Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu
SAMPTA’09, International Conference on SAMPling Theory and Applications Laurent Fesquet, Bruno Torrésani To cite this version: Laurent Fesquet, Bruno Torrésani. SAMPTA’09, International Conference on SAMPling Theory and Applications. Laurent Fesquet and Bruno Torrésani. pp.384, 2010. <hal-00495456> HAL Id: hal-00495456 https://hal.archives-ouvertes.fr/hal-00495456 Submitted on 26 Jun 2010 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. SAMPTA'09 SAMPling Theory and Applications Centre International de Rencontres Mathématiques Marseille Luminy MAY 18-22, 2009 Editors: Laurent Fesquet and Bruno Torrésani Organized by TIMA, INP Grenoble and LATP, Université de Provence http://www.latp.univ-mrs.fr/SAMPTA09 SAMPTA'09 2 SAMPTA'09 Participants SAMPTA'09, the 8th international conference on Sampling Theory and Applications, was organized in Marseille-Luminy, on May 18-22, 2009. The previous conferences were held in Riga (Latvia) in 1995, Aveiro (Portugal) in 1997, Loen (Norway) in 1999, Orlando (USA) in 2001, Salzburg (Austria) in 2003, Samsun (Turkey) in 2005 and Thessaloniki (Greece) in 2007. The purpose of SAMPTA's is to bring together mathematicians and engineers interested in sampling theory and its applications to related fields (such as signal and image processing, coding theory, control theory, complex analysis, harmonic analysis, differential equations) to exchange recent advances and to discuss open problems. SAMPTA09 gathered around 160 participants from various countries and scientific areas, The conference benefited from the infrastructure of CIRM, the Centre International de Rencontres Mathématiques, an institute mainly sponsored by the french Centre National de la Recherche Scientifique (CNRS) and the French Mathematical Society (SMF). SAMPTA'09 3 Organizing committee: General chairs Local committee • Laurent Fesquet (TIMA, Grenoble Institute of Technology) • Bruno Torrésani (LATP, Université de Provence, Marseille) • • • • • Sandrine Anthoine (I3S, CNRS, Sophia-Antipolis) Karim Kellay (LATP, Université de Provence, Marseille) Matthieu Kowalski (LATP, Université de Provence, Marseille) Clothilde Melot (LATP, Université de Provence, Marseille) El Hassan Youssfi (LATP, Université de Provence, Marseille Program committee: Program chairs Special sessions organizers SAMPTA'09 • Yonina Eldar (Electrical Engineering, Technion, Israel Institute of Technology) • Karlheinz Gröchenig (Fakultät für Mathematik, University of Vienna) • Sinan Gunturk (Mathematics Department, Courant Institute, New York) • Michael Unser (Biomedical Imaging Group, EPFL Lausanne) • Bernhard Bodmann (Dept of Mathematics, University of Houston, USA) • Pierluigi Dragotti (Dept of Electronic and Alectrical Engineering, Imperial College London, UK) • Yonina Eldar (Electrical Engineering, Technion, Israel Institute of Technology, Israel) • Laurent Fesquet (TIMA, INP Grenoble, France) • Massimo Fornasier (RICAM, Linz University, Austria) • Hakan Johansson (Dept of Electrical Engineering, Linkoping University, Sweden) • Gitta Kutyniok (Universität Osnabrück, Germany) • Pina Marziliano (Nanyang Technological University, Singapore) • Götz Pfander (International University Bremen, Germany) • Hölger Rauhut (Hausdorff Center for Mathematics, University of Bonn, Germany) • Jared Tanner (School of Mathematics, University of Edimburgh, Scotland) • Christian Vogel (ISI, ETH Zurich, Switzerland) • Ozgur Yilmaz (Dept of Mathematics, University of British Columbia, Vancouver, Canada) 4 Acknowledgements The conference was extremely successful, thanks mainly to the participants, whose scientific contributions were remarkably good. Thanks are also due to the members of the organizing committee and the program committee, as well as all the reviewers who participated in the selection of contributions. We would also like to thank the CIRM staff for the practical organization (accomodation, conference facilities,...) and their constant availability, and the Faculté des Sciences de Luminy for lending a conference room for the plenary sessions. The secretary staff at LATP was instrumental in all aspects of the organization, from the scientific part to the social events. Finally, we would like to thank the sponsors of the conference: CIRM (CNRS and French Mathematical Society), Université de Provence, the European Excellence Center for Time-Frequency Analysis (EUCETIFA), the City of Marseille and the Conseil Général des Bouches du Rhône for their financial support. SAMPTA'09 5 SAMPTA'09 6 SampTA Technical Program Monday 18 May 2009 09:10 - 09:30 Opening Session – Amphi 8 09:30 - 10:30 Plenary talk - Amphi 8 – Chair: K. Gröchenig Gabor frames in Complex Analysis , Yura Lyubarskii 10:30 - 11:00 Coffee break 11:00 - 12:00 Plenary talk - Amphi 8 – Chair: K. Gröchenig A Prior-Free Approach to Signal and Image Denoising: the SURE-LET Methodology, Thierry Blu 12:00 - 14:00 Lunch 14:00 - 16:00 Special session – Auditorium Sparse approximation and high-dimensional geometry - Chair: J. Tanner #192. Dense Error Correction via L1-Minimization, John Wright, Yi Ma #204. Recovery of Clustered Sparse Signals from Compressive Measurements, Volkan Cevher, Piotr Indyk, Chinmay Hegde, Richard G. Baraniuk #206. Sparse Recovery via lq-minimization for 0 < q ≤ 1, Simon Foucart #210. The Balancedness Properties of Linear Subpaces and Signal Recovery Robustness in Compressive Sensing, Weiyu Xu #207. Phase Transitions Phenomena in Compressed Sensing, Jared Tanner #167 Optimal Non-Linear Models, Akram Aldroubi, Carlos Cabrelli,Ursula Molter 14:00 - 16:00 General session – room 1 General sampling - Chair: Y. Lyubarskii #78. Linear Signal Reconstruction from Jittered Sampling, Alessandro Nordio, Carla-Fabiana Chiasserini, Emanuele Viterbo #81. Zero-two derivative sampling, Gerhard Schmeisser #115. On average sampling restoration of Piranashvili-type harmonizable processes, Andriy Ya. Olenko, Tibor K. Pogany #86. Uniform Sampling and Reconstruction of Trivariate Functions, Alireza Entezari #184. On Subordination Principles for Generalized Shannon Sampling Series, Andi Kivinukk and Gert Tamberg #104. The Class of Bandlimited Functions with Unstable Reconstruction under Thresholding, Holger Boche, Ullrich J. Mönich 16:00 - 16:30 Coffee break 16:30 – 18:30 Special Session – Auditorium Compressed sensing - Chair: Y. Eldar #150. Sampling Shift-Invariant Signals with Finite Rate of Innovation, Kfir Gedalyahu, Yonina C. Eldar #105. Compressed sensing signal models - to infinity and beyond?, Thomas Blumensath, Mike Davies #110. Compressed sampling Via Huffman Codes, Akram Aldroubi, Haichao Wang, Kourosh Zaringhalam #126. On Lp minimisation, instance optimality, and restricted isometry constants for sparse approximation, Michael Davies, Rémi Gribonval #73. Signal recovery from incomplete and inaccurate measurements via ROMP, Deanna Needell, Roman Vershynin #169 Sparse approximation and the MAP, Akram Aldroubi, Romain Tessera 16:30 – 18:30 Special Session – room 1 Frame theory and oversampling - Chair: B. Bodmann #118. Invariance of Shift Invariance Spaces, Akram Aldroubi, Carlos Cabrelli, Christopher Heil, Keri Kornelson, Ursula Molter #188. Gabor frames with reduced redundancy, Ole Christensen, Hong Oh Kim, Rae Young Kim #141. Gradient descent of the frame potential, Peter G. Casazza, Matthew Fickus #201. Error Correction for Erasures of Quantized Frame Coefficients, Bernhard G. Bodmann, Peter G. Casazza ,Gitta Kutyniok, Steven Senger #199. Linear independence and coherence of Gabor systems in finite dimensional spaces, Götz E. Pfander SAMPTA'09 7 Tuesday 19 May 2009 09:10 - 10:30 Special Session – Auditorium Efficient design and implementation of sampling rate conversion, resampling and signal reconstruction methods - Chair: H. Johansson and C. Vogel #194. Efficient design and implementation of sampling rate conversion, resampling, and signal reconstruction methods, Håkan Johansson, Christian Vogel #171. Structures for Interpolation, Decimation, and Nonuniform Sampling Based on Newton's Interpolation Formula, Vesa Lehtinen, Markku Renfors #79. Chromatic Derivatives, Chromatic Expansions and Associated Function Spaces, Aleksandar Ignjatovic #84. Estimation of the Length and the Polynomial Order of Polynomial-based Filters, Djordje Babic, Heinz G. Göckler 09:10 - 10:30 General Session – room 1 Time frequency and frames - Chair: J.P. Antoine #121. An Efficient Algorithm for the Discrete Gabor Transform using full length Windows, Peter L. Søndergaard #82. Matrix Representation of Bounded Linear Operators By Bessel Sequences, Frames and Riesz Sequence, Peter Balazs #124. Nonstationary Gabor Frames, Florent Jaillet, Peter Balazs, Monika Dörfler #140. A Nonlinear Reconstruction Algorithm from Absolute Value of Frame Coefficients for Low Redundancy Frames, Radu Balan 10:30 - 11:00 Coffee break 11:00 - 12:00 Plenary talk – Amphi 8 – Chair: A. Aldroubi Harmonic and multiscale analysis of and on data sets in high-dimensions, Mauro Maggioni 12:00 - 14:00 Lunch 14:00 - 16:00 Special session – Auditorium Geometric multiscale analysis I - Chair: G. Kutyniok #74. Analysis of Singularities and Edge Detection using the Shearlet Transform, Glenn Easley, Kanghui Guo, Demetrio Labate #98. Discrete Shearlet Transform: New Multiscale Directional Image Representation, Wang-Q Lim #125. The Continuous Shearlet Transform in Arbitrary Space Dimensions, Frame Construction, and Analysis of singularities, Stephan Dahlke, Gabriele Steidl, Gerd Teschke #193. Computable Fourier Conditions for Alias-Free Sampling and Critical Sampling, Yue M. Lu, Minh N. Do, Richard S. Laugesen #149. Compressive-wavefield simulations, Felix J. Herrmann, Yogi Erlangga, Tim T. Y. Lin #164. Analysis of Singularity Lines by Transforms with Parabolic Scaling, Panuvuth Lakhonchai, Jouni Sampo, Songkiat Sumetkijakan 14:00 - 16:00 Special session – room 1 Sampling and communication - Chair: G. Pfander #146. Erasure-proof coding with fusion frames, Bernhard G. Bodmann, Gitta Kutyniok, Ali Pezeshki #175. Operator Identification and Sampling, Götz Pfander, David Walnut #116. A Kashin Approach to the Capacity of the Discrete Amplitude Constrained Gaussian Channel, Brendan Farrell, Peter Jung #147. Irregular and Multi-channel sampling in Operator Paley-Wiener spaces, Yoon Mi Hong, G. Pfander #136. Low-rate Wideband Receiver, Moshe Mishali, Yonina Eldar #151. Representation of operators by sampling in the time-frequency domain, Monika Dörfler, Bruno Torrésani 16:00 - 16:30 Coffee break 16:30 - 17:30 Special session – Auditorium Geometric multiscale analysis II - Chair: G. Kutyniok #120. Geometric Wavelets for Image Processing: Metric Curvature of Wavelets, Emil Saucan, Chen Sagiv, Eli Appleboim #102. Image Approximation by Adaptive Tetrolet Transform, Jens Krommweh #202. Geometric Separation using a Wavelet-Shearlet Dictionary, David L. Donoho, Gitta Kutyniok SAMPTA'09 8 16:30 - 17:30 General session – room 1 Sparsity and compressed sensing - Chair: R. Gribonval #127. Sparse Coding in Mass Spectrometry, Stefan Schiffler, Dirk Lorenz,Theodore Alexandrov #161.Quasi-Random Sequences for Signal Sampling and Recovery, Miroslaw Pawlak, Ewaryst Rafajlowicz 17:30 - 18:30 Poster session #75. Sparse representation with harmonic wavelets, Carlo Cattani #85. Reconstruction of signals in a shift-invariant space from nonuniform samples, Junxi Zhao #92. Spline Interpolation in Piecewise Constant Tension, Masaru Kamada, Rentsen Enkhbat#95. The Effect of Sampling Frequency on a FFT Based Spectral Estimator, Saeed Ayat #99. Nonlinear Locally Adaptive Wavelet Filter Banks, Gerlind Plonka and Stefanie Tenorth #111. Continuous Fast Fourier Sampling, Praveen K. Yenduri, Anna C. Gilbert #134. Double Dirichlet averages and complex B-splines, Peter Massopust #135. Sampling in cylindrical 2D PET, Yannick Grondin, Laurent Desbat, Michel Desvignes #148. Significant Reduction of Gibbs' Overshoot with Generalized Sampling Method, Yufang Hao, Achim Kempf #156. Optimized Sampling Patterns for Practical Compressed MRI, Muhammad Usman, Philip G. Batchelor #160. A study on sparse signal reconstruction from interlaced samples by l1-norm minimization, Akira Hirabayashi #162. Multiresolution analysis on multidimensional dyadic grids, Douglas A. Castro, Sônia M. Gomes, Anamaria Gomide, Andrielber S. Oliveira, Jorge Stolfi #165. Adaptive and Ultra-Wideband Sampling via Signal Segmentation and Projection, Stephen D. Casey, Brian M. Sadler #174. Non-Uniform Sampling Methods for MRI, Steven Troxler #187. On approximation properties of sampling operators defined by dilated kernels, Andi Kivinukk, Gert Tamberg Wednesday 20 May 2009 09:10 - 10:30 Special Session – Auditorium Sampling and industrial applications - Chair: L. Fesquet #182. A coherent sampling-based method for estimating the jitter used as entropy source for True Random Number Generators, Boyan Valtchanov, Viktor Fischer, Alain Aubert #91. Orthogonal exponential spline pulses with application to impulse radio, Masaru Kamada, Semih Özlem, Hiromasa Habuchi #117. Effective Resolution of an Adaptive Rate ADC, Saeed Mian Qaisar, Laurent Fesquet, Marc Renaudin #157. An Event-Based PID Controller With Low Computational Cost, Sylvain Durand, Nicolas Marchand 09:10 -10:30 General Session – room 1 Wavelets, multiresolution and multirate sampling – Chair: D. Walnut #189. Asymmetric Multi-channel Sampling in Shift Invariant Spaces, Sinuk Kang,Kil Hyun Kwon #89. Sparse Data Representation on the Sphere using the Easy Path Wavelet Transform, Gerlind Plonka, Daniela Rosca #114. On the incoherence of noiselet and Haar bases, Tomas Tuma, Paul Hurley #138. Adaptive compressed image sensing based on wavelet modeling and direct sampling, Shay Deutsch, Amir Averbuch, Shai Dekel10:30 - 11:00 Coffee break 11:00 - 12:00 Plenary talk – Amphi 6 – Chair: G. Teschke Recent Developments in Iterative Shrinkage/Thresholding Algorithms, Mario Figueiredo 12:00 - 14:00 Lunch 14:00 - 23:00 Social event Thursday 21 May 2009 09:10 - 10:30 General Session – Auditorium Adaptive techniques – Chair: N. Marchand #68. Adaptive transmission for lossless image reconstruction, Elisabeth Lahalle, Gilles Fleury, Rawad SAMPTA'09 9 Zgheib #129. A fully non-uniform approach to FIR filtering, Brigitte Bidegaray-Fesquet, Laurent Fesquet #72. Sampling of bandlimited functions on combinatorial graphs, Isaac Pesenson, Meyer Pesenson #01. Pseudospectral Fourier reconstruction with the inverse polynomial reconstruction method, Karlheinz Groechenig, Tomasz Hrycak 09:10 - 10:30 General Session – room 1 General sampling – Chair: A. Jerri #112. Geometric Sampling of Images, Vector Quantization and Zador's Theorem, Emil Saucan, Eli Appleboim, Yehoshua Y. Zeevi #168. On sampling lattices with similarity scaling relationships, Steven Bergner, Dimitri Van De Ville, Thierry Blu, Torsten Möller #83. Scattering Theory and Sampling of Bandlimited Hardy Space Functions, Ahmed I. Zayed, Marianna Shubov #119. Sampling of Homogeneous Polynomials, Somantika Datta, Stephen D. Howard, Douglas Cochran 10:30 - 11:00 Coffee break 11:00 - 12:00 Plenary talk – Amphi 6 – Chair: S. Güntürk A Taste of Compressed Sensing, Ron DeVore 12:00 - 14:00 Lunch 14:00 - 16:00 Special Session – Auditorium Sampling using finite rate of innovation principles I - Chair: P. Dragotti and P. Marziliano #113. The Generalized Annihilation Property --- A Tool For Solving Finite Rate of Innovation Problems, Thierry Blu #100. Sampling of Sparse Signals in Fractional Fourier Domain, Ayush Bhandari, Pina Marziliano #153. A method for generalized sampling and reconstruction of finite-rate-of-innovation signals, Chandra Sekhar Seelamantula, Michael Unser #80. An ``algebraic'' reconstruction of piecewise-smooth functions from integral measurements, Dima Batenkov, Niv Sarig, Yosef Yomdin #108. Estimating Signals With Finite Rate of Innovation From Noisy Samples: A Stochastic Algorithm, Vincent Y. F. Tan, Vivek K. Goyal 14:00 - 16:00 Special Session – room 1 Mathematical aspects of compressed sensing - Chair: H. Rauhut #195. Orthogonal Matching Pursuit with random dictionaries, Paweł Bechler, Przemysław Wojtaszczyk #178. A short note on nonconvex compressed sensing, Rayan Saab, Ozgur Yilmaz #190. Domain decomposition methods for compressed sensing, Massimo Fornasier, Andreas Langer, Carola-Bibiane Schönlieb #197. Free discontinuity problems meet iterative thresholding, Rachel Ward, Massimo Fornasier #198. Concise Models for Multi-Signal Compressive Sensing, Mike Wakin #196. Average case analysis of multichannel Basis Pursuit, Yonina Eldar, Holger Rauhut 16:00 - 16:30 Coffee break 16:30 - 17:30 Special session – Auditorium Sampling using finite rate of innovation principles II - Chair: P. Dragotti and P. Marziliano #96. Distributed Sensing of Signals Under a Sparse Filtering Model, Ali Hormati, Olivier Roy, Yue M. Lu, Martin Vetterli #154. Multichannel Sampling of Translated, Rotated and Scaled Bilevel Polygons Using Exponential Splines, Hojjat Akhondi Asl, Pier Luigi Dragotti 16:30 - 17:30 General session – room 1 Signal Analysis and compressed sensing – Chair: A. Cohen #176. General Perturbations of Sparse Signals in Compressed Sensing, Matthew Herman, Thomas Strohmer #203. Limits of Deterministic Compressed Sensing Considering Arbitrary Orthonormal Basis for Sparsity, Arash Amini, Farokh Marvasti #185. Analysis of High-Dimensional Signal Data by Manifold Learning and Convolutions, Mijail Guillemard, Armin Iske 17:30 - 18:30 Poster session SAMPTA'09 10 See Tuesday poster session. Friday 22 May 2009 09:10 - 10:30 General Session – Auditorium Kernels and unusual Paley-Wiener spaces – Chair: G. Schmeisser #131. Geometric Reproducing Kernels for Signal Reconstruction, Eli Appleboim, Emil Saucan, Yehoshua Y. Zeevi #137. Concrete and discrete operator reproducing formulae for abstract Paley-Wiener space, John R. Higgins #132. Multivariate Complex B-Splines, Dirichlet Averages and Difference Operators, Brigitte Forster, Peter Massopust #143. Explicit localization estimates for spline-type spaces, José Luis Romero 09:10 - 10:30 General Session – room 1 Reconstruction, time and frequency analysis - Chair: R. Balan #70. Daubechies Localization Operator in Bargmann-Fock Space and Generating Function of Eigenvalues of Localization Operator, Kunio Yoshino #97 Optimal Characteristic of Optical Filter for White Light Interferometry based on Sampling Theory, Hidemitsu Ogawa and Akira Hirabayashi #90. Signal-dependent sampling and reconstruction method of signals with time-varying bandwidth, Modris Greitans , Rolands Shavelis #177. A Fast Fourier Transform with Rectangular Output on the BCC and FCC Lattices, Usman Raza Alim, Torsten Moeller 10:30 - 11:00 Coffee break 11:00 - 12:00 Plenary talk – Amphi 6 Compressed Sensing in Astronomy, Jean-Luc Starck 12:00 - 14:00 Lunch 14:00 - 16:00 Special Session – Auditorium Sampling and quantization - Chair: O. Yilmaz #180. Finite Range Scalar Quantization for Compressive Sensing, Jason N. Laska, Petros Boufounos, Richard G. Baraniuk #107. Quantization for Compressed Sensing Reconstruction, John Z. Sun, Vivek K Goyal #106. Determination of Idle Tones in Sigma-Delta Modulation by Ergodic Theory, Nguyen T. Thao #172. Noncanonical reconstruction for quantized frame coefficients, Alexander M. Powell #166. Stability Analysis of Sigma-Delta Quantization Schemes with Linear Quantizers, Percy Deift, Sinan Güntürk, Felix Krahmer 14:00 - 16:00 Special Session – room 1 Sampling and inpainting - Chair: M. Fornasier #191. Image Inpainting Using a Fourth-Order Total Variation Flow, Carola-Bibiane Schönlieb, Andrea Bertozzi, Martin Burger, Lin He #139. Reproducing kernels and colorization, Minh Q. Ha, Sung Ha Kang, Triet M. Le #123. Edge Orientation Using Contour Stencils, Pascal Getreuer #71. Image Segmentation Through Efficient Boundary Sampling, Alex Chen, Todd Wittman, Alexander Tartakovsky, Andrea Bertozzi #103. Report on Digital Image Processing for Art Historians, Bruno Cornelis, Ann Dooms, Ingrid Daubechies, Peter Schelkens #158. Smoothing techniques for convex problems. Applications in image processing, Pierre Weiss, Mikael Carlavan, Laure Blanc-Féraud, Josiane Zerubia SAMPTA'09 11 SAMPTA'09 12 SAMPTA'09 Special Sessions SAMPTA'09 13 SAMPTA'09 14 Special session on Sparse approximation and high-dimensional geometry Chair: Jared TANNER SAMPTA'09 15 SAMPTA'09 16 Recovery of Clustered Sparse Signals from Compressive Measurements Volkan Cevher(1) , Piotr Indyk(1,2) , Chinmay Hegde(1) , and Richard G. Baraniuk(1) (1) Electrical and Computer Engineering, Rice University, Houston, TX (2) Computer Science and Artificial Intelligence Lab, MIT, Cambridge, MA Abstract: ing algorithms for dimensionality reduction. We introduce a new signal model, called (K, C)-sparse, to capture K-sparse signals in N dimensions whose nonzero coefficients are contained within at most C clusters, with C < K ≪ N . In contrast to the existing work in the sparse approximation and compressive sensing literature on block sparsity, no prior knowledge of the locations and sizes of the clusters is assumed. We prove that O (K + C log(N/C))) random projections are sufficient for (K, C)-model sparse signal recovery based on subspace enumeration. We also provide a robust polynomialtime recovery algorithm for (K, C)-model sparse signals with provable estimation guarantees. Fortunately, it is possible to design CS recovery algorithms that exploit the knowledge of structured sparsity models with provable performance guarantees [3, 5, 6]. In particular, the model-based CS recovery framework in [3] generalizes to any structured-sparsity model that has a tractable model-based approximation algorithm. This framework has been applied productively to two structured signal models: block sparsity and wavelet trees with robust recovery guarantees from O (K) measurements [3]. To recover signals that have structured sparsity, problem-specific convex relaxation approaches are also used in the literature with recovery guarantees similar to those in [3]; e.g., for block sparse signals, see [5, 6]. 1. In this paper, we introduce a new structured sparsity model, called the (K, C)-model, that constrains the Ksparse signal coefficients to be contained within at most C-clusters. In contrast to the block sparsity model in [5, 6], our proposed model does not assume prior knowledge of the locations and sizes of the coefficient clusters. We show that O (K + C log(N/C))) random projections are sufficient for (K, C)-model signal recovery using a subspace counting argument. We also provide a polynomialtime model-based approximation algorithm based on dynamic programming and a CS recovery algorithm based on the model-based recovery framework of [3]. In contrast to the clustered sparse recovery algorithm based on the probabilistic Ising model in [7], the (K, C)-model has provable performance guarantees. Introduction Compressive sensing (CS) is an alternative to Shannon/Nyquist sampling for the acquisition of sparse or compressible signals in an appropriate basis [1, 2]. By sparse, we mean that only K of the N basis coefficients are nonzero, where K ≪ N . By compressible, we mean the basis coefficients, when sorted, decay rapidly enough to zero so that they can be well-approximated as K-sparse. Instead of taking periodic samples of a signal, CS measures inner products with random vectors and then recovers the signal via a sparsity-seeking convex optimization or greedy algorithm. The number of compressive measurements M necessary to recover a sparse signal under this framework grows as M = O (K log(N/K)) In many applications, including imaging systems and high-speed analog-to-digital converters, such a saving can be dramatic; however, the dimensionality reduction from N to M is still not on par with state-of-the-art transform coding systems. While many natural and manmade signals can be described to a first-order as sparse or compressible, their sparse supports often have an underlying domain specific structure [3–6]. Exploiting this structure in CS recovery has two immediate benefits. First, the number of compressive measurements required for stable recovery decreases due to the reduction in the degrees of freedom of a sparse or compressible signal. Second, true signal information can be better differentiated from recovery artifacts during signal recovery, which increases recovery robustness. Only by exploiting a priori information on coefficient structure in addition to signal sparsity, can CS hope to be competitive with the state-of-the-art transform cod- SAMPTA'09 The paper is organized as follows. Section 2 provides the necessary theoretical and algorithmic background on model-based CS. Section 3 introduces the (K, C)-model, derives its sampling bound for CS recovery, and describes a dynamic programming solution for optimal (K, C)model approximation. Section 4 discusses the aspect of compressibility and highlights some connections to the block sparsity model. Simulation results are given in Section 5 to demonstrate the effectiveness of the (K, C)model. Section 6 provides our conclusions. 2. Model-based CS Background N A K-sparse signal  vector x lives in ΣK ⊂ R , which N is a union of K subspaces of dimension K. Other than its K-sparsity, there are no further constraints on the support or values of its coefficients. A union-of-subspaces 17 signal model (a signal model in the sequel for brevity) endows the K-sparse signal x with additional structure that allows certain K-dimensional subspaces in ΣK and disallows others [4, 8]. More formally, let x|Ω represent the entries of x corresponding to the set of indices Ω ⊆ {1, . . . , N }, and let ΩC denote the complement of the set Ω. A signal model MK is then defined as the union of mK canonical Kdimensional subspaces MK = m K [ m=1 Xm , Xm := {x : x|Ωm ∈ RK , x|ΩCm = 0}. Each subspace Xm contains all signals x with supp(x) ∈ Ωm . Thus, the signal model MK is defined by the set of possible supports {Ω1 , . . . , ΩmK }. Signals from MK are called K-model sparse. Likewise, we may define McK to be the set of c-wise differences of signals belonging to MK . Clearly, MK ⊆ ΣK and M4K ⊆ Σ4K . In the sequel, we will use an algorithm M(x; K) that returns the best K-term approximation of the signal x under the model MK . If we know that the signal x being acquired is Kmodel sparse, then we can relax the standard restricted isometry property (RIP) [1] of the CS measurement matrix Φ and still achieve stable recovery from the compressive measurements y = Φx. The model-based RIP MK -RIP requires that (1 − δMK )kxk22 ≤ kΦxk22 ≤ (1 + δMK )kxk22 (1) hold for signals x ∈ MK [4, 8], where δMK is the modelbased RIP constant. Blumensath and Davies [4] have quantified the number of measurements M necessary for a subgaussian CS matrix to have the MK -RIP with constant δMK and with probability 1 − e−t to be   12 2 ln(2mK ) + K ln + t . (2) M≥ 2 cδMK δMK This bound can be used to recover the conventional CS  N result by substituting mK = K ≈ (N e/K)K . To take practical advantage of signal models in CS, we can integrate them into a standard CS recovery algorithm based on iterative greedy approximation. The key modification is surprisingly simple [3]: we merely replace the best K-term approximation step with the best K-term model-based approximation M(x; K). For example, in the CoSaMP algorithm [9], the best LK-term approximation (with L a small integer) is modified to incorporate a best LK-term model-based approximation. The resulting algorithm (see [3]) then inherits the following model-based CS recovery guarantee at each iteration i, when the measurement matrix Φ has the M4K -RIP with δM4K ≤ 0.1: kx − x bi k2 ≤ 2−i kxk2 + 20 kx − xMK k2 ! 1 + √ kx − xMK k1 + knk2 , K SAMPTA'09 where xMK = M(x; K) is the best model-based approximation of x within MK . 3. The (K, C)-Model Motivation: The block sparsity model is used in applications where the significant coefficients of a sparse signal appear in designated blocks on the ambient signal dimension, e.g., group sparse regression problems, DNA microarrays, MIMO channel equalization, source localization in sensor networks, and magnetoencephalography [3, 5, 6, 10–14]. It has been shown that recovery algorithms provably improve standard CS recovery by exploiting this block-sparse structure [3, 5]. The (K, C)-model generalizes the block sparsity model by allowing the significant coefficients of a sparse signal to appear in at most C clusters of unknown size and location (Figure 1(a)). This way, the (K, C)-model further accommodates additional applications in, e.g., neuroscience problems that are involved with decoding of natural images in the primary visual cortex (V1) or understanding the statistical behavior of groups of neurons in the retina [15]. In this section, we formulate the (K, C)model as a union of subspaces and pose an approximation algorithm on this union of subspaces. To define the set of (K, C)-sparse signals, without loss of generality, we focus on canonically sparse signals in N + 2 dimensions whose first and last coefficients are zero. Consider expressing the support of such signals via run-length coding with a vector β = (β1 , . . . , β2C+1 ) (βj 6= 0), where βodd counts the number of continuous zero-signal values and βeven counts the number of continuous nonzero-signal values (i.e., clusters). The (K, C)-sparse signal model M(K,C) is Definition: defined as M(K,C) = ( x∈R N+2 2C+1 X βi = N + 2, i=1 C X β2i = K i=1 ) . (3) Sampling Bound: The number of subspaces m(K,C) in M(K,C) can be obtained by counting the number of positive solutions to the following integer equations: β1 + β2 + . . . + β2C+1 = N + 2, β2 + β4 + . . . + β2C = K, which can be rewritten as β1 + β3 + . . . + β2C+1 = N + 2 − K, β2 + β4 + . . . + β2C = K. (4) Note that the number of positive integer solutions to the following problem: β1 + β2 + β3 + . . . + βn = N,  −1 is given by N n−1 . Then, we can count the solutions to the two of decoupled problems in (4) and multiply the number of solutions to obtain m(K,C) :    N +1−K K −1 m(K,C) = . (5) C C −1 18 4. Additional Remarks Plugging (5) into (2), we obtain the sampling bound for M(K,C) :   N M = O K + C log . C Compressibility: Just as compressible signals are nearly K-sparse and live close to the union of subspaces ΣK in RN , (K, C)-compressible signals are nearly (K, C)-model sparse and live close to the restricted union of subspaces M(K,C) . Here, we rigorously introduce a (K, C)-compressible signal model in terms of the decay of their (K, C)-model approximation error. (6) Note that the (K, C)-sampling bound (6) becomes the standard CS bound of M = O K log N K when C ≈ K. Model Approximation Algorithm: In this section we focus on designing an algorithm M(x; K, C) for finding the best (K, C)-model approximation to a given signal x. The algorithm uses the principle of dynamic programming [16]. For simplicity, we focus on the problem of finding the cost of the best (K, C)-clustered signal approximation in ℓ2 . This solution generalizes to the best (K, C)-clustered signal approximation in ℓp for p ≥ 1. The actual sparsity pattern can be then recovered using standard back-tracing techniques; see [16] for the details. We first define the ℓ2 error incurred by approximating x ∈ RN by the best approximation in M(K,C) : σM(K,C) (x) , Ms = min k⋆ =0...k SAMPTA'09 (7) . We use the restricted amplification property (RAmP) and the nested approximation property (NAP) in [3] to ensure that the (K, C)-model based CoSaMP recovery possesses the following guarantee for (K, C)-model scompressible signals at each iteration i:  SM kx − x bi k2 ≤ 2 kxk2 + 35 knk2 + s (1 + ln⌈N/K⌉) , K (8) −i ) cost[i, j ⋆ , k ⋆ , c⋆ ] × cost[j ⋆ + 1, j, k − k ⋆ , c − c⋆ ] . The correctness of the algorithm follows from the following observation. Let v be the best (k, c)-clustered approximation of xi:j . Unless all entries of xi:j can be included in the approximation v (in which case j − i + 1 ≥ k and the entry has been already computed during initialization), then there must exist an index l ∈ [i, . . . , j] such that xl is not included in v. Let l⋆ = l if l < j, and l⋆ = j − 1 otherwise. Let k ⋆ be the number of non-zero entries present in the left segment of v i:l⋆ , and let c⋆ be the number of clusters present in that left segment. Then, it must be the case that v i:l⋆ is the best (k ⋆ , c⋆ )-approximation to xi:l , and v l+1:j is the best (k − k ⋆ , c − c⋆ )-approximation to x(l⋆ +1):j . Otherwise, those better approximations could have been concatenated together to yield an even better (k, c)-approximation of xi:j . Thus, the recursive formula will identify the optimal split and compute the optimal approximation cost.  The cost table contains O N 2 KC entries. Each entry can be computed in O (N KC) time.  Thus, the running time of the algorithm is O N 3 K 2 C 2 . ) Define SM as the smallest value of S for which this condition holds for x and s. min j ⋆ =i...j−1 x ∈ RN : σMj(K,C) (x) ≤ S(jK)−1/s , N 1 ≤ K ≤ N, S < ∞, j = 1, . . . , K (Main loop) All other cost entries can then be computed using the following recursion: min (  (Initialization) When either c = 0 or k = 0, the signal approximation costs can be computed directly, since cost[i, j, 0, c] = kxi:j k22 and cost[i, j, k, 0] = kxi:j k22 , for all valid indices i, j, k, c. Moreover, for all entries i, j, k, c such that c > 0 and j − i + 1 ≤ k, we have cost[i, j, k, c] = 0 since we can include all j − i + 1 coordinates of the vector xi:j in the approximation. c⋆ =0...c kx− x̄k2 = kx−M(x; K, C)k2 . The decay of the (K, C)-model approximation error in (7) defines the (K, C)-compressibility of a signal. Then, a set of (K, C)-model s-compressible signals is given by The algorithm M(x; K, C) computes an array cost[i, j, k, c], where 1 ≤ i ≤ j ≤ N , 0 ≤ k ≤ K, and 0 ≤ c ≤ C. At the end of the algorithm, each entry cost[i, j, k, c] contains the smallest cost of approximating xi:j , the signal vector restricted to the index set [i, . . . , j], using at most k non-zero entries that span at most c clusters. M(x; K, C) performs the following operations. cost[i, j, k, c] = ( inf x̄∈M(K,C)  when Φ has the M4(K,C) -RIP with δM4(K,C) ≤ 0.1 and the (ǫK , r)-RAmP with ǫK ≤ 0.1 and r = s − 1. Simulation via Block Sparsity: It is possible to recover (K, C)-sparse signals by using the block sparsity model if we are willing to pay an added penalty in terms of the number of measurements. To demonstrate this, we define uniform blocks of size K/C (e.g., average cluster length) on the signal space. Then, it is straightforward to see that the number of active blocks B in the block sparse model is upper-bounded by B ≤ 2(C − 1) + K − 2(C − 1) ≤ 3C. K/C (9) To reach this upper bound, we first construct a (K, C)sparse signal that has (C − 1)-clusters with 2 coefficients and a single cluster with the remaining sparse coefficients. We then place the clusters with two coefficients at the boundary of the block sparse model so that each cluster activate two blocks in the block sparse model to arrive at (9). Then, the (K, C)-equivalentblock sparse model requires M = O BK/C + B log N B samples, where B = 3C. 19 (K,C)−based recovery CoSaMP 0.8 0.8 0.6 0.4 0.2 0 2 (a) A (10, 2)-model signal 1 Average distortion Probability of perfect signal recovery 1 0.6 0.4 0.2 (K,C)−based recovery CoSaMP 2.5 3 3.5 M/K 4 (b) Reconstruction probability 4.5 5 0 2 2.5 3 3.5 M/K 4 4.5 5 (c) Recovery error Figure 1: Monte Carlo simulation results for (K, C)-model based recovery with K = 10, C = 2.. 5. Experiments In this section we demonstrate the performance of (K, C)model based recovery. Our test signals are the class of length-100 clustered-sparse signals with K = 10, C = 2. We run both the CoSaMP algorithm as well as (K, C)model based CoSaMP algorithm [3] until convergence for 1000 independent trials. In Fig. 1(a), a sample realization of the signal is displayed. It is evident from Figs. 1(b) and (c) that enforcing the structured sparsity model in the recovery process significantly improves CS reconstruction performance. In particular, Fig. 1(b) demonstrates that approximately 85% of the signals are almost perfectly recovered at M = 2.5K, whereas CoSaMP fails to recover any signals at this level of measurements. Instead, traditional sparsity-based recovery requires M ≥ 4.5K to attain comparable performance. Similarly, Figure. 1(c) displays the rapid decrease in average recovery distortion of our proposed method, as compared to the conventional approach. The (K, C)-sparse approximation algorithm codes are available at dsp.rice.edu/software/KC. 6. Conclusions In this paper, we have introduced a new sparse signal model that generalizes the block-sparsity model used in the CS literature. To exploit the provable model-based CS recovery framework of [3], we developed a dynamic programming algorithm that computes, for any given signal, its optimal ℓ2 -approximation within our clustered sparsity model. We then demonstrated that significant performance gains can be made by exploiting the clustered signal model beyond the simplistic sparse model that are prevalent the CS literature. Acknowledgments The authors would like to thank Marco F. Duarte for useful discussions and Andrew E. Waters for converting the (K, C)-model MATLAB code into C++. VC, CH and RGB were supported by the grants NSF CCF-0431150 and CCF-0728867, DARPA/ONR N66001-08-1-2065, ONR N00014-07-1-0936 and N00014-081-1112, AFOSR FA9550-07-1-0301, ARO MURI W311NF-071-0185, and the Texas Instruments Leadership University Program. PI is supported in part by David and Lucille Packard Fellowship and by MADALGO (Center for Massive Data Algorithmics, funded by the Danish National Research Association) and by NSF grant CCF-0728645. SAMPTA'09 References: [1] E. J. Candès, “Compressive sampling,” in Proc. International Congress of Mathematicians, vol. 3, (Madrid, Spain), pp. 1433–1452, 2006. [2] D. L. Donoho, “Compressed sensing,” IEEE Trans. Info. Theory, vol. 52, pp. 1289–1306, Sept. 2006. [3] R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde, “Model-based compressive sensing,” 2008. Preprint. Available at http://dsp.rice.edu/cs. [4] T. Blumensath and M. E. Davies, “Sampling theorems for signals from the union of finite-dimensional linear subspaces,” IEEE Trans. Info. Theory, Dec. 2008. [5] Y. Eldar and M. Mishali, “Robust recovery of signals from a union of subspaces,” 2008. Preprint. [6] M. Stojnic, F. Parvaresh, and B. Hassibi, “On the reconstruction of block-sparse signals with an optimal number of measurements,” Mar. 2008. Preprint. [7] V. Cevher, M. F. Duarte, C. Hegde, and R. G. Baraniuk, “Sparse signal recovery using Markov Random Fields,” in Proc. Workshop on Neural Info. Proc. Sys. (NIPS), (Vancouver, Canada), Dec. 2008. [8] Y. M. Lu and M. N. Do, “Sampling signals from a union of subspaces,” IEEE Signal Processing Mag., vol. 25, pp. 41– 47, Mar. 2008. [9] D. Needell and J. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Applied and Computational Harmonic Analysis, June 2008. [10] J. Tropp, A. C. Gilbert, and M. J. Strauss, “Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit,” Signal Processing, vol. 86, pp. 572–588, Apr. 2006. [11] Y. Kim, J. Kim, and Y. Kim, “Blockwise sparse regression,” Statistica Sinica, vol. 16, no. 2, p. 375, 2006. [12] L. Meier, S. van de Geer, and P. Buhlmann, “The group lasso for logistic regression,” Journal of Royal Stat. Society: Series B (Statistical Methodology), vol. 70, no. 1, pp. 53–71, 2008. [13] F. Parvaresh, H. Vikalo, S. Misra, and B. Hassibi, “Recovering Sparse Signals Using Sparse Measurement Matrices in Compressed DNA Microarrays,” IEEE Journal of Selected Topics in Sig. Proc., vol. 2, no. 3, pp. 275–285, 2008. [14] I. F. Gorodnitsky, J. S. George, and B. D. Rao, “Neuromagnetic source imaging with FOCUSS: a recursive weighted minimum norm algorithm,” Electroenceph. and Clin. Neurophys., vol. 95, no. 4, pp. 231–251, 1995. [15] P. J. Garrigues and B. A. Olshausen, “Learning Horizontal Connections in a Sparse Coding Model of Natural Images,” in Advances in Neural Info. Proc. Sys. (NIPS), 2008. [16] T. H. Corman, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms. MIT Press and McGraw-Hill, New York, USA, 2001. 20 Special session on Compressed Sensing Chair: Yonina ELDAR SAMPTA'09 21 SAMPTA'09 22 Compressed sensing signal models - to infinity and beyond? Thomas Blumensath and Michael Davies IDCOM & Joint Research Institute for Signal and Image Processing Edinburgh University, King’s Buildings, Mayfield Road, Edinburgh, UK thomas.blumensath@ed.ac.uk, mike.davies@ed.ac.uk Abstract: Compressed sensing is an emerging signal acquisition technique that enables signals to be sampled well below the Nyquist rate, given a finite dimensional signal with a sparse representation in some orthonormal basis. In fact, sparsity in an orthonormal basis is only one possible signal model that allows for sampling strategies below the Nyquist rate. We discuss some recent results for more general signal models based on unions of subspaces that allow us to consider more general structured representations. These include classical sparse signal models and finite rate of innovation systems as special cases. We consider the dimensionality conditions for two aspects of the compressed sensing inverse problem: the existence of one-to-one maps to lower dimensional observation spaces and the smoothness of the inverse map. On the surface Lipschitz smoothness of the inverse map appears to limit the applicability of compressed sensing to infinite dimensional signal models. We therefore discuss conditions where smooth inverse maps are possible even in infinite dimensions. Finally we conclude by mentioning some recent work [14] which develops the these ideas further allowing the theory to be extended beyond exact representations to structured approximations. 1. Introduction Since Nyquist and Shannon we are used to sampling continuous signals at a rate that is twice the bandwidth of the signal. However recently, under the umbrella title of compressed sensing, researchers have begun to explore how and when signals can be recovered using much fewer samples, but relying on known signal structure. Importantly the papers by Candes, Romberg and Tao [4], [5], [6] and by Donoho [8] have shown that under certain conditions on the signal sparsity and the sampling operator (which are often satisfied by certain random matrices), finite dimensional signals can be stably reconstructed when the number of observations is of the order of the signal sparsity and only logarithmically dependent on the ambient space dimension. Furthermore the reconstruction can be performed using practical polynomial time algorithms. Here we discuss a generalization of the sparse signal model that enables us to consider more structured signal types. We are interested in when the signals can be stably reconstructed (or in some cases approximated). We SAMPTA'09 finish the paper by considering the implications of these results for ∞-dimensional signal models and extending from structured representations to structured approximation. 2. Signal models and problem statement The problem can be formulated as follows. A continuous or discrete signal f from some separable Hilbert space is to be sampled. This is done by using M linear measurements {hf, φn i}n , where h·, ·i is the inner product and where {φn } is a set of vectors from the Hilbert space under consideration. Through the choice of an appropriate orthonormal basis, PN ψ we can replace f by the vector x such that f = i=1 ψi xi . Let Φ ∈ RM ×N be the sensing matrix with entries hψi , φj i. The observation can then be written as y = Φx. (1) In compressed sensing it is paramount to consider signals x that are highly structured and in the original papers, x was assumed to be an exact k-sparse vector, i.e. a vector with not more than k non-zero entries (we discuss a relaxation of this in section 6.). This naturally defines the signal model as a union of N -choose-k k-dimensional subspaces, K. A nice generalization of this model, introduced in [12], is to consider the signal x to be an element from a union of arbitrary subspaces A, defined formally as A= L [ j Sj , Sj = {y = Ωj a, Ωj ∈ RN ×kj , a ∈ Rkj }, (2) where the Ωj are bases for linear subspaces. This general signal model incorporates many previously considered compressed sensing settings, including: • The exact k-sparse signal model, K • Finite Rate of Innovation (FRI) [15] signal models, if we allow an uncountable number of subspaces (e.g. filtered streams of Dirac functions) • signals that are k-sparse in a general, possibly redundant dictionary • exact k-sparse signals whose non-zero elements form a tree 23 • multi-dimensional signals that are k-sparse with common support Importantly this model allows us to incorporate additional structure which can in turn be advantageous by for example reducing signal complexity (as in the tree-constrained sparse model). The aim of compressed sensing is to select a linear sampling operator, Φ, such that there exists a unique inverse −1 map Φ|K : Φ(K) 7→ K. Moreover, for stability, we generally desire Φ(K) to be a bi-Lipschitz embedding of K. In standard compressed sensing this stability is captured by the restricted isometry property [1]. When considering the union of subspaces model we can similarly look for a Φ with a unique stable (Lipschitz) in−1 verse map Φ|A : Φ(A) 7→ A. Below we will discuss both necessary and sufficient conditions for this. 3. Existence of a unique inverse map In [12] it was shown that a necessary condition for a unique inverse map to exist is that M ≥ Mmin := maxi6=j ki + kj . If this is not the case we can find a vector x ∈ Si ⊕ Sj , x 6= 0 such that Φx = 0. The authors further go on to show that when there are a countable number of finite dimensional subspaces then the set of such sampling operators, Φ giving a unique inverse is dense. In [3] we presented a slight refinement of this result for the case where the number of subspaces is finite. In this case almost every sampling operator, Φ, M ≥ Mmin has a unique inverse on A. Furthermore even when maxi ki < M < Mmin for almost every Φ the set of points in A without a unique inverse has zero measure (with respect to the largest subspace). All this suggests that we might be able to perform compressed sensing from only slightly more observations than the dimension of the signal model, i.e. M > dim(A). Unfortunately we have so far ignored the issue of stability which we will see presents additional complications. 4. Stability of the inverse map We now consider when the inverse mapping for the union of subspaces model is stable. Here we are particularly interested in the Lipschitz property of this inverse map and we derive conditions for the existence of a bi-Lipschitz embedding from A into a subset of RM . The Lipschitz property is an important aspect of the map which ensures stability of any reconstruction to perturbations of the observation and in effect specifies the robustness of compressed sensing against noise and quantization errors. Furthermore, in the k-sparse model, the bi-Lipschitz property has also played an important role in demonstrating the existence of efficient and robust reconstruction algorithms through the k-restricted isometry property (RIP) [4, 5, 6, 8]. A natural extension of the k-restricted isometry for the union of subspaces model is [12, 3]: Definition: (A-restricted isometry) For any matrix Φ and any subset A ⊂ RN we define the A-restricted isom- SAMPTA'09 etry constant δA (Φ) to be the smallest quantity such that (1 − δA (Φ)) ≤ kΦxk22 ≤ (1 + δA (Φ)), kxk22 (3) holds for all x ∈ A. If we define the set Ā = {x = x1 + x2 : x1 , x2 ∈ A} then δĀ < 1 controls the Lipschitz constants of Φ and −1 Φ|A (in the standard compressed sensing this is directly equivalent to δ2m ). Specifically let us define: kΦ(y1 ) − Φ(y2 )k2 kΦ|−1 A (x1 ) − ≤ KF ky1 − y2 k2 Φ|−1 A (x2 )k2 ≤ KI kx1 − x2 k2 (4) (5) then a straight forward consequence of the Ā-RIP definition is that: p KF ≤ 1 + δĀ (6) 1 (7) KI ≤√ 1−δĀ Note, as always with RIP, it is prudent to consider appropriate scaling of Φ to balance the upper and lower inequalities in (3). The following results, proved in [3], give neccessary and sufficient conditions for Φ to be an A-restricted isometry. 4.1 Sufficient conditions Theorem 1 For any t > 0, let µ µ ¶ ¶ 2 12 M≥ ln(2L) + k ln +t , cδA δA (8) then there exist a matrix Φ ∈ RM ×N and a constant c > 0 such (1 − δA (Φ))kxk22 ≤ kΦxk22 ≤ (1 + δA (Φ))kxk22 (9) holds for all x from the union of L arbitrary k dimensional subspaces A. What is more, if Φ is generated by randomly drawing i.i.d. entries from an appropriately scaled subgaussian distribution then this matrix satisfies equation (9) with probability at least 1 − e−t . (10) The proof follows the same lines as the construction of random matrices with k-RIP [1]. In contrast to the previous results on the existence of a unique inverse map this sufficient condition is logarithmic in the number of subspaces considered. 4.2 Necessary conditions We next show that the logarithmic dependence on L is in fact necessary. This can be done by considering the distance between the optimally packed unit norm vectors in A as a function of the number of observations. To this end it is useful to define a measure of separation between vectors in the different subspaces: 24 Definition: S (∆(A) subspace separation) Let A = i Si be the union of subspaces Si and let A/Si be the union of subspaces with the ith subspace excluded. The subspace separation of A is defined as    inf   ∆(A) = inf  sup  i xi ∈Si kxi k2 =1 xj ∈A/Si kxj k2 =1  kxi − xj k2  (11) Theorem 2 Let A be the union of L subspaces of dimension no more than k. In order for a linear map Φ : A 7→ RN to exist such that it has a Lipschitz constant KF and such that its inverse map ΦA −1 : Φ(A) 7→ A has a Lipschitz constant KI , it is necessary that ln ln(L) ³ ´. 4KF KI ∆(A) (12) Therefore, for a fixed subspace separation, the necessary number of samples grows logarithmically with the number of subspaces. This last fact suggests that extending the compressed sensing framwork to infinite dimensional signals may be problematic. For example, it implies that the log(N ) dependence in the standard k-sparse signal model p is necessary (from the easily derived bound ∆(A) ≥ 2/k) and therefore such a framework does not directly map to infinite dimensional signal models. 5. 2 routes to infinity Most of the results in compressed sensing assume that the ambient signal space, N , is finite dimensional. This also implies in the case of the k-sparse signal model (k < ∞) that the number of subspaces, L, in the signal model is also finite. In fact we would ideally like to understand when we can perform compressed sensing when either or both the quantities, N and L, are infinite. Specifically when might a stable unique inverse for Φ|A exist based upon a finite number of observations. For example the Finite Rate of Innovation (FRI) sampling framework introduced by Vetterli et al. [15] provides sampling strategies for signals composed of the weighted sum of a finite stream of diracs. In this case both N and L are uncountably infinite while M > 2k is sufficient to reconstruct the signal. Below we consider two possible routes to infinity and comment on their stability. Note other routes to infinity also exist, such as when we let k, M and N → ∞ while keeping k/M and M/N finite [9], or in the blind multi-band signal model [10, 13], where the sampling rate, M/N , is finite but where M, N → ∞. 5.1 k, L finite and N infinite We begin with the easy case that the reader might consider to be a bit of a cheat. SAMPTA'09 U := L M Si (13) i=1 We can now state the following necessary condition for the existence of an A-restricted isometry in terms of ∆(A) and the observation dimension. M≥ Consider a signal model A ⊂ H, where H is an infinite dimensional separable Hilbert space (i.e. N = ∞). Assume that both k and L are finite. In this case the union of subspace model A automatically lives within a finite dimensional subspace, U ⊂ H defined as: Note that dim(U ) ≤ kL < ∞. We can therefore first project onto the finite dimensional subspace U and then apply the above theory to guarantee both the existence and stability of inverse mappings in this setting. Two signal models that naturally fit into this framework are: the block-based sparsity model [11], which is related to the multiple measurement vectors problem and has been used recently in a blind multi-band signal acquisition scheme [13]; and the tree-based sparsity model where the usual k-sparse model is constrained to form a rooted subk tree where L ≤ (2e) k+1 independent of N [3] and naturally occurs in multi-resolution modelling. This model has also been recently extended to include tree-compressible signals [14]: see section 6.. 5.2 k finite, L and N infinite From Theorem 2 the only way in which the number of subspaces can be infinite (or even un-countable) while permitting a stable inverse mapping, Φ|−1 A , with M finite is if the subspace separation, ∆(A) = 0. In such a case the union of subspace model may often form a nonlinear signal manifold. Note also that when we have an uncountable union of k-dimensional subspaces the dimension of the signal model may well be greater than k. As an example let us consider the case of a simple Finite Rate of Innovation process [15]. Such models can be described as an uncountable union of subspaces and the key existence results from [12] immediately apply. However this tells us nothing about stability. For simplicity we will limit ourselves to a basic form of periodic FRI signal on T = R/Z which can be written as: x(t) = G(τ, a)(t) := k−1 X i=0 ai ψ(t − τi ) (14) where ψ are also periodic on T, τ = {τ1 , . . . , τk } and a = {a1 , . . . , ak } ∈ Rk . In [15] the possibility of a periodic Dirac stream is considered, i.e. ψ(t) = δ(t), t ∈ [0, 1]. Here we avoid the Dirac stream by restricting to the case where ψ(t) ∈ L2 (T) and directly consider the signal model defined by the parametric mapping: G : U × Rk 7→ L2 (T) (15) where U = {τ ∈ Rk : τi < τj , ∀i < j}. Individual subspaces can be identified with a given τ . Furthermore the continuity of the shift operator implies that for any ψ(t) ∈ L2 (T), the associated union of subspace model, A has ∆(A) = 0. Equivalently we can only find a finite SL ′ number of subspaces, Sj′ , whose union, A′ := j Sj′ ⊂ 25 A has ∆(A′ ) ≥ ǫ > 0). Theorem 2 can then be used to lower bound the Lipschitz constants of any embedding in terms of the number of subspaces, L′ of any such A′ . We have seen that Theorem 2 does not preclude a stable embedding for such systems. However there is clearly more work needed to determine when such models can have finite dimensional stable embeddings. One possible avenue of research would be to examine the recently derived sufficient conditions for stable embedding of general smooth manifolds [7, 2]. 6. ...and beyond? In reality all the union of subspace models we have considered are an idealization. In practise we can expect to, at most, be able to approximate a signal by one from a union of subspaces model. In traditional compressed sensing this is the difference between finding a sparse representaion of an exact k-sparse signal and finding a good sparse approximation of a compressible signal (i.e. one that is well approximated by a k-sparse signal). Recent work at Rice university [14] has shown that for the special case of restricted k-sparse models (such as the tree-restricted sparsity) the exact union of subspace model can be extended to approximate union of subspace models that are subsets of compressible signal models. In order to go beyond exact representations further conditions are introduced. Notably: 1. Nested Approximation Property (NAP) - this specifies sets of models, MK , that are naturally nested. 2. Restricted Amplification Property (RAmP) - this imposes additional regularity on the sensing matrix Φ when acting on the difference between the MK subspaces and the MK−1 subspaces (in the k-sparse case it is interesting to note that the RAmP condition is automatically satisfied by the k-RIP condition). There are therefore a number of interesting open questions. For example, are such additional conditions typically necessary to go beyond exact subspace representations? Furthermore can these additional tools be applied successfully to arbitrary union of subspace models (i.e. ones that are not subsets of the standard k-sparse model)? 7. Acknowledgements This research was supported by EPSRC grants D000246/1 and D002184/1. MED acknowledges support of his position from the Scottish Funding Council and their support of the Joint Research Institute with the Heriot-Watt University as a component part of the Edinburgh Research Partnership. [2] R Baraniuk and M Wakin. Random projections of smooth manifolds. Foundations of Computational Mathematics, 2007. [3] T. Blumensath and M. E. Davies. Sampling theorems for signals from the union of linear subspaces. Awaiting Publication, IEEE Transactions on Information Theory, 2008. [4] E. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory, 52(2):489–509, Feb 2006. [5] Emmanuel Candès and Justin Romberg. Quantitative robust uncertainty principles and optimally sparse decompositions. Foundations of Comput. Math, 6(2):227 – 254, 2006. [6] Emmanuel Candès and Terence Tao. Near optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. on Information Theory, 52(12):5406 – 5425, 2006. [7] K. L. Clarkson. Tighter bounds for random projections of manifolds. In Proceedings of the twentyfourth annual symposium on Computational geometry, pages 39–48, 2008. [8] D. Donoho. Compressed sensing. IEEE Trans. on Information Theory, 52(4):1289–1306, 2006. [9] D. Donoho and J. Tanner. Counting faces of randomly-projected polytopes when the projection radically lowers dimension. Journal of the AMS, 2009. [10] Y. C. Eldar. Compressed sensing of analog signals. submitted to IEEE Trans. on Signal Processing, 2008. [11] Y. C. Eldar and M. Mishali. Robust recovery of signals from a union of subspaces. Submitted to IEEE Trans. Inf Theory, arXiv.org 0807.4581, 2008. [12] Y. Lu and M. Do. A theory for sampling signals from a union of subspaces. IEEE transactions on signal processing, 56(6):2334–2345, 2008. [13] M. Mishali and Y. C. Eldar. Blind multiband signal reconstruction: Compressed sensing for analog signals. IEEE Trans. Signal Proc., 57(3):993–1009, 2009. [14] M.F. Duarte R. G. Baraniuk, V. Cevher and C. Hegde. Model based compressed sensing. Submitted to IEEE Transactions on Information Theory, 2008. [15] M. Vetterli, P. Marziliano, and T. Blu. Sampling signals with finite rate of innovation. IEEE Transactions on Signal Processing, 50(6):1417–1428, 2002. References: [1] R. Baraniuk, M. Davenport, R. De Vore, and M. Wakin. A simple proof of the restricted isometry property for random matrices. Constructive Approximation, 28(3):253–263, 2008. SAMPTA'09 26 Special session on Frame Theory and Oversampling Chair: Bernhard BODMANN SAMPTA'09 27 SAMPTA'09 28 Gradient descent of the frame potential Peter G. Casazza (1) and Matthew Fickus (2) (1) Department of Mathematics, University of Missouri, Columbia, MO 65211 USA. (2) Department of Mathematics & Statistics, Air Force Institute of Technology, WPAFB, OH 45433 USA. pete@math.missouri.edu, matthew.fickus@afit.edu Abstract: Unit norm tight frames provide Parseval-like decompositions of vectors in terms of possibly nonorthogonal collections of unit norm vectors. One way to prove the existence of unit norm tight frames is to characterize them as the minimizers of a particular energy functional, dubbed the frame potential. We consider this minimization problem from a numerical perspective. In particular, we discuss how by descending the gradient of the frame potential, one, under certain conditions, is guaranteed to produce a sequence of unit norm frames which converge to a unit norm tight frame at a geometric rate. This makes the gradient descent of the frame potential a viable method for numerically constructing unit norm tight frames. 1. Introduction The analysis operator of some finite sequence of vectors {fm }M m=1 in an N -dimensional Hilbert space HN is the operator F : HN → CM , (F f )(m) := hf, fm i. The corresponding frame operator is F ∗ F : HN → HN , F ∗F f = M X hf, fm ifm . m=1 Generally speaking, frame theory is the study of how ∗ {fm }M m=1 may be chosen in order to guarantee that F F M is well-conditioned. In particular, {fm }m=1 is a frame for HN if there exists frame bounds 0 < A ≤ B < ∞ such that AI ≤ F ∗ F ≤ BI, and is a tight frame if A = B, that is, if F ∗ F = AI. Typically, one’s choice of fm ’s is restricted according to some nonlinear, application-specific constraints. Of particular interest is the case of unit norm tight frames, that is, tight frames for which kfm k = 1 for all m = 1, . . . , M ; such frames, known to exist for any M ≥ N , provide Parseval-like decompositions in terms of vectors of unit length, even though these vectors are possibly nonorthogonal. Despite an ever-growing list of specific constructions of such frames, little is known about the manifold structure of the set of all unit norm tight frames. In the hunt for unit norm tight frames, the frame potential, specifically defined as: FP({fm }M m=1 ) := M X m,m′ =1 SAMPTA'09 |hfm , fm′ i|2 M for any sequence {fm }M m=1 ∈ HN , is a useful tool. Specifically, the frame potential quantifies the total orthogonality of a system of vectors by measuring the total potential energy stored within that system under a certain force which encourages orthogonality. Regarded as a functional over  M M SM N = {fm }m=1 ∈ HN : kfm k = 1, m = 1, . . . , M , one may show that when M ≥ N , the local minimizers of the frame potential are precisely the unit norm tight frames of M elements for HN . In particular, as the frame potential is continuous and SM N is compact, one may conclude that such frames indeed exist for any M ≥ N . In this paper, we consider the minimization of the frame potential from a numerical perspective. In particular, in the next section, we compute the gradient of the frame M potential, namely a specific direction {gm }M m=1 ∈ HN M in which to push {fm }m=1 so as to achieve the greatest instantaneous decrease of FP. Then, in an improvement over typical uses of gradient descent, we compute an exact step size in which to travel in this direction so as to produce a certain decrease in potential. In the third section, we estimate the size of this decrease in relation to how far the frame potential is from its minimum; under sufficient conditions, this estimate may be used to show that by descending the gradient of the frame potential, one may produce a sequence of unit norm frames which converge to a unit norm tight frame at a geometric rate. The frame potential was introduced in [1], with its domain of optimization being later generalized in [4]. It has been used to characterize tight filter bank frames [5, 6]. The frame potential may also be used to prove the existence of tight fusion frames [3], and the local minimizers of the fusion frame potential are themselves a subject of interest [7, 9]. Further generalizations of the frame potential are considered in [2, 8]. 2. The gradient of the frame potential Our goal is to numerically minimize the frame potential over SM N . As our domain of optimization is a product of spheres as opposed to the entire space HM N , our approach departs from the classical theory of gradients. In particuM M M lar, given {fm }M m=1 ∈ SN and any {gm }m=1 ∈ HN such that hfm , gm i = 0 for all m = 1, . . . , M , we shall compute the rate of change of the frame potential as each fm 29 is pushed along a great circle with tangent velocity gm . We then define the gradient of FP to be that particular {gm }M m=1 which makes this directional derivative as large as possible. We begin with the following result, which gives the first two derivatives of the frame potential of a single parameter family of frames: Lemma 1 (Lemma 2 of [3]). For any set of twicedifferentiable parameterized curves {fm (·)}M m=1 in HN , the first two derivatives of ϕ(t) := FP({fm (t)}M m=1 ) are:  ϕ̇(t) = 4ReTr Ḟ (t)F ∗ (t)F (t)F ∗ (t)) ,  ϕ̈(t) = 4ReTr F̈ (t)F ∗ (t)F (t)F ∗ (t) + 4kḞ (t)F ∗ (t)k2HS ∗ + 2kḞ (t)F (t) + F ∗ To compute the terms in (4), note that f˙m (t) = −kgm k sin(kgm kt)fm + cos(kgm kt)gm (5) for any m such that gm 6= 0. As (5) also immediately holds when gm = 0, we have f˙m (0) = gm for all m. Thus, by Lemma 1,  ϕ̇(0) = 4ReTr Ḟ (0)F ∗ (0)F (0)F ∗ (0)  = 4ReTr Ḟ (0)F ∗ F F ∗ = 4Re (t)Ḟ (t)k2HS , M X hḞ (0)F ∗ F F ∗ em , em i M X hF ∗ F fm , f˙m (0)i M X hF ∗ F fm , gm i. m=1 where Ḟ (t) and F̈ (t) are the analysis operators of M ¨ {f˙m (t)}M m=1 and {fm (t)}m=1 , respectively. We now use Lemma 1 along with Taylor’s theorem to asymptotically estimate the change in frame potential one M obtains by perturbing a given {fm }M m=1 ∈ SN along any choice of great circles. To be precise, letting:  ⊥ M := {gm }M ⊕fm m=1 ∈ HN : hfm , gm i = 0, ∀m , = 4Re m=1 = 4Re (6) m=1 Next, as taking the derivative of (5) yields f¨m (t) = −kgm k2 fm (t) for any m, we have: we have the following: M M ⊥ Theorem 2. For any {fm }M m=1 ∈ SN , {gm }m=1 ∈ ⊕fm , let: = fm (t) := cos(kgm kt)fm + (sin(kgm kt)/kgm k) gm kfm (t) − fm k2 ≤ t2 m=1 M X kgm k2 , M X hF̈ (t)F ∗ (t)F (t)F ∗ (t)em , em i M X hF ∗ (t)F (t)fm (t), f¨m (t)i M X hF ∗ (t)F (t)fm (t), −kgm k2 fm (t)i m=1 whenever gm 6= 0 and let fm (t) := fm otherwise. Then, M {fm (t)}M m=1 ∈ SN for any t ∈ R, and satsifies: M X Tr(F̈ (t)F ∗ (t)F (t)F ∗ (t)) (1) = m=1 = m=1 m=1 as well as: =− FP({fm (t)}M m=1 ) ≤ FP({fm }M m=1 ) + 4tRe M X M X kgm k2 kF (t)fm (t)k2 . (7) m=1 hF ∗ F fm , gm i In particular, combining (7) with Lemma 1 gives: m=1 + 8M t 2 M X kgm k2 . (2) m=1 ϕ̈(t) = −4 M X kgm k2 kF (t)fm (t)k2 m=1 Proof. It is straightforward to show that kfm (t)k = 1 for all m = 1, . . . , M and all t ∈ R. To show (1), note that for any m such that gm 6= 0, we have: 2 kfm (t) − fm k2 = cos(kgm kt) − 1 + sin2 (kgm kt) = 4 sin2 (kgm kt/2) ≤ kgm k2 t2 . + 2kḞ ∗ (t)F (t) + F ∗ (t)Ḟ (t)k2HS . ϕ(t) ≤ ϕ(0) + tϕ̇(0) + 12 t2 max |ϕ̈(s)|. s∈R (8) To bound (8), note that by (5), (3) As (3) also immediately holds for any m such that gm = 0, we may sum (3) over all m to conclude (1). To show (2), we apply Taylor’s theorem to ϕ(t) = FP({fm (t)}M m=1 ) at t = 0: SAMPTA'09 + 4kḞ (t)F ∗ (t)k2HS kF (t)k2HS = M X kfm (t)k2 = M, M X kf˙m (t)k2 = m=1 kḞ (t)k2HS = m=1 M X kgm k2 , m=1 (4) 30 M over all {gm }M m=1 ∈ SN and all t ∈ R. We note immediately from (12) that the optimal {gm }M m=1 and t are not unique, though we now show that their product is. Indeed, for any fixed m, letting Pm denote the orthogonal projection of HN onto the orthogonal complement of fm , we have: and thus, taking absolute values of (8), we have: |ϕ̈(t)| M X ≤4 kgm k2 kF (t)fm (t)k2 + 4kḞ (t)F ∗ (t)k2HS m=1 + 2kḞ ∗ (t)F (t) + F ∗ (t)Ḟ (t)k2HS M X ≤4 2 kgm k kF (t)k22 kfm (t)k2 + 4kḞ (t)F ∗ RehF ∗ F fm + 2M tgm , 2M tgm i = RehF ∗ F fm + 2M tgm , 2M tPm gm i (t)k2HS m=1 ∗ ∗ + 2 kḞ (t)F (t)kHS + kF (t)Ḟ (t)kHS ≤4 M X = RehPm F ∗ F fm + 2M tgm , 2M tgm i 2 = m=1 = 16M kgm k2 . (9) m=1 In light of the Taylor expansion (2), one, in light of Cauchy’s inequality, might expect the gradient of FP, M namely the {gm }M m=1 ∈ HN which maximizes the linear term M X hF ∗ F fm , gm i, Re gm = Pm F ∗ F fm = F ∗ F fm − hF ∗ F fm , fm ifm = F ∗ F fm − kF fm k2 fm , M Theorem 3. For any {fm }M m=1 ∈ SN , the minimizer of ⊥ the bound in (2) over all t ∈ R and {gm }M m=1 ∈ ⊕fm is given by t = −1/(4M ) and In particular, there exists M X {f˜m }M m=1 m = 1, . . . , M. ∈ SM N such that: kf˜m − fm k2 kgm k2 = hF ∗ F fm , gm i ≤ M  1 X kF ∗ F fm k2 − kF fm k4 , (10) 16M 2 m=1 and such that: M FP({f˜m })M m=1 ) − FP({fm }m=1 ) ≤− M  1 X kF ∗ F fm k2 − kF fm k4 . (11) 2M m=1 Proof. We seek to minimize: 4tRe M X hF ∗ F fm , gm i + 8M t2 m=1 M X kgm k2 m=1 M 2 X = RehF ∗ F fm + 2M tgm , 2M tgm i (12) M m=1 SAMPTA'09 = F ∗ F fm , F ∗ F fm − kF fm k2 fm = kF ∗ F fm k2 − kF fm k4 , which, when substituted into (1) and (2) yields (10) and (11), respectively, where f˜m := fm (−1/4M ). Note that as kF fm k4 = |hF ∗ F fm , fm i|2 ≤ kF ∗ F fm k2 for all m = 1, . . . , M , Theorem 3 provides a direction and M step size in which to travel from a given {fm }M m=1 ∈ SN so as to produce a concrete decrease in frame potential. In the next section, we estimate the size of this decrease in terms of how far the current potential is from its minimum, and in so doing, provide an upper bound on the rate at which repeated applications of Theorem 3 will asymptotically produce a unit norm tight frame. 3. m=1 (13) as claimed. Moreover, in light of (13), we have: m=1 ∗ to be given by gm = F F fm for all m = 1, . . . , M . Indeed, one may show that this would be the correct gradient if the frame potential was being regarded as a functional over the entire space HM N . However, as we are M ⊥ optimizing over SM N , we require that {gm }m=1 ∈ ⊕fm , M and as such, instead take {gm }m=1 to be the projection of ⊥ {F ∗ F fm }M m=1 onto ⊕fm . In the next result, we formally verify that such a choice is indeed optimal.  with equality if and only if Pm F ∗ F fm + 4M tgm = 0. Thus, to minimize (12), and consequently to minimize the upper bound in (2), we may take t = −1/(4M ) and Substituting (7) and (9) into (4) yields (2). gm = F ∗ F fm − kF fm k2 fm , kPm F ∗ F fm + 4M tgm k2 − kPm F ∗ F fm k2 ≥ − 14 kPm F ∗ F fm k2 , kgm k2 kF (t)k2HS + 12kḞ (t)k2HS kF (t)k2HS M X 1 4 Gradient descent of the frame potential We now consider the gradient descent of the frame potential: by repeatedly applying Theorem 3, we hope to produce a sequence of unit norm frames which are converging to a unit norm tight frame. Here, the main idea is to estimate the right hand side of (11) as a proportion of the difference between the current value of the frame potential and its minimum. To be clear, in [1], the minimum value of FP over SM N is found to be M 2 /N ; we now show how the quantity 2 FP({fm }M m=1 ) − M /N is a good metric on the tightness M of {fm }m=1 . Indeed, letting {λn }N n=1 be the eigenvalues of the corresponding frame operator F ∗ F , we have: N X n=1 λn = Tr(F ∗ F ) = Tr(F F ∗ ) = M X kfm k2 = M. m=1 (14) 31 M In particular, (14) implies that {fm }M m=1 ∈ SN is tight if M and only if λn = N for all n = 1, . . . , N . Moreover, as N  ∗ 2 X ∗ 2 ) = kF F k = Tr (F F ) = λ2n , FP({fm }M m=1 HS n=1 another consequence of (14) is that: FP({fm }M m=1 ) = N X (λn − M N N X (λn − M 2 N) + 2(0) + FP({f˜m })M m=1 ) − and thus: ≤ 1− M2 N FP({fm }M m=1 ) − = N X (λn − M 2 N) . (15) n=1 That is, the difference between the frame potential and its minimum is the square of the distance of the eigenvalues of F ∗ F from their optimal values. Using this fact, one may show: M Theorem 4. For any {fm }M m=1 ∈ SN , kF ∗ F fm k2 − kF fm k4 m=1 M2 N  , (16) where δ is defined as: δ := inf max min |hfm , en i|, m kF ∗ F fm k2 − kF fm k4 hfm , en ien 2 n=1 − D F ∗F N X hfm , en ien , fm E 2 n=1 = N X λn hfm , en ien 2 − n=1 = λ2n |hfm , en i|2 − N X λn − n=1 = N X hfm , en ihλn en , fm i 2 n=1 N X N X λn |hfm , en i|2 2 (18) n=1 n=1 N X 2 λp |hfm , ep i|2 |hfm , en i|2 , (19) p=1 where the equality of (18) and (19) arises from the fact that they both represent the variance of the random variable {λn }N n=1 with respect to the probability density function {|hfm , en i|2 }N n=1 . SAMPTA'09  , (20) M2 N  δ2 2M FP({fm }M m=1 ) − M2 N  , (21) By repeatedly applying Theorem 5, one produces a sequence of unit norm frames whose tightness, measured in terms of (15), improves at a geometric rate, provided all δ’s remain above some positive lower bound; finding such a bound is a subject of current research. Acknowledgments Casazza and Fickus were supported by NSF DMS 0704216 and AFOSR F1ATA07337J001, respectively. The views expressed in this article are those of the authors and do not reflect the official policy or position of the United States Air Force, Department of Defense, or the U.S. Government. References: For sake of space, we omit the complete proof of Theorem 4; the main idea is to let {en }N n=1 be an orthonormal eigenbasis of F ∗ F , and note that for any m = 1, . . . , M , N X M2 N (17) n where the infimum is taken over all orthonormal bases {en }N n=1 of HN . = F ∗F FP({fm }M m=1 ) − where δ is given in (17). 4.  ≥ δ 2 FP({fm }M m=1 ) − N +1 16M and such that: M2 N , n=1 M X kf˜m − fm k2 ≤ m=1 n=1 = M Theorem 5. For any {fm }M m=1 ∈ SN , there exists M M {f˜m }m=1 ∈ SN such that: M X M 2 N) + The significance of Theorem 4 is that it bounds the decrease in frame potential given in Theorem 3 in terms M of (15), that is, how far {fm }M m=1 ∈ SN is from being tight. Indeed, using Theorem 4, one may show: [1] J.J. Benedetto and M. Fickus. Finite normalized tight frames. Adv. Comput. Math., 18:357–385, 2003. [2] I. Bengtsson and H. Granström. The frame potential, on average. Preprint. [3] P.G. Casazza and M. Fickus. Minimizing fusion frame potential. To appear in Acta Appl. Math. [4] P.G. Casazza, M. Fickus, J. Kovačević, M. Leon and J. Tremain. A physical interpretation of tight frames. In C. Heil, editor, Harmonic analysis and applications, pp. 51–76, 2006. [5] M. Fickus, B.D. Johnson, K. Kornelson, and K. Okoudjou. Convolutional frames and the frame potential. Appl. Comput. Harmon. Anal., 19:77–91, 2005. [6] B.D. Johnson and K. Okoudjou. Frame potential and finite abelian groups. Contemp. Math., 464:137– 148, 2008. [7] P. Massey. Optimal reconstruction systems for erasures and for the q-potential. Preprint. [8] P. Massey and M. Ruiz. Minimization of convex functionals over frame operators. To appear in Adv. Comput. Math. [9] P. Massey, M. Ruiz and D. Stojanoff. The structure of minimizers of the frame potential of fusion frames. Submitted. 32 Gabor frames with reduced redundancy Ole Christensen (1) , Hong Oh Kim (2) and Rae Young Kim (3) (1) Department of Mathematics, Technical University of Denmark, Building 303, 2800 Lyngby, Denmark. (2) Department of Mathematical Sciences, KAIST, Daejeon, Korea. (3) Department of Mathematics, Yeungnam University, Gyeongsan-si,Korea. Ole.Christensen@mat.dtu.dk, kimhong@kaist.edu, rykim@ynu.ac.kr This work was supported by the Korea Science and Engineering Foundation (KOSEF) Grant funded by the Korea Government(MOST)(R01-2006-000-10424-0) and by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) (KRF-2006-331-C00014). Abstract: 2. The range 2N1−1 < b < N1 Considering previous constructions of pairs of dual Gabor We first cite a result from [2]. It yields an explicit conframes, we discuss ways to reduce the redundancy. The struction of dual Gabor frames: focus is on B-spline type windows. 1. Introduction We will consider Gabor systems in L2 (R), i.e., families of functions {Emb Tn g}m,n∈Z, where Emb Tn g(x) := e 2πimbx g(x − na). If there exists a constant B > 0 such that X |hf, Emb Tn gi|2 ≤ B ||f ||2 , ∀f ∈ L2 (R), m,n∈Z then {Emb Tn g}m,n∈Z is called a Bessel sequence. If there exist two constants A, B > 0 such that X A ||f ||2 ≤ |hf, Emb Tn gi|2 ≤ B ||f ||2 , ∀f ∈ L2 (R), m,n∈Z then {Emb Tn g}m,n∈Z is called a frame. If {Emb Tn g}m,n∈Z is a frame with dual frame {Emb Tn h}m,n∈Z , then X f= hf, Emb Tn hiEmb Tn g, f ∈ L2 (R), m,n∈Z where the series expansion converges unconditionally in L2 (R). Our starting point is the duality condition for Gabor frames, originally due to Ron and Shen [4]. We use the version due to Janssen [3]: Lemma 1..1 Two Bessel sequences {Emb Tn g}m,n∈Z and {Emb Tn h}m,n∈Z form dual Gabor frames for L2 (R) if and only if X g(x − n/b + k)h(x + k) = bδn,0 (1..1) k∈Z for a.e. x ∈ [0, 1]. The Bessel condition in Lemma 1..1 is always satisfied for bounded windows with compact support, see [1]. Note that if g and h have compact support, we only need to check a finite number of conditions in (1..1). In this paper we will usually choose b so small that only the condition for n = 0 has to be verified. SAMPTA'09 Theorem 2..1 Let N ∈ N. Let g ∈ L2 (R) be a realvalued bounded function with supp g ⊂ [0, N ], for which X g(x − n) = 1. (2..1) n∈Z Let b ∈]0, 2N1−1 ]. Consider any scalar sequence −1 {an }N n=−N +1 for which a0 = b and an + a−n = 2b, n = 1, 2, · · · N − 1, (2..2) and define h ∈ L2 (R) by h(x) = N −1 X an g(x + n). (2..3) n=−N +1 Then g and h generate dual frames {Emb Tn g}m,n∈Z and {Emb Tn h}m,n∈Z for L2 (R). The above result can be extended: Corollary 2..2 Consider any b ≤ 1/N. With g and an as in Theorem 2..1, the function ! N −1 X h(x) = an g(x + n) χ[0,N ] (x) (2..4) n=−N +1 is a dual frame generator of g. Proof. Consider the condition (1..1) for n = 0; only the values of h(x) for x ∈ [0, N ] play a role, so since the condition holds for the function in (2..3), it also holds for the function in (2..4).  The cut-off in (2..4) yields a non-smooth function. However, for any b < 1/N, we might modify h slightly and obtain a smooth dual generator: In particular, we obtain the following: Corollary 2..3 Consider any b < 1/N, and take ǫ < 1/b − N. With g as in Theorem 2..1, the function h(x) = b, x ∈ [0, N ] has an extension to a function of desired smoothness, supported on [−ǫ, N + ǫ], which is a dual frame generator of g. 33 Proof. The choice an = b, n = −N + 1, . . . , N − 1, leads to 0.7 0.6 N −1 X 0.5 an g(x + n) = b, x ∈ [0, N ]. 0.4 n=−N +1 0.3 0.2 Given ǫ < 1/b − N and any functions φ1 : [−ǫ, 0[→ R and φ2 :]N, N + ǫ] → R, the function  φ1 (x), x ∈ [−ǫ, 0[,   PN −1  a g(x + n) = b, x ∈ [0, N ], n=−N +1 n h(x) =  φ2 , x ∈]N, N + ǫ],    0, x∈ / [−ǫ, N + ǫ], 0.1 0 -2 2 4 6 x Figure 1: B3 and the dual generator h3 in (2..5). 0.25 0.20 will satisfy (1..1); in fact, for n 6= 0, the support of the functions g(· ± n/b) and h are disjoint, and for n = 0 we are (for all relevant values of x) back at the function in (2..4). The functions φ1 and φ2 can be chosen such that the function h has the desired smoothness.  The assumptions in Theorem 2..1 are tailored to B-splines, defined inductively by 0 0.15 0.10 0.05 K 1 0 1 2 3 4 Figure 2: The function h in (3..13).. B1 := χ[0,1] , BN +1 := BN ∗ B1 . h3 in (2..5) from [−2, 0] to [−1/2, 0] and from [3, 5] to [3, 31/2] and obtain the dual Direct calculations shows that B2 (x) =   x 2−x  0 if x ∈ [0, 1], if x ∈ [1, 2], otherwise, and B3 (x) =     1 2 2x 2 −x + 3x − 32 1 2 x − 3x + 29    2 0 if x ∈ [0, 1], if x ∈ [1, 2], if x ∈ [2, 3], otherwise. In general, the functions BN are (N − 2)−times differentiable piecewise polynomials (explicit expressions are known). Furthermore, supp BN = [0, N ], and the partition of unity condition (2..1) is satisfied. In case g = BN , the dual generators in Theorem 2..1 are splines, of the same smoothness as BN itself. By comP −1 a pressing the function N n=−N +1 n g(x + n) from the interval [−N + 1, 0] to [−ǫ, 0] and from [N, 2N − 1] to [N, N + ǫ] we obtain a dual in (2..3) with the same features: Example 2..4 For the B-spline B3 (x) and b = 1/5, Theorem 2..1 yields the symmetric dual  1/2 x2 + 2 x + 2, x ∈ [−2, −1[,    2  −1/2 x + 1, x ∈ [−1, 0[,   1  1, x ∈ [0, 3[, h3 (x) = −1/2 x2 + 3 x − 7/2, x ∈ [3, 4[, 5    1/2 x2 − 5x + 25/2, x ∈ [4, 5[,    0, x∈ / [0, 5[. (2..5) See Figure 1. Now, for b = 1/4, we can use Corollary 2..3 for ǫ < 4 − 3 = 1. Taking ǫ = 1/2, we compress the function SAMPTA'09 h(x) =  1/2 (4x)2 + 2 (4x) + 2, x ∈ [−1/2, −1/4[,    2  −1/2 (4x) + 1, x ∈ [−1/4, 0[,     1, x ∈ [0, 3[,   1  −1/2 (4(x − 3) + 3)2 + 3 (4(x − 3) + 3) − 7/2, x ∈ [3, 3 + 1/4[, 4   2  1/2 (4(x − 3) + 3) − 5(4(x − 3) + 3) + 25/2,     x ∈ [3 + 1/4, 3 + 1/2[,    0, x∈ / [−1/2, 3 + 1/2[.  8 x2 + 8 x + 2,     −8 x2 + 1,    1 1, = −8 x2 + 48 x − 71, 4    8 x2 − 56 x + 98,    0, See Figure 2. x ∈ [−1/2, −1/4[, x ∈ [−1/4, 0[, x ∈ [0, 3[, x ∈ [3, 3 + 1/4[, x ∈ [3 + 1/4, 3 + 1/2[, x∈ / [−1/2, 3 + 1/2[.  3. B2 and 1/2 < b < 1 In the following discussion, we consider dual windows associated with a Gabor frame {Emb Tn B2 }m,n∈Z generated by the B-spline B2 . The arguments can be extended to general functions supported on [0, 2]. Take any function h with values specified only on [0, 2] and such that X B2 (x + k)h(x + k) = 1, x ∈ [0, 1]. (3..1) k∈Z In fact, due to the support of B2 , only the values for h(x) for x ∈ [0, 2] play a role for that condition. We know that 34 for any b ≤ 1/2 the function generates – up to a certain scalar multiple – a dual of g. Now consider any 1/2 < b < 1; that is, we have 1 < 1/b < 2. Similarly, considering (3..3) for x ∈ [0, 1] = [0, 2 − 1/b] ∪ [2 − 1/b, 1] leads to (3..5) and Lemma 3..1 Assume that h(x), x ∈ [0, 2] is chosen such that (3..1) is satisfied. The the following hold: B2 (x + 1/b − 2)h(x − 2) + B2 (x + 1/b − 1)h(x − 1) (i) If X = 0, x ∈ [2 − 1/b, 1]; B2 (x − 1/b + k)h(x + k) = 0, x ∈ R, (3..2) k∈Z and X (3..7) the equation (3..7) only involves h(x) for x ∈ [−1/b, −1] ∪ [1 − 1/b, 0], B2 (x + 1/b + k)h(x + k) = 0, x ∈ R, (3..3) and (3..5) implies that k∈Z then h(x − 1) = B2 (x − 1/b)h(x) + B2 (x − 1/b + 1)h(x + 1) = 0, i.e., x ∈ [1/b, 2], (3..4) B2 (x + 1/b − 1)h(x − 1) + B2 (x + 1/b)h(x) = 0 h(x) = −B2 (x + 1/b)h(x) , x ∈ [0, 2 − 1/b], B2 (x + 1/b − 1) −B2 (x + 1/b + 1)h(x + 1) , x ∈ [−1, 1 − 1/b]. B2 (x + 1/b) For the proof of (ii), the condition h(x) = 0, x ∈ / [0, 2] ∪ [−1, 1 − 1/b] ∪ [1 + 1/b, 3], x ∈ [0, 2 − 1/b]. (3..5) implies that (3..6) and (3..7) are satisfied. By construction, (3..2) and (3..3) are satisfied.  These equations determine h(x) for x ∈ [−1, 1 − 1/b] ∪ [1 + 1/b, 3]. (ii) If h(x) for x ∈ [−1, 1 − 1/b] ∪ [1 + 1/b, 3] is chosen such that (3..4) and (3..5) are satisfied, and h(x) = 0, x ∈ / [0, 2] ∪ [−1, 1 − 1/b] ∪ [1 + 1/b, 3], Lemma 3..1 shows that if we want that (3..1), (3..2), and (3..3) hold for some b ∈]1/2, 1], then h in general will take values outside [0, 2]. However, the proof shows that we under certain circumstances can find a solution h having support in [0, 2]. In that case, the support will actually be a subset of [0, 2]: then (3..2) and (3..3) hold. Proof. We consider (3..2) for x ∈ [1, 2], and split into two cases: For x ∈ [1, 1/b], (3..2) yields that 0 = B2 (x − 1/b + 1)h(x + 1) +B2 (x − 1/b + 2)h(x + 2); (3..6) the equation only involve h(x) for x ∈ [2, 1 + 1/b] ∪ [3, 2 + 1/b]. For x ∈ [1/b, 2], (3..2) yields that 0 = B2 (x − 1/b)h(x) + B2 (x − 1/b + 1)h(x + 1); since h(x) is known, this implies that h(x + 1) = −B2 (x − 1/b)h(x) , x ∈ [1/b, 2], B2 (x − 1/b + 1) that is, h(x) = −B2 (x − 1/b − 1)h(x − 1) , x ∈ [1/b + 1, 3]. B2 (x − 1/b) SAMPTA'09 Corollary 3..2 Let b ∈]1/2, 1]. Assume that supp h ⊆ [0, 2] and that (3..1) and (3..2) holds. Then h(x) = 0, x ∈ [0, 2 − 1/b] ∪ [1/b, 2]. (3..8) Proof. According to the proof of Lemma 3..1, we obtain that h(x) = 0 on [1/b+1, 3] by requiring that h(x) = 0 for x ∈ [1/b, 2]; and we obtain that h(x) = 0 on [−1, 1 − 1/b] by requiring that h(x) = 0 for x ∈ [0, 2 − 1/b].  If supp h ⊆ [0, 2], the condition (3..8) implies that h at most can be nonzero on the interval [2 − 1/b, 1/b] having length 2/b − 2. In order for (3..1) to hold, this interval must have length at least 1; thus, we need to consider b such that 2/b − 2 ≥ 1, i.e., b ≤ 2/3. Note that if b ≤ 2/3, then 2/b ≥ 3 : that is, because B2 and h are supported on [0, 2], Janssen’s duality conditions in (1..1) are automatically satisfied for n = ±2, ±3, . . . . Corollary 3..3 Consider b ∈]1/2, 2/3]. Then there exists a function h with supp h ⊆ [0, 2] such that (3..1) and (3..2) hold; and bh(x) is a dual generator of B2 for these values of b. 35 Proof. For x ∈ [0, 2 − 1/b] ∪ [1/b, 2], let h(x) = 0. For x ∈ [0, 1], the equation (3..1) means that 1.5 1.25 1.0 xh(x) + (1 − x)h(x + 1) = 1. 0.75 0.5 This implies that 0.25 xh(x) = 1, x ∈ [1/b − 1, 1], (1 − x)h(x + 1) = 1, x ∈ [0, 2 − 1/b]; 0.0 0.5 0.0 1.0 1.5 2.0 x that is, Figure 3: The function h in (3..13).. h(x) = 1 , x ∈ [1/b − 1, 1], x (3..9) Put h(x) = 6x − 2, x ∈ [1/3, 1/2]. and h(x) = 1 , x ∈ [1, 3 − 1/b]. 2−x (3..10) Finally, for x ∈ [2 − 1/b, 1/b − 1] and x ∈ [3 − 1/b, 1/b], choose h(x) such that xh(x) + (1 − x)h(x + 1) = 1. By construction, bh(x) is a dual generator.  For b = 3/5 we will now explicitly construct a continuous dual generator h of B2 with support in [0, 2]. Putting Corollary 3..2, (3..9), and (3..10) together, we can state a result about how a dual window supported on [0, 2] must look like on parts of [0, 2]: Lemma 3..4 For b = 3/5, every dual generator of B2 with support in [0, 2] has the form h(x) =  0    1 x 1     2−x 0 if x ≤ 1/3; if x ∈ [2/3, 1]; if x ∈ [1, 4/3]; if x ≥ 5/3. That is, we only have freedom on the definition of h on ]1/3, 2/3[∪]4/3, 5/3[. Note that on [2/3, 4/3], the function h is symmetric around x = 1. We will now show that it is possible to define h on ]1/3, 2/3[∪]4/3, 5/3[ in such a way that h becomes symmetric around x = 1. First, we note that this form of symmetry means that h(1 − x) = h(1 + x), x ∈]1/3, 2/3[. (3..11) Put together with the duality condition, we thus require that xh(x) = 1 − (1 − x)h(1 − x), x ∈]1/3, 2/3[. (3..12) The condition (3..12) shows that must define h(1/2) = 1. Now, taking any continuous function h defined on [1/3, 1/2] with the properties that h(1/3) = 0 and h(1/2) = 1, the condition (3..12) shows how to define h(x) on ]1/2, 2/3[; and, finally, the condition (3..11) shows how to define h on ]4/3, 5/3[ such that the resulting function is a symmetric dual generator. SAMPTA'09 Then, for x ∈ [1/2, 2/3], 1 − (1 − x)h(1 − x) x −6x2 + 10x − 3 . = x The condition h(1 + x) = h(1 − x), x ∈]1/3, 2/3[ can also be expressed as h(x) = h(2 − x), x ∈]4/3, 5/3[. Thus, for x ∈ [4/3, 3/2] we arrive at h(x) = h(x) = h(2 − x) = −6x2 + 14x − 7 , x ∈ [4/3, 3/2]; 2−x while, for x ∈ [3/2, 5/3], h(x) = h(2 − x) = 6(2 − x) − 2 = 10 − 6x. We have arrived at the following conclusion: Lemma 3..5 For b = 3/5, the function   0 if x ≤ 1/3;     if x ∈ [1/3, 1/2];  6x −2 2   −6x +10x−3  if x ∈ [1/2, 2/3];  x   1 if x ∈ [2/3, 1]; h(x) = x 1  if x ∈ [1, 4/3];  2−x   −6x2 +14x−7   if x ∈ [4/3, 3/2];  2−x    10 − 6x if x ∈ [3/2, 5/3];     0 if x ≥ 5/3 (3..13) is a continuous symmetric dual generator of B2 . References: [1] Christensen, O.: Frames and bases. An introductory course. Birkhäuser 2007. [2] Christensen, O. and Kim, R. Y.: On dual Gabor frame pairs generated by polynomials. J. Fourier Anal. Appl., accepted for publication. [3] Janssen, A.J.E.M.: The duality condition for WeylHeisenberg frames. In ”Gabor analysis: theory and applications” (eds. H.G. Feichtinger and T. Strohmer). Birkhäuser, Boston, 1998. [4] Ron, A. and Shen, Z.: Frames and stable bases for shift-invariant subspaces of L2 (Rd ). Canad. J. Math. 47 no. 5 (1995), 1051–1094. 36 Linear independence and coherence of Gabor systems in finite dimensional spaces Götz E. Pfander (1) , (1) Jacobs University, 28759 Bremen, Germany. g.pfander@jacobs-university.de Abstract: This paper reviews recent results on the geometry of Gabor systems in finite dimensions. For example, we discuss the coherence of Gabor systems, the linear independence of subsets of Gabor systems, and the condition number of matrices formed by a small number of vectors from a Gabor system. We state a result on the recovery of signals that have a sparse representation in certain Gabor systems. The results listed here are obtained by the author in collaborations with Jim Lawrence, Felix Krahmer, Peter Rashkov, Jared Tanner, Holger Rauhut, and David Walnut linear independence 1. Introduction and Notation The theory of Gabor systems in the Hilbert space of square integrable functions on the real line has received significant attention during the last ten to twenty years (see, for example, [4, 6, 8, 7] and references within). Much of the research concentrates on showing that certain Gabor systems are frames or Riesz bases for their closed linear span. The seemingly simpler concept of linear independence of vectors in a Gabor system was addressed in [10]. There, it was conjectured that any finite set of time–frequency shifted copies of a single square integrable function is linear independent. This conjecture still remains to be resolved. In the last years, in part due to the emergence of the theory of compressed sensing and sparse signal recovery, the structure of Gabor systems in finite dimensional spaces has received increased attention. Such finite Gabor systems on finite Abelian groups are described below. We let G denote a finite Abelian group. Its dual group b consists of the group homomorphisms ξ : G 7→ S 1 . G b ⊆ CG = {f : G −→ C}, the latter being We have G the space of complex valued functions on G. The support size of f ∈ CG is kf k0 := |{x : f (x) 6= 0}|. G The Fourier P transform of f ∈b C is normalized to be b f (ξ) = x∈G f (x) ξ(x), ξ ∈ G. Translation operators Tx , x ∈ G, and modulation operb on CG are unitary operators given by ators Mξ , ξ ∈ G, (Tx f )(t) = f (t − x) and (Mξ f )(t) = f (t) · ξ(t). Timeb frequency shift operators π(λ), λ = (x, ξ) ∈ G × G, are the unitary operator on CG represented by π(λ)f = b Tx ◦ Mξ f , λ = (x, ξ) ∈ G × G. SAMPTA'09 b ⊆ CG is called (full) The system {π(λ)g : λ ∈ G × G} Gabor system with window g ∈ CG , it consists of |G|2 vectors in a |G| dimensional space. The short-time Fourier transform with respect to g is given by Vg f (λ) = hf, π(λ)gi = X y∈G f (y)g(y − x)ξ(y), b f ∈ CG , λ = (x, ξ) ∈ G × G. We shall not make a distinction between the linear mapb ping Vg : CG −→ CG×G and its matrix representation with respect to the Euclidean basis. Full Gabor systems in finite dimensions share an important and very useful property: for any g 6= 0, the collection {π(λ)g}λ∈G×Gb forms a uniform tight finite frame for CG with frame bound n2 kgk2 , that is, X b λ∈G×G |hf, π(λ)gi|2 = n2 kgk2 kf k2 . This is a simple consequence of the representation theory of the Weyl–Heisenberg group [9, 12]. In this paper we are concerned with properties of subsets of full Gabor systems. In Section 2, we consider the linear independence of subsets of |G| elements of {π(λ)g}λ∈G×Gb . Recall that a finite set of vectors in CG is in general linear position if any subset of at most |G| of these vectors are linearly independent. While being a classical concept in mathematics, it is also relevant for communications, namely, for information transmission through a so-called erasure channel [2]. In fact, a frame n F = {xk }m k=1 in C is called maximally robust to erasures if the removal of any l ≤ m − n vectors from F leaves a frame. Moreover, we consider the coherence of Gabor systems in Section 3. We state probabilistic estimates of the coherence of a full Gabor system with respect to a randomly generated window. In Section 4, we consider the condition number of matrices formed by a small subset of a Gabor system. The results presented below were obtained over the last few years in collaboration with Jim Lawrence and David Walnut [12], Felix Krahmer and Peter Rashkov [11], and Holger Rauhut and Jared Tanner [14, 13]. 37 2. Gabor systems in general linear position The following simple observations illustrate the usefulness of Gabor systems which are in general linear position. Proposition 1 [11, 12] For g ∈ CG \ {0}, the following are equivalent: 1. {π(λ)g}λ∈G×Gb are in general linear position. 2. For all f ∈ CG \{0} we have kVg f k ≥ |G|2 −|G|+1. 3. For all f ∈ CG , Vg f is completely determined by its values on any set Λ with |Λ| = n. 4. {π(λ)g}λ∈G×Gb is maximally robust to erasures. 5. The |G| × |G|2 matrix Vg has the property that every minor of order n is nonzero. Corollary 2 [12] If {π(λ)g}λ∈G×Gb are in general linear g k0 = |G|. position, then kgk0 = |G| and kb Unfortunately, not each finite Abelian groups G permits the existence of a vector g ∈ CG satisfying one and therefore all conditions listed in Proposition 1. For example, for the group G = Z2 × Z2 , no such g exists [11]. The situation is different for G = Zp . Recall that E is of full measure if the Lebesgue measure of CG \ E is 0. Theorem 3 [12] If |G| is prime, that is, G = Zp , p prime, then there is a dense open set E of full measure in CG such that for every g ∈ E, the elements of the full Gabor system {π(λ)g}λ∈G×Gb are in general linear position. That is, for almost all g we have kVg f k ≥ |G|2 −|G|+1 for all f 6= 0. Rudimentary numerical experiments encourage us to ask the following question. Question 4 [12] For G cyclic, that is, G = Zn , n ∈ N, exists g ∈ CG so that the conclusions of Proposition 1, and, therefore, kVg f k ≥ |G|2 − |G| + 1, f ∈ CG , hold In fact, for |G| prime, Theorem 3 can be strengthened. Theorem 5 [11] Let G = Zp , p prime. For almost every g ∈ CG , we have kVg f k0 ≥ |G|2 − kf k0 + 1 (1) for all f ∈ CG \ {0}. Moreover, for 1 ≤ k ≤ |G| and 1 ≤ l ≤ |G|2 with k + l ≥ |G|2 + 1 there exists f with kf k0 = k and kVg f k0 = l. Proposition 6 [11] If |G| is not prime, then Vg has zero minors for all g ∈ CG . Hence, there is no g ∈ CG such that (1) holds for all f ∈ CG . Numerical experiments for Abelian groups of order less than or equal to 8, as well as our result for all cyclic groups of prime order, indicate that the following question might have an affirmative answer. SAMPTA'09 Question 7 [11] For every cyclic group G and almost every g ∈ CG , does  hold? (kf k0 , kVg f k0 ), f ∈ CG \{0}  = ( kf k0 , kfbk0 +|G|2 −|G| ), f ∈ CG \{0} The following result improves on Theorem 5. It allows for the construction of Gabor based equal norm tight frames of p2 elements in Cn , n ≤ p. To our knowledge, the only previously known equal norm tight frames that are maximally robust to erasures are so-called harmonic frames (see Conclusions in [2]). Proposition 8 [11] There exists a unimodular g ∈ CZp , p prime, that is, a g with |g(x)| = 1 for all x ∈ G satisfying the conclusions of Theorem 5. To construct an equal norm tight frame, we choose a g ∈ (S 1 )p satisfying the conclusions of Proposition 8. We remove p − n components of the equal norm tight frame {π(λ)g}λ∈G×Gb . The resulting frame remains an equal norm tight frame which is maximally robust to erasure. Note that this frame is not a Gabor frame proper. Reducing the number of vectors in the frame to m ≤ p2 vectors leaves an equal norm frame which is maximally robust to erasure but which might not be tight. With the restriction to frames with p2 elements, p prime, we have shown the existence of Gabor frames which share the usefulness of harmonic frames when it comes to transmission of information through erasure channels. Background and more details on frames and erasures can be found in [2, 15] and the references cited therein. Note that Theorem 5 has as direct consequence Theorem 9 [11] Let g ∈ CZp , p prime, satisfy the conclusion of Theorem 5. Then any f ∈ CZp with kf k0 ≤ 12 |Λ|, cp , is uniquely determined by Λ and rΛ Vg f . Λ ⊂ Zp ×Z Here, only the support size of f is known. No additional information on the support of f is required to determine f. In terms of sparse representations, X we consider the question whether any vector f = cλ π(λ)g can be deterλ∈Λ mined by a few entries of f in case that |Λ| is small. Theorem 10 [11] Let g ∈ CZp , p prime, satisfy the conclusion of Theorem 5. Then any f ∈ CZp with f = P c λ∈Λ cλ π(λ)g, Λ ⊂ Zp ×Zp is uniquely determined by B and rB f whenever |B| ≥ 2|Λ|. Note that similar to before, the efficient recovery of f from 2|Λ| samples of f in Theorem 10 does not require knowledge of Λ. The question asking how to recover f from a small number of entries of f efficiently will be briefly addressed with Theorem 14 38 3. Coherence of Gabor systems In the following we restrict our attention to cyclic groups G = Zn , n ∈ N. We consider the so-called Alltop window hA [15] with entries 3 1 hA (x) = √ e2πix /n , n x = 0, . . . , n−1, (2) and the randomly generated window hR with entries 1 hR (x) = √ ǫx , n x = 0, . . . , n−1, (3) where the ǫx are independent and uniformly distributed on the torus {z ∈ C, |z| = 1}. For khk2 = 1, the coherence of a full Gabor systems is µ = max (ℓ,p)6=(ℓ′ ,p′ ) |hMℓ Tp h, Mℓ′ Tp′ hi|. (4) In [16] it is shown that the coherence of {π(λ)hA : λ ∈ b n } ⊆ Cn given in (2) satisfies Zn × Z 1 µ = √ n (5) Theorem 12 [13] Let ε, δ ∈ (0, 1) and |Λ| = S. Suppose that δ2 n (7) S≤ 4e(log(S/ε) + c) with c = log(e2 /(4(e−1))) ≈ 0.0724. Then kIΛ − Ψ∗Λ ΨΛ k ≤ δ with probability at least 1 − ε; in other words the minimal and maximal eigenvalues of Ψ∗Λ ΨΛ satisfy 1 − δ ≤ λmin ≤ λmax ≤ 1 + δ with probability at least 1 − ε. Remark 13 [13] Assuming equality in condition (7) and solving for ε we deduce  2  e2 δ n ∗ S exp − P kIΛ − ΨΛ ΨΛ k > δ) ≤ 4(e−1) 4eS  2  δ n = CS exp − 4eS with C ≈ 1.075. Theorem 12 allows us to guarantee theX successful use of efficient algorithms to determine f = cλ π(λ)g from λ∈Λ for n prime. This is close to optimal since as the lower bound for the coherence of frames with n2 elements in 1 Cn is µ ≥ √n+1 [16]. Unfortunately, the coherence (4) of hA applies only for n prime. For arbitrary n we now consider the random window hR . Theorem 11 [14] Let n ∈ N and choose a random window hR with entries 1 hR (x) = √ ǫx , n x = 0, . . . , n−1, where the ǫx are independent and uniformly distributed on the torus {z ∈ C, |z| = 1}. Let µ be the coherence of the associated Gabor dictionary (4), then for α > 0 and n even, 2 α  P µ ≥ √ ≤ 4n(n−1)e−α /4 , n while for n odd,  n−1 2  n+1 2 α  P µ ≥ √ ≤ 2n(n−1) e− n α /4 + e− n α /4 . n (6) Up to the constant factor α, the coherence in Theorem 11 1 with high comes close to the lower bound µ ≥ √n+1 probability. (The probability depends on α). 4. Conditioning of submatrices of Vg For applications such as sparse signal recovery, not only linear independence of subsets of Gabor systems is required. It is rather needed, that small subsets of Gabor systems form well-conditioned matrices. 2 Throughout this section, we let Ψ = Vg ∈ Cn×n with g = hR being the randomly generated unimodular winb we denote by ΨΛ dow described in (3). For Λ ⊆ G×G the matrix consisting only of those columns indexed by λ ∈ Λ. SAMPTA'09 a few entries of f in case that |Λ| is small. Here, we will concentrate on algorithms based on Basis Pursuit. Basis Pursuit seeks the solution of the convex problem min kxk1 x subject to Ψg x = y, (8) P where kxk1 = λ∈Z2 |xλ | is the ℓ1 -norm of x. Efficient n convex optimization techniques for Basis Pursuit can be found in [1, 3, 5]. Theorem 14 [13] Assume x is an arbitrary S-sparse coefficient vector. Choose the random unimodular Gabor window g = hR defined in (3), that is, with random entries independently and uniformly distributed on the torus {z ∈ C, |z| = 1}. Assume that S≤C n log(n/ε) (9) for some constant C. Then with probability at least 1 − ε Basis Pursuit (8) recovers x from y = Ψx = Ψg x. References: [1] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge Univ. Press, 2004. [2] Peter G. Casazza and Jelena Kovačević. Equal-norm tight frames with erasures. Adv. Comput. Math., 18(2-4):387–430, 2003. Frames. [3] S.S. Chen, D.L. Donoho, and M.A. Saunders. Atomic decomposition by Basis Pursuit. SIAM J. Sci. Comput., 20(1):33–61, 1999. [4] O. Christensen. An introduction to frames and Riesz bases. Applied and Numerical Harmonic Analysis. Birkhäuser Boston Inc., Boston, MA, 2003. [5] D.L. Donoho and Y. Tsaig. Fast solution of l1-norm minimization problems when the solution may be sparse. Preprint, 2006. 39 [6] H.G. Feichtinger and T. Strohmer, editors. Gabor Analysis and Algorithms: Theory and Applications. Birkhäuser, Boston, MA, 1998. [7] H.G. Feichtinger and T. Strohmer, editors. Advances in Gabor Analysis. Applied and Numerical Harmonic Analysis. Birkhäuser Boston Inc., Boston, MA, 2003. [8] K. Gröchenig. Foundations of Time-Frequency Analysis. Applied and Numerical Harmonic Analysis. Birkhäuser, Boston, MA, 2001. [9] A. Grossmann, J. Morlet, and T. Paul. Transforms associated to square integrable group representations. I. General results. J. Math. Phys., 26(10):2473–2479, 1985. [10] C. Heil, J. Ramanathan, and P. Topiwala. Linear independence of time–frequency translates. Proc. Amer. Math. Soc., 124(9), September 1996. [11] F. Krahmer, G.E. Pfander, and P. Rashkov. Uncertainty principles for time–frequency representations on finite abelian groups. Appl. Comp. Harm. Anal., 2008. doi:10.1016/j.acha.2007.09.008. [12] J. Lawrence, G.E. Pfander, and D. Walnut. Linear independence of Gabor systems in finite dimensional vector spaces. J. Fourier Anal. Appl., 11(6):715–726, 2005. [13] G.E. Pfander and H. Rauhut. Sparsity in time– frequency representations. 2008. Preprint. [14] G.E. Pfander, H. Rauhut, and J. Tanner. Identification of matrices having a sparse representation. IEEE Trans. Signal Proc., 2008. to appear. [15] T. Strohmer and R.W. Heath, Jr. Grassmannian frames with applications to coding and communication. Appl. Comput. Harmon. Anal., 14(3):257–275, 2003. [16] Thomas Strohmer and Robert W.jun. Heath. Grassmannian frames with applications to coding and communication. Appl. Comput. Harmon. Anal., 14(3):257–275, 2003. SAMPTA'09 40 Error Correction for Erasures of Quantized Frame Coefficients Bernhard G. Bodmann(1) , Peter G. Casazza(2) , Gitta Kutyniok(3) and Steven Senger(2) (1) Department of Mathematics, University of Houston, Houston, TX 77204, USA. (2) Department of Mathematics, University of Missouri, Columbia, MO 65211, USA. (3) Institute of Mathematics, University of Osnabrück, 49069 Osnabrück, Germany. bgb@math.uh.edu, pete@math.missouri.edu, kutyniok@math.uni-osnabrueck.de, senger@math.missouri.edu Abstract: In this paper we investigate an algorithm for the suppression of errors caused by quantization of frame coefficients and by erasures in their subsequent transmission. The erasures are assumed to happen independently, modeled by a Bernoulli experiment. The algorithm for error correction in this study embeds check bits in the quantization of frame coefficients, causing a possible, but controlled quantizer overload. If a single-bit quantizer is used in conjunction with codes which satisfy the Gilbert Varshamov bound, then the contributions from erasures and quantization to the reconstruction error is shown to have bounds with the same asymptotics in the limit of large numbers of frame vectors. 1. Introduction The versatility of redundant systems, in particular frames, has been demonstrated by their resilience to erasures and by their usefulness to suppress quantization errors. In the context of finite frames, the statistical error estimates by Goyal, Kovačević, Vetterli and Kelner [7, 6] were to the authors’ knowledge the first instance of a combined analysis of erasures and quantization. In recent years, the robustness of finite frames against erasures has been more extensively studied, for instance, in [5, 14, 9, 2, 10]. These studies typically provide estimates for the (average and worst case) blind reconstruction error, meaning all erased (unknown) coefficients are set to zero and the reconstruction relies on a fixed synthesis operator. It is well-known that if the frame vectors related to the non-erased coefficients still form a spanning set, then the frame operator of those can be inverted, leading to perfect reconstruction. However, the latency caused by the wait until all coefficients have been transmitted and the computational cost of inverting the frame operator make perfect reconstruction less practicable. On the other hand, Benedetto, Powell and Yilmaz [1] investigated an easily implementable, active error correction for the compensation of quantization errors with so-called sigma-delta algorithms, which provide highly accurate reconstruction. Recently, Boufounos, Oppenheim and Goyal [4] introducedSAMPTA'09 an erasure correction scheme with strong similarities to quantization-noise shaping, offering the possibility of a combined treatment of both types of errors. The idea of pre-compensation and error-forward projection deserves to be explored further, but the algorithm by Boufounos, Oppenheim and Goyal is computationally still more costly than a simple application of sigma-delta quantization. The need for results on low-complexity quantization-anderasure correcting algorithms motivated the present study, which investigates a rather simple strategy for error compensation, a modified sigma-delta algorithm with embedded check bits. The error correction algorithm we present allows precise bounds on quantization errors and also on the effect of erasures from unreliable transmissions of frame coefficients. 2. PCM quantization and blind reconstruction We first revisit erasure-averaged error bounds for PCM quantization of frame coefficients and blind reconstruction after transmission. Definition. Let H be a d-dimensional Hilbert space. A frame F = {f1 , f2 , . . . fN } for H is a spanning set. If all vectors in the frame the same norm, we call F equalPhave N norm. If x = A1 j=1 hx, fj ifj for all x ∈ H, then we say that F is A-tight. Quantizing frame coefficients simply means mapping them to a finite set of values. Definition. A function Q on R is called a quantizer with accuracy ǫ > 0 on the interval [−L, +L] if it has a finite range A and for any x ∈ [−L, +L], Q(x) satisfies |x − Q(x)| ≤ ǫ. The range A of the quantizer Q is also called the alphabet. If this alphabet consists of all integer multiples of a fixed step-size δ contained in the interval [−L − δ/2, +L + δ/2] and the quantizer assigns to x ∈ [−L, +L] the unique value mδ, m ∈ Z, satisfying (m − 21 )δ < x ≤ (m + 12 )δ then we call Q the uniform mid-tread quantizer with step-size δ [3]. Alternatively, if the alphabet is A = (Z + 21 )δ ∩ [−L − δ/2, +L + δ/2] and if Q assigns to x ∈ [−L, +L] the value (m+ 21 )δ such that mδ < x ≤ (m + 1)δ, then we speak of the so-called uniform mid-riser quantizer with step-size δ. In the latter part of this study, we focus on the single-bit mid-riser quantizer which rounds the input to A = {−δ/2, +δ/2}. We want to apply this quantizer to frame coefficients.41 Definition. Given a quantizer Q, the PCM quantization of a vector x in a real Hilbert space H of dimension dim(H) = d, equipped with an A-tight frame F = {fj }N j=1 , is defined by N QF (x) = 1 X Q(hx, fj i)fj . A j=1 Remark. We recall that the PCM quantization error resulting from a uniform quantizer Q with accuracy ǫ > 0 on [−L, +L], and a N/d-tight equal-norm frame F applied to any input vector x ∈ H satisfying kxk ≤ L is in norm bounded by N d X | uj hfj , vi| kvk=1 uj ∈{±1} N j=1 kQF (x) − xk ≤ max ≤ max N X √ d √ |hfj , vi|2 )1/2 = dǫ . ( N ǫ)( N j=1 This is in contrast to erasures, where the bound on the reconstruction error depends on the norm of the input vector. Definition. Given a probability measure P on the set of erasures, and the analysis operator V belonging to an Atight frame, we define the erasure-averaged reconstruction error to be 1 e(V, P) = E[k V ∗ E(ω)V − Ik] . A Hereby, E[·] is the expectation with respect to the probability measure P on Ω = {0, 1}N , and E : Ω → RN ×N is a random diagonal matrix with entries Ej,j = ωj . Theorem. Let H be a real Hilbert space of dimension d, equipped with an A-tight equal-norm frame F. If all the frame coefficients are erased with a probability 0 ≤ p ≤ 1, independently of each other, then the erasure-averaged reconstruction error is bounded by 1 p ≤ E[k V ∗ E(ω)V − Ik] ≤ pd . A Proof. The lower bound uses Jensen’s inequality and the convexity of the norm on the real vector space of Hermitian operators [12]. The upper bound relies on the identity for the operator norms kV ∗ (I − E)V k = k(I − E)V V ∗ (I − E)k and on the bound for entries in the Grammian, |(V V ∗ )j,k | ≤ kfj kkfk k = 1, derived from the Cauchy-Schwarz inequality, which implies E[k(I − E(ω))V V ∗ (I − E(ω))k] ≤ N p. Thus,√for a vector x for which pkxk is bigger than (δ/2) d, the bound on the worst case error due to erasures dominates that of PCM quantization. A similar phenomenon happens when the quantization is obtained with first and higher-order sigma delta quantization. For sufficiently large N , the bound for the worst-case quantization error, see e.g. [3], is smaller than the worstcase erasure error. This motivates investigating active error correction for erasures. 3. Sigma-delta quantization with embedded check bits Our main goal is to make the two error bounds for erasures SAMPTA'09 and quantization comparable. To this end, we use systematic binary error-correcting codes for packets of quantized coefficients, and replace a portion of the output from the sigma-delta quantizer by the check bits. Definition. A binary (n, k)-code is an invertible map C : Zk2 → Zn2 . The minimum distance of this code is the minimal number of bits by which any two code words (elements in the range of C) differ. A systematic (n, k)-code simply appends check bits, meaning q = (q1 , q2 , . . . qk ) maps to C(q) = (q1′ , q2′ , . . . qn′ ) such that qj′ = qj for all j ∈ {1, 2, . . . k}. The relevance of this definition is that among any block of n transmitted bits, the minimum distance is the number of bit erasures that cannot be corrected any more. The reconstruction strategy we study is given by incorporating check bits in the output of the quantizer, which are used by the receiver to correct a portion of the erased bits. The remaining, incorrectible bits are then omitted from reconstruction. As already mentioned, we will exploit a particular accompanying quantization strategy, which we briefly explain. Definition. Let Q be the binary mid-riser quantizer with stepsize δ > 0 and let F = {f1 , f2 , . . . fN } be an N/dtight frame for a d-dimensional real Hilbert space H. Also, assume that C is a binary (n, k)-code. Given an input vector x ∈ H, then the C-embedded sigma-delta PN quantization of x is QF ,C (x) = Nd j=1 qj fj , where the sequence {qj }∞ j=1 associated with the initialization value u0 = 0 is defined by ( Q(hx, fm+j i + um+j−1 ), 1 ≤ j ≤ k, qm+j := C((qm+1 , qm+2 , . . . qm+k ))j , else , for any m ∈ {0, n, 2n, . . . }, and j ∈ {1, 2, . . . n}, and the map for updating the internal variable is um+j := hx, fm+j i − qm+j + um+j−1 . Our first main theorem is the stability of this modified sigma-delta algorithm. Theorem. Let Q be a binary mid-riser quantizer with stepsize δ > 0, let F = {f1 , f2 , . . . fN } be an N/d-tight equal-norm frame for a d-dimensional real Hilbert space H, and let C be a systematic binary (n, k)-code, such that n divides N . If kxk ≤ αδ/2, α < 1, and k≥ n (1 + α) 2 then in the course of the C-embedded first-order sigmadelta quantization, the internal variable is bounded by |uj | ≤ δ k2 1 ((n − k + 1) + (n − k)α) ≤ δ(k − + ) 2 n 2 for all j ∈ {1, 2, . . . N }. Proof. We proceed by induction. At the end of the first block of n bits, if all n − k check bits were chosen incorrectly and the input is taken to be the worst case, then uN reaches the maximum magnitude stated in the theorem. 42In the course of quantizing the next block, due to the bound on the input, each bit allows the quantizer to recover at least δ/2 − αδ/2. With the inequality k ≥ n2 (1 + α) we deduce 1 α 1 k( − ) ≥ (n − k)(1 + α) 2 2 2 which means uj is contained in [−δ/2, δ/2] before the next check bit is encountered. Similarly as in [1] and [3], we deduce an error estimate from the bound on the internal variable. The relevant quantity in this estimate is derived from the frame geometry, as in [3], T (F) = k(f1 −f2 )±(f2 −f3 )±· · ·±(fN −1 −fN )±fN k . We define the maximal error caused by quantization to be eq(V, δ, α) = max kxk≤αδ/2 kQF ,C (x) − xk , where V is the analysis operator of the frame F. Theorem. Under the same assumptions as in the preceding theorem, eq(V, δ, α) ≤ d δ ( ((n − k)(1 + α) + 1)T (F) . N 2 Proof. This is an immediate consequence of the bound on the internal variable and the proof in [3]. In comparison with the unmodified first order sigma-delta quantization, we have a bound that is worse by at most a factor of 2(n − k). However, the advantage of the embedded check bits is the ability to correct erasures in each block. Assume the initial probability measure applies an erasure with a probability of p to each coefficient. Assume that the code C has minimal distance np + t with t > 0. Let P′ denote the probability measure governing the erasures remaining after the error correction has been applied in each block of length n. Definition. The combination of quantization, erasures and error correction gives the reconstruction error ec(V, δ, α, P′ ) = E[ max kxk≤αδ/2 k 1 X ωj qj fj − xk] , A j where ωj = 0 means that the j-th coefficient is erased. The following lemma helps bound the probability of erasures remaining, if the weight of the code is larger than the expected number of erasures before correction. Lemma. (Hoeffding [8]). Let E[ωj ] = 1 − p and assume that the minimum distance of C is bounded below by n(p + ǫ), ǫ > 0. The probability p′ of an individual coefficient being erased after the error correction is applied is bounded by p′ ≤ exp(−2nǫ2 ) . Now we can combine the two error estimates for quantization and erasures. Theorem. Let ǫ > 0, assume C has minimal distance n(p + ǫ). Let P′ be the probability measure governing the erasures after the error correction has been applied. Under the additional assumptions of the preceding theorem, SAMPTA'09 dδ exp(−2nǫ2 ) . ec(V, δ, α, P ) ≤ eq(V, C, δ, α) + 2 ′ Proof. First we apply Minkowski’s inequality to separate the error caused by quantization and by erasures. The expected number of erasures is N p′ , with p′ bounded in accordance with the preceding lemma. Each erased coefficient has magnitude δ/2, so the norm of the vectors which are omitted in the reconstruction can at most be δdp′ /2. The remaining question is which asymptotics can be achieved for the minimum distance with a suitable sequence of codes. To this end, we quote a version of the Gilbert-Varshamov bound. Lemma. Let 0 ≤ q ≤ 1/2, then there exist infinitely many systematic linear (n, k)-codes with minimum distance at least nq and rate k ≥ 1 − H2 (q) , n where H2 (q) = −q log2 q − (1 − q) log2 (1 − q) is the binary entropy. Proof. The usual form of the Gilbert Varshamov bound for linear codes [11, Ch. 17] can be re-stated as a bound for the maximal number of erasures that can be corrected by certain codes. In this form, it states the existence of linear codes for which any n−d+1 rows of the generator matrix have rank k if d ≥ nq, meaning up to d − 1 erasures can be corrected. Permuting the rows so that the first k have maximal rank and right-multiplying by the inverse of this k × k block gives the generator matrix for a systematic code that can correct the same number of erasures. We are ready to state the final result. Theorem. Let 0 ≤ p < q ≤ 1/2, H2 (q) ≤ (1 − α)/2, 0 < α < 1 and denote ǫ = q − p. Consider the sequence of systematic linear codes provided by the GilbertVarshamov bound for minimum distance bounded below 2 by nq and let N ≥ ne2nǫ , then ec(V, δ, α, P′ ) ≤ dδ ((2 ln N H2 (q)/ǫ2 + 1)T (F) 2N 1 + 2 ln N ) . 2ǫ 2 Proof. From the assumption, we have e2nǫ ≤ N and thus n ≤ 2ǫ12 ln N . By the Gilbert-Varshamov bound, n − k ≤ nH2 (q) ≤ 1 ln N H2 (q) . 2ǫ2 Using the Hoeffding inequality on the error due to the remaining erasures gives 2 e−2nǫ ≤ n 1 ln N ≤ 2 . N 2ǫ N Thus, the two error terms have the same asymptotic behavior. We note that this error bound is only worse by a term logarithmic in N compared to the quantization error without erasures. We also remark that even in the lossy regime, when the error correction fails with near certainty in any packet, then we still have p′ ≤ p and thus 43 ec(V, δ, α, P′ ) ≤dδ(( ln N p H2 (q)/ǫ2 + 1)T (F) + ) . N 2 Acknowledgment This work was partially supported by NSF DMS 0704216, NSF DMS 08-07399 and by the Deutsche Forschungsgemeinschaft (DFG) under Heisenberg Fellowship KU 1446/8-1. References: [1] J. J. Benedetto, A. M. Powell, and O. Yilmaz, SigmaDelta quantization and finite frames, IEEE Trans. Inform. Theory 52:1990–2005, 2006. [2] B. G. Bodmann and V. I. Paulsen. Frames, graphs and erasures. Linear Algebra Appl. 404:118–146, 2005. [3] B. G. Bodmann and V. I. Paulsen. Frame Paths and Error Bounds for Sigma-Delta Quantization. Appl. Comput. Harmon. Anal. 22:176–197, 2007. [4] P. Boufounos, A. V. Oppenheim, and V. K. Goyal. Causal Compensation for Erasures in Frame Representations. IEEE Trans. Signal Proc. 3:1071–1082, 2008. [5] P. Casazza and J. Kovačević, Equal-norm tight frames with erasures. (English summary) Frames. Adv. Comput. Math. 18:387–430, 2003. [6] V. K. Goyal, J. Kovačević, and J. A. Kelner. Quantized frame expansions with erasures. Appl. Comp. Harm. Anal. 10:203–233, 2001. [7] V. K. Goyal, J. Kovačević, and M. Vetterli. Quantized frame expansions as source-channel codes for erasure channels. In: Proc. Data Compr. Conf., Snowbird, UT, Mar. 1999. [8] W. Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer. Stat. Assoc. 58 (301):13–30, 1963. [9] R. B. Holmes and V. I. Paulsen, Optimal frames for erasures, Linear Algebra Appl. 377:31–51, 2004. [10] D. Kalra, Complex equiangular cyclic frames and erasures, Linear Algebra Appl. 419:373–399, 2006. [11] F. J. MacWilliams and N. J. A. Sloane, The theory of error-correcting codes. North-Holland, Amsterdam, 1977 [12] D. Petz, Spectral scale of self-adjoint operators and trace inequalities, J. Math. Anal. Appl. 109:74–82, 1985. [13] M. Püschel and J. Kovačević, Real, Tight Frames with Maximal Robustness to Erasures, Proc. Data Compr. Conf., Snowbird, UT, March 2005, pp. 63– 72. [14] T. Strohmer and R. Heath, Grassmannian frames with applications to coding and communication, Appl. Comput. Harmon. Anal. 14:257–275, 2003. SAMPTA'09 44 Special session on Efficient Design and Implementation of Sampling Rate Conversion, Resampling and Signal Reconstruction Methods Chair: Hakan Johansson and Christian Vogel SAMPTA'09 45 SAMPTA'09 46 Structures for Interpolation, Decimation, and Nonuniform Sampling Based on Newton’s Interpolation Formula Vesa Lehtinen and Markku Renfors Department of Communications Engineering, Tampere University of Technology P.O.Box 553, FI-33101 Tampere, Finland {vesa.lehtinen,markku.renfors}@tut.fi where Abstract: m–1 The variable fractional-delay (FD) filter structure by Tassart and Depalle performs Lagrange interpolation in an efficient way. We point out that this structure directly corresponds to Newton’s interpolation (backward difference) formula, hence we prefer to refer to it as the Newton FD filter. This structure does not function correctly when the fractional delay is made time-variant, e.g., in sample rate conversion. We present a simple modification that enables time-variant usage such as fractional sample rate conversion and nonuniform resampling. We refer to the new structure as the Newton (interpolator) structure. Almost all advantages of the Newton FD structure are preserved. Furthermore, we suggest that by transposing the Newton interpolator we obtain the transposed Newton structure which can be used in decimation as well as reconstruction of nonuniformly sampled signals, analogously to the transposed Farrow structure. The presented structures are a competitive alternative for the Farrow structure family when low complexity and flexibility are required. 1. Introduction In [1][2][3], Tassart and Depalle as well as Candan derive an efficient implementation structure for FD filters, depicted in Fig. 1, from Lagrange’s interpolation formula. It turns out that the obtained filter structure directly corresponds to Newton’s (backward difference) interpolation formula [4] (with some subexpression sharing) which indeed is equivalent with Lagrange interpolation [5]. Newton’s backward difference formula is f (t + τ) = ∑ ∞ m=0 τ(m)∆m f ( t ) ---------------------------- , m! (1) τ(m) = ∏ k=0 (τ + k ) (2) is the rising factorial, and ∆ is the backward difference operator such that ∆ m f ( t ) = ∆ m – 1 f ( t ) – ∆ m – 1 f ( t – 1 ) and ∆ 0 f ( t ) = f ( t ) , resulting in ∆m f ( t ) = ∑ m k=0 m ( – 1 ) k   f ( t – k ).  k (3) Newton’s backward difference formula provides an efficient means to realise piecewise-polynomial interpolation for DSP. Its complexity is only O(M) (where M is the interpolator order)–cf. equivalent Lagrange implementations based on the Farrow structure [6] having O(M2) complexity [3]. The subfilters are multiplier-free and extremely simple. The structure is modular, as highlighted by the grey shading in Fig. 1, and the interpolator order can be changed in real time [3]. Unfortunately, the structure presented in Fig. 1 does not function correctly in sample rate conversion (SRC). Because the multiplications are performed between the subfilters, making them time-variant will result in incorrect output. This is because each output sample should only depend on the current value of the delay parameter D; in Fig. 1, past values of D contribute to the output through the delayed paths through the subfilters. Therefore, the structure in Fig. 1 is only useful in single-rate, time-invariant or slowly-varying fractional-delay filtering. We propose a slightly modified structure that allows arbitrary resampling, including increasing the sample rate by arbitrary, also fractional, factors (fractional interpolation). We also point out that the structure can be transposed to obtain a decimator structure that possesses all the advantages of the Newton interpolation structure. This work was supported by the Graduate School in Electronics, Telecommunications and Automation (GETA). SAMPTA'09 47 1 – z –1 1–z –1 –D+1 -------------2 –D ... 1–z –1 1–z –1 – D+M –1 -----------------------M –D+2 -------------3 ... Figure 1. The fractional-delay filter structure proposed in [1][3], based on Newton’s interpolation formula. 1–z –1 H&S 1 1–z –1 H&S – 1⁄2 1⁄3 H&S – D ( t )–1 1–z –1 1⁄M H&S – – D(t ) ... 1–z –1 H&S ... D ( t )–2 D ( t )–M +1 Figure 2. The Newton interpolator structure suitable for sample rate conversion. The hold & sample (H&S) blocks perform the sampling at the output sample instants. 2. The Newton structure for interpolation In order to allow fractional SRC and arbitrary resampling, the Newton structure must work correctly with a time-variant fractional delay. This is achieved through two simple steps: (i) We invert the summation order at the output part of the structure from that presented in [1][3] (this was already done in [2]). (ii) The time-varying multiplications can now be implemented in the high-rate part between the adders. The improved structure is shown in Fig. 2. We refer to it as the Newton interpolator structure or the Newton structure for short. Also the improved structure is modular, permitting changing the interpolator order in real time. In single-rate FD filtering, the improved structure is equivalent to [1][2][3]. In Fig. 2, the H&S blocks stand for hold & sample, i.e., each output sample obtains the value of the previously arrived input sample. In fractional interpolation, i.e., increasing the sample rate by a fractional factor, we use the common notation illustrated in Fig. 3. The time interval between the previous input sample and the next output sample to be generated is expressed using the fractional interval variable µ which is normalised with respect to the input sample interval so that µ ∈ [0, 1) . Interpolation of uniformly spaced input samples can be modelled as convolution [5], leading to the generic model depicted in Fig. 4 [7]. The continuous-time (CT) linear time-invariant (LTI) model filter is piecewise polynomial, with M + 1 pieces, each with duration equal to the input sample interval T in . Hence the impulse response length is ( M + 1 )T in . SAMPTA'09 Input samples Output samples T in µ l–1 T in ( k –1 )T in µ l+1 T in ( k +1 )T in k T in ( l–1 )T out lT out ( l+1 )T out Figure 3. Definition of the fractional interval µ for interpolation. x[n] CT @ F in H CT ( f ) 1 x CT ( t ) = -------F in Figure 4. factors. y[n] = y CT ( nT out ) x CT ( t ) DT y CT ( t ) @ F out - ∑n x [ n ]δ  t – ------F in n The generic model for SRC by arbitrary The composite transfer function of m cascaded subfilters is ( 1 – z –1 ) m = ∑ m n=0 m ( – 1 ) n   z –n ,  n (4) cf. (3). The output of the interpolator is 48 D(t ) D ( t )–1 D ( t )–2 D ( t )–M +1 ... A&D A&D 1 1–z –1 – A&D 1⁄2 – 1–z –1 – A&D 1⁄3 1–z –1 – A&D 1⁄M 1–z –1 ... Figure 5. The transposed Newton structure for decimation and reconstruction of signals from nonuniformly spaced samples. y ( ( k + µ )T in ) = ∑ M n=0 Input samples h ( ( n + µ )T in )x [ k – n ] m ( D0 – µ )m = ∑ x [ k – n ] ( –1 ) n ∑ ( – 1 ) m   ----------------------- n n=0 m=n m! M M T out (2.1) where n<0∨n>m ( l–1 )T in (5) for m ≥ 0 , and m–1 ( x )m = ∏ k=0 (x – k) (6) is the falling factorial. The delay of the interpolator is D 0 T in . The parameter D 0 can be chosen quite freely, but the best amplitude response and linear phase response are obtained with D 0 = ( M + 1 ) ⁄ 2 [1]. The continuous-time model impulse response of the interpolator is then (cf. the expression of the filter input in Fig. 4) m ( D0 – µ )m M 1 ( – 1 ) n + m   ------------------------. h ( ( n + µ )T in ) = ------- ∑  n T in m = n m! (7) The reversed summation order in the high-rate part comes with a price: the structure is more costly to pipeline than those in [1][3] because the signal paths cannot share pipeline registers. 3. The transposed Newton structure There exists a duality1 between decimation and interpolation that allows transforming a decimator into an interpolator and vice versa through network transposition [7]. By transposing the Newton interpolator, we obtain the structure depicted in Fig. 5. We refer to this as the transposed Newton structure. The transpose is obtained by inverting the flow direction of all signals and replacing each block with its dual. For instance, the H&S block is replaced with the accumulate & dump 1. There exist a number of definitions for duality, including the adjoint. Here we use the generalised duality/transpose as defined in [7]. SAMPTA'09 µ l+1 T out µ l–1 T out ( k –1 )T out  m = 0,  n Output samples ( k +1 )T out k T out lT in ( l+1 )T in Figure 6. Definition of the fractional interval µ for the transposed structure (dual of interpolation). (A&D) block, which sums up all its input samples since the previous output sample. This is also the most straightforward way to obtain the transposed Farrow structure from the Farrow structure2 [9]. The output samples of the transposed Newton structure are uniformly spaced, but the input samples may arrive at arbitrary time instants. The generic SRC model (Fig. 4) is valid also for the transposed Newton structure. The model impulse response is again piecewise-polynomial, now with the piece duration equal to the output sample interval. The model impulse response is obtained by replacing T in with T out in (7) and redefining µ according to Fig. 6 (reflecting the duality between decimation and interpolation). For an input sample arriving at time instant t , the fractional interval is t t µ ( t ) = --------– --------- ∈ [0, 1). T out T out (8) For fractional decimation, the fractional interval for the lth input sample is lT in lT in – ---------. µ l = --------T T out out (9) The impulse response in the generic model is now m ( D0 – µ )m M 1 h ( ( n + µ )T out ) = --------- ∑ ( – 1 ) n + m   ----------------------- n m! T out m = n (10) 2. The structure in [8] (transposed structure I in [9]) is not the true transpose of the Farrow structure even though the duality of responses holds. 49 with integer n. Again, D 0 = ( M + 1 ) ⁄ 2 for the best response. In the frequency response, the model filter has M + 1 zeros at each (nonzero) integer multiple of the output sample rate, hence realising antialiasing regardless of the decimation factor. The transposed Newton structure is able to receive input samples at arbitrary time instants, which makes it a potential building block for reconstruction of signals from nonuniformly spaced samples (e.g., in algorithms like [10][11]), as earlier suggested for the transposed Farrow structure in [12]. The transposed Newton structure shares the advantages and disadvantages of the Newton interpolator, such as modularity, O ( M ) complexity and the inefficient zero locations. 4. Computational complexity In interpolation by factor R, the Newton structure will perform ( 1 + R )M additions and ( 1 + R )M multiplications per input sample on average. In decimation by R, the transposed Newton structure will perform ( R – 1 ) ( 1 + M ) + 2M additions and ( 1 + R )M multiplications per output sample. The first term in the addition count comes from the A&D block. Multiplication by a constant inverse of a small integer requires only few additions/subtractions. Unambiguous complexity comparison between the proposed structures and alternatives, mainly the Farrow family, would require specifying the implementation technology and the SRC factor. However, the following points can be made: (i) The basis multipliers are more complex in the Newton structures (integer part present in the time-variant coefficients) than in Farrow structures (no integer part). Hence, large SRC factors are unfavourable to the Newton family. (ii) If the Lagrange response suffices, the ultimate simplicity of the subfilters makes the Newton family superior to the Farrow structure when the SRC factor is small. (iii) The response of the Newton structures can be improved only by increasing the order (i.e., number of stages). In designs with a low oversampling factor and/or strict performance requirements, this may lead to a very high filter order. In such cases, an optimised Farrow design with a non-Lagrange response will have a lower complexity and smaller delay. 5. Conclusions The proposed structures allow efficient piecewise Newton interpolation for SRC and arbitrary resampling as well as its dual for decimation and reconstruction of nonuniformly sampled signals. The advantages of the proposed structures include SAMPTA'09 low, O(M) complexity (high orders are feasible at the cost of a long delay), very simple subfilters and run-time adjustability of the filter order. As a drawback, the basis multipliers running at the high-rate end of the filter have longer wordlengths than in the Farrow counterparts. Due to their simplicity, the Newton structures may be useful as building blocks of more complicated algorithms for interpolation, decimation, and reconstruction of nonuniformly sampled signals. References: [1] S. Tassart and Ph. Depalle, “Fractional delays using Lagrange interpolators,“ in Proc. Nordic Acoustic Meeting, Helsinki, Finland, 12–14 June, 1996. [2] S. Tassart, Ph. Depalle, “Analytical approximations of fractional delays: Lagrange interpolators and allpass filters,” in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc. (ICASSP’97), 21–24 Apr 1997, pp. 455–458. [3] Ç. Candan, “An efficient filtering structure for Lagrange interpolation,” IEEE Signal Processing Letters, Vol. 14, No. 1, Jan 2007, pp. 17–19. [4] E.W. Weisstein, "Newton’s Backward Difference Formula." Available: http://mathworld.wolfram.com/ NewtonsBackwardDifferenceFormula.html. Visited: 22 Jan 2008. [5] E. Meijering, “A chronology of interpolation: From ancient astronomy to modern signal and image processing,” in Proc. of the IEEE, Vol. 90, No. 3, Mar 2002, pp. 319–342. [6] C.W. Farrow, “A continuously variable digital delay element,” in Proc. IEEE Int. Symp. Circ. Syst. (ISCAS’88), Espoo, Finland, June 1988, pp. 2641–2645. [7] R.E. Crochiere, L.R. Rabiner, Multirate Digital Signal Processing, Prentice-Hall, 1983. [8] T. Hentschel, G. Fettweis, “Continuous-time digital filters for sample-rate conversion in reconfigurable radio terminals,” in Proc. European Wireless, Dresden, Germany, Sep 2000, pp. 55–59. [9] D. Babic, J. Vesma, T. Saramäki, M. Renfors, “Implementation of the transposed Farrow structure,” in Proc. IEEE Int. Symp. Circ. Syst., May 2002, pp. IV-5–IV-8. [10] F. Marvasti, M. Analoui, M. Gamshadzahi, “Recovery of signals from nonuniform samples using iterative methods,” IEEE Trans. Signal Proc., Vol. 39, No. 4, Apr 1991, pp. 872–878. [11] F.A. Marvasti, P.M. Clarkson, M.V. Dokic, U. Goenchanart, C. Liu, “Reconstruction of speech signals with lost samples,” IEEE Trans. Signal Proc., Vol. 40, No. 12, Dec 1992, pp. 2897–2903. [12] D. Babic and M. Renfors, “Reconstruction of non-uniformly sampled signal using transposed Farrow structure,” in Proc. Int. Symp. Circ. Syst. (ISCAS), Vancouver, Canada, May 2004, Vol. III, pp. 221–224. 50 Chromatic Derivatives, Chromatic Expansions and Associated Function Spaces Aleksandar Ignjatović School of Computer Science and Engineering, University of New South Wales, and National ICT Australia (NICTA), Sydney, Australia; ignjat@cse.unsw.edu.au Abstract: We present the basic properties of the chromatic derivatives and the chromatic expansions as well as a motivation for introducing these notions. The chromatic derivatives are special, numerically robust linear differential operators; the chromatic expansions are the associated local expansions, which possess the best features of both the Taylor and the Nyquist expansions. This makes them potentially useful in fields involving sampled data, such as signal and image processing. 1. Motivation The Nyquist–(Whittaker–Kotelnikov–Shannon) expanP∞ sion f (t) = f (n) sin π(t − n)/π(t − n) of a n=−∞ π-band limited signal of finite energy f (t) ∈ BL(π) is of global nature, because it requires samples of the signal at integers of arbitrarily large absolute value. On the other hand, since signals from BL(π) are analytic functions, they can be represented by the Taylor expanPalso ∞ sion, f (t) = n=0 f (n) (0) tn /n!. Such expansion is of local nature, because the values of the derivatives f (n) (0) are determined by the values of the signal in an arbitrarily small neighborhood of zero. While the Nyquist expansion has a central role in digital signal processing, the Taylor expansion is of very limited use there, for several reasons. (1) Numerical evaluation of higher order derivatives of a signal from its samples is very noise sensitive; in general, one is cautioned against numerical differentiation of signals given by empirical samples. (2) The Taylor expansion of a signal f ∈ BL(π) converges non-uniformly; its truncations are unbounded and have rapid error accumulation. (3) The Nyquist expansion of a signal f ∈ BL(π) converges to f in BL(π) and thus the action of a filter A on any f ∈ BL(π) can be expressed using the samples of f and the impulse response A[sinc ] of A, i.e., A[f ](t) = ∞ X n=−∞ 2. Chromatic Derivatives To explain our notions, we first consider normalized and rescaled Legendre polynomials PnL (ω) which satisfy Z π 1 L P L (ω) Pm (ω)dω = δ(m − n), 2π −π n and then define operator polynomials µ ¶ d 1 L n . Kt = n Pn i dt i (2) It is easy to verify that for f ∈ BL(π) and its Fourier transform f[ (ω) we have Z π 1 n (ω) ei ωt dω. K [f ](t) = in PnL (ω)f[ 2π −π Figure 1 compares the plots of PnL (ω) and ω n /π n for n = 15 to n = 18, which are the transfer functions (save a factor of in ) of the operators Kn and of the (normalized) derivatives 1/π n dn /dtn , respectively. While the transfer functions of the normalized “standard derivatives” 1/π n dn /dtn obliterate the spectrum of the signal, leaving only its edges which in practice contain mostly noise, the transfer functions of operators Kn form a family of well separated, interleaved and increasingly refined comb filters. Due to their spectrum preserving property, we call the operators Kn the chromatic derivatives associated with the Legendre polynomials. Both analytic estimates and empirical tests have shown that the chromatic derivatives 2 0.5 1 -3 -2 1 -1 2 3 -3 -2 -1 1 2 3 -1 f (n) A [sinc ] (t − n). (1) In contrast, the polynomials obtained by truncating the Taylor series do not belong to BL(π) and nothing similar to (1) holds for the Taylor expansion. SAMPTA'09 The chromatic derivatives and the chromatic expansions and approximations were introduced to obtain local signal representations which do not suffer from these problems. -0.5 -2 Figure 1: Graphs of PnL (ω) (left) and ω n /π n (right), for n = 15 − 18. 51 can be accurately and robustly evaluated from samples of the signal taken at a small multiple (2 to 4) of the usual Nyquist rate, thus solving problem (1) associated with numerical evaluation of the standard derivatives, mentioned above. Chromatic expansions, on the other hand, were introduced to solve problems (2) and (3). 3. Chromatic Approximations Proposition 1 Let Kn be the chromatic derivatives associated with the Legendre polynomials, and let f (t) be any analytic function; then for all t, f (t) = P∞ n n=0 (−1) K n [f ](u) K n [sinc ](t − u). (3) If f ∈ BL(π) the series converges uniformly and in L2 . The series in (3) is denoted by CE[f, u](t) and is called the chromatic expansion of f (t) associated with the Legendre polynomials; a truncation of this series up to first n + 1 terms is denoted by CA[f, n, u](t) and is called a chromatic approximation of f (t). Just like a Taylor approximation, a chromatic approximation is also a local approximation: its coefficients are the values of differential operators Km [f ](u) at a single instant ¯ u, and for all k ≤ n, f (k) (u) = dk /dtk CA[f, n, u](t)¯t=u . Figure 2 compares the behavior of the chromatic approximation (black) of a signal f ∈ BL(π) (gray) with the behavior of the Taylor approximation of f (t) (dashed). Both approximations are of order sixteen. The plot reveals that, when approximating a signal f ∈ BL(π), a chromatic approximation has a much gentler error accumulation when moving away from the point of expansion than the Taylor approximation of the same order. Functions Kn [sinc ](t) appearing in the chromatic expansion associated with the Legendre polynomials are given √ by Kn [sinc ](t) = (−1)n 2n + 1 jn (πt), where jn is the spherical Bessel function of the first kind of order n. Thus, unlike the monomials that appear in the Taylor formula, functions Kn [sinc ](t) belong to BL(π) and satisfy |Kn [sinc ](t)| ≤ 1 for all t ∈ R. Consequently, the chromatic approximations are bounded on R and belong to BL(π). Also, as Proposition 1 asserts, the chromatic approximation of a signal f ∈ BL(π) converges in BL(π). Thus, if A is a filter, then A commutes with the differential operators Kn and for every f ∈ BL(π), we have the 2.0 1.5 1.0 0.5 -15 -10 5 -5 10 15 following analogue of (1): A[f ](t) = ∞ X n=0 (−1)n Kn [f ](0) Kn [A[ sinc ]](t). Thus, while local, the chromatic expansion possesses the features that make the Nyquist expansion useful in signal processing. This, together with numerical robustness of the chromatic derivatives, makes chromatic approximations applicable in fields involving empirically sampled data, such as digital signal and image processing. The next proposition demonstrates another remarkable feature of the chromatic derivatives which is relevant to signal processing. Proposition 2 Let Kn be the chromatic derivatives associated with the (re-scaled and normalized) Legendre polynomials, and f, g ∈ BL(π). Then ∞ X K n [f ](t)2 = K n [f ](t)K n [g](t) = n=0 K n Z ∞ Z ∞ f (x)2 dx; f (x)g(x)dx; −∞ n=0 ∞ X ∞ −∞ n=0 ∞ X Z [f ](t)Ktn [g(u − t)] = −∞ f (x)g(u − x)dx. Thus, the sums on the left hand side of the above equations do not depend on the choice of the instant t. Note that the above equations provide local representations of the usual norm, the scalar product and the convolution, respectively, which are defined in L2 globally, as improper integrals. Given the above properties of the Legendre polynomials, it is natural to ask if other families of orthonormal polynomials have similar properties. This question was answered in [1]. 4. General Chromatic Derivatives Let M : Pω → R be a linear functional on the vector space Pω of real polynomials in the variable ω. Such M is called a moment functional and µn = M(ω n ) is the moment of M of order n. Definition 1 A moment functionals M is chromatic if it satisfies the following conditions (condition (iii) is not essential, but simplifies the technicalities): (i) M is positive definite; 1/n (ii) lim supn→∞ µn /n < ∞; (iii) M is symmetric, i.e., µ2n+1 = 0 for all n. -0.5 -1.0 -1.5 Figure 2: Chromatic approximation (black) and Taylor’s approximation (dashed) of a signal from BL(π) (gray). For functionals M which satisfy conditions (i) and (iii) there exists a family of real polynomials {PnM (ω)}n∈N , such that PnM (ω) contains only powers of ω of the same parity as n and which are orthonormal with respect to M; i.e., for all m, n, M M(Pm (ω) PnM (ω)) = δ(m − n). SAMPTA'09 52 The family {PnM (ω)}n∈N is a family of orthonormal polynomials which corresponds to a symmetric positive definite moment functional M just in case there exists a sequence of positive reals {γn }n∈N such that M Pn+1 (ω) = 1 γn ω PnM (ω) − γn−1 γn M Pn−1 (ω). (4) For every positive definite moment functional there exists a non-decreasing bounded function a(ω), called an m– distribution function, such that for the associated Stieltjes integral we have R∞ n (5) ω da(ω) = µn , −∞ R∞ M M P (ω) Pm (ω) da(ω) = δ(m − n). (6) −∞ n If M is chromatic, then condition (3) implies that {PnM (ω)}n∈N is a complete system in L2a(ω) . Let ϕ ∈ L2a(ω) ; we can define a corresponding function fϕ : R → C by R∞ (7) fϕ (t) = −∞ ϕ(ω)eiωt da(ω), and one can show that (7) can be differentiated under the integral sign any number of times. Setting ¡ ¢ Kn = i1n PnM (ω) i ddt we get that for all t Kn [fϕ ](t) = R∞ −∞ in PnM (ω) ϕ(ω) eiωt da(ω), (8) i.e., hϕ(ω)eiωt , PnM (ω)ia(ω) = (− i)n Kn [fϕ ](t). Thus, ϕ(ω)eiωt = (− i)n Kn [fϕ ](t)PnM (ω), and by Parseval’s Theorem, for every t ∈ R, P∞ 2 n 2 iωt k2 a(ω) = kϕ(ω)ka(ω) . n=0 |K [fϕ ](t)| = kϕ(ω)e P∞ Thus, if ϕ ∈ L2a(ω) , then the sum n=0 |Kn [fϕ ](t)|2 converges to a constant function on R. If we let R∞ (9) m(t) = −∞ eiωt da(ω), then (5) implies m(k) (0) = ik µk . It can be shown that condition (iii) of Definition 1 implies that m(t) is analytic at every t ∈ R (moreover, it is analytic on a strip in C; see [2]). For the chromatic approximation associated with M, Pn CAM [f, n, u](t) = k=0 (−1)k Kk [f ](u)Kk [m](t − u), one can show that ¯ ¯2 P∞ |fϕ (t) − CAM [fϕ , n, u](t)| < k=n+1 ¯Kk [fϕ ](u)¯ . P∞ Thus, fϕ (t) = k=0 (−1)k Kk [fϕ ](u) Kk [m](t − u), and the convergence is uniform on R. Definition 2 LM 2 denotes P∞ the space of functions analytic on R which satisfy k=0 Kk [f ](0)2 < ∞. Let f (t) ∈ LM 2 ; then P∞ ϕf (ω) = k=0 (−i)k Kk [f ](0)PkM (ω) belongs to L2a(ω) and for all t, R∞ f (t) = −∞ ϕf (ω) eiωt da(ω). On the space LM 2 one can now introduce locally defined norm, inner product and convolution using equations from Proposition 2, and for every fixed u, the chromatic expansion of an f ∈ LM 2 is just the Fourier series of f in the orthonormal and complete base {Kun [m(t − u)]}n∈N . SAMPTA'09 5. Examples Example 1. (Legendre polynomials/Spherical Bessel functions) Let√Ln (ω) be the Legendre polynomials; if we set PnL (ω) = 2n + 1 Ln (ω/π), then Rπ L L P (ω)Pm (ω) d2πω = δ(m − n). −π n The corresponding recursion coefficients pin equation (4) are given by the formula γn = π(n+1)/ 4(n + 1)2 − 1. In thisp case m(t) = sinc t, and Kn [m](t) = (−1)n (2n + 1) jn (πt), where jn (x) is the spherical Bessel function of the first kind of order n. The corresponding space LM 2 consists of all analytic functions which belong to L2 and have a Fourier Transform supported in [−π, π]. Example 2. (Chebyshev polynomials of the first kind/Bessel functions) Let PnT (ω) be the family of orthonormal polynomials obtained by normalizing and rescaling the Chebyshev polynomials of the √ first kind, Tn (ω), by setting P0T (ω) = 1 and PnT (ω) = 2 Tn (ω/π) for n > 0. In this case Rπ T T P (ω)Pm (ω) q dω ω 2 = δ(n − m). −π n π 2 1−( π ) The corresponding function (9) is m(t) = J0 (πt) and √ Kn [m](t) = (−1)n 2 Jn (πt) for n > 0, where Jn (t) is the Bessel function of the first kind of order n. In the recurrence √ relation (4) the coefficients are given by γ0 = π/ 2 and γn = π/2 for n > 0. The corresponding space LM 2 consists of analytic functions whose Fourier transform f[ (ω) is supported in (−π, π) and satisRπ p fies −π 1 − (ω/π)2 |f[ (ω)|2 dω < ∞. The chromatic expansion of a function f (t) is the Neumann series √ P∞ n f (t) = f (0)J0 (πt) + 2 n=1 K [f ](0)Jn (πt). Thus, the chromatic expansions corresponding to various families of orthogonal polynomials can be seen as generalizations of the Neumann series, while the families of corresponding functions {Kn [m](t)}n∈N can be seen as generalizations (and a uniform representation) of some familiar families of special functions. Example 3. (Hermite polynomials/Gaussian monomial functions) Let Hn (ω) be the Hermite polynomials; then the polynomials given by PnH (ω) = (2n n!)−1/2 Hn (ω) satisfy R∞ −∞ H PnH (ω)Pm (ω) 2 −ω e√ π dω = δ(n − m). The corresponding function defined by (9) √ is m(t) = 2 2 −t /4 n n n −t /4 e and K [m](t) = (−1) t e / 2n n!. The corresponding recursion coefficients are given by γn = p (n + 1)/2. The corresponding space LM 2 consists of analytic functions whose Fourier transform f[ (ω) satisfies R∞ 2 |f[ (ω)|2 eω dω < ∞. The chromatic expansion of −∞ 2 f (t) is just the Taylor expansion of f (t) et 2 by e−t /4 . /4 , multiplied 53 6. Weakly Bounded Moment Functionals 7. To study local (i.e., non-uniform) convergence of chromatic expansions, we somewhat restrict the class of moment functionals we consider. If M is weakly bounded, the functions do not Pperiodic ∞ n belong to LM K [sin ωt]2 diverges. 2 ; for example, n=0 We now consider some inner product spaces in which pure harmonic oscillations have finite positive norms ([3, 2]). Definition 3 Let M be a symmetric positive definite moment functional and let γn > 0 be such that (4) holds. (i) M is weakly bounded if there exist some M ≥ 1, some 0 ≤ p < 1 and some integer r, such that for all n ≥ 0, 1/M ≤ γn ≤ M (n + r)p and γn /γn+1 ≤ M 2 . (ii) M is bounded if there exists some M ≥ 1 such that 1/M ≤ γn ≤ M for all n ≥ 0. Thus, every bounded moment functional is also weakly bounded with p = 0. Functionals in our Example 1 and Example 2 are bounded. For bounded moment functionals the corresponding m-distribution a(ω) has a finite support and consequently m(t) is a band-limited signal. However, m(t) can be of infinite energy (i.e., not in L2 ) as is the case in our Example 2. Moment functional in Example 3 is weakly bounded but not bounded (p = 1/2). We note that all important examples of classical orthogonal polynomials which correspond to weakly bounded moment functionals in fact satisfy a stronger condition 0 < limn→∞ γn /np < ∞ for some 0 ≤ p < 1. Lemma 3 If M is a weakly bounded moment functional, 1/k then limk→∞ (µk /k!) P∞ n= 0.n Thus, M is chromatic; moreover, m(z) = n=0 i µn z /n! is an entire function on C. Lemma 4 Let M be weakly bounded and p < 1 as in Definition 3(i); then for every integer k ≥ 1/(1 − p) there exists K > 0 and a polynomial P (x) such that for every n ∈ N and every z ∈ C, k |Kn [m](z)| < |Kz|n P (|z|)e|Kz| /n!1−p . This Lemma is used to prove the following Proposition. Proposition 5 Let M be as in Lemma 4, f (z) an entire function and u ∈ C. If limn→∞ |f (n) (u)/n!1−p |1/n = 0, then the chromatic expansion of f (z) centered at u converges everywhere to f (z), and the convergence is uniform on every disc of finite radius. Thus, if M is bounded (p = 0) and f is an entire function, then the chromatic expansion CE[f, u](t) converges to f (t) for all t. Many well known equalities for the Bessel functions Jn (t) are just the special cases of chromatic expansions. For example, the chromatic expansions of f (t) = eiωt , f (t) = 1 and f (t) = m(t + u) yield P n M n eiωt = ∞ n=0 i Pn (ω) K [m](t); ³ ´ Qn γ2k−2 P∞ 2n m(t) + n=1 k=1 γ2k−1 K [m](t) = 1, P∞ m(t + u) = n=0 (−1)n Kn [m](u)Kn [m](t), which generalize the following well known equalities: P ei ωt = J0 (t) + 2 ∞ in Tn (ω)Jn (t); P∞ n=1 J0 (t) + 2 n=1 J2n (t) = 1; P∞ J0 (t + u) = J0 (u)J0 (t) + 2 n=1 (−1)n Jn (u)Jn (t). SAMPTA'09 Non-Separable Inner Product Spaces Definition 4 Assume again that M is weakly bounded and let p be as in Definition 3. We denote by C M the vector space of analytic functions such that the sequence Pn νnf (t) = 1/(n + 1)1−p k=0 Kk [f ](t)2 converges uniformly on every finite interval. Proposition 6 Let f, g ∈ C M and Pn σnf g (t) = 1/(n + 1)1−p k=0 Kk [f ](t)Kk [g](t); then the sequence {σnf g (t)}n∈N converges to a constant function. In particular, νnf (t) is constant. Corollary 7 Let C0M be the vector space consisting of analytic functions f (t) such that limn→∞ νnf (t) = 0; then in the quotient space C2M = C M /C0M the limit limn→∞ σnf g (t) is independent of t and defines a scalar product on C2M . Proposition 8 Let M correspond to Chebyshev polynomials as in our Example √ 2; then functions fω (t) = √ 2 sin ωt and gω (t) = 2 cos ωt for all 0 < ω < π form an uncountable orthonormal system of vectors in C2M . Proposition 9 Let M correspond to Hermite polynomials as in our Example 3; then for all ω > 0 functions fω (t) = sin ωt and gω (t) = cos ωt form an uncountM able orthogonal system of vectors in C2M , and kfω k = √ 2 M 4 ω /2 kgω k = e / 2π. Conjecture 1 Assume that for some 0 ≤ p < 1 the recursion coefficients γn in (4) are such that γn /np converges to a finite positive limit. Then, for the corresponding family of orthogonal polynomials we have Pn 0 < limn→∞ 1/(n + 1)1−p k=0 PkM (ω)2 < ∞ for all ω in the support sp(a) of the corresponding mdistribution function a(ω). Thus, in the corresponding space C2M all pure harmonic oscillations with positive frequencies ω ∈ sp(a) have finite positive norm and are mutually orthogonal. Detailed presentation of the theory of chromatic derivatives can be found in our references; preprints of some unpublished manuscripts are available at http://www.cse.unsw.edu.au/˜ignjat/diff. References: [1] A. Ignjatovic. Local approximations based on orthogonal differential operators. Journal of Fourier Analysis and Applications, 13(3), 2007. [2] A. Ignjatovic. Chromatic derivatives and associated function spaces. manuscript, 2008. [3] A. Ignjatovic. Chromatic derivatives and local approximations. to appear in: IEEE Transactions on Signal Processing, 2009. 54 Estimation of the Length and the Polynomial Order of Polynomial-based Filters Djordje Babic(1), and Heinz G. Göckler(2) (1) Faculty of Computer Science, University Union, Belgrade, Knez Mihailova 6/VI, 11000 Belgrade, Serbia. (2) DISPO, Faculty of Electrical Engineering and Information Sciences, Ruhr-Universität, Bochum, Germany. djbabic@raf.edu.rs, goeckler@nt.rub.de Abstract: In many signal processing applications it is beneficial to use polynomial-based interpolation filters for sampling rate conversion. Actual implementations of these filters can be performed effectively by using the Farrow structure or its modifications. In the literature, several design methods have been proposed. However, estimation formulae for the number of polynomialsegments defining the finite length of the underlying continuous-time filter impulse response and the order of polynomials have not been known. This contribution presents estimation formulae for the length and the polynomial order of polynomial-based filters for various types of requirements. The formulae presented here can save time in designing, since they provide good starting values of length and order for a given set of requirements. 1. Introduction In many signal processing applications it is required to determine signal samples at arbitrary positions between existing samples of a discrete-time signal. In these cases, it is beneficial to use polynomial-based interpolation filters. For these filters, an efficient overall implementation can be achieved by using a continuoustime impulse response ha(t) having the following properties [1], [2]; First, ha(t) is nonzero only in a finite interval 0≤t<NT with N being an integer. Second, in each subinterval nT≤t<(n+1)T, for n=0, …, N−1, ha(t) is expressible as a polynomial of t of a given (low) order M. Third, ha(t) is symmetric with respect to t = NT/2 to guarantee phase linearity of the resulting overall system. The length of polynomial segments, T, can be selected to be equal to the input Tin or output Tout sampling interval, a fraction of the input or output sampling interval, or an integer multiple of the input or output sampling interval. The advantage of the above system lies in the fact that the actual implementation can be efficiently performed by using the Farrow structure [3] or its modifications [4], [5]. In the literature, several design methods have been proposed [1], [2], [4]. However, estimation formulae for the number N of polynomial-segments and the order M of polynomial have not been known. This contribution presents the missing estimation formulae for the length N SAMPTA'09 and polynomial order M for various types of requirements. The formulae presented subsequently can save time for the filter designers, because they get suitable starting values for N and M that can be used for the given set of requirements. The formulae can also be used to estimate implementation costs of Farrow filter as subsystem of general sampling rate converters, for example, in optimal factorization of multistage decimation (interpolation). 2. Polynomial-based filters As it has been originally suggested in [1], [2] when deriving the modified Farrow structure for interpolation, it is beneficial to construct ha(t) as follows: N −1 M ha (t ) = ∑ ∑ cm (n) f m (n, T , t ) n =0 m = 0 (1) where the number of polynomial segments N is an integer. The basis functions fm(n, T, t), as defined in [1], are given by m  2(t − nT )   − 1 for nT ≤ t < (n + 1)T f m (n, T , t ) =  T   otherwise, 0 (2) where the common polynomial order of all segments is M. The coefficients cm(n) are the adjustable parameters being related to each other by  cm (n) for m even cm ( N − 1 − n) =  − cm (n) for m odd (3) for n = 0, 1,…, N−1, as consequence of the symmetry properties required above. The resulting ha(t) is characterized by the following properties: (i) ha(t) is nonzero for 0≤ t < NT and zero elsewhere; (ii) in each subinterval nT ≤ t < (n +1)T for n = 0 , …, N−1, ha(t) is expressed as a polynomial of degree M; (iii) ha(t) is symmetric about t = NT/2, that is, ha(NT−t) = ha(t) . Based on Property (iii), it is guaranteed that the resulting overall system has a linear phase, a very attractive property for many applications. Furthermore, the generation of the above ha(t) guarantees that, in the frequency domain, the zero-phase frequency response, when omitting the linear-phase term, is expressible as (see [1] for details) N / 2 −1 M H a ( j 2πf ) = ∑ ∑ cm (n)Gm (n, T , f ) , n =0 m =0 (4) where Gm(n, T, f ) is the Fourier transform of 55 g m (n, T , t ) = (− 1) f m (n, T , t − NT / 2) m + f m ( N − 1 − n, T , t − NT / 2) . (5) Since the above approximating function is linear with respect to the unknown coefficients cm(n), it enables one to optimize the overall filter to meet the given criteria in a manner similar to that used for synthesizing various types of linear-phase FIR filters [6]. In the above, T, the length of the polynomial segments, can be used to define different implementation structures as discussed in [4], [5]. As seen in [4], [5], T can be chosen as T = βTin or T = βTout, where β is unity, an integer, or one divided by an integer. The selection depends on whether decimation or interpolation is under consideration, and on the structural needs for efficient implementation. The actual implementation can be efficiently performed by using the Farrow structure [3] or its modifications [4], [5]. For all these structure the number of fixed coefficients depends on the number N of polynomial segments and the order M of the polynomial in each segment. The total number of multipliers, exploiting the symmetry properties of (3), is given by for N even  N ⋅ ( M + 1) / 2 S= , (6) ( N − 1)(M + 1) / 2 + ( M + 1) / 2 for N odd. For the purpose of illustration, the modified Farrow structure [1] is used with T=Tin. It should be pointed out that, in a practical realization, the coefficients’ symmetry of the FIR branches will be exploited, and a single delay line can be shared with all branches. 3. Review of minimax design method This section reviews minimax design method of polynomial-based filters of arbitrary length and order, as presented in [1], [2], for which we estimate N and M. To this end, we assume a lowpass signal x(n)↔X(ejΩin). Its sampling rate Fin=1/Tin shall be converted by an arbitrary ration according to Fout=RFin yielding y(l)↔Y(ejΩout). In case of R>1 (R<1) the system realizes interpolation (decimation). The ultimate aim is to determine a continuous-time, finite-length impulse response ha(t) of the sampling rate conversion system such that the Fourier transform of ha(t) meets following requirements [4] , [7]: (1 − δ p ) ≤ H a ( f ) ≤ (1 + δ p ) for f ≤ f p = αF / 2 Ha ( f ) ≤ δs for f ∈ Φ s , (7) where [F / 2, ∞ ] ∞  Φ s =  kF − f p , kF + f p  k =1  F − f ,∞ p  [ [ ] for Case A ] for Case B (8) for Case C. In all three cases, the signal is preserved according to the given tolerance in the passband region [0, fp]. Furthermore, the aliasing components are attenuated in the defined manner. In Case A, all components aliasing into the baseband [0, F/2] are attenuated. In Case B, all SAMPTA'09 components aliasing into the passband [0, fp] are attenuated, but aliasing is allowed in the transition band [fp, F/2]. In Case C, aliasing into the transition band [fp, F/2] is allowed only from the band [F/2, F+fp]. In the above discussion and in (7) and (8) F stands for Fout in a decimation case, and Fin in an interpolation case. The minimax optimization method introduced in [1], [2] is probably the most convenient and the most flexible solution for designing polynomial-based interpolation filters: Minimax Optimization Problem: Given N, M, and a compact subset Φ ⊂ [0,∞) as well as a desired function D( f ) being continuous for f ∈ Φ and a weight function W( f ) being positive for f ∈ Φ , find the (M +1)N/2 unknown coefficients cm(n) to minimize δ ∞ = max W ( f )[H a ( f ) − D( f )] f ∈Φ (9) subject to the given time-domain conditions of ha(t). Here, Ha( f ) is the real-valued frequency response and D(f ) is the desired function according to specifications. (For details refer to [2]). The design procedure has been generalized, and modified for optimization of prolonged and transposed prolonged polynomial-based filters [4]. The minimax design method has several design parameters. First of all, the design parameters include passband and stopband regions Φp and Φs. The desired filter may have several passbands and stopbands as stated in [2]. Next, the minimum stopband attenuation δs, and maximum allowable passband ripple δp are also included. Other design parameters are the number of polynomial segments N and the order M of the polynomial, which determine the number of multipliers in the overall structure, see (6). Finally, some weighting function can be used to give different weights to passband and stopband [2]. Hence we give estimation formulae for the number N of polynomial segments and the order M of polynomial for a minimax design. 4. Estimation of N and M In the previous section, we have seen that the number of polynomial segments N and the order M of the polynomial, are the design parameters that highly influence the performance of the filter in the frequency domain. Furthermore, the cost of realization, i.e. the number of multipliers, of a filter can be estimated by introducing the required values for N and M into (6). It would be very beneficial to estimate N and M by only using the given specifications of the filter in the frequency domain. Similar order estimation formulae exist for FIR filters, for example Kaiser order estimation [6], [8]. In the actual implementation, polynomial-based filters can be modeled as FIR filters [4]. Thus, we can start from the Kaiser formula and adapt it to polynomialbased filters. To this end, a lot of filters were designed, by using different system specifications, in order to adapt the Kaiser formula to polynomial-based case. The obtained estimation formula for the number of polynomial segments N, is rather similar to Kaiser formula for the order estimation of FIR filters. The 56  A − 10 log10 (W ) − 8.4  N e = 2 s   30.4( f s − f p ) / F  65 60 Stopband attenuation As in dB where As=-20log10(δs) is the required attenuation in stopband, and W=δp/δs represents weighting between required tolerances in passband and stopband. The next problem is to find the minimum value of the polynomial order M to meet the specifications. It has been observed that the required value of M depends on the type of requirements from (7) and (8). Never the less, it is possible to consider the following estimate as good starting point for all three types of requirements: 55 50 45 40 35 2 4 6 8 10 12 14 16 18  A − 20 ⋅ log10 (W )  + log10 (W ) +1. Me =  s 2 .5   20 Number of polynomial segments N (a) 65 55 s Stopband attenuation A in dB (12) It has been observed that if transition band is relatively large to the sampling frequency, that is when (fs-fp)/F ≥0.5, the required value of polynomial order M is lowered by one. The estimation formula cannot be used when the transition band is very small, i.e., in the case when (fs-fp)/F<0.1. However, even in this border situation required value of M is always smaller than Me given by (12). Thus, the estimation formula (12) for the polynomial order M can be used to estimate the upper border for M for all types of requirements. 60 50 45 40 35 (11) 5. Design Examples 0 1 2 3 4 5 6 7 Polynomial order M (b) Fig. 1. Case A specifications: The passband and stopband edges are at fp=0.4Fin and at fs=0.5Fin, and stopband weighting W=100. (a) The curves are shown for M equals 0 to 7. Dashed line is plot obtained from the estimation formula for N. (b) The curves are shown for N equals 2 to 20. Dashed line is plot obtained from the estimation formula for M. estimation formula for N, which can be found in [9], is not accurate enough. Hence, we propose the more accurate formula:  − 20 log10 ( δ pδ s ) − 8.4   N e = 2  30.4( f s − f p ) / F    (10) where δp and δs are the maximum deviations of the amplitude response from unity for f∈[0,fp] and the maximum deviation from zero for f∈Φs, respectively. Here, x stands for the smallest integer which is larger or equal to x. It has been observed that in most cases the above estimation formula is rather accurate with only a 2% error. The formula above is valid for all three types of requirements, i.e., A, B, and C, as given by (7) and (8). However, if the transition band is narrow, i.e., in the case when (fs-fp)/F≤0.1, the required value of N should be increased by 2. Further, in the case of very narrow transition band ((fs-fp)/F ≤0.05) the formula can not be used. The kernel of the estimation formula for the number N of polynomial segments can be expressed in a different form: SAMPTA'09 This part gives several examples to illustrate the performance of the presented formulae. To illustrate this, the following specifications are considered: Case A specifications: The passband and stopband edges are at fp=0.4Fin and at fs=0.5Fin. Case B specifications: The passband and stopband edges are at fp=0.35Fin and at fs=0.65Fin. Case C specifications: The passband and stopband edges are at fp=0.35Fin and at fs=0.65Fin. In each case, several filters have been designed in minimax sense with the passband weighting equal to unity and stopband weightings of W=100. The degree of the polynomial in each subinterval M varies from 0 to 7. The number of intervals N varies from 2 to 20. Recall that N is an even integer. Figures 1 give the results for Case A, the similar results for Case B are given in Fig. 2, and for Case C in Fig. 3. It can be observed that the estimation formulae are relatively good, as they estimate the border performance for the given set of requirements (dashed lines in Figs 1-3). 6. Conclusions In this paper, the estimation formulae for the number N of polynomial segments and the polynomial order M are presented. It has been shown that these estimates give the border performance of the filter for the given set of specifications. Formulae for N and M can be used to estimate the starting value of these two parameters in minimax optimization. Furthermore, the formulae for N and M can be used to estimate implementation costs of 57 110 110 100 100 Stopband attenuation A in dB 120 90 90 s Stopband attenuation As in dB 120 80 70 60 80 70 60 50 50 40 40 30 2 4 6 8 10 12 14 16 18 30 20 2 4 6 Number of polynomial segments N 8 10 (a) 14 16 18 20 (a) 130 120 120 110 110 Stopband attenuation A in dB 130 Stopband attenuation As in dB 12 Number of polynomial segments N 100 s 100 90 80 70 90 80 70 60 60 50 50 40 0 1 2 3 4 5 6 7 Polynomial order M 40 0 1 2 3 4 5 6 7 Polynomial order M (b) (b) Fig. 2. Case B specifications: The passband and stopband edges are at fp=0.35Fin and at fs=0.65Fin, and stopband weighting W=100. (a)The curves are shown for M equals 0 to 7. Dashed line is plot obtained from the estimation formula for N. (b) The curves are shown for N equals 2 to 20. Dashed line is plot obtained from the estimation formula for M. Fig. 3. Case C specifications: The passband and stopband edges are at fp=0.35Fin and at fs=0.65Fin, and stopband weighting W=100. (a) The curves are shown for M equals 0 to 7. Dashed line is plot obtained from the estimation formula for N. (b) The curves are shown for N equals 2 to 20. Dashed line is plot obtained from the estimation formula for M. the Farrow based filters for the given set of requirements. Formulae can also be used to estimate implementation costs of composed sampling rate converters containing Farrow, for example, in optimal factorization for multistage decimation (interpolation). Processing SMMSP’02 , Toulouse, France, September 2002, pp. 57−64. [5] D. Babic, Techniques for sampling rate conversion by arbitrary factors with applications in flexible communications receivers, Doctoral Thesis, Tampere University of Technology, 2004. [6] T. Saramäki, “Finite impulse response filter design,” Chapter 4 in Handbook for Digital Signal Processing, edited by S. K. Mitra and J. F. Kaiser, John Wiley & Sons, New York, 1993. [7] D. Babic, J. Vesma, T. Saramäki, M. Renfors, “Implementation of the transposed Farrow structure,” in Proc. 2002 IEEE Int. Symp. Circuits and Systems, Scotsdale, Arizona, USA, 2002, vol. 4, pp. 4−8. [8] J.F. Kaiser, "Nonrecursive Digital Filter Design Using the - sinh Window Function," Proc. 1974 IEEE Symp. Circuits and Systems, (April 1974), pp.20-23. [9]T. Saramäki, "Multirate Signal Processing," Lecture Notes, http://www.cs.tut.fi/~ts/ References: [1] J. Vesma and T. Saramäki, “Interpolation filters with arbitrary frequency response for all-digital receivers,” in Proc. 1996 IEEE Int. Symp. Circuits and Systems, Atlanta, Georgia, May 1996, pp. 568−571. [2] J. Vesma and T. Saramäki, “Polynomial-based interpolation Filters - Part I: Filter synthesis," Circuits, Systems, and Signal Processing, vol. 26, no. 2, pp. 115-146, March/April 2007. [3] C. W. Farrow, “A continuously variable digital delay element,”in Proc. 1988 IEEE Int. Symp. Circuits and Systems, Espoo, Finland, June 1988, pp. 2641−2645. [4] D. Babic, T. Saramäki, M. Renfors, “Conversion between arbitrary sampling rates using polynomialbased interpolation filters,” in Proc. 2nd Int. TICSP Workshop on Spectral Methods and Multirate Signal SAMPTA'09 58 Special session on Geometric Multiscale Analysis Chair: Gitta Kutyniok SAMPTA'09 59 SAMPTA'09 60 The Continuous Shearlet Transform in Arbitrary Space Dimensions, Frame Construction, and Analysis of Singularities S. Dahlke (1) , G. Steidl (2) and G. Teschke (3) (1) Philipps-Universität Marburg, FB12 Mathematik und Informatik, Hans-Meerwein Straße, Lahnberge, 35032 Marburg, Germany. (2) Universität Mannheim, Fakultät für Mathematik und Informatik, Institut für Mathematik, 68131 Mannheim, Germany. (3) University of Applied Sciences Neubrandenburg, Institute for Computational Mathematics in Science and Technology, Brodaer Str. 2, 17033 Neubrandenburg, Germany. dahlke@mathematik.uni-marburg.de, steidl@math.uni-mannheim.de, teschke@hs-nb.de Abstract: This note is concerned with the generalization of the continuous shearlet transform to higher dimensions. Similar to the two-dimensional case, our approach is based on translations, anisotropic dilations and specific shear matrices. We show that the associated integral transform again originates from a square-integrable representation of a specific group, the full n-variate shearlet group. Moreover, we verify that by applying the coorbit theory, canonical scales of smoothness spaces and associated Banach frames can be derived. We also indicate how our transform can be used to characterize singularities in signals. So far, the shearlet transform is well developed for problems in R2 . However, for analyzing higher-dimensional data sets, there is clearly an urgent need for further generalizations and applications. This is exactly the concern of this paper. One particular field of application is the geometrical structure analysis of multi-dimensional data, e.g. multimodal spectral measurements in meteorology. To our best knowledge, it seems that there exist only few results in this direction: some important progress has been achieved for the curvelet case in [1] and for surfacelets in [16]. However, for the shearlet approach the question has been completely open. 2. 1. Introduction Modern technology allows for easy creation, transmission and storage of huge amounts of data. Confronted with a flood of data, such as internet traffic, or audio and video applications, nowadays the key problem is to extract the relevant information from these sets. To this end, usually the first step is to decompose the signal with respect to suitable building blocks which are well–suited for the specific application and allow a fast and efficient extraction. In this context, one particular problem which is currently in the center of interest is the analysis of directional information. Due to the bias to the coordinate axes, classical approaches such as, e.g., wavelet or Gabor transforms are clearly not the best choices, and hence new building blocks have to be developed. In recent studies, several approaches have been suggested such as ridgelets [2], curvelets [3], contourlets [7], shearlets [14] and many others. For a general approach see also [13]. Among all these approaches, the shearlet transform stands out because it is related to group theory, i.e., this transform can be derived from a square-integrable representation π : S → U(L2 (R2 )) of a certain group S, the socalled shearlet group, see [5]. Therefore, in the context of the shearlet transform, all the powerful tools of group representation theory can be exploited. SAMPTA'09 Multivariate Continuous Shearlet Transform In this section, we introduce the shearlet transform on L2 (Rn ). This requires the generalization of the twodimensional parabolic dilation matrix and of the shear matrix, respectively. Let In denote the (n, n)-identity matrix and 0n , resp. 1n the vectors with n entries 0, resp. 1. For a ∈ R∗ := R \ {0} and s ∈ Rn−1 , we set ! a 0Tn−1 Aa := 1 0n−1 sgn(a)|a| n In−1 and Ss :=  1 sT 0n−1 In−1  . Lemma 1 The set R∗ × Rn−1 × Rn endowed with the operation (a, s, t) ◦ (a′ , s′ , t′ ) = (aa′ , s + |a|1−1/n s′ , t + Ss Aa t′ ) is a locally compact group S which we call full shearlet group. The left and right Haar measures on S are given by 1 dµl (a, s, t) = n+1 da ds dt |a| 61 and ω3 ✻ 1 da ds dt. dµr (a, s, t) = |a| In the following, we use only the left Haar measure and use the abbreviation dµ = dµl . For f ∈ L2 (Rn ) we define 1 −1 π(a, s, t)f (x) = fa,s,t (x) := |a| 2n −1 f (A−1 a Ss (x − t)). (1) It is easy to check that π : S → U(L2 (Rn )) is a mapping from S into the group U(L2 (Rn )) of unitary operators on L2 (Rn ). Recall that a unitary representation of a locally compact group G with the left Haar measure µ on a Hilbert space H is a homomorphism π from G into the group of unitary operators U(H) on H which is continuous with respect to the strong operator topology. 1 2 ❇ A nontrivial function ψ ∈ L2 (Rn ) is called admissible, if Z |hψ, π(a, s, t)ψi|2 dµ(a, s, t) < ∞. S If π is irreducible and there exits at least one admissible function ψ ∈ L2 (Rn ), then π is called square integrable. The following result shows that the unitary representation π defined in (1) is square integrable. Theorem 3 A function ψ ∈ L2 (Rn ) is admissible if and only if it fulfills the admissibility condition Z |ψ̂(ω)|2 Cψ := dω < ∞. (2) n Rn |ω1 | Then, for any f ∈ L2 (Rn ), the following equality holds true: Z |hf, ψa,s,t i|2 dµ(a, s, t) = Cψ kf k2L2 (Rn ) . (3) PP PP PP PP PP P P ❇ ✠ ω2 Lemma 2 The mapping π defined by (1) is a unitary representation of S. 2 ❇ ❇ ❇ ✲ ω1 ❇❇ Figure 1: Support of the shearlet ψ̂ for ω1 ≥ 0. 3.1 Shearlet Coorbit Spaces We consider weight functions w(a, s, t) = w(a, s) that are locally integrable with respect to a and s, i.e., n ′ ′ ′ w ∈ Lloc 1 (R ) and fulfill w ((a, s, t) ◦ (a , s , t )) ≤ ′ ′ ′ w(a, s, t)w(a , s , t ) and w(a, s, t) ≥ 1 for all (a, s, t), (a′ , s′ , t′ ) ∈ S. For 1 ≤ p < ∞, let Lp,w (S) := {F measurable : kF kLp,w (S) := Z S  p1 |F (g)| w(a, s, t) dµ(a, s, t) < ∞}, p p and let L∞,w be defined with the usual modifications. In order to construct the coorbit spaces related to the shearlet group we have to ensure that there exists a function ψ ∈ L2 (Rn ) such that SHψ (ψ) = hψ, π(a, s, t)ψi ∈ L1,w (S). S (4) In particular, the unitary representation π is irreducible and hence square integrable. Fortunately, it turns out that (4) can be satisfied in our setting. An example of a continuous shearlet can be constructed as follows: Let ψ1 be a continuous wavelet with ψ̂1 ∈ C ∞ (R) and supp ψ̂1 ⊆ [−2, − 12 ] ∪ [ 12 , 2], and let ψ2 be such that ψ̂2 ∈ C ∞ (Rn−1 ) and supp ψ̂2 ⊆ [−1, 1]n−1 . Then the function ψ ∈ L2 (Rn ) defined by   1 ω̃ ψ̂(ω) = ψ̂(ω1 , ω̃) = ψ̂1 (ω1 ) ψ̂2 ω1 Theorem 4 Let ψ be a Schwartz function such that supp ψ̂ ⊆ ([−a1 , −a0 ] ∪ [a0 , a1 ]) × Qb ,where Qb := [−b1 , b1 ] × · · · × [−bn−1 , bn−1 ]. Then we have that SHψ (ψ) ∈ L1,w (S), i.e., is a continuous shearlet. The support of ψ̂ is depiced for ω1 ≥ 0 in Fig. 1. 3. Multivariate Shearlet Coorbit Theory In this section we want to establish a coorbit theory based on the square integrable representation (1) of the shearlet group. We mainly follow the lines of [4]. For further information on coorbit space theory, the reader is referred to [8, 9, 10, 11, 12]. SAMPTA'09 khψ, π(·)ψikL1,w (S) = Z |SHψ (ψ)(a, s, t)| w(a, s, t) dµ(a, s, t) < ∞. S For ψ satisfying (4) we can consider the space H1,w := {f ∈ L2 (Rn ) : SHψ (f ) ∈ L1,w (S)}, (5) with norm kf kH1,w := kSHψ f kL1,w (S) and its anti∼ dual H1,w , the space of all continuous conjugate-linear ∼ functionals on H1,w . The spaces H1,w and H1,w are π-invariant Banach spaces with continuous embeddings ∼ H1,w ֒→ H ֒→ H1,w , and their definition is independent of the shearlet ψ. Then the inner product on L2 (Rn ) × 62 ∼ L2 (Rn ) extends to a sesquilinear form on H1,w × H1,w , ∼ therefore for ψ ∈ H1,w and f ∈ H1,w the extended representation coefficients k(cλ (f ))λ∈Λ kℓp,w ≤ Ckf kSCp,w SHψ (f )(a, s, t) := hf, π(a, s, t)ψiH∼ 1,w ×H1,w are well-defined. Now, for 1 ≤ p ≤ ∞, we define the shearlet coorbit spaces ∼ : SHψ (f ) ∈ Lp,w (S)} SCp,w := {f ∈ H1,w Shearlet Banach Frames ℓp,w := {c = (cλ )λ∈Λ : kckℓp,w := kcwkℓp < ∞}, where w = (w((a, s, t)λ ))λ∈Λ . versely, if (c (f )) ∈ ℓp,w , λ λ∈Λ P f = λ∈Λ cλ π((a, s, t)λ )ψ is in SCp,w and kf kSCp,w ≤ C ′ k(cλ (f ))λ∈Λ kℓp,w . The Feichtinger-Gröchenig theory provides us with a machinery to construct atomic decompositions and Banach frames for our shearlet coorbit spaces SCp,w . In a first step, we have to determine, for a compact neighborhood U of e ∈ S with non-void interior, so-called U –dense sets. A (countable) family X = ((a, s, t)λ )λ∈Λ in S is said to be U -dense if ∪λ∈Λ (a, s, t)λ U = S, and separated if for some compact neighborhood Q of e we have (ai , si , ti )Q ∩ (aj , sj , tj )Q = ∅, i 6= j, and relatively separated if X is a finite union of separated sets. 1 1 (7) Then the sequence 1 {(ǫαj , βαj(1− n ) k, S 1 βαj(1− n ) k j ∈ Z, k ∈ Z n−1 , m ∈ Z n , ǫ ∈ {−1, 1}} (8) is U -dense and relatively separated. Next we define the U –oscillation as k oscU kL1,w (S) < 1/kSHψ (ψ)kL1,w (S) . (9) Then, the following decomposition theorem, which was proved in a general setting in [8, 9, 10, 11, 12], says that discretizing the representation by means of an U -dense set produces an atomic decomposition for SCp,w . Theorem 6 Assume that the irreducible, unitary representation π is w-integrable and let an appropriately normalized ψ ∈ L2 (Rn ) which fulfills sup |hψ, π(u)ψi| ∈ L1,w (S) u∈(a,s,t)U (10) be given. Choose a neighborhood U of e so small that k oscU kL1,w (S) < 1. (11) Then for any U -dense and relatively separated set X = ((a, s, t)λ )λ∈Λ the space SCp,w has the following atomic decomposition: If f ∈ SCp,w , then X f= cλ (f )π((a, s, t)λ )ψ (12) λ∈Λ SAMPTA'09 if ii) there exist two constants 0 < D ≤ D′ < ∞ such that D kf kSCp,w ≤ k(hf, π((a, s, t)λ )ψiH∼ ) k ≤ D′ kf kSCp,w ; 1,w ×H1,w λ∈Λ ℓp,w (16) u∈U M hψ, π(a, s, t)i := (15) Then, for every U -dense and relatively separated family X = ((a, s, t)λ )λ∈Λ in G the set {π((a, s, t)λ )ψ : λ ∈ Λ} is a Banach frame for SHp,w . This means that oscU (a, s, t) := sup |SHψ (ψ)(u ◦ (a, s, t)) − SHψ (ψ)(a, s, t)|. (14) Theorem 7 Impose the same assumptions as in Theorem 6. Choose a neighborhood U of e such that if and only i) f ∈ SCp,w (hf, π((a, s, t)λ )ψiH∼ ∈ ℓ ; ) ×H λ∈Λ p,w 1,w 1,w Aαj γm) : Conthen Given such an atomic decomposition, the problem arises under which conditions a function f is completely determined by its moments hf, π((a, s, t)λ )ψi and how f can be reconstructed from these moments. This is answered by the following theorem which establishes the existence of Banach frames. Lemma 5 Let U be a neighborhood of the identity in S, and let α > 1 and β, γ > 0 be defined such that [α n −1 , α n ) × [− β2 , β2 )n−1 × [− γ2 , γ2 )n ⊆ U. (13) with a constant C depending only on ψ and with ℓp,w being defined by (6) with norms kf kSCp,w := kSHψ f kLp,w (S) . It holds that SC1,w = H1,w and SC1,1 = L2 (Rn ). 3.2 where the sequence of coefficients depends linearly on f and satisfies iii) there exists a bounded, linear reconstruction operatorR from ℓp,w to SCp,w such that  ) R (hf, ψ((a, s, t)λ )ψiH∼ = f. 1,w ×H1,w λ∈Λ It can be checked that the conditions (10), (11) and (15) can be satisfied, see [6] for details. 4. Analysis of Singularities In this section, we deal with the decay of the shearlet transform at hyperplane singularities. The following analysis generalizes techniques and results presented in [15] for two dimensions. An (n − m)-dimensional hyperplane in Rn , 1 ≤ m ≤ n − 1, not containing the x1 -axis can be written w.l.o.g. as       xm+1 0 x1  ..   ..   ..   .  + P  .  = . , xm | {z } xA | xn {z xE 0 } 63  p1T   P :=  ...  ∈ Rm,n−m .  pTm Then we obtain for νm := δ(xA + P xE ) with the Delta distribution δ that Z ν̂m (ω) = δ(xA + P xE )e−2πi(hxA ,ωA i+hxE ,ωE i) dx Rn Z = e−2πi(−hP xE ,ωA i+hxE ,ωE i) dxE Rn−m = δ(ωE − P T ωA ). (17) The following theorem describes the decay of the shearlet transform at hyperplane singularities. We use the notation SHψ f (a, s, t) ∼ |a|r as a → 0, if there exist constants 0 < c ≤ C < ∞ such that c|a|r ≤ SHψ f (a, s, t) ≤ C|a|r as a → 0. Theorem 8 Let ψ ∈ L2 (Rn ) be a shearlet satisfying ψ̂ ∈ C ∞ (Rn ). Assume further that ψ̂(ω) = ψ̂1 (ω1 )ψ̂2 (ω̃/ω1 ), where supp ψ̂1 ∈ [−a1 , −a0 ] ∪ [a0 , a1 ] for some a1 > a0 ≥ α > 0 and supp ψ̂2 ∈ Qb . If (sm , . . . , sn−1 ) = (−1, s1 , . . . , sm−1 ) P and (t1 , . . . , tm ) = −(tm+1 , . . . , tn ) P T , then SHψ νm (a, s, t) ∼ |a| 1−2m 2n as a → 0. (18) Otherwise, the shearlet transform SHψ νm decays rapidly as a → 0. Similar results can be derived for point singularities, see again [6] for details. References: [1] L. Borup and M. Nielsen, Frame decomposition of decomposition spaces, J. Fourier Anal. Appl., to appear. [2] E. J. Candès and D. L. Donoho, Ridgelets: a key to higher-dimensional intermittency?, Phil. Trans. R. Soc. Lond. A. 357 (1999), 2495–2509. [3] E. J. Candès and D. L. Donoho, Curvelets - A surprisingly effective nonadaptive representation for objects with edges, in Curves and Surfaces, L. L. Schumaker et al., eds., Vanderbilt University Press, Nashville, TN (1999). [4] S. Dahlke, G. Kutyniok, G. Steidl, and G. Teschke, Shearlet Coorbit Spaces and Associated Banach Frames, Preprint Nr. 2007-5, Philipps-Universitt Marburg, 2007. SAMPTA'09 [5] S. Dahlke, G. Kutyniok, P. Maass, C. Sagiv, H.-G. Stark, and G. Teschke, The uncertainty principle associated with the continuous shearlet transform, Int. J. Wavelets Multiresolut. Inf. Process. 6 (2008), 157181. [6] S. Dahlke, G. Steidl, and G. Teschke, The continuous shearlet transform in arbitrary space dimensions, Preprint Nr. 2008–7, Philipps-Universität Marburg 2008. [7] M. N. Do and M. Vetterli, The contourlet transform: an efficient directional multiresolution image representation, IEEE Transactions on Image Processing 14(12) (2005), 2091–2106. [8] H. G. Feichtinger and K. Gröchenig, A unified approach to atomic decompositions via integrable group representations, Proc. Conf. “Function Spaces and Applications”, Lund 1986, Lecture Notes in Math. 1302 (1988), 52–73. [9] H. G. Feichtinger and K. Gröchenig, Banach spaces related to integrable group representations and their atomic decomposition I, J. Funct. Anal. 86 (1989), 307–340. [10] H. G. Feichtinger and K. Gröchenig, Banach spaces related to integrable group representations and their atomic decomposition II, Monatsh. Math. 108 (1989), 129–148. [11] H. G. Feichtinger and K. Gröchenig, Non– orthogonal wavelet and Gabor expansions and group representations, in: Wavelets and Their Applications, M.B. Ruskai et.al. (eds.), Jones and Bartlett, Boston, 1992, 353–376. [12] K. Gröchenig, Describing functions: Atomic decompositions versus frames, Monatsh. Math. 112 (1991), 1–42. [13] K. Guo, W. Lim, D. Labate, G. Weiss, and E. Wilson, Wavelets with composite dilations and their MRA properties. Appl. Comput. Harmon. Anal. 20 (2006), 220–236. [14] K. Guo, G. Kutyniok, and D. Labate, Sparse multidimensional representations using anisotropic dilation und shear operators, in Wavelets und Splines (Athens, GA, 2005), G. Chen und M. J. Lai, eds., Nashboro Press, Nashville, TN (2006), 189–201. [15] G. Kutyniok and D. Labate, Resolution of the wavefront set using continuous shearlets, Trans. Amer. Math. Soc. 361 (2009), 2719-2754. [16] Y. Lu and M.N. Do, Multidimensional directional filterbanks and surfacelets IEEE Trans. Image Process. 16 (2007) 918–931. 64 Compressive-wavefield simulations Felix J. Herrmann, Yogi Erlangga, and Tim. T. Y. Lin Department of Earth and Ocean Sciences, the University of British Columbia, Canada fherrmann,yerlangga,tlin@eos.ubc.ca Abstract: Full-waveform inversion’s high demand on computational resources forms, along with the non-uniqueness problem, the major impediment withstanding its widespread use on industrial-size datasets. Turning modeling and inversion into a compressive sensing problem—where simulated data are recovered from a relatively small number of independent simultaneous sources—can effectively mitigate this high-cost impediment. The key is in showing that we can design a sub-sampling operator that commutes with the time-harmonic Helmholtz system. As in compressive sensing, this leads to a reduction in simulation cost. Moreover, this reduction is commensurate with the transform-domain sparsity of the solution, implying that computational costs are no longer determined by the size of the discretization but by transform-domain sparsity of the solution of the CS problem which forms our data. The combination of this sub-sampling strategy with our recent work on implicit solvers for the Helmholtz equation provides a viable alternative to full-waveform inversion schemes based on explicit finite-difference methods. 1. Introduction With the recent resurgence of full-waveform inversion— i.e., adjoint-state methods applied to solve PDEconstrained optimization problems—the computational cost of solving forward modeling has become one of the major impediments withstanding successful application of this technology to industry-size data volumes. To overcome this impediment, we argue that further improvements will depend on a problem formulation with a computational complexity that is no longer strictly determined by the size of the discretization but by transform-domain sparsity of its solution. In this new paradigm, we bring computational costs in par with our ability to compress solutions to certain PDEs. This premise is related to two recent developments. First, there is the new field of compressive sensing [CS in short throughout the paper, 4, 5]—where the argument is made, and rigorously proven—that compressible signals can be recovered from severely sub-Nyquist sampling by solving a sparsity promoting program. Second, there is in the seismic community the recent resurgence of simultaneous-source acquisition [1, 13, 2, 18, 12], and continuing efforts to reduce the cost of seismic modeling, imaging, and inversion through phase encoding of simultaneous sources [16, 21, 13, 12] and the removal of subsets SAMPTA'09 of angular frequencies [22, 17, 15, 12] or plane waves [24]. All these approaches correspond to instances of CS. By using CS principles, we have been able to remove the associated sub-sampling interferences through a combination of exploiting transform-domain sparsity, properties of certain sub-sampling schemes, and the existence of sparsity promoting solvers. 2. Compressive full-waveform inversion Full-waveform inversion entails solving PDE-constrained optimization problems of the following type:  1 min kRM d − DU k22 U, m 2 s.t. H[m]U = B, (1) where d and U are the observed data volumes and the solution of the multi-source (in its columns)-frequency Helmholtz equation over the domain of interest, D represents the detection operator that extracts the simulated data from time-harmonic solutions at the receiver locations, H a matrix with the discretized multi-frequency Helmholtz equation, and B a matrix with the frequency-transformed source distributions in its columns. In the above optimization problem (from which—after casting Eq. 1 in its unconstrained form—most quasi-Newton type full-waveform inversion schemes derive), solutions for the unknown velocity model, m, and for the wave equation, U, that minimize the energy mismatch are pursued. Because Eq. 1 is nonlinear in the model variables collected in the vector m, solutions of Eq. 1 require multiple solves of the (implicit) Helmholtz equation. Even after preconditioning (yielding a complexity for this solver of O(n4 ) in 2-D [7, 6]), this may prove computationally prohibitive. We address this problem by using CS [20, 12] to reduce the size of the seismic data volume through y = RMd where sub sampler }| { random phase encoder Ω RΣ z 1 ⊗ I ⊗ R1  {  }|   ∗  .. îθ RM =  ⊗ I F3 ,  F2 diag e . Ω RΣ ⊗ I ⊗ R n s′ n s′ z  with F2,3 the 2,3-D Fourier transforms, and θ = Uniform([0, 2π]) a random phase rotation. The matrices RΩ and RΣ represent CS-subsampling matrices (see Figure 1) acting along the rows (frequency coordinate) and columns (source coordinate) of the data volume, respectively. As shown by [12] application of this CS-sampling 65 matrix, RM, to the data is equivalent to applying it to the source wavefields directly, which turns single-impulsive sources into a smaller set (n′s ≪ ns with ns the number of separated single-impulsive sources) of time-harmonic simultaneous sources that are randomly phase encoded and that have for each source-experiment a different set of angular frequencies missing—i.e., there are n′f ≪ nf (with nf the number of frequencies of fully sampled data) frequencies non-zero (see Figure 1). This implies that the sub-sampling operator commutes with the Helmholtz system and this allows us to recast Eq. 1 into the following reduced form (consisting of fewer frequencies and fewer right-hand sides): 1 min ky − DUk22 U, m 2 s.t. H[m]U = B, (2) where the underlined quantities are related to the reduced Helmholtz system. 3. The time-harmonic Helmholtz system Since their inception, iterative implicit matrix-free solutions to the time-harmonic Helmholtz equation have been plagued by lack of numerical convergence for decreasing mesh sizes and increasing angular frequencies [19]. The inclusion of deflation—a way to handle small eigenvalues that lead to slow convergence [7, 6]—can successfully remove this impediment, bringing 2- and 3-D solvers for the time-harmonic Helmholtz into reach. For a given source (right-hand side b) and angular frequency ω (:= 2πf , with f the temporal frequency in Hz), the frequency-domain wavefield u is computed with a Krylov method that involves the following system of equations: H[ω]M−1 Qû = b, u = M−1 Qû, where H[ω], M, and Q represent the discretized monochromatic Helmholtz equation, the preconditioner, and the projection matrices, respectively. As shown by [8, 9], convergence is guaranteed by defining the preconditioning matrix M in terms of the discretized shifted or ω2 damped Helmholtz operator M := −∇ · ∇ − c(x) 2 (1 − √ β î), î = −1, with β > 0. With this preconditioning, the eigenvalues of HM−1 are clustered into a circle in the complex plane. By the action of the projector matrix Q, these eigenvalues move towards unity on the real axis. These two operations lower the condition number, which explains the superior performance of this solver. 4. Source-solution CS-sampling equivalence Aside from the required number of frequencies, the computational cost of full-wavefield simulation is determined by the number of sources—i.e., the number of right-hand sides. In the current simulation paradigm, the number of sources coincides with the number of single-impulsive source simulations. As prescribed by CS, this number can be reduced by designing a survey that consists of a relatively small number of simultaneous experiments with simultaneous sources that contain subsets of angular frequencies. Mathematically, we can accomplish this by applying a CSsampling matrix, RM, to the individual-impulsive sources collected in the vector s. If we can show that the solution SAMPTA'09 from this set of “compressed” sources s = RMs, is identical to the compressively sampled solution yielded from modeling the complete, we are in the position to speed up our computations. This speed up is the result of a decreased number of experiments and angular frequencies that are present in the simultaneous source vector. For this to work, the solution y must be equivalent to the solution y, obtained by compressively sampling the full solution. More specifically, we need to demonstrate that the solutions for the full and compressed systems are equivalent—i.e., y = y in   B = D∗    s |{z} impulsive sources  HU = B    y = RMDU := RMd   B = D∗ s = D∗    HU = B    y = DU. RMs | {z } sim. sources Here, H = diag(H[ωi ]) is the block-diagonal discretized Helmholtz equation for each ωi := 2πi · ∆f, i = 1 · · · nf , with nf the number of frequencies and ∆f its sample interval. The adjoint (denoted by ∗ ) of the detection matrix D injects the individual sources into the multiple righthand sides, B = [b1 b2 · · · bns ], with ns the number of shots. This detection matrix extracts data at the receiver positions. Its adjoint inserts data at the co-located source positions. Each column of U contains the wavefields for all frequencies induced by the shots located in the columns of B. Consequently, the full simulation requires the inversion of the block-diagonal system (for all shots), followed by a detection—i.e., we have d = DH−1 B, with H−1 = diag(H−1 [ωi ]), i = 1 · · · ns . After CS sampling, this volume is reduced to y = RMd by applying the flat rectangular CS-sampling matrix RM (defined explicitly in the next section) to the full simulation. Applying RM directly to the sources s leads to a compressed system H, which after inversion gives y. To illustrate why y is equivalent to y, consider a compressive sampling of the solution over frequency by the subsampling matrix RΩ (for clarity, we removed the orthonormal measurement matrix). This restriction matrix removes arbitrary rows from the right-hand side. By virtue of the block-diagonal structure of our system, we have RΩ H−1 = H−1 RΩ with H−1 = diag(H−1 [ωi ]), i ⊂ {1 · · · nf }, yielding RΩ U = H−1 B = U, where B := RΩ B. This means that frequency subsampling the right-hand side, followed by solving the system for the corresponding frequencies, is the same as solving the full system, followed by frequency subsampling. A similar argument holds when subsampling the shots (removing arbitrary columns of B). Now, we have the reduced system RΩ U(RΣ )∗ = H−1 B = U, with B := RΩ B(RΣ )∗ . Using Kronecker products, these relations can be written succinctly as (RΣ ⊗ RΩ )vec (U) = vec (U) and (RΣ ⊗ RΩ )vec (B) = vec (B) with vec (·) being a linear operator that maps a matrix into a lexicographicallysorted array. The inversion of HU = B is easier because it involves only a subset of angular frequencies and simultaneous shots—i.e., {U, B} contain only n′s columns with n′f frequency components each. Finally, the matrix D extracts the compressed data from the solution. 66 5. Recovery by sparsity promotion Aside from CS sampling the recovery from simultaneous simulations depends on a sparsifying transform that compresses seismic data, is fast, and reasonably incoherent with the CS sampling matrix. We accomplish this by defining the sparsity transform as the Kronecker product between the 2-D discrete curvelet transform [3] along the sourcereceiver coordinates, and the discrete wavelet transform along the time coordinate—i.e., S := C ⊗ W with C, W the curvelet- and wavelet-transform matrices, respectively. We reconstruct the seismic wavefield by solving the following nonlinear optimization problem e = arg min kxk1 x subject to Ax = y, subsampling ratios. As expected, the SNR for the simple model is better because of the reduced complexity, whereas the numbers in Table 1 for the complex model confirm increasing recovery errors for increasing subsampling ratios. Moreover, the bandwidth limitation of seismic data explains improved recovery with decreasing frequencyto-shot ratio for a fixed subsampling ratio. Because the speedup of the solution is roughly proportional to the subsampling ratio, we can conclude that speedups of four to six times are possible with a minor drop in SNR. (3) x e = S∗ x e the reconstruction, A := RMS∗ the CS with d matrix, and y (= y) the compressively simulated data (cf. Equation 2-right). Equation 3 is solved by SPGℓ1 [23], a projected-gradient algorithm with root finding. 6. Computational complexity analysis According to [19], the cost of the iterative Helmholtz solver equals nf ns nit O(nd ), typically with nit = O(n) the number of iterations. For d = 2 and assuming ns = nf = O(n), this cost becomes O(n5 ). Under the same assumption, the cost of a time-domain solver is O(n4 ). The iterative Helmholtz solver can only become competitive if nit = O(1), yielding an O(n4 ) computational complexity. [7, 6] achieve this by the method explained earlier. Despite this improvement, this figure is still overly pessimistic for simulations that permit sparse representations. As long as the simulation cost exceeds the ℓ1 -recovery cost (cf. Equation 3), CS will improve on this result. This reduction depends on the cost of A, which is dominated by the CS-matrix. For naive choices, such as Gaussian projections, these sampling matrices cost O(n3 ) for each frequency, which offers no gain. However, with our choice of fast O(n log n) projections with random convolutions [20], we are able to reduce this cost to O(n2 log n). Note that these costs are of the same order as those of calculating the sparsifying transforms. Now, the leading order cost of the ℓ1 recovery is reduced to O(n3 log n), which is significantly less than the cost of solving the full Helmholtz system, especially for large problems (n → ∞) and for extensions to d = 3. 7. Example To illustrate CS-recovery quality, we conduct a series of experiments for two velocity models, namely the complex model used in [10], and a simple single-layer model. These models generate seismic lines that differ in complexity. During these experiments, we vary the subsampling ratio and the frequency-to-shot subsampling ratio. All simulations are carried out with a fully parallel Helmholtz solver for a spread with 128 collocated shots and receivers sampled at a 30 m interval. The time sample interval is 0.004 s and the source function is a Ricker wavelet with a central frequency of 10 Hz. By solving Equation 3, we recover the full simulation for the two datasets. Comparison between the full and compressive simulations in Figure 2 shows remarkable high-fidelity results even for increasing SAMPTA'09 Subsample ratio 0.25 n′f /n′s 0.15 0.07 recovery error (dB) 2 1 0.5 14.3 18.2 22.2 12.1 14.5 16.5 8.6 10.2 10.7 Speed up (%) 400 670 1420 Table 1: Signal-to-noise ratios based on the complex model, e 2 dk SNR = −20 log10 ( kd− kdk2 ) for reconstructions with the curvelet-wavelet sparsity transform for different subsample and frequency-to-shot ratios.. 8. Discussion, extensions, and conclusions Compressive sampling (CS) can be considered a paradigm shift because objects of interest that exhibit transformdomain sparsity can be recovered from degrees of subsampling commensurate their sparsity. This new paradigm can be applied to reduce the computational complexity of solving PDEs that lie at the heart of PDE-constrained optimization problems. In this paper, we demonstrate that this principle leads to simultaneous source experiments that reduce the cost of computer simulations. Similar cost reductions are possible during actual acquisition in situations where we have control over the physical sources; such as during acquisition on land [14]. These results are exciting because CS decouples simulation- and acquisitionrelated costs from the discretization size. Instead, these costs depend on sparsity. Because the image space is even sparser after focusing seismic energy, we obtain further improvements when we extend CS principles to promote joint sparsity through mixed (1, 2)-norm minimization [11]. References [1] Craig J. Beasley. A new look at marine simultaneous sources. The Leading Edge, 27(7):914–917, 2008. [2] A. J. Berkhout. Changing the mindset in seismic data acquisition. The Leading Edge, 27(7):924–938, 2008. [3] E. J. Candès, L. Demanet, D. L. Donoho, and L. Ying. Fast discrete curvelet transforms. Multiscale Modeling and Simulation, 5:861– 899, 2006. [4] E.J. Candès, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics, 59(8):1207–1223, 2006. [5] D. L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289–1306, 2006. [6] Y. A. Erlangga and F. J. Herrmann. An iterative multilevel method for computing wavefields in frequency-domain seismic inversion. In SEG Technical Program Expanded Abstracts, volume 27, pages 1957–1960. SEG, November 2008. [7] Y A Erlangga and R Nabben. On multilevel projection Krylov method for the preconditioned Helmholtz system. 2007. Submitted for publication. 67 Figure 1: Compressive sampling with simultaneous sources. (a) Amplitude spectrum for the source signatures emitted by each source as part of the simultaneous-source experiments. These signatures appear noisy in the shot-receiver coordinates because of the phase encoding (cf. Equation 1). Observe that the frequency restrictions are different for each simultaneous source experiment. (b) CS-data after applying the inverse Fourier transform. Notice the noisy character of the simultaneous-shot interferences.. [8] Y A Erlangga, C Vuik, and C W Oosterlee. On a class of preconditioners for solving the Helmholtz equation. Applied Numerical Mathematics, 50:409–425, 2004. [9] Y A Erlangga, C Vuik, and C W Oosterlee. Comparison of multigrid and incomplete LU shifted-Laplace preconditioners for the inhomogeneous Helmholtz equation. Applied Numerical Mathematics, 56:648–666, 2006. [10] F. J. Herrmann, U. Boeniger, and D. J. Verschuur. Non-linear primary-multiple separation with directional curvelet frames. Geophysical Journal International, 170:781–799, 2007. [11] Felix J. Herrmann. Compressive imaging by wavefield inversion with group sparsity. Technical Report TR-2009-01, UBC-SLIM, 2009. [12] Felix J. Herrmann, Yogi A. Erlangga, and Tim T.Y. Lin. Compressive simultaneous full-waveform simulation. TR-2008-09. to appear in geophysics. 2009. [13] C.E. Krohn and R. Neelamani. Simultaneous sourcing without compromise. In Rome 2008, 70th EAGE Conference & Exhibition, page B008, 2008. [14] Tim T Y Lin and Felix J Herrmann. Designing simultaneous acquisitions with compressive sensing. In Amsterdam 2009, 71th EAGE Conference & Exhibition, 2009. [15] T.T.Y. Lin, E. Lebed, Y. A. Erlangga, and F. J. Herrmann. Interpolating solutions of the helmholtz equation with compressed sensing. In SEG Technical Program Expanded Abstracts, volume 27, pages 2122–2126. SEG, November 2008. [16] S. A. Morton and C. C. Ober. Faster shot-record depth migrations using phase encoding. In SEG Technical Program Expanded Abstracts, volume 17, pages 1131–1134. SEG, 1998. [17] W. Mulder and R. Plessix. How to choose a subset of frequencies in frequency-domain finite-difference migration. 158:801–812, 2004. [18] N. Neelamani, C. Krohn, J. Krebs, M. Deffenbaugh, and J. Romberg. Efficient seismic forward modeling using simultaneous random sources and sparsity. In SEG International Exposition and 78th Annual Meeting, pages 2107–2110, 2008. [19] C. D. Riyanti, Y. A. Erlangga, R.-E. Plessix, W. A. Mulder, C. Vuik, and C. Oosterlee. A new iterative solver for the time-harmonic wave equation. Geophysics, 71(5):E57–E63, 2006. [20] J. Romberg. Compressive sensing by random convolution. submitted, 2008. [21] L. A. Romero, D. C. Ghiglia, C. C. Ober, and S. A. Morton. Phase encoding of shot records in prestack migration. Geophysics, 65(2):426–436, 2000. [22] Laurent Sirgue and R. Gerhard Pratt. Efficient waveform inversion and imaging: A strategy for selecting temporal frequencies. Geophysics, 69(1):231–248, 2004. [23] E. van den Berg and M. P. Friedlander. Probing the pareto frontier for basis pursuit solutions. SIAM Journal on Scientific Computing, 31(2):890–912, 2008. [24] Denes Vigh and E. William Starr. 3d prestack plane-wave, fullwaveform inversion. Geophysics, 73(5):VE135–VE144, 2008. SAMPTA'09 Figure 2: Comparison between conventional and compressive simulations in for simple and complex velocity models. (a) Crossing-planes view of the seismic line for the simple model. (b) The same for the complex model. (c). Recovered simulation (with a SNR of 28.1 dB) for the simple model from 25 % of the samples with the ℓ1 -solver running to convergence. (d) The same but for the complex model now with a SNR of 18.2 dB.. 68 Computable Fourier Conditions for Alias-Free Sampling and Critical Sampling Yue M. Lu (1)(2) , Minh N. Do (2) and Richard S. Laugesen (2) (1) Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015, Lausanne, Switzerland (2) University of Illinois at Urbana-Champaign, Urbana IL 61801, USA yue.lu@epfl.ch, minhdo@illinois.edu, Laugesen@illinois.edu Abstract: We propose a Fourier analytical approach to the problems of alias-free sampling and critical sampling. Central to this approach are two Fourier conditions linking the above sampling criteria with the Fourier transform of the indicator function defined on the underlying frequency support. We present several examples to demonstrate the usefulness of the proposed Fourier conditions in the design of critically sampled multidimensional filter banks. In particular, we show that it is impossible to implement any cone-shaped frequency partitioning by a nonredundant filter bank, except for the 2-D case. 1 Introduction The search for alias-free sampling lattices for a given frequency support, and in particular for those lattices achieving minimum sampling densities, is a fundamental issue in various signal processing applications that involve the design of efficient acquisition schemes for bandlimited signals. As a special case of alias-free sampling, the concept of critical sampling also plays an important role in the theory and design of critically sampled (a.k.a. maximally decimated) multidimensional filter banks [9]. The study of alias-free (and critical) sampling lattices is a classical problem [8, 4]. So far, most existing work in the literature approaches the problem from a geometrical perspective: The primary tools employed include the theories from Minkowski’s work [2], as well as various geometrical intuitions and heuristics. In this paper, we propose a Fourier analytical approach to the problems of alias-free sampling and critical sampling. Central to this approach are two Fourier conditions linking the above sampling criteria with the Fourier transform of the indicator function defined on the underlying frequency support (see Theorem 1 and Proposition 2). An important feature of the proposed conditions is that they open the door to purely analytical and computational solutions to the sampling lattice selection problem. The rest of the paper is organized as follows. In Section 2, we briefly review some relevant concepts on sampling bandlimited signals. We present in Section 3 a novel condition linking the alias-free sampling (as well as critical sampling) with the Fourier transform of the indicator function defined SAMPTA'09 on the given frequency support. In Section 4, we present an application of the proposed Fourier conditions in the design of multidimensional nonredundant filter banks. We conclude the paper in Section 5. The material in this paper was presented in part in [5] and [7]. As a novel aspect, we present in this paper a different proof for Theorem 1, which provides important new insights into this key result. Notation: The Fourier transform of a function f (ω) defined on RN is defined by Z b f (x) = f (ω) e−2πj x·ω dω. (1) RN Calligraphic letters, such as D, represent bounded and open frequency domains in RN , with m(D) denoting the Lebesgue measure (i.e. volume) of D. Given a nonsingular matrix M and a vector τ , we use M (D + τ ) to represent the set of points of the form M (ω + τ ) for ω ∈ D. Finally, we denote by 1D (ω) the indicator function of the domain D, i.e., 1D (ω) = 1 if ω ∈ D and 1D (ω) = 0 otherwise. 2 Background In multidimensional multirate signal processing, the sampling operations are usually defined on lattices, each of which can be generated by an N × N nonsingular matrix M as def ΛM = {M n : n ∈ ZN }. (2) We denote by Λ∗M the corresponding reciprocal lattice (a.k.a. polar lattice), defined as def Λ∗M = {M −T ℓ : ℓ ∈ ZN } (3) In the rest of the paper, when it is clear from the context what the generating matrix is, we will drop the subscripts in ΛM and Λ∗M , and use Λ and Λ∗ for simplicity. Let f (x) be a continuous-domain signal, whose Fourier transform is bandlimited to a bounded open set D ⊂ RN . def The discrete-time Fourier transform of the samples s[n] = f (M n) is supported in [9] ! [ T S =M (D + k) . (4) k∈Λ∗ For appropriately chosen sampling lattices, the aliasing components in (4) do not overlap with the baseband frequency 69 support D. In this important case, we can fully recover the original continuous-domain signal f (x) by applying an ideal interpolation filter spectrally supported on D to the discrete samples s[n]. Definition 1 We say a frequency support D allows an aliasfree M -fold sampling, if different shifted copies of D in (4) are disjoint, i.e., D ∩ (D + k) = ∅ for all k ∈ Λ∗ \ {0} . (5) Furthermore, we say D can be critically sampled by M , if in addition to the alias-free condition in (5), the union of the shifted copies also covers the entire spectrum, i.e., [ (D + k) = RN , up to a set of measure zero. (6) Proof Consider the autocorrelation function Z RD (ω) = 1D (τ ) 1D (τ − ω) dτ . Clearly, RD (ω) ≥ 0 for all ω. Meanwhile, we can verify that supp RD (ω) = (D − D). Thus, we can apply Lemma 1 and obtain that, D allows an M -fold alias-free sampling if and only if Z X RD (k) = RD (0) = 1D (τ ) dτ = m(D). k∈Λ∗ Applying the Poisson summation formula to the above equality (see Appendix A of [7] for a justification of the pointwise equality), we have k∈Λ∗ m(D) = The focus of this work is to present two Fourier analytical conditions for alias-free sampling and critical sampling. Our discussions will be based on the following geometrical argument [2], which can be easily verified from (5). Proposition 1 The alias-free sampling condition in (5) is equivalent to requiring Λ∗ ∩ (D − D) = {0} , (7) def where D − D = {ω − τ : ω, τ ∈ D} is the Minkowski sum of the open set D and its negative −D. 3 Fourier Analytical Conditions In this section, we study the problems of alias-free sampling and critical sampling with Fourier techniques. The key observation is a link between the alias-free sampling condition and the Fourier transform of the indicator function 1D (ω) defined on the frequency support D. 31 Alias-Free Sampling Lemma 1 Let D be a frequency region, and f (ω) a positive function supported on (D − D), i.e., f (ω) > 0 for ω ∈ (D − D) and f (ω) = 0 otherwise. Then D allows an M fold alias-free sampling if and only if X f (k) = f (0). (8) k∈Λ∗ Proof By construction, (8) holds if and only if Λ∗ ∩ (D − D) = {0}. Applying Proposition 1, we are done. Theorem 1 A frequency region D allows an M -fold aliasfree sampling if and only if X b D (n)|2 = m(D), |M | |1 (9) n∈Λ b D (x) is the Fourier transform of 1D (ω), and |M | where 1 is the absolute value of the determinant of M . SAMPTA'09 X RD (k) = |M | X n∈Λ k∈Λ∗ bD (n). R (10) From the definition of RD (ω), its Fourier transform is bD (x) = |1 b D (x)|2 . Substituting this formula into (10), R we are done. 32 Critical Sampling Here we focus on the special case of critical sampling, and begin by mentioning, without proof, a standard result: Lemma 2 A frequency support D can be critically sampled by a sampling matrix M if and only if M is an aliasfree sampling matrix for D with sampling density 1/|M| = m(D). Proposition 2 A frequency support D can be critically sampled by a matrix M if and only if b D (0) = m(D) = 1 1 |M | b D (n) = 0 and 1 (11) for all n ∈ Λ \ {0}. Proof Suppose (11) holds. Then it follows that X b D (n)|2 = |1 b D (0)|2 = m(D) , |1 |M | n∈Λ and hence from Theorem 1, M is an alias-free sampling 1 matrix for D. Meanwhile, since m(D) = |M | , we can apply Lemma 2 to conclude that D is critically sampled by M . By reversing the above line of reasoning, we can also show the necessity of (11). Remark: The result of Proposition 2 is previously known in various disciplines. In approximation theory, the condition (11) is often called the interpolation property (see, for example, [4]). The usefulness of this condition in the context of lattice tiling was first pointed out by Kolountzakis and Lagarias [3] and applied to investigate the tiling of various high dimensional shapes. 70 33 Computational Aspects The Fourier conditions proposed in Theorem 1 and Proposition 2 can lead to practical computational algorithms for testing alias-free and critical sampling. Here, we briefly comment on two important computational aspects in applying the proposed conditions. First, as a prerequisite to using the proposed Fourier condibD (x). This evalutions, we must know the expression for 1 ation can be a cumbersome task if we need to do the derivation by hand for each given D. However, when the frequency regions D are arbitrary polygonal and polyhedral domains, b D (x) via the we can obtain the closed-form expressions for 1 divergence theorem [1, 7]. Another potential issue in practical implementations is that the Fourier conditions in (9) and (11) both involve an infinite number of lattice points. We show in [7] that the infinite sum in (9) can be well-approximated by a truncated finite sum. Moreover, with high probability, we actually only need to evaluate the Fourier transform on a very small number of points in a lattice (e.g. 4 points in 2-D) in order to show aliasing occurs, thus ruling out the lattice. 4 Application: Filter Bank Design In this section we present an application of Proposition 2 in the design of multidimensional critically sampled filter banks. 41 Frequency Partitioning of Critically Sampled Filter Banks Consider a general multidimensional filter bank, where each channel contains a subband filter and a sampling operator. As an important step in filter bank design, we need to specify the ideal passband support of each subband filter, all of which form a partitioning of the frequency spectrum. Not every possible frequency partitioning can be used for filter bank implementation though. In particular, if we want to have a nonredundant filter bank, then the ideal passband support of each subband filter must be critically sampled by the sampling matrix in that channel. Consequently, whenever given a possible frequency partitioning, we must first perform a “reality check” of seeing whether the above condition is met, before proceeding to actual filter design. The critical sampling condition is commonly verified geometrically (i.e. by drawing figures). Although intuitive and straightforward, this geometrical approach becomes cumbersome when the shape of the passband support is complicated, or when we work in 3-D and higher dimensional cases. Applying the result of Proposition 2, we propose in the following a computational procedure, which can systematically check and determine the critical sampling matrices of a given polytope region. Notice that the algorithm only searches among integer matrices, since the filter banks considered here operate on discrete-time signals. Procedure 1 Let D be a given polytope-shaped frequency support region. SAMPTA'09 ω2 0 1 ω3 ω2 0,3 0,1 0,0 5 3 4 4 3 5 2 1 (a) 1,3 0,2 0 2 1,2 1,1 1,0 3,3 3,2 3,1 3,0 1 ω1 2,3 2,2 2,1 2,0 ω2 ω1 ω1 1 3,0 3,1 3,2 0 0 3,3 2,0 2,1 2,2 2,3 (b) 1,0 1,1 1,2 1,3 0,0 0,1 0,2 0,3 (c) Figure 1: The ideal frequency partitioning of several filter banks. (a) A directional filter bank which decomposes the frequency cell (− 12 , 12 ]2 into 6 subbands. (b) A directional multiresolution frequency partitioning. (c) A 3-D directional frequency decomposition with pyramid-shaped passband supports. 1. Calculate δ = 1/m(D). From (11), any matrix M that can critically-sample D must satisfy |M | = δ. If δ is not an integer, then stop the procedure, since in this case it is impossible for D to be critically sampled by any integer matrix. b D (x). 2. Construct a closed-form formula [7] for 1 3. Based on the Hermite normal form, construct an exhaustive list of matrices of determinant δ, each corresponding to a distinct sampling lattice [7]. 4. For every matrix M in the above list, test the following condition b D (M n) = 0 for all n ∈ ZN \{0} with knk∞ ≤ r, 1 (12) where r is a large positive integer. 5. Present all the matrices in the list that satisfy (12). If there is no such matrix, then D cannot be critically sampled by any integer matrix. To be clear, the expression (12) is a necessary condition for D to be critically sampled by M . It is not sufficient since we only check for integer points within a finite radius r, and so in principle, even if M satisfies (12) for all b D (M n) 6= 0 for some n knk∞ ≤ r, it might happen that 1 with knk∞ > r. However, by choosing r sufficiently large, we can gain confidence in the validity of the original infinite condition (11) as required in Proposition 2. We leave the quantitative analysis of this approximation to [7]. In the following examples, we choose r = 10000. Example 1 Figure 1(a) presents the frequency decomposition of a directional filter bank (DFB). Applying the algorithm in Procedure 1, we can easily verify that this frequency decomposition can be critically sampled. The corresponding sampling matrices, denoted by M k for the kth subband, are   6 3 M0 = M1 = M2 = . 0 1 M 3 , M 4 and M 5 can be inferred by symmetry. Example 2 We show in Figure 1(b) a directional and multiresolution decomposition of the 2-D frequency spectrum. Applying Procedure 1 confirms that such a frequency partitioning can be critically sampled as well. The sampling 71 matrices for two representative subbands (marked as dark regions in the figure) are     4 0 8 4 M0 = and M 1 = . 0 4 0 4 Example 3 Figure 1(c) shows an extension of the original 2-D DFB to the 3-D case [6]. Applying Procedure 1, we find that the 3-D frequency partitioning shown in Figure 1(c) cannot be critically sampled; in other words, redundancy is unavoidable for a 3-D DFB. 42 Critical Sampling of General Cone-Shaped Frequency Regions in Higher Dimensions The result in Example 3 can be generalized to higher dimensions, and to cases where the subbands take different directional shapes. As an application of the Fourier condition in Proposition 2, we show here a much more general statement: it is impossible to implement any cone-shaped frequency partitioning by a nonredundant filter bank, except for the 2-D case. We consider the following ideal subband supports in N -D: D = {ω : a ≤ |ωN | ≤ b, (ω1 , . . . , ωN −1 ) ∈ ωN B}, (13) N −1 where B is some bounded set in R . Geometrically, D takes the form of a two-sided cone in RN , truncated by hyperplanes |ωN | = a and |ωN | = b, where 0 ≤ a < b. The “base” region B in (13) is the intersection between the cone and the hyperplane ωN = 1. The formulation in (13) is flexible enough to characterize, up to a rotation, any directional subband shown in Figure 1. For example, the 3-D pyramid-shaped subband (1, 1) in Figure 1(c) can be presented by a = 0, b = 12 , and B = [− 21 , 0]2 . However, the class of frequency shapes that can be described by (13) is far beyond those shown in Figure 1, since the formulation (13) allows for arbitrary configuration of the cross section heights a and b (not necessarily the dyadic decomposition as in Figure 1(b)) and arbitrary shape for the base B (not necessarily lines or squares). Lemma 3 If a frequency support D can be critically sampled by an integer matrix M , then bD (|M | n) = 0, for all n ∈ ZN \ {0}. 1 (14) Proof It is easy to verify that, for any integer matrix M , the vector |M | n belongs to the lattice Λ generated by M . The condition (14) then follows from (11) in Proposition 2. Theorem 2 For arbitrary choice of 0 ≤ a < b and the base shape B, the frequency domain support D given in (13) cannot be critically sampled by any integer matrix in N dimensions, N ≥ 3. Remark: For 2-D, we established the positive result in Examples 1 and 2. Proof We argue by contradiction. Suppose for N ≥ 3, and for some particular choices of 0 ≤ a < b and B, the corresponding frequency region D in (13) can be critically sampled by an integer matrix M . It then follows from (14) in SAMPTA'09 Lemma 3 that b D (0, . . . , 0, |M | n) = 0, for all n ∈ Z \ {0}. 1 (15) From the definition of D, we have bD (0, . . . , 0, x) 1   Z Z −2πj x ωN = dωN e 1 dω1 . . . dωN −1 a≤|ωN |≤b ωN B Z = e−2πj x ω m(ω B) dω a≤|ω|≤b Z = e−2πj x ω |ω|N −1 m(B) dω a≤|ω|≤b = 2 m(B) Z b ω N −1 cos(2πx ω) dω. a After a change of variable, we can now rewrite (15) as R 2π|M |b N −1 ω cos(n ω) dω = 0, for all n ∈ Z \ {0}, which 2π|M |a is impossible when N ≥ 3 by Appendix C of [7]. 5 Conclusions By linking the alias-free (and critical) sampling of a given frequency support region with the Fourier transform of the indicator function, we presented two simple yet powerful conditions for checking alias-free sampling and critical sampling. We demonstrated the usefulness of the proposed conditions in the design of multidimensional critically sampled filter banks. As an interesting result, we show that it is impossible to construct a nonredundant directional filter bank with a general cone-shaped frequency decomposition, except for the 2-D case. References: [1] L. Brandolini, L. Colzani, and G. Travaglini. Average decay of Fourier transforms and integer points in polyhedra. Ark. Mat., 35:253–275, 1997. [2] P. M. Gruber and C. G. Lekkerkerker. Geometry of Numbers. Elsevier Science Publishers, Amsterdam, second edition, 1987. [3] M. N. Kolountzakis and J. C. Lagarias. Tilings of the line by translates of a function. Duke Math. J., 82(3):653–678, 1996. [4] H. R. Künsch, E. Agrell, and F. A. Hamprecht. Optimal lattices for sampling. IEEE Trans. Inf. Theory, 51(2):634–47, Feb. 2005. [5] Y. M. Lu and M. N. Do. Finding optimal integral sampling lattices for a given frequency support in multidimensions. In Proc. IEEE Int. Conf. on Image Proc., San Antonio, USA, 2007. [6] Y. M. Lu and M. N. Do. Multidimensional directional filter banks and surfacelets. IEEE Trans. Image Process., 16(4):918– 931, April 2007. [7] Y. M. Lu, M. N. Do, and R. S. Laugesen. A computable Fourier condition generating alias-free sampling lattices. IEEE Trans. Signal Process., to appear, 2009. [8] D. P. Peterson and D. Middleton. Sampling and reconstruction of wavenumber-limited functions in N -dimensional Euclidean spaces. Inform. Contr., 5:279–323, 1962. [9] P. P. Vaidyanathan. Multirate Systems and Filter Banks. Prentice-Hall, Englewood Cliffs, NJ, 1993. 72 Analysis of Singularities and Edge Detection using the Shearlet Transform Glenn Easley (1) , Kanghui Guo (2) , and Demetrio Labate (3) (1) System Planning Corporation 1000 Wilson Boulevard, Arlington, VA 22209, USA. (2) Missouri State University, Springfield, MO 65804, USA. (3) University of Houston, 651 Phillip G Hoffman, Houston, TX 77204-3008, USA. geasley@sysplan.com, KanghuiGuo@MissouriState.edu, dlabate@math.uh.edu Abstract: The continuous curvelet and shearlet transforms have recently been shown to be much more effective than the traditional wavelet transform in dealing with the set of discontinuities of functions and distributions. In particular, the continuous shearlet transform has the ability to provide a very precise geometrical characterization of general discontinuity curves occurring in images. In this paper, we show that these properties are useful to design improved algorithms for the analysis and detection of edges. 1. Introduction One of the most useful properties of the wavelet transform is its ability to deal very efficiently with the discontinuities of functions and distributions. Consider, for example, a function f on R2 which is smooth except for a discontinuity at x0 ∈ R2 , and let Wψ f (a, t) be the continuous wavelet transform of f . This is defined as the mapping Z ¡ ¢ Wψ f (a, t) = a−1 f (x) ψ a−1 (x − t) dx, R2 where a > 0, t ∈ R2 and ψ ∈ L2 (R2 ) is an appropriate well-localized function. Then Wψ f (a, t) decays rapidly as a → 0 everywhere, unless t is near x0 [5]. Hence, the wavelet transform is able to signal the location of the singularity of f through its asymptotic decay at fine scales. It was recently shown that certain “directional” extensions of the wavelet transform have the ability to provide a much finer description of the set of singularities of a function. Namely, the recently introduced curvelet and shearlet transforms are able to identify not only the location of singularities of a function, but also the orientation of discontinuity curves. In particular, using the continuous shearlet transform, one can precisely characterize the geometrical information of general discontinuity curves, including discontinuity curves which contain irregularities such as corner and junction points. In this paper, we show that one can take advantage of the properties of the shearlet transform to design improved algorithms for the analysis and detection of edges in images. Indeed, multiscale techniques based on wavelets have a history of successful applications in the study of edges. With respect to traditional wavelets, the shearlet framework has the ability to capture directly the information about edge orientation and this is useful to improve the SAMPTA'09 robustness of edge detection algorithms in the presence of noise. The paper is organized as follows. In Section 2. we recall the definition of the shearlet transform and its main results concerning the analysis of edges. In Section 3. we present some representative numerical experiments of edge detection, comparing the shearlet approach against wavelets and other standard edge detection techniques. 2. The Shearlet Transform For a > 0 s ∈ R and t ∈ R2 , let Mas be the matrices µ √ ¶ a − as Mas = √ 0 a and, corresponding to those, let ψast (x) = 1 2 −1 (x − t)), where ψ ∈ L2 (R³ ). It is | det Mas |− 2 ψ(Mas a 0 ´ useful to notice that Mas = Bs Aa , where Aa = √ 0 a ³ 1 −s ´ and Bs = . Hence to each matrix Mas are 0 1 associated two distinct actions: an anisotropic dilation produced by the matrix Aa and a shearing produced by the non-expansive matrix Bs . For f ∈ L2 (R2 ), the continuous shearlet transform is defined as the mapping f → SHψ f (a, s, t) = hf, ψast i, a > 0, s ∈ R, t ∈ R2 . The generating function ψ is chosen to be a well localized function satisfying appropriate admissibility conditions [7, 4], so that each f ∈ L2 (R2 ) satisfies the generalized Calderòn reproducing formula: Z Z ∞Z ∞ da hf, ψast i ψast 3 ds dt. f= a R2 −∞ 0 The significance of the shearlet representation is that any function f is broken up with respect to well-localized analyzing elements defined not only at various scales and locations, as in the traditional multiscale approach, but also at various orientations associated with the shearing parameter s. Figure 1 shows the frequency support of the shearlet analyzing functions ψ̂ast for some values of s and a. Thanks to this directional multiscale decomposition, the continuous shearlet transform is able to precisely capture the geometry of edges through its asymptotic decay at fine 73 ξ2 (a, s) = otherwise, if s = s0 corresponds to one of the normal directions of Γ at t then 1 (a, s) = ( 32 , 1) ❅ ❅ ❘ 3 0 < lim a− 4 |SHψ B(a, s0 , t)| < ∞. + a→0 ( 14 , 0) ❅ ❅ ❘ ❅ ξ1 ✻ 1 , 0) (a, s) = ( 32 Thus, the continuous shearlet transform has rapid asymptotic decay, as a → 0, everywhere except for locations t on the edges and orientations s which are normal to the edges. We refer to [7, 4, 3] for additional detail, including a more precise description of the behavior at the corner points. We also refer to [1] for some similar (even if more restricted) results based on the curvelet transform. 2.1 Figure 1: Frequency support of same representative shearlet analyzing functions ψ̂ast . scales (a → 0). To precisely describe these properties, let us introduce the following model of images. S L Let Ω = [0, 1]2 and consider the partition Ω = n=1 Ωn ∪ Γ, where: 1. each “object” Ωn , for n = 1, . . . , L, is a connected open set; SL 2. the set of edges of Ω is given by Γ = n=1 ∂Ω Ωn , where each boundary ∂Ω Ωn is a piecewise smooth curve of finite length. Hence, we consider the space of images u ∈ I(Ω) of the form u(x) = L X un (x) χΩn (x) for x ∈ Ω\Γ n=1 where, for each n = 1, . . . , L, un ∈ C01 (Ω) has bounded partial derivatives, and the sets Ωn are pairwise disjoint in measure. We have the following result, which is a significant refinement with respect to the simple detection of singularities obtained using traditional wavelets. Theorem 2.1. Let f ∈ I(Ω). (i) If t ∈ / Γ, then, for each N ∈ N lim a−N SHψ f (a, s, t) = 0. a→0+ (ii) If t ∈ Γ is a regular point and s does not correspond to the normal direction of Γ at t then lim a−N SHψ B(a, s, t) = 0, a→0+ for all N > 0; otherwise, if s = s0 corresponds to the normal direction of Γ at t then 3 0 < lim a− 4 |SHψ B(a, s0 , t)| < ∞. + a→0 (iii) If t ∈ Γ is a corner point and s does not correspond to any of the normal directions of Γ at t, then 9 lim+ a− 4 |SHψ B(a, s, t)| < ∞; a→0 SAMPTA'09 Lipschitz regularity The notion of Lipschitz regularity is a method to quantitatively describe the local regularity of functions and distributions. Given α ≥ 0, a function f is Lipschitz α at x0 ∈ R2 if there exists a positive constant K and a polynomial px0 of degree m = ⌊α⌋ such that, for all x in a neighborhood of x0 : α |f (x) − px0 (x)| ≤ K |x − x0 | . (1) A function f is uniformly Lipschitz α over an open set Ω ⊂ R2 if there exists a constant K > 0, independent of x0 , such that the above inequality holds for all x0 ∈ Ω. If f is uniformly Lipschitz α > m in a neighborhood of x0 , then f is necessarily m times differentiable at x0 . Also notice that if 0 ≤ α < 1, then px0 = f (x0 ) and condition (1) becomes α |f (x) − f (x0 )| ≤ K |x − x0 | . If f is Lipschitz α with α < 1 at x0 , then f is not differentiable at x0 . The closer the Lipschitz exponent is to 0, the more “singular” the function is. If f is bounded but discontinuous at x0 , then it is Lipschitz 0 at x0 , indicating the presence of an edge. Also recall that if f (x) is Lipschitz α, then its primitive g(x) is Lipschitz α + 1 (the converse however is not true; that is, if a function is Lipschitz α at x0 , then its derivative need not be Lipschitz α - 1 at the same point). This observation explains the following definition which extends the concept of Lipschitz regularity to distributions. Let α be a real number. A tempered distribution f is uniformly Lipschitz α on Ω ⊂ R2 if its primitive is uniformly Lipschitz α + 1 on Ω ⊂ R2 . It follows that a distribution may have a negative Lipschitz exponent. For example, one can show that if f is a Dirac delta distribution centered at x0 , then f is Lipschitz -1 at x0 . We refer to [8] and to the references indicated there for more details. The function ψ satisfies the property that for each n ∈ N, there exists a constant cn > 0 such that |ψ(x)| ≤ cn (1 + |x|)−n for all x ∈ R2 (for details, seeR [4], p. 26). As a conRsequence, weα obtain kψk1 = R2 |ψ(x)| dx < ∞, and |ψ(x)||x| dx < ∞. R2 The following result (whose proof is reported in the appendix) is an adaptation of a similar theorem about the 74 continuous wavelet transform due to Jaffard [6]. If we R assume ψ has n vanishing moments, i.e. tk ψ(t) dt = 0 for all k = 0, . . . , n − 1, we would need to add the condition α ≤ n. However, the general construction of ψ implies that ψ has an infinite number of vanishing moments. Thus this assumption is unnecessary. Theorem 2.2. If f ∈ L2 (R2 ) is Lipschitz α > 0 at t0 , then there exists a constant C > 0 such that, for all a < 1, ¯ 1 ¯´ ³ 1 3 ¯ ¯ |SHψ f (a, s, t)| ≤ C a 2 (α+ 2 ) 1 + ¯a− 2 (t − t0 )¯ . The theorem can be extended to the case where f is a distribution. In addition, the estimation of the decay of the shearlet transform of the Dirac delta and other distributions was computed in [7]. These results show that, for locations t corresponding to delta-type singularities, the shearlet transform has a very different behavior from edge points. In fact, the amplitude of |SHψ f (a, s, t0 )| grows 1 like O(a− 4 ) as a → 0. Similarly, for spike singularities, one can show that the amplitude of the shearlet transform increases at fine scales. This shows that classification of points by their Lipschitz regularity is important as it can be used to distinguish true edge points from points corresponding to noise. This principle was already exploited, for example, in [8]. 3. Shearlet-based Edge Detection Taking advantage of the theoretical observations reported above, a discrete version of the shearlet transform was developed and applied to the purpose of locating and identifying edges in images. Because of space limitations, we will limit ourselves to presenting a few numerical demonstrations. A detailed account of the discrete shearlet transform and shearlet-based edge detection algorithms is found in [2, 10]. Figures 2 and 3 compare a shearlet-based edge detection routine against a wavelet-based routine using a consistent set of predetermined default parameters. For a base-line comparison against standard routines, we also used the Sobel and Prewitt methods using their default parameters. The results highlight the superior performance of the shearlet-based method. To assess the performance of the edge detector, we have given the value of the Pratt’s Figure of Merit (FOM), which is a fidelity measure ranging from 0 to 1, with 1 indicating a perfect edge detector [9]. Acknowledgments DL acknowledges partial support from NSF DMS 0604561 and DMS (Career) 0746778. 4. Appendix: Proof of Theorem 2.2. Proof of Theorem 2.2. Since f is Lipschitz α at t0 , there is a polynomial pt0 (x) and a constant K > 0 such that Since SHψ pt0 (a, s, t) = 0, then ≤ ≤ = ≤ ≤ ≤ |SHψ f (a, s, t)| Z −1 −3/4 |ψ(A−1 a a Bs (x − t))| |f (x) − pt0 (x)| dx R2 Z −1 α |ψ(A−1 K a−3/4 a Bs (x − t))| |x − t0 | dx 2 Z R 3/4 |ψ(y)| |t + Bs Aa y − t0 |α dy Ka R2 µ Z α 3/4 |ψ(y)| |y|α dy kBs kα kAa kα K2 a R2 ¶ Z α |ψ(y)| |t − t0 | dy + R2 µ Z α 3/4 K2 a C(s)α aα/2 |ψ(y)| |y|α dy R2 ¶ Z α |ψ(y)| dy + |t − t0 | R2 ´ ³ 1 3 C a 2 (α+ 2 ) 1 + |a−1/2 (t − t0 )|α . Here we have used the fact that kAa k = a1/2 , i.e. the largest eigenvalue of the matrix Aa . Similarly kBs k is the largest eigenvalue of the matrix Bs , which is 1. References: [1] E. J. Candès and D. L. Donoho, “Continuous curvelet transform: I. Resolution of the wavefront set”, Appl. Comput. Harmon. Anal., Vol. 19, pp. 162–197, 2005. 162–197. [2] G. Easley, D. Labate, and W-Q. Lim “Sparse Directional Image Representations using the Discrete Shearlet Transform”, Appl. Comput. Harmon. Anal. Vol. 25, pp. 25–46, 2008. [3] K. Guo and D. Labate, “Characterization and analysis of edges using the Continuous Shearlet Transform”, preprint, 2008 [4] K. Guo, D. Labate and W. Lim, “Edge analysis and identification using the continuous shearlet transform”, to appear in Appl. Comput. Harmon. Anal.. [5] M. Holschneider, Wavelets. Analysis tool, Oxford University Press, Oxford, 1995. [6] S. Jaffard “Pointwise smoothness, two-localization and wavelet coefficients”, Publicacions Mathematique, Vol. 35, pp. 155–168, 1991. [7] G. Kutyniok and D. Labate, “Resolution of the Wavefront Set using Continuous Shearlets”, Trans. Am. Math. Soc., Vol. 361 pp. 2719-2754, 2009. [8] S. Mallat and W. L. Hwang, Singularity detection and processing with wavelets, IEEE Trans. Inf. Theory, vol. 38, no. 2, 617-643, Mar. 1992. [9] W.K. Pratt, Digital Image Processing, Wiley Interscience Publications, 1978. [10] S. Yi, D. Labate, G. R. Easley, and H. Krim, “A Shearlet Approach to Edge Analysis and Detection”, to appear in IEEE Trans. Image processing, 2008. |f (x) − pt0 (x)| ≤ K |x − t0 |α . SAMPTA'09 75 Figure 2: Results of edge detection methods. From top left, clockwise: Original image, noisy image (PSNR=28.10 dB), Sobel result (FOM=0.24), shearlet result (FOM=0.44), wavelet result (FOM=0.29), and Prewitt result (FOM=0.23). Figure 3: Results of edge detection methods. From top left, clockwise: Original image, noisy image (PSNR=24.58 dB), Sobel result (FOM=0.15), shearlet result (FOM=0.45), wavelet result (FOM=0.27), and Prewitt result (FOM=0.15). SAMPTA'09 76 Discrete Shearlet Transform : New Multiscale Directional Image Representation Wang-Q Lim Department of Mathematics, University of Osnabrück, Osnabrück, Germany wlim@mathematik.uni-osnabrueck.de Abstract: It is now widely acknowledged that analyzing the intrinsic geometrical features of an underlying image is essentially needed in image processing. In order to achieve this, several directional image representation schemes have been proposed. In this report, we develop the discrete shearlet transform (DST) which provides efficient multiscale directional representation. We also show that the implementation of the transform is built in the discrete framework based on a multiresolution analysis. We further assess the performance of the DST in image denoising and approximation applications. In image approximation, our adaptive approximation scheme using the DST significantly outperforms the wavelet transform (up to 3.0dB) and other competing transforms. Also, in image denoising, the DST compares favorably with other existing methods in the literature. 1. Introduction Sharp image transitions or singularities such as edges are expensive to represent and intergrating the geometric regularity in the image representation is a key challenge to improve state of the art applications to image compression and denoising. To exploit the anisotropic regularity of a surface along edges, the basis must include elongated functions that are nearly parallel to the edges. Several image representations have been proposed to capture geometric image regularity. They include curvelets [1], contourlets [2] and bandelets [3]. In particular, the construction of curvelets is not built directly in the discrete domain and they do not provide a multiresolution representation of the geometry. In consequence, the implementation and the mathematical analysis are more involved and less efficient. Contourlets are bases constructed with elongated basis functions using a combination of a multiscale and a directional filter bank. However, contourlets have less clear directional features than curvelets, which leads to artifacts in denoising and compression. Bandelets are bases adapted to the function that is represented. Asymptotically, the resulting bandelets are regular functions with compact support, which is not the case for contourlets. However, in order to find bases adapted to an image, the bandelet transform searches for the optimal geometry. For an image of N pixels, the complexity of this SAMPTA'09 best bandelet basis algorithm is O(N 3/2 ) which requires extensive computation [3]. Recently, a new representation scheme has been introduced [4]. These so called shearlets are frame elements which yield (nearly) optimally sparse representations [5]. This new representation system is based on a simple and rigorous mathematical framework which not only provides a more flexible theoretical tool for the geometric representation of multidimensional data, but is also more natural for implementations. As a result, the shearlet approach can be associated to a multiresolution analysis [4]. However constructions proposed in [4] do not provide compactly supported shearlets and this property is essentially needed especially in image processing applications. In fact, in order to capture local singularities in images efficiently, basis functions need to be well localized in the spatial domain. In this report, we construct compactly supported shearlets and show that there is a multiresolution analysis associated with this construction. Based on this, we develop the fast discrete shearlet transform (DST) which provides efficient directional representations. 2. Shearlets A family of vectors {ϕn }n∈Γ constitutes a frame for a Hilbert space H if there exist two positive constants A, B such that for each f ∈ H we have X |hf, ϕn i|2 ≤ Bkf k2 . Akf k2 ≤ n∈Γ In the event that A = B, the frame is said to be tight. Let us next introduce some notations that we will use throughout this paper. For f ∈ L2 (Rd ), the Fourier transform of f is defined by Z ˆ f (x)e−2πix·ω dx. f (ω) = Rd Also, for t ∈ Rd and A ∈ GLd (R), we define the following unitary operators: Tt (f )(x) = f (x − t) and 1 DA (f )(x) = |A|− 2 f (A−1 x). Finally, for q ∈ ( 21 , 1] and a > 1, we define ¶ µ µ q ¶ a 0 1 1 and B0 = A0 = 1 0 1 0 a2 (1) 77 and A1 = µ 1 a2 0 0 aq ¶ and B1 = µ 1 1 ¶ 0 . 1 (2) We are now ready to define a shearlet frame as follows. For c ∈ R+ , ψ01 , . . . , ψ0L , ψ11 , . . . , ψ1L ∈ L2 (R2 ) and φ ∈ L2 (R2 ), we define i,0 Ψ0c = {ψjkm : j, k ∈ Z, m ∈ Z2 , i = 1, . . . , L}, i,1 Ψ1c = {ψjkm : j, k ∈ Z, m ∈ Z2 , i = 1, . . . , L}, Figure 1: Examples of shearlets in the spatial domain. The i,0 top row illustrates shearlet functions ψjk0 associated with matrices A0 and B0 in (1). The bottom row shows shearlet i,1 functions ψjk0 associated with matrices A1 and B1 in (2).. and Ψ2c = {Tcm φ : m ∈ Z2 } i,0 ∪{ψjkm : j ≥ 0, −2j ≤ k ≤ 2j , m ∈ Z2 , i = 1, . . . , L} i,1 ∪{ψjkm : j ≥ 0, −2j ≤ k ≤ 2j , m ∈ Z2 , i = 1, . . . , L} Theorem 3..1. [7] For i = 1, . . . , L, we define ψ0i (x1 , x2 ) = γ i (x1 )θ(x2 ) such that where ′ i,ℓ ψjkm = DA−j B −k Tcm ψℓi ℓ ℓ |ω1 |α |γ̂ (ω1 )| ≤ K1 (1 + |ω1 |2 )γ ′ /2 i (3) for ℓ = 0, 1, m ∈ Z2 , i = 1, . . . , L and j, k ∈ Z. If Ψpc i,ℓ is a frame for L2 (R2 ), then we call the functions ψjkm in the system Ψpc shearlets. i,ℓ Observe that each element ψjkm in Ψpc is obtained by applying an anisotropic scaling matrix Aℓ and a shear matrix Bℓ to fixed generating functions ψℓi . This implies that the system Ψpc can provide window functions which can be elongated along arbitrary directions. Therefore, the geometrical structures of singularities in images can be efficiently represented and analyzed using those window functions. In fact, it was shown that 2-dimensional piecewise smooth functions with C 2 -singularities can be approximated with nearly optimal approximation rate using shearlets. We refer to [5] for details. Furthermore, one can show that shearlets can completely analyze the singular structures of piecewise smooth images [6]. In fact, this property of shearlets is useful especially in signal and image processing, since singularities and irregular structures carry essential information in a signal. For example, discontinuities in the intensity of an image indicate the presence of edges. Figure 1 displays examples of shearlets which can be elongated along arbitrary direction in the spatial domain. 3. Construction of Shearlets In this section, we will introduce some useful sufficient conditions to construct compactly supported shearlets. Using these conditions, we will show that the system Ψpc can be generated by simple separable functions associated with a multiresolution analysis. Furthermore, this leads to the fast DST, and we will discuss this in the next section. We first discuss sufficient conditions for the existence of compactly supported shearlets.³ For this,´ let α > 1 max (1, (1 − p)γ) and γ > max α+1 be fixed p , 1−p positive numbers for 0 < p < 1. We choose α′ , γ ′ > 0 such that α′ ≥ α + γ and γ ′ ≥ α′ − α + γ. Then we obtain SAMPTA'09 the following results [7]. and |θ̂(ω1 )| ≤ K2 (1 + |ω1 |2 )−γ ′ /2 . If ess inf |ω1 |≤1/2 |θ̂(ω1 )|2 ≥ K3 > 0 (4) and ess inf a−q ≤|ω1 |≤1 L X i=1 |γ̂ i (ω1 )|2 ≥ K4 > 0, (5) then there exists c0 > 0 such that Ψ0c is a frame for L2 (R2 ) for all c ≤ c0 . Observe that the functions ψ01 , . . . , ψ0L are separable functions, and the one-dimensional scaling function θ and wavelets γ i can be chosen with sufficient vanishing moments in this case. We now show some concrete examples of compactly supported shearlets using Theorem 3.1. Assume that a = 4 and q = 1 in (1) and (2). Let us consider a box spline [1] of order m defined as follows. ³ sin πω ´m+1 1 e−iǫω1 , θ̂m (ω1 ) = πω1 where ǫ = 1 if m is even, and ǫ = 0 if m is odd. Obviously, we have the following two scaling equation: θ̂m (2ω1 ) = m0 (ω1 )θ̂m (ω1 ) and m0 (ω1 ) = (cos πω1 )m+1 e−iǫπω1 . Let α′ and γ ′ be positive real numbers as in Theorem 3.1. We now define ´ℓ √ ³ ψ̂01 (ω) = (i)ℓ 2 sin πω1 θ̂m (ω1 )θ̂m (ω2 ) and ³ ω1 πω1 ´ℓ ψ̂02 (ω) = (i)ℓ sin θ̂m ( )θ̂m (ω2 ), 2 2 78 where ℓ ≥ α′ and m + 1 ≥ γ ′ . Then, by Theorem 3.1, ψ01 and ψ02 generate a frame Ψ0c for c ≤ c0 with some c0 > 0. There are infinitely many possible choices for ℓ and m. For example, one can choose ℓ = 9 and m = 11. Define φ(x1 , x2 ) = θm (x1 )θm (x2 ), ´ℓ √ ³ ψ̂11 (ω) = (i)ℓ 2 sin πω2 θ̂m (ω2 )θ̂m (ω1 ) and ³ πω2 ´ℓ ω2 ψ̂12 (ω) = (i)ℓ sin θ̂m ( )θ̂m (ω1 ). 2 2 Figure 2: Examples of anisotropic discrete wavelet decomposition: (a) Anisotropic discrete wavelet decomposition by W, (b) Anisotropic discrete wavelet decomposif tion by W. Then similar arguments show that ψ11 and ψ12 generate a frame Ψ1c for c ≤ c0 with some c0 > 0. Furthermore, the functions φ, ψℓi for ℓ = 0, 1 and i = 1, 2 generate a frame Ψ2c with c ≤ c0 for some c0 > 0. where fJ (n) = h f, D2−J I2 Tn φ i. For h = 0, 1, let us define maps Dhk,j : ℓ2 (Z2 ) → ℓ2 (Z2 ) by X k,j (Dhk,j x)(d) = dh (d, m)x(m) 4. Discrete Shearlet Transform where dk,j h (d, m) = h D In the previous section, we constructed compactly supported shearlets generated by separable functions associated with a multiresolution analysis. In this section, we will show that this multiresolution analysis leads to the i,ℓ fast DST which computes hf, ψjkm i. To be more specific, we let a = 4 and q = 1 in (1) and (2). For notational convenience, we let n = (n1 , n2 ), m = (m1 , m2 ), d = (d1 , d2 ) ∈ Z2 and I2 be a 2 by 2 identity matrix. Let θ ∈ L2 (R) be a compactly supported function such that {θ(· − n1 ) : n1 ∈ Z} is an orthonormal sequence and X √ (6) h(n1 ) 2θ(2x1 − n1 ). θ(x1 ) = n1 ∈Z Define γ(x1 ) = X n1 ∈Z √ g(n1 ) 2θ(2x1 − n1 ) (7) such that γ has sufficient vanishing moments and the pair of the filters h and g is a pair of conjugate mirror filters. We assume that γ and θ satisfy decay conditions (4) and (5) in Theorem 3.1. We also define φ(x1 , x2 ) = θ(x1 )θ(x2 ), ψℓ1 (x1 , x2 ) = γ(xℓ+1 )θ(x2−ℓ ) and (8) xℓ+1 )θ(x2−ℓ ) (9) 2 for ℓ = 0, 1. Then Theorem 3.1 can be easily generalized to show that the functions ψ01 , ψ02 , ψ11 , ψ12 and φ generate a shearlet frame Ψ2c with c < c0 for some c0 > 0. Let J be a positive odd integer. Based on a multiresolution analysis associated with the two-scale equation (6), we can now easily derive a fast algorithm for computing shearlet i,ℓ coefficients hf, ψjkm i for ℓ = 0, 1,j = 1, . . . , J−1 2 , and −2j ≤ k ≤ 2j as follows. First, assume that X fJ (n)D2−J I2 Tn φ (10) SAMPTA'09 f = 1 ψℓ2 (x1 , x2 ) = 2− 2 γ( n∈Z2 m∈Z2 Also we define k/2j Bh X H(ω1 ) = Tm φ, Td φ i and x ∈ ℓ(Z2 ). h(n1 )e−2iπω1 n1 and G(ω1 ) = X g(n1 )e−2iπω1 . n1 hj , gj0 Finally, we let and gj1 be the Fourier coefficients of  QJ−j−1 ³ k ´   for J − j > 0, H 2 ω2 H (ω ) =  j 2 k=0 QJ−2j−2 0 k Gj (ω1 ) = k=0 H(2 ω1 )G(2J−2j−1 ω1 ),  Q  G1 (ω ) = J−2j−1 H(2k ω )G(2J−2j ω ), 1 1 1 j k=0 (11) respectively. Then we obtain  1,0 hf, ψjkm i = (((D0k,j fJ ) ∗r hj )↓2J−j ∗c g 0j )↓2J−2j (m),    hf, ψ 2,0 i = (((Dk,j f ) ∗ h ) J−j ∗ g 1 ) J−2j+1 (m), J r j ↓2 c j ↓2 0 jkm k,j 1,1 0 hf, ψjkm J−j ∗r g )↓2J−2j (m), h ) i = (((D f ) ∗ j J c  ↓2 j 1   2,1 k,j hf, ψjkm i = (((D1 fJ ) ∗c hj )↓2J−j ∗r g 1j )↓2J−2j+1 (m), (12) where ∗c and ∗r are convolutions along the vertical and horizontal axes respectively, ↓ 2j is the downsampling by 2j and h(n) = h(−n) for given filter coefficients h(n). From (12), we observe that the shearlet transform i,ℓ hf, ψjkm i is the application of the shear transformation D k/2j to f ∈ L2 (R2 ) followed by the wavelet transform Bℓ associated with anisotropic scaling matrix Aℓ . In this case, applying Dℓk,j to fJ ∈ ℓ2 (Z2 ) corresponds to applying the shear transform D k/2j in the discrete domain. Thus Bℓ we simply replace the operator Dℓk,j by the discrete shear ℓ transform Pk,j for fJ ∈ ℓ2 (Z2 ), where we define the dis0 1 crete shear transforms Pk,j and Pk,j as follows: ( ¡ ¢ 0 (Pk,j fJ )(n) = fJ n1 + ⌊(k/2j )n2 ⌋, n2 , ¡ ¢ (13) 1 (Pk,j fJ )(n) = fJ n1 , n2 + ⌊(k/2j )n1 ⌋ . 1 0 Let M be a fixed positive integer. Since Pk,j and Pk,j 79 2 are unitary operators on ℓ(Z ), we can extend the shearlet transform defined in (12) to a linear transform S consisting of finitely many orthogonal transforms SkM and S̃kM where 0 f 1 (fJ ) SkM (fJ ) = WPk,M (fJ ) and S̃kM (fJ ) = WP k,M f are the wavelet transform associated with and W and W an anisotropic sampling matrices A0 and A1 , respectively. f we refer to [7]. For the precise definitions of W and W, In this case, the linear transform S, which we call DST, is defined by M M M M S = (S−2 M , . . . , S2M , S̃−2M , . . . , S̃2M ) for a given M ∈ Z+ . Notice that redundancy of the DST is K = 2M +2 + 2 and the DST merely requires O(KN ) operations for an image of N pixels. It is obvious that the inverse DST is simply the adjoint of S with normalization. 5. Image Approximation Using DST In this section, we present some results of the DST in image compression applications. In this case, we use adaptive image representation using the DST. The main idea of this is similar to the matching pursuit introduced by Mallat and Zhong [8]. The matching pursuit selects vectors one by one from a given basis dictionary at each iteration step. On the other hand, our approximation scheme searches the optimal directional index k0 at each iteration step so that corresponding the orthogonal transform SkM0 or S̃kM0 provides an optimal nonlinear approximation with P nonzero terms among all possible 2M +2 + 2 orthogonal transforms in S. For a detailed description of this algorithm, we refer to [7]. For numerical tests, we compare the performance of the DST to other transforms such as the discrete biorthogonal CDF 9/7 wavelet transform (DWT)[9] and contourlet transform (CT)[2] in image compression (see Figure 3). We used only 2 directions (horizontal and vertical) and 4 level decomposition for our DST. In this case, our numerical tests indicate that only a few iterations (15) can give significant improvement over other transforms and computing time is comparable to the wavelet transform. For more results, we refer to [8]. Figure 3: Compression results of ’Barbara’ image of size 512 × 512: The image is reconstructed from 5024 most significant coefficients. Top left: Zoomed original image, Top right: Zoomed image reconstructed by the DWT (PSNR = 25.11), Bottom left: Zoomed image reconstructed by the CT (PSNR = 25.88), Bottom right: Zoomed image reconstructed by the DST with only 1 iteration step (PSNR = 26.73). References: [1] [2] [3] [4] 6. Conclusion [5] We have constructed compactly supported shearlet systems which can provide efficient directional image representations. We also have developed the fast discrete implementation of shearlets called the DST. This algorithm consists of applying the shear transforms in the discrete domain followed by the anisotropic wavelet transforms. Applications of our proposed transform in image approximation and denoising were studied. In image approximation, the results obtained with our adaptive image representation using the DST are significantly superior to those of other transforms such as the DWT and CT both visually and with respect to PSNR. In denoising, we studied the performance of the DST coupled with a (partially) translation invariant hard tresholding estimator. Our results indicate that the DST consistently outperforms other competing transforms. For deSAMPTA'09 tailed numerical results, we refer to [7]. [6] [7] [8] [9] E. Candes and D. Donoho, ”New tight frames of curvelets and optimal representations of objects with piecewise C 2 singularities,” Commun. Pure Appl. Math, vol. 57, no. 2, pp. 219-266, Feb. 2004. M. Do and M. Vetterli, ”The contourlet transform: An efficient directional multiresolution image representation,” IEEE Trans. Image Process., vol. 14, no. 12, pp. 2091-2106, Dec. 2005. G. Peyre and S. Mallat, ”Discrete Bandelets with Geometric Orthogonal Filters,” Proceedings of ICIP, Sept. 2005. D. Labate, W. Lim, G. Kutyniok and G. Weiss ”Sparse Multidimensional Representation using Shearlets”, Proc. of SPIE conference on Wavelet Applications in Signal and Image Processing XI, San Diego, USA, 2005. K. Guo and D. Labate, ”Optimally Sparse Multidimensional Representation using Shearlets,”, SIAM J Math. Anal., 39 pp. 298-318, 2007. K. Guo, D. Labate and W. Lim, ”Edge Analysis and identification using the Continuous Shearlet Transform”, to appear in Appl. Comput. Harmon. Anal. W. Lim, ”Compactly Supported Shearlet Frames and Their Applications”, submitted. S. Mallat and S. Zhang, ”Matching Pursuits With Time-Frequency Dictionaries,” IEEE Trans. Signal Process., pp. 3397-3415, Dec. 1993. A. Cohen, I. Daubechies and J. Feauveau, ”Biorthogonal bases of compactly supported wavelets,” Commun. on Pure and Appl. Math., 45:485-560, 1992. 80 Image Approximation by Adaptive Tetrolet Transform Jens Krommweh Department of Mathematics, University of Duisburg-Essen, Campus Duisburg, 47048 Duisburg, Germany. jens.krommweh@uni-due.de Abstract: In order to get an efficient image representation we introduce a new adaptive Haar wavelet transform, called Tetrolet Transform. Tetrolets are Haar-type wavelets whose supports are tetrominoes which are shapes made by connecting four equal-sized squares. The corresponding filter bank algorithm is simple but enormously effective. Numerical results show the strong efficiency of the tetrolet transform for image compression. 1. Introduction The main task in every kind of image processing is finding an efficient image representation that characterizes the significant image features in a compact form. In the last years a lot of methods have been proposed to improve the treatment with orientated geometric image structures. Curvelets [1], contourlets [2], shearlets [5], and directionlets [10] are wavelet systems with more directional sensitivity than classical tensor product wavelets. Instead of choosing a priori a basis or a frame one may adapt the function system depending on the local image structures. Wedgelets [3] and bandelets [7] stand for this second class of image representation schemes which is a wide field of further research. Very recent approaches are the grouplets [8] or the EPWT [9] which are based on an averaging in adaptive neighborhoods of data points. In [6] we have introduced a new adaptive algorithm whose underlying idea is similar to the idea of digital wedgelets where Haar functions on wedge partitions are considered. We divide the image into 4 × 4 blocks, then we determine in each block a tetromino partition which is adapted to the image geometry in this block. Tetrominoes are shapes made by connecting four equal-sized squares, each joined together with at least one other square along an edge. On these geometric shapes we define Haar-type wavelets, called tetrolets, which form a local orthonormal basis. The main advantage of Haar-type wavelets is the lack of pseudo-Gibbs artifacts. The corresponding filter bank algorithm decomposes an image into a compact representation. The tetrolet transform is also very efficient for compression of real data arrays. SAMPTA'09 2. The Adaptive Tetrolet Transform 2.1 Definitions and Notations Let be I = {(i, j) : i, j = 0, . . . , N − 1} ⊂ Z2 the index set of a digital image a = (a[i, j])(i,j)∈I with N = 2J , J ∈ N. We determine a 4-neighborhood of an index (i, j) ∈ I by N4 (i, j) := {(i − 1, j), (i + 1, j), (i, j − 1), (i, j + 1)}. An index that lies at the boundary has three neighbors, an index at the vertex of the image has two neighbors. A set E = {I0 , . . . , Ir }, r ∈ N, of subsets Iν ⊂ I is a!disjoint partition of I if Iν ∩ Iµ = ∅ for ν '= µ and r ν=0 Iν = I. In this paper we consider disjoint partitions of the index set I that satisfy two conditions for all Iν : 1. each subset Iν contains four indices, i.e. #Iν = 4, 2. every index of Iν has a neighbor in Iν , i.e. ∀(i, j) ∈ Iν ∃(i" , j " ) ∈ Iν : (i" , j " ) ∈ N4 (i, j). We call such subsets Iν tetromino, since the tiling prob2 lem of the square [0, N ) by shapes called tetrominoes is a well-known problem being closely related to our partitions of the index set I = {0, 1, . . . , N − 1}2 . We shortly introduce this tetromino tiling problem in the next subsection. 2.2 Tilings by Tetrominoes Tetrominoes were introduced by Golomb in [4]. They are shapes formed from a union of four unit squares, each connected by edges, not merely at their corners. The tiling problem with tetrominoes became popular through the famous computer game classic ’Tetris’. Disregarding rotations and reflections there are five different shapes, the so called free tetrominoes, see Figure 1. 2 It is clear that every square [0, N ) can be covered by tetrominoes if and only if N is even. But the number of different coverings explodes with increasing N . There are 117 solutions for disjoint covering of a 4 × 4 board with four tetrominoes. As represented in Figure 2, we have 22 Figure 1: The five free tetrominoes. 81 In other words, we first divide the index set I of an im2 age a into N16 squares Qi,j and then we consider the admissible tetromino partitions there. Among the 117 solutions we compute an optimal partition in each image block such that the wavelet coefficients defined on the tetrominoes have minimal l1 -norm. 3. Detailed Description of the Algorithm Figure 2: The 22 fundamental forms tiling a 4 × 4 board. Regarding additionally rotations and reflections there are 117 solutions. fundamental configurations (disregarding rotations and reflections). One solution (first line) is unaltered by rotations and reflections, four solutions (second line) give a second version applying the isometries. Seven forms can occur in four orientations (third line), and ten asymmetric cases in eight directions (last line). 2.3 The Idea of Tetrolets In the two-dimensional classical Haar case, the low-pass filter and the high-pass filters are just given by the averaging sum and the averaging differences of each four pixel values which are arranged in a 2 × 2 square, i.e., with Ii,j = {(2i, 2j), (2i + 1, 2j), (2i, 2j + 1), (2i + 1, 2j + 1)} for i, j = 0, 1, . . . , N2 − 1, we have a dyadic partition E = {I0,0 , . . . , I N −1, N −1 } of the image index set I. Let 2 2 L be a bijective mapping which maps the four pixel pairs (i, j) to the scalar set {0, 1, 2, 3}, that means it brings the pixels into a unique order. Then we can determine the lowN −1 2 pass part a1 = (a1 [i, j])i,j=0 as well as the three high-pass N 2 −1 parts wl1 = (wl1 [i, j])i,j=0 for l = 1, 2, 3 with a1 [i, j] = " "[0, L(i" , j " )] a[i" , j " ] " "[l, L(i" , j " )] a[i" , j " ], (1) (2) where the coefficients "[l, m], l, m = 0, . . . , 3, are entries from the Haar wavelet transform matrix   1 W := 1 1 1 1 1 1 −1 −1 =  . 2 1 −1 1 −1 1 −1 −1 (3) Going into detail our main attention shall be turned to step 2 of the algorithm where the adaptivity comes into play. −1 We start with the input image a0 = (a[i, j])N i,j=0 with J N = 2 , J ∈ N. In the rth-level, r = 1, . . . , J − 1, we apply the following computations. 1. Divide the low-pass image ar−1 into blocks Qi,j of size 4 × 4, i, j = 0, . . . , 4Nr − 1. 2. In each block Qi,j we compute analogously to (1) and (2) the pixel averages for every admissible tetromino covering c = 1, . . . , 117 by " ar,(c) [s] = "[0, L(m, n)] ar−1 [m, n], as well as the three high-pass parts for l = 1, 2, 3 " r,(c) "[l, L(m, n)] ar−1 [m, n], wl [s] = (c) (m,n)∈Is s = 0, . . . , 3, where the coefficients are given in (3) and L is the mapping mentioned above. Then we choose the covering c∗ such that the l1 -norm of the tetrolet coefficients becomes minimal 1 Obviously, the fixed blocking by the dyadic squares Ii,j is very inefficient because the local structures of an image are disregarded. Our idea is, to allow more general partitions such that the local image geometry is taken into account. Namely, we use tetromino partitions. As described in the previous subsection we shall restrict us to 4 × 4 blocks. This leads to a third condition for the desired disjoint partition E of the index set I introduced in Section 2.1: 3. Each 4 × 4 square Qi,j := {4i, . . . , 4i + 3} × {4j, . . . , 4j + 3}, i, j = 0, 1, . . . , N4 − 1, is covered by four subsets (tetrominoes) I0 , . . . , I3 . SAMPTA'09 Table 1: Adaptive tetrolet decomposition algorithm. (c) (i! ,j ! )∈Ii,j ("[l, m])3l,m=0 Adaptive Tetrolet Decomposition Algorithm −1 J Input: Image a = (a[i, j])N i,j=0 with N = 2 , J ∈ N. 1. Divide the image into 4 × 4 blocks. 2. Find in each block the sparsest tetrolet representation. 3. Rearrange the low- and high-pass coefficients of each block into a 2 × 2 block. 4. Store the tetrolet coefficients (high-pass part). 5. Apply step 1 to 4 to the low-pass image. Output: Decomposed image ã. (m,n)∈Is (i! ,j ! )∈Ii,j wl1 [i, j] = The rough structure of the tetrolet filter bank algorithm is described in Table 1. c∗ = arg min c 3 " 3 " r,(c) |wl [s]|. (4) l=1 s=0 Hence, for every block Qi,j we get an optimal tetro∗ r,(c∗ ) r,(c∗ ) r,(c∗ ) let decomposition [ar,(c ) , w1 , w2 , w3 ]. By doing this, the local structure of the image block is adapted. The best configuration c∗ is a covering whose tetrominoes do not intersect an important structure like an edge in the image ar−1 . Because the tetrolet coefficients become as minimal as possible a sparse image representation will be obtained. We have to store for each block Qi,j which covering c∗ has been chosen, since this information is necessary for reconstruction. 82 3. In order to be able to apply further levels of the tetrolet decomposition algorithm, we rearrange the entries ∗ r,(c∗ ) of the vectors ar,(c ) and wl into 2 × 2 matrices, ) r,(c∗ ) * ∗ a [0] ar,(c ) [2] r ∗ ∗ , a|Q = i,j ar,(c ) [1] ar,(c ) [3] and in the same way wl|r Q i,j for l = 1, 2, 3. 4. After finding a sparse representation in every block Qi,j for i, j = 0, . . . , 4Nr − 1, we store (as usually done) the low-pass matrix ar and the high-pass matrices wlr , l = 1, 2, 3, replacing the low-pass image ar−1 by the matrix ) r * a w2r . w1r w3r After a suitable number of decomposition steps, one can apply a shrinkage to the tetrolet coefficients in order to get a sparse image representation. 4. An Orthonormal Basis of Tetrolets We describe the discrete basis functions which correspond to the above algorithm. Remember that the digital image a = (a[i, j])(i,j)∈I is a subset of l2 (Z2 ). For any tetromino Iν of I we define the discrete functions + 1/2, (m, n) ∈ Iν , φIν [m, n] := 0, else, + "[l, L(m, n)], (m, n) ∈ Iν , ψIl ν [m, n] := 0, else. Due to the underlying tetromino support, we call φIν and ψIl ν tetrolets. As a straightforward consequence of the orthogonality of the standard 2D Haar basis functions and the disjoint partition of the discrete space by the tetromino supports, we have the following essential statement. Theorem 1 For every admissible covering {I0 , I1 , I2 , I3 } of a 4 × 4 square Q ⊂ Z2 the tetrolet system {φIν : ν = 0, 1, 2, 3} ∪ {ψIl ν : ν = 0, 1, 2, 3; l = 1, 2, 3} is an orthonormal basis of l2 (Q). 5. Cost of Adaptivity: Modified Tetrolet Transform We will address the costs of storing additional adaptivity information. Our observations will lead to some relaxed versions of the tetrolet transform in order to reduce these costs. It is well known that a vector of length N and with entropy E can be stored with N · E bits. Hence, the entropy describes the required bits per pixel (bpp) and is an appropriate measure for the quality of compression. In the following, we propose three methods of entropy reduction in order to reduce the adaptivity costs. An application of these modified transforms as well as of combinations of them is given in the last section. SAMPTA'09 The simplest approach of entropy reduction is reduction of the symbol alphabet. The tetrolet transform uses the alphabet {1, . . . , 117} for the chosen covering in each image block. If we restrict ourselves to 16 essential configurations that feature different directions we considerably reduce the entropy as well as the computation time. A second approach to reduce the entropy is to change the distribution of the symbols. Relaxing the tetrolet transform we could ensure that only very few tilings are preferred. Hence, we allow the choice of an almost optimal covering c∗ in (4) in order to get a tiling which is already frequently chosen. More precisely, we replace (4) by the two steps: 1. Find the set of almost optimal configurations that satisfy 3 " 3 " l=1 s=0 r,(c) |wl [s]| ! min c 3 " 3 " r,(c) |wl [s]| + θ l=1 s=0 with a predetermined tolerance parameter θ. 2. Among these tilings choose the covering c which is chosen most frequently in the previous image blocks. Using an appropriate relaxing parameter θ, we achieve a satisfactory balance between low entropy (low adaptivity costs) and minimal tetrolet coefficients. The third method also reduces the entropy by optimization of the tiling distribution. After an application of an edge detector we use the classical Haar wavelet transform inside flat image regions. In the image blocks that contain edges we make use of the strong adaptivity of the proposed tetrolet transform. More details of the modified versions can be found in [6]. 6. Numerical Experiments We apply a complete wavelet decomposition of an image and use a shrinkage with global hard-thresholding. The detail ’monarch’ image in Figure 3 shows the enormous efficiency in handling with several directional edges due to the high adaptivity. It can be well noticed that the tetrolet transformation gives excellent results for piecewise constant images. Though the tetrolets are not continuous the approximation of the ’cameraman’ image in Figure 4 illustrates that even for natural images the tetrolet filter bank outperforms the tensor product wavelets with the biorthogonal 9-7 filter bank, since no pseudo-Gibbs phenomena occur. This confirms the fact already noticed with wedgelets [3] and bandelets [7]: While nonadaptive methods need smooth wavelets for excellent results, well constructed adaptive methods need not. See [6] for more numerical examples. Considering the adaptivity costs we compare the standard tetrolet transform with its modified versions. Of course, reduction of adaptivity cost produces a loss of approximation quality. Hence, a satisfactory balance is necessary. For a rough estimation of the complete storage costs of the compressed image with N 2 pixels we apply a simplified scheme costf ull = costW + costP + costA , 83 10 10 20 20 30 30 40 40 50 50 60 50 50 100 100 150 150 200 200 60 250 10 20 30 40 50 60 10 10 20 30 40 50 250 50 60 100 150 200 250 50 100 150 200 250 50 100 150 200 250 10 20 20 30 30 40 40 50 50 60 60 50 50 100 100 150 150 200 200 250 250 50 10 20 30 40 50 60 10 20 30 40 50 100 150 200 250 60 Figure 3: Approximation with 256 coefficients. (a) Input, (b) classical Haar, PSNR 18.98, (c) Biorthogonal 9-7, PSNR 21.78, (d) Tetrolets, PSNR 24.43. Figure 4: Approximation with 2048 coefficients. (a) Input, (b) classical Haar, PSNR 25.47, (c) Biorthogonal 9-7, PSNR 27.26, (d) Tetrolets, PSNR 29.17. References: where costW = 16 · M/N 2 are the costs in bpp of storing M non-zero wavelet coefficients with 16 bits. The term costP gives the cost for coding the position of these M co2 M N 2 −M M log2 ( N N−M ). The efficients by − N 2 log2 ( N 2 ) − 2 N2 third component appearing only with the tetrolet transform contains the cost of adaptivity, costA = E · R/N 2 , for R adaptivity values and the entropy E previously discussed. Table 2 presents some results for the monarch detail image (Fig. 3) where different versions of the tetrolet transform are compared with the tensor product wavelet transformation regarding to quality and storage costs. We have tried to balance the modified tetrolet transform such that the full costs are in the same scale as with the 9-7 filter. For the relaxed versions we have used the parameter θ = 25. Tensor Haar Tensor 9-7 filter Tetrolet Tetro 16 Tetro rel Tetro edge Tetro 16 edge rel coeff 300 300 256 256 256 256 256 PSNR 19.58 22.62 24.43 23.56 24.51 24.24 23.48 entropy 0.53 0.30 0.32 0.43 0.21 costf ull 1.55 1.55 1.86 1.64 1.66 1.77 1.55 Table 2: Comparison between tensor wavelet transforms and the different versions of the tetrolet transform regarding quality (PSNR) and storage cost (costf ull in bpp). 7. Acknowledgments The research is funded by the project PL 170/11-1 of the Deutsche Forschungsgemeinschaft (DFG). This is gratefully acknowledged. SAMPTA'09 [1] E.J. Candes and D.L. Donoho. New tight frames of curvelets and optimal representations of objects with piecewise C 2 singularities. Communications on Pure and Applied Mathematics, 57(2):219–266, 2004. [2] M.N. Do and M. Vetterli. The contourlet transform: an efficient directional multiresolution image representation. IEEE Transactions on Image Processing, 14(12):2091–2106, 2005. [3] D.L. Donoho. Wedgelets: Nearly-minimax estimation of edges. Annals of Statistics, 27(3):859–897, 1999. [4] S.W. Golomb. Polyominoes. Princeton University Press, 1994. [5] K. Guo and D. Labate. Optimally sparse multidimensional representation using shearlets. SIAM Journal on Mathematical Analysis, 39(1):298–318, 2007. [6] Jens Krommweh. Tetrolet transform: A new adaptive Haar wavelet algorithm for sparse image representation. 2009. [7] E. Le Pennec and S. Mallat. Sparse geometric image representations with bandelets. IEEE Transactions on Image Processing, 14(4):423–438, 2005. [8] S. Mallat. Geometrical grouplets. Applied and Compu-tational Harmonic Analysis, 26(2):161–180, 2009. [9] G. Plonka. Easy path wavelet transform: A new adaptive wavelet transform for sparse representation of two-dimensional data. Multiscale Modeling and Simulation, 7(3):1474–1496, 2009. [10] V. Velisavljevic, B. Beferull-Lozano, M. Vetterli, and P.L. Dragotti. Directionlets: Anisotropic multidirectional representation with separable filtering. IEEE Transactions on Image Processing, 15(7):1916– 1933, 2006. 84 Geometric Wavelets for Image Processing: Metric Curvature of Wavelets Emil Saucan (1) , Chen Sagiv (2) and Eli Appleboim (3) (1) Department of Mathematics, Technion - Israel Institute of Technology, Haifa 32000, Israel. (2) SagivTech Ltd. Israel. (3) Electrical Engineering Department, Technion - Israel Institute of Technology, Haifa 32000, Israel. semil@tx.technion.ac.il, chensagivron@gmail.com, eliap@ee.technion.ac.il Abstract: We introduce a semi-discrete version of the FinslerHaantjes metric curvature to define curvature for wavelets and show that scale and curvature play similar roles with respect to image presentation and analysis. More precisely, we show that there is an inverse relationship between local scale and local curvature in images. This allows us to use curvature as a geometrically motivated automatic scale selection in signal and image processing, this being an incipient bridging of the gap between the methods employed in Computer Graphics and Image Processing. A natural extension to ridgelets and curvelets is also given. Further directions of study, in particular the development of a curvature transform and the study of its link with wavelet and the scale transforms are also suggested. 1. Introduction The versatility and adaptability of wavelets for a variety of tasks in Image Processing and related fields is too well established in the scientific community, and the bibliography pertaining to it is far too extensive, to even begin to review it here. We do, however, stress the fact that the multiresolution property of wavelets has been already applied in determining the curvature of planar curves [1] and to the intelligence and reconstruction of meshed surfaces (see, e.g. [18], [26], amongst many others). Moreover, the intimate relation between scale and differentiability in natural images has also been stressed [10]. We have presented in [24] and other related works, an extension of Shannon’s Sampling Theorem when images are viewed as higher dimensional objects (i.e. manifolds), rather than 2-dimensional signals. More precisely, our approach to Shannon’s Sampling Theorem is based on sampling the graph of the signal, considered as a manifold, rather than sampling of the domain of the signal, as is customary in both theoretical and applied signal and image processing, motivated by the framework of harmonic analysis. The main tool for proving our geometric sampling theorem, resides in the confluence of Differential Topology and Differential Geometry. More precisely, we consider piecewise-linear (P L) approximations of the manifold, where the geometric feature (i.e. curvature) determines the proper size and shape-ration of the simplices of SAMPTA'09 the constructed triangulation. Naturally, the question is whether the implementation of the geometric sampling scheme is feasible. We do not address here the purely geometric aspects, that would be highly relevant in Computer Graphics implementation (besides, these were partly addressed in [24]). Instead, we focus on the far more important and popular Image Processing tool of wavelets. The versatility and adaptability of wavelets to a variety of tasks in Image Processing and related fields is too well established in the scientific community, and the bibliography pertaining to it is far to extensive, to even begin to review it here. Unfortunately, in contrast to Computer Graphics experts, for many investigators concerned with wavelets applications, piecewise-linear approximations are not necessarily among their most familiar tools. It is, therefore, a challenge to consider the integration of tools practiced by both communities. Although it may appear to be a surprising result to those primarily familiar with classical wavelets, the Strömberg wavelets [27], are based on piecewise-linear functions. Another, more intriguing issue is whether one can replace the intuitive trade-off between scale and curvature, by a formal concept of wavelet curvature, in particular in cases such as those of the Strömberg wavelets, or, in the more difficult case of Haar wavelets that are not even piecewise linear. Interestingly enough, this can be done by using metric curvatures [2] (and [21] for a short presentation). It turns out that the best candidate, for the desired metric curvature is the Finsler-Haantjes curvature, due to its adaptability to both continuous and discrete settings. A more suitable approach to surface reconstruction could, for example, implement ridgelets [5], or the more generalized, curvelets [6]. 2. Mathematical Background The central mathematical concept of the present paper is the following metric notion of curvature suggested by Finsler and developed by Haantjes [12]: Definition 1 Let (M,d) be a metric space, let c : I = ∼ [0, 1] → M be a homeomorphism, and let p, q, r ∈ c(I), q, r 6= p. Denote by qr b the arc of c(I) between q and r, and by qr segment from q to r. We say that c has 85 ✁ r ☎ C ✟ qr ✆✂ ✝ qr p ✡ ✠ ✞ ✄ q Figure 2: A piecewise-linear wavelet. Figure 1: A metric arc and a metric segment. 3 Finsler-Haantjes curvature κF H (p) at the point p iff: κ2F H (p) = 24 lim q,r→p l(qr) b − d(q, r) ¡ ¢3 ; d(q, r)) (1) where “l(qr)” b denotes the length, in intrinsic metric induced by d, of qr b – see Figure 1. (Here we assume that the arc qr b has finite length.) 3. Finsler-Haantjes Curvature of Wavelets In [23] we have introduced, in the context of both vertex and edge weighted graphs, a discretization of the FinslerHaantjes curvature, (for applications in DNA analysis). Here we consider a semi-discrete (or semi-continuous) version, as follows: Let ϕ be the typical piecewise-linear wavelet depicted in d be the arc of curve between the points A Figure 2, let AE and E, and let d(A, E) is the length of the line-segment d = a + b + c + d and d(A, E) = e + f . AE. Then l(AE) 2 Then κF H (ϕ) = 24[(a + b + c + d) − (e + f )]/(a + b + c + d)3 . Note that, in addition to the “total” curvature of ϕ, one can also compute the “local” curvatures at the “peaks” B and D: κ2F H (B) = 24(a + c − e)/(a + b)3 and κ2F H (D) = 24(c + d − f )/(a + b)3 , as well as the mean curvature of these peaks: κ = [κF H (B) + κF H (B)]/2. Even if these variations may prove to be useful in certain applications, we believe that the correct approach, in the sense that it best corresponds to the scale of the wavelet, would be to compute the total curvature of ϕ. Let us compare the relationship between curvature and scale, for a concrete piecewise-linear wavelet – the Meyer wavelet [19] – see Figure 3. The results indicating the relationship between scale and curvature, for this wavelet, can be seen in the graph in Figure 4. However, had the definition of Finsler-Haantjes curvature been limited solely to piecewise-linear wavelets, its applicability would have also been diminished. We show, SAMPTA'09 1/2 2 -1 Note that, while highly intuitive and definable for a very large class of curves in general rather metric spaces, this definition of curvature would remain some esoteric notion, without the following theorem (see [2]): Theorem 2 Let c ∈ C 3 (I) be a smooth curve in R3 , and let p ∈ c be a regular point. Then κF H (p) exists and, moreover, κF H (p) = k(p) – the classical (differential) curvature of c at p. 1 0 -1 Figure 3: The Meyer wavelet. however, that it is also definable for the “classical” Haar wavelets, in a rather straightforward manner. For example, consider the basic Haar wavelet and Haar scaling function, illustrated in Figure 5. Then for the scaling function we d = d(A, B) + d(B, C) + d(C, D) = 3, while have: l(AE) d(A, D) = 1. Analogously, for the Haar wavelet we get: d = d(M, N, ) + d(N, P ) + d(P, R) + d(R, S) + l(AE) d(S, T ) = 5 and d(M, T ) = 1. The expression for κHF follow easily in both cases and we present the results for the first 10 scales in Figure 6 and Figure 7, respectively. Moreover, while perhaps of lesser interest, it should be mentioned that κHF (ϕ) can also be computed for smooth wavelets, using the p classical formula for the arc-length: R d = 1 + (ϕ′ )2 . l(AE) Suppϕ 4. Ridgelets and beyond The wavelet curvature definition introduced above is applicable, through standard methods, for image processing goals, by using separable 2-dimensional wavelets. However, while practical in many cases, this presumption contravenes to real geometric structure of images, as emphasized, for instance, in [24]. In addition, as it has already been pointed out by Candès [5], “that wavelets can efficiently represent only a small range of the full diversity of interesting behavior”, since wavelets can cope well with pointlike singularities, but they are not fitted for the analysis and reconstruction of singularities of dimension greater that 0, that are distributed along lines (and more general curves), planes (and other surfaces), etc. It is therefore natural to ask whether the notion of curvature defined for wavelets can be extended to ridgelets as well. The perhaps somewhat surprising answer is that such an extension is not only possible, it is in fact more straight- 86 Figure 4: Curvature as a function of scale: Meyer wavelets. C N 1/2 P 1 1 1 1 A D M Q B 1 Figure 6: Curvature as a function of scale: The Haar scaling functions. T 1 1 S R 1/2 Figure 5: The Harr scaling function and wavelet. forward and canonical. Indeed, 2-dimensional ridgelets are, in fact, piecewise C 2 surfaces (with line singularities). For these geometrical objects an almost standard notion of curvature exists: the principal curvatures (i.e maximal and minimal normal sectional curvatures – see [8]) at any point of the surfaces. For ridgelets, we consider only the maximal absolute curvature at points on the ridges (since, along the ridge-line, curvature is 0 (cf. [8]) – see Figure 8. The sectional curvature of curves normal to the ridge is then computed using the method described in the previous section. (See also [22] for the application of the this method to piecewise-flat surfaces.) Note that similar consideration apply with regard to curvelets (and, evidently, to nonseparable 2-dimensional wavelets as well). However, as far as curvature is concerned, there exists a basic difference between curvelets and ridgelets, which is a direct consequence of the difference between the geometric models employed. Namely, as already noted above, the principal curvature associated with the feature of interest (i.e. the ridge) vanishes. In consequence, Gaussian curvature, being the product of the principal curvatures, will also equal 0 for any point on the ridge (see Figure 8). In contrast, curvelets, being modeled on more flexible types of surfaces, can – and will – exhibit Gaussian curvatures different from 0, both positive and negative. This geometric analysis can also be applied to shear- SAMPTA'09 Figure 7: Curvature as a function of scale: The Haar wavelets. lets. As Figure 9 illustrates, shearlets display “peaks” of high positive Gauss curvature. In consequence, they are ideally suited for modeling phenomena which, in geometric terms, are characterized by positive curvature concentrated at specific points. In view of this, shearlets may be viewed, in the context of our geometric approach, as a complementary tool to ridgelets. Indeed, recall that ridgelets were developed as an extension of wavelets, befitting the modeling of line-type singularities. Point type singularities can still occur in conjunction to 1-dimensional singularities (not least as noise), hence a combination of both type of tools, in a common, integrated “dictionary” is, indeed, required. The geometric approach presented above enables us to build such a “dictionary” in natural manner. 5. Future work – Theory and Applications As we have seen, curvature can serve as a local scale estimator that is natural, i.e. intrinsic to the geometry of the image. Moreover, it can be easily calculated and used for image analysis and enhancement, especially in edge detection and texture discrimination (since in both cases curvature either large and/or exhibits a large variation). Results 87 Figure 8: Lines of curvatures on a ridgelet (after [9]). should be validated using previous work of Brox & Weickert [3] and Lindenberg [17]. It’s extension to ridgelets (and curvelets) should be compared with such benchmark works as [6]. Moreover, in view of such works as [4], [15], [16] (to cite only a few), further applications to image compression also impose themselves as naturally stemming from our curvature analysis. In addition, feature extraction is also a natural application for our method, since it allows for a better correlation between the internal scale of he image (i.e. curvature) and wavelets’ scale. (In fact, experiments in this direction are currently in progress.) On the theoretical end of the spectrum, one would like to develop a full multi-curvature analysis framework, where images are constructed using basis functions that are curve-related to one another. This is not an impossible task as it seems, since, as we have already mentioned, we have shown in [24] that image sampling and reconstruction based on their curvature is possible. In fact, in the said paper, we have proven that, in the geometric approach, the radius of curvature (see [8]) substitutes for the condition of the Nyquist rate, even in the 1-dimensional case. Since (sectional) curvature is defined as 1/(curvature radius), the relationship between scale and curvature becomes even clearer, in the light of the results presented herein. Therefore, we aim at presenting a curvature transform, akin to the wavelet transform and to the scale transform of [7]. Of course, in the context of curvatures of ridgelets and curvelets one should consider the appropriate types of transforms. We conclude with a further natural application of metric curvatures, lying at the confluence of theory and practice, namely to the fractals and their use, in conjunction with wavelets or independent of them, to image processing (see, e.g. [11], [13]). While a metric curvature – namely Menger’s metric curvature (see [2], [21]) – was already applied in a purely theoretical context to fractal analysis [20], our geometric method allows for a more flexible and coherent approach, that provides a unified treatment of wavelets (including their extensions mentioned above) and fractals. SAMPTA'09 Figure 9: Lines of curvatures on shearlets (after [14]). Note the high positive curvature concentrated at the “apex”. 6. Acknowledgments The authors would like to thank Professor Yehoshua Y. Zeevi for possing the problem, and to Professor Peter Maass, for his constructive critique and encouragement. The first author would also like to thank Professor Shahar Mendelson – his warm support is gratefully acknowledged. References: [1] Jean-Pierre Antoine and Laurent Jaques. Measuring a curvature radius with directional wavletes. In J-P. Gazeau, R. Kerner, J-P. Antoine, S. Metens, J-Y. Thibon, editors, GROUP 24: Physical and Mathematical Aspects of Symmetries, Inst. Phys. Conf. Series 173, pages 899–904, 2003. [2] Leonard M. Blumenthal and Karl Menger. Studies in Geometry Freeman & co., San Francisco, 1970. [3] Thomas Brox and Joackim Weickert. A TV flow based local scale estimate and its application to texture discrimination. Journal of Visual Communication and Image Representation, 17(5): 1053–1073, October 2006. [4] A. R. Calderbank, Ingrid Daubechies, Wim Sweldens and Boon-Lock Yeo Lossless image compression us- 88 ing integer to integer wavelet transforms. In Proceedings of ICIP 1997, vol.1, pages 596–599, 1997. [5] Emmanuel J. Candès and David L. Donoho. Ridgelets: a key to higher-dimensional intermittency? Phil. Trans. R. Soc. Lond. A., 357, 24952509. In L. L. Schumaker et al. editors, Curves and Surfaces., 1999. [6] Emmanuel J. Candès and David L. Donoho. Curvelets - a surprisingly effective nonadaptive representation for objects with edges. In L. L. Schumaker et al. editors, Curves and Surfaces., pages 1–10, 1999. [7] Leon Cohen. The Scale Representation IEEE Trans. Signal Processing, 41(12): 3275–3292, December 1993. [8] Manfredo P. do Carmo. Differential Geometry of Curves and Surfaces, Prentice-Hall, Englewood Cliffs, N.J., 1976. [9] David L. Donoho. Ridgelets and Ridge Functions NSF-SIAM Conference Board in the Mathematical Sciences Lectures, 2000. [10] Luc Florack, Bart. M. ter Haar Romeny, Jan J. Koenderink and Max A. Viergever. Scale and the differential structure of images, Image Vision Comput. 10(6), 376–388, 1992. [11] Éric Guérin, Éric Tosan and Atilla Baskurt. Fractal approximation and compression using projected IFS In Interdisciplinary Approaches in Fractal Analysis, IAFA’2003, pages 39-45, 2003. [12] Johannes Haantjes. Distance geometry. Curvature in abstract metric spaces. Indagationes Math., 9: 302314, 1947 [13] Houssam Hnaidi, Éric Guérin and Samir Akkouche. Fractal/Wavelet representation of objects In 3rd International Conference onInformation and Communication Technologies: From Theory to Applications, ICTTA 2008, pages 1-5, 2008. [14] Gitta Kutyniok and Tomas Sauer. From Wavelets to Shearlets and back again In M.Neamtu, L. L. Schumaker, editors, Approximation Theory XII: San Antonio 2007, pages 201–209, 2008. [15] Erwan Le Pennec and Stéphane Mallat. Image compression with geometrical wavelets In Proceedings of ICIP 2000, vol.1, pages 661–664, 2000. [16] Adrian S. Lewis and G. Knowles. Image Compression Using the 2-D Wavelet Transform. IEEE Transactions on Image Processing 1(2): 244–250, 1992. [17] Tony Lindeberg. Edge Detection and Ridge Detection with Automatic Scale Selection. International Journal of Computer Vision 30(2): 117–154, 1998. [18] John M. Lounsbery, Anthony D. DeRose, and Joe Warren. Multiresolution Analysis For Surfaces Of Arbitrary Topological Type ACM Transactions on Graphics, 16(1): 34–73 1997. [19] Yves Meyer. Wavelets : Algorithms & Applications. SIAM, University of Michigan, 1993. [20] Hervé Pajot. Analytic Capacity, Rectificabilility, Menger Curvature and the Cauchy Integral. Lecture Notes in Mathematics 1799, Springer-Verlag, Berlin, 2002. SAMPTA'09 [21] Emil Saucan. Curvature – Smooth, Piecewise-Linear and Metric. In G. Sica, editor, What is Geometry?, Advanced Studies in Mathematics and Logic, pages 237–268, 2006. [22] Emil Saucan. Surface triangulation - the metric approach. Preprint (arxiv:cs.GR/0401023), 2004. [23] Emil Saucan, and Eli Appleboim. Curvature Based Clustering for DNA Microarray Data Analysis. Lecture Notes in Computer Science, 3523:405–412, 2005. [24] Emil Saucan, Eli Appleboim, and Yehoshua Y Zeevi. Sampling and Reconstruction of Surfaces and Higher Dimensional Manifolds. Journal of Mathematical Imaging and Vision 30(1):105–123, 2008. [25] Emil Saucan, Eli Appleboim, and Yehoshua Y Zeevi. Geometric Approach to Sampling and Communication. Technion CCIT Report #707, November 2008. [26] Sébastien Valette and Rémy Prost. WaveletBased Multiresolution Analysis Of Irregular Surface Meshes. IEEE Transaction on Visualization and Computer Graphics, (10)2:113–122, 2004. [27] Jan-Olov Strömberg. A modified Franklin system and high order spline systems on Rn as unconditional bases for Hardy spaces. In W. Beckner, editor, Conference on Harmonic Analysis in honor of A. Zygmund, pages 475–494, Wadeworth International Group, Belmont, California, 1983. 89 SAMPTA'09 90 Analysis of Singularity Lines by Transforms with Parabolic Scaling Panuvuth Lakhonchai (1) , Jouni Sampo (2) and Songkiat Sumetkijakan (1) (1) Department of Mathematics, Chulalongkorn University, Phyathai Road, Patumwan, Bangkok 10330, Thailand. (2) Department of Applied Mathematics, Lappeenranta University of Technology, Lappeenranta, Finland. panuvuth@hotmail.com, jouni.sampo@lut.fi, songkiat.s@chula.ac.th Abstract: Using Hart Smith’s, curvelet, and shearlet transforms, we investigate L2 functions with sufficiently smooth background and present here sufficient and necessary conditions, which include the special case with 1-dimensional singularity line. Specifically, we consider the situation where regularity on a line in a non-parallel direction is much lower than directional regularity along the line in a neighborhood and how this is reflected in the behavior of the three transforms. 1. Introduction Wavelet transforms, both continuous and discrete, have proved to be a very efficient tool in detecting point singularities. However, due to its isotropic scaling, wavelet transforms are not ideal tools in detecting one-dimensional singularities like singularity lines or curves. Recently, wavelet-like transforms with parabolic scaling, such as Hart Smith’s and curvelet transforms, were introduced and applied successfully in edge detection. Our goal is then to investigate how these transforms can be used in detecting point, line, and curve singularities. New necessary and new sufficient conditions for an L2 (R2 ) function to possess Hölder regularity, uniform and pointwise, with exponent α > 0 are given. Similar to the characterization of Hölder regularity by the continuous wavelet transform, the conditions here are in terms of bounds of the Smith and curvelet transforms across fine scales. However, due to the parabolic scaling, the sufficient and necessary conditions differ in both the uniform and pointwise cases, with larger gap in pointwise regularities. Naturally, global conditions for pointwise singularities can be weakened. We then investigate functions with sufficiently smooth background in one direction and potential singularity in the perpendicular (non-parallel) direction. Specifically, sufficient and necessary conditions, which include the special case with one-dimensional singularity line, are derived for pointwise Hölder exponent. Inside their “cones” of influence, these conditions are practically the same, giving near-characterization of direction of singularity. 2. Directional Regularity We shall restrict our definition to a real-valued function f of two variables. Generalization to a function of several SAMPTA'09 variables is straightforward. For a given positive exponent α not in N, its pointwise, uniform, and directional Hölder (or Lipschitz) regularities are defined as follows. Fix a point u ∈ R2 at which regularity is under investigation. f is said to be pointwise Hölder regular with exponent α at u, denoted by f ∈ C α (u), if there exists a polynomial Pu of degree less than α and a constant C = Cu such that for all x in a neighborhood of u |f (x) − Pu (x − u)| ≤ Ckx − ukα . (1) If there exists a uniform constant C so that for all u in an open subset Ω of R2 there is a polynomial Pu of degree less than α such that (1) holds for all x ∈ Ω, then we say that f is uniformly Hölder regular with exponent α on Ω or f ∈ C α (Ω). The uniform Hölder exponent of f on Ω is defined to be αl (Ω) := sup{α : f ∈ C α (Ω)}, (2) and the pointwise Hölder exponent is defined in an analogous manner. Following [9], the local Hölder exponent of f at u is defines as αl (u) = lim αl (In ). n→∞ where {In }n∈N is a family of nested open sets in R2 , i.e. In+1 ⊂ In , with intersection ∩n In = {u}. In order to define directional regularity, let v ∈ Rd be a fixed unit vector representing a direction and u be a point in Rd . f is said to be pointwise Hölder regular with exponent α at u in the direction v, denoted by f ∈ C α (u; v), if there exist a constant C = Cu,v and a polynomial Pu,v of degree less than α such that |f (u + λv) − Pu,v (λ)| ≤ C|λ|α (3) holds for all λ in a neighborhood of 0 ∈ R. We next define directional regularity on a set Ω1 ⊆ R2 . Let Ω2 be an open neighborhood of Ω1 representing a set on which the Hölder estimate holds. Then f is said to be in C α (Ω1 , Ω2 ; v) if there exists a constant C = Cv so that for all u ∈ Ω1 there is a polynomial Pu,v of degree less than α such that (3) holds for all λ ∈ R with u + λv ∈ Ω2 . If Ω1 = Ω2 , then we denote C α (Ω1 , Ω2 ; v) simply by C α (Ω1 ; v). Of course, the directional pointwise and uniform Hölder exponents could be defined in the same way as (2). In the pointwise case, this directional 91 Hölder exponent measures one-dimensional regularity of f at u on the line passing through u and parallel with v. See [5]. For C α (Ω1 , Ω2 ; v), the set Ω1 in our context of line singularity will usually be a line and v points in a direction that is nonparallel with the line. In this situation, f ∈ C α (Ω1 , Ω2 ; v) has a ridge along the line provided that hte regularity in the direction of the line is sufficiently high. See Theorem 4. 3. Three Transforms with Parabolic Scaling 3.1 Hart Smith Transform Originally defined in [10], the Hart Smith transform was described in [1, 2] as follows. For a given ϕ ∈ L2 (R2 ), we define ³ ´ 3 ϕabθ (x) = a− 4 ϕ D a1 R−θ (x − b) , for θ ∈ [0, 2π), b ∈ R2 , and 0 < a0 , where a0 is a ³ a<´ 1 √1 fixed coarsest scale, D a1 = diag a , a , and R−θ is the matrix affecting planar rotation of θ radians in clockwise direction. Hart Smith transform can then be defined as Γf (a, b, θ) := hϕabθ , f i . We define vector v θ := Rθ (0, 1)T so that v θ is parallel to the major axis of the ellipse kvka,θ = 1. Reconstruction Formula [10, 1, 2] There exists a Fourier multiplier M of order 0 so that whenever f ∈ L2 (R2 ) is a high-frequency function supported in frequency space kξk > a20 , then, in L2 (R2 ) Z a0 0 = Z Z 2π 0 a0 0 Z 2π 0 Z Z R2 hϕabθ , M f i ϕabθ db dθ da a3 R2 hϕabθ , f i M ϕabθ db dθ da . a3 (4) 3.2 Continuous Curvelet Transform Following Candès and Donoho[1, 2], the continuous curvelet transform (CCT) is defined in the polar coordinates (r, ω) of the Fourier domain. Let W ¡be a¢positive real-valued C ∞ function supported inside 12 , 2 , called a radial window, and let V be a real-valued C ∞ function supported on [−1, 1], called an angular window, for which the following admissibility conditions hold: Z ∞ W (r) 0 2 dr r =1 and Z 1 V (ω)2 dω = 1. −1 At each scale a, 0 < a < a0 , γa00 is defined by ¡ √ ¢ 3 γd a00 (r cos(ω), r sin(ω)) = a 4 W (ar) V ω/ a SAMPTA'09 γabθ (x) = γa00 (Rθ (x − b)) , (5) for x ∈ R2 . (6) The continuous curvelet transform of f ∈ L( R2 ) is Γf (a, b, θ) = hγabθ , f i for 0 < a < a0 , b ∈ R2 , and θ ∈ [0, 2π). The admissibility conditions (5) and the polar coordinate design of curvelets yield the following: Reconstruction formula [2] There exists a bandlimited purely radial function Φ such that for all f ∈ L2 (R2 ), Z a0 Z 2π Z da f = f˜ + hγabθ , f i γabθ db dθ 3 , (7) a 0 R2 0 R where f˜ = R2 hΦb , f i Φb db and Φb (x) = Φ(x − b). For analysis of singularities of f , the low frequency part f˜ is not an issue as it is always C ∞ . Unlike Smith transform, curvelet transform does not use a true affine parabolic scaling as a slightly different generating function γa00 is used at each scale a > 0. 3.3 This gives a true affine transform that uses parabolic scaling. For each scale a and direction θ, let us define the norm ° ° ° ° kvka,θ := °D a1 R−θ v ° for v ∈ R2 . f= for r ≥ 0 and ω ∈ [0, 2π). For each 0 < a < a0 , b ∈ R2 , and θ ∈ [0, 2π), a curvelet γabθ is defined by Continuous Shearlet Transform We will follow mainly the definitions and notations in G. Kutyniok and D. Labate[6]. Let ψ1 , ψ2 ∈ L2 (R) and ψ ∈ L2 (R2 ) be given by µ ¶ ξ2 , ξ1 6= 0, ξ2 ∈ R, (8) ψ̂(ξ1 , ξ2 ) = ψ̂1 (ξ1 )ψ̂2 ξ1 where ψ1 satisfies the admissibility condition and ψ̂1 ∈ C0∞ (R) with supp ψ̂1 ⊂ [−2, − 21 ] ∪ [ 21 , 2] while ψ̂2 ∈ C0∞ (R) with supp ψ̂2 ⊂ [−1, 1], ψ̂2 > 0 on (−1, 1), and kψk2 = 1. Given such a shearlet function ψ, a continuous shearlet system is the family of functions ψast , a ∈ R+ , s ∈ R, t ∈ R2 , where ¡ ¢ 3 −1 ψast = a− 4 ψ D−1 a Bs (· − t) µ ¶ 1 −s where Bs is the shear matrix and Da is the di0 1 µ ¶ a √0 agonal matrix . The continuous shearlet trans0 a form of f is then defined for such (a, s, t) by SHψ f (a, s, t) = hf, ψast i . Many properties of the continuous shearlet are more evident in the frequency domain. So we note here that each ψ̂ast is supported on the set ¯ ¯ ¾ ½ ¯ √ 1 2 ¯ ξ2 (ξ1 , ξ2 ) : ≤ |ξ1 | ≤ , ¯¯ − s¯¯ ≤ a . 2a a ξ1 Reconstruction Formula [6] Let ψ ∈ L2 (R2 ) be a shearlet function. Then, for all f ∈ L2 (R2 ), Z Z Z da (9) f= hψast , f i ψast 3 ds dt in L2 . a R2 R R+ 92 If supp fˆ ⊂ C = then f= Z R2 Z 2 −2 Z 0 1 ¯ ¯ n o ¯ ¯ (ξ1 , ξ2 ) : |ξ1 | ≥ 2 and ¯ ξξ12 ¯ ≤ 1 , hψast , f i ψast da ds dt in L2 . a3 (10) Even though the second reconstruction formula (10) is valid only for functions with frequency support in the union C of two infinite horizontal trapezoids, it has the advantage that the integral involves only scales a and shear parameters s in bounded sets. A complementary shearlet (v) system ψast can be similarly defined so that one has a reˆ construction ¯ for ¯ f owith supp f ⊂ n formula which is valid ¯ ξ2 ¯ (v) C = (ξ1 , ξ2 ) : |ξ2 | ≥ 2 and ¯ ξ1 ¯ > 1 . Finally, ev- ery f ∈ L2 (R2 ) can be decomposed into three functions with frequency supports in C, C (v) , and W = [−2, 2]2 . The former two functions can then be reconstructed from (v) ψast and ψast respectively, while the latter is C ∞ . Therefore, regularity analysis can be carried out by considering the continuous shearlet transform with respect to these two shearlet systems. For more details, see [6]. 4. Common Properties of the Transforms We shall suppose from this point onward that ϕ̂ ∈ C ∞ and that there exist C1′ > C1′ > 0 and C2 > 0 such that supp(ϕ̂) ⊂ ([−C1′ , −C1 ]∪[C1 , C1′ ])×[−C2 , C2 ]. This assumption ensures that all our three kernel functions, Hart Smith, curvelet, and shearlet functions, have Fourier supports away from the Y -axis, which in turns results in crucial properties needed to prove our main results. 4.1 Vanishing Directional Moments A function f of two variables is said to have an L-order vanishing directional moments along a direction v = (v1 , v2 )T 6= 0 if Z bn f (bv+w)db = 0, R for all w ∈ R2 and 0 ≤ n < L. Lemma 1: Let v = (v1 , v2 )T be a unit vector. 1. There exists C < ∞ (independent √ of a, b and θ) such that if |θ + arctan( vv21 )| ≥ C a then the curvelet functions γabθ and the Smith functions ϕabθ and M ϕabθ have vanishing directional moments of any order L < ∞ along the direction v. ¯ ¯ √ ¯ ¯ 2. If ¯s + vv12 ¯ > a then the shearlet functions ψast have vanishing directional moments of any order L < ∞ along the direction v. Here, if v2 is 0 then vv12 are treated as ∞ so that the assumed inequality holds for all a ∈ (0, 1) and s ∈ [−2, 2], hence ψast has vanishing directional moments of any order L < ∞ along the direction v = (v1 , 0). SAMPTA'09 4.2 Smoothness and Decay Properties Lemma 2: For each N = 1, 2, ... there is a constant CN such that for all x ∈ R2 and ν ∈ N20 |∂ ν γabθ (x)| ≤ and CN a−3/4−|ν| °2N ° ° ° 1 + °D a1 R−θ (x − b)° √ CN a−3/4−|ν| ( a + |s|)ν2 |∂ ψast (x)| ≤ ° °2N . 1 + °D1/a B−s (x − t)° ν (11) (12) Moreover, (11) also holds for functions ϕabθ and M ϕabθ . 5. Singularity Lines Let φabθ denote any of the γabθ , ϕabθ , or M ϕabθ . Let us quote the following results.[8, 7] Theorem 1: Let f ∈ L2 (R2 ), u ∈ R2 , and assume that α > 0 is not an integer. If there exist α′ < 2α, θ0 ∈ [0, 2π], and A, C < ∞ such that |hφabθ , f i| is bounded by  à ° ′! °  ° b − u °α √ 5  α+ °  Ca 4 1 + ° , if |θ − θ0 | ≥ A a  ° a1/2 °  à ° ′! °  ° b − u °α √  α+ 43  ° , if |θ − θ0 | ≤ A a 1 + ° 1/2 °   Ca ° a for all a ∈ (0, a0 ), b ∈ R2 , and θ ∈ [0, 2π), then f ∈ C α (u). Theorem 2: Let f ∈ L2 (R2 ), u ∈ R2 , and assume that α > 0 is not an integer. If there exist α′ < 2α, −2 ≤ s0 ≤ 2, and C, C ′ < ∞ such that, for each 0 < a < 1, −2 ≤ s ≤ 2, and t ∈ R2 , |hψast , PC1 f i| is bounded by  à ° ° α′ ! ° °  √ t − u 5  α+ °  ° , if |s − s0 | > C ′ a,   Ca 4 1 + ° a1/2 ° à ° ° ′!  ° t − u °α √ 3  α+  ° ° , if |s − s0 | ≤ C ′ a,   Ca 4 1 + ° a1/2 ° (13) and à °α′ ! ° ¯D E¯ ° ° 5 t − u ¯ ¯ (v) ° , (14) ¯ ψast , PC2 f ¯ ≤ Caα+ 4 1 + ° ° a1/2 ° then f ∈ C α (u). holds if the inequality E D Similar statement (v) (13) holds for ψast , PC2 f and the inequality (14) holds for hψast , PC1 f i. Theorem 3 Let f be bounded with local Hölder exponent α ∈ (0, 1] at point u and f ∈ C 2α+1+ε (R2 , v θ0 ) for some θ0 ∈ [0, 2π) with any fixed ε > 0. Then there exist α′ ∈ [α − ε, α] and A, C < ∞ such that for a > 0 and b ∈ R2 , |hφabθ , f i| is bounded by  √ α+ 5 if |θ − θ0 | ≥ A a,   Ca 4 , à °α′ ! ° °b − u° √ α′ + 34 °  , if |θ − θ0 | ≤ A a. 1+°  Ca ° a ° 93 For s0 ∈ [−2, 2] and u = (u1 , u2 ) ∈ R2 , let Γu denote the vertical line passing though u and Γu,s0 denote the line passing through u with slope − s10 . Observe that we may write Γu = Γu,0 so that (x1 , x2 ) ∈ Γu,s0 if and only if x1 = −s0 (x2 − u2 ) + u1 . Recall that if Γ ⊆ R2 and ρ > 0, then Γ(ρ) is the ρ-neighborhood of Γ, i.e. the set of all points whose distance to Γ is less than ρ. Theorem 4 Let f ∈ C α (Γu,s0 , Γu,s0 (ρ) ; (1, 0)) and bounded for some α ∈ (0, 1], u ∈ R2 , s0 ∈ [−2, 2] and ρ > 1. Suppose also that f is in C 2α+1+ε (Γu,s0 (ρ) ; Bs0 (0, 1)) for some fixed ε > 0. Then there exists C < ∞ such that if 0 < a < a0 < 1 and t ∈ Γu (r) with r < ρ/2 and s ∈ [−2, 2], the continuous shearlet transform hψast , f i is bounded in magnitude by  √ 5 if |s − s0 | > a,  Caα+ 4 ,µ ¯ ¯α ¶ ¯ d (t, u) ¯ 3 ¯ , if |s − s0 | ≤ √a,  Caα+ 4 1 + ¯¯ s0 ¯ a [7] P. Lakhonchai, J. Sampo, and S. Sumetkijakan. Shearlet transforms and hölder regularities. 2009. Preprint. [8] J. Sampo and S. Sumetkijakan. Estimations of Hölder regularities and direction of singularity by Hart Smith and curvelet transforms. Journal of Fourier Analysis and Applications, 15(1):58–79, 2009. [9] S. Seuret and J. Lévy Véhel. The local Hölder function of a continuous function. Appl. Comput. Harmon. Anal., 13(3):263–276, 2002. [10] Hart F. Smith. A Hardy space for Fourier integral operators. J. Geom. Anal., 8(4):629–653, 1998. [11] S. Yi, D. Labate, G.R. Easley, and H. Krim. Edge detection and processing using shearlets. 2008. Preprint. where ds0 (t, u) = |t1 + s0 t2 − u1 − s0 u2 | denotes the distance between the parallel lines with slope − s10 (vertical line if s0 = 0) and passing through t and u respectively. Edge analysis has been done successfully using the continuous shearlet transform ([11, 4, 3, 6]). They consider the shearlet transform of the characteristic function of a set with piecewise smooth boundary and found that, at a regular boundary point t, the shearlet transform decays like a3/4 if s = s0 = ± vv12 and decays rapidly at other s 6= s0 , where v = (v1 , v2 ) is the normal vector of the boundary curve at t. Since this characteristic function has Hölder exponent 0 (bounded and discontinuous) at any boundary point in the normal direction, this decay rate of a3/4 at s = s0 = 0 agrees with that of Theorem 4. However, when s0 6= 0 the two directions in Theorem 4 along which regularity is assumed are not perpendicular. More comparisons of our results and the aforementioned work are needed. References: [1] Emmanuel J. Candès and David L. Donoho. Continuous curvelet transform. I: Resolution of the wavefront set. Appl. Comput. Harmon. Anal., 19(2):162– 197, 2005. [2] Emmanuel J. Candès and David L. Donoho. Continuous curvelet transform. II: Discretization and frames. Appl. Comput. Harmon. Anal., 19(2):198– 222, 2005. [3] K. Guo, Labate D., and W-Q. Lim. Edge analysis and identification using the continuous shearlet transform. Appl. Comput. Harmon. Anal., 2008. In Press. [4] K. Guo and D. Labate. Characterization and analysis of edges using the continuous shearlet transform. 2008. Preprint. [5] S. Jaffard. Multifractal functions: Recent advances and open problems. Menuscript, 2004. [6] G. Kutyniok and D. Labate. Resolution of the wavefront set using continuous shearlets. Trans. AMS., 105(1):157–175, 2007. SAMPTA'09 94 Geometric Separation using a Wavelet-Shearlet Dictionary David L. Donoho (1) and Gitta Kutyniok (2) (1) Department of Statistics, Stanford University, Stanford, CA 94305, USA. (2) Institute of Mathematics, University of Osnabrück, 49069 Osnabrück, Germany. donoho@stanford.edu, kutyniok@uni-osnabrueck.de Abstract: Astronomical images of galaxies can be modeled as a superposition of pointlike and curvelike structures. Astronomers typically face the problem of extracting those components as accurate as possible. Although this problem seems unsolvable – as there are two unknowns for every datum – suggestive empirical results have been achieved by employing a dictionary consisting of wavelets and curvelets combined with ℓ1 minimization techniques. In this paper we present a theoretical analysis in a model problem showing that accurate geometric separation can be achieved by ℓ1 minimization. We introduce the notions of cluster coherence and clustered sparse objects as a machinery to show that the underdetermined system of equations can be stably solved by ℓ1 minimization. We prove that not only a radial wavelet-curvelet dictionary achieves nearly-perfect separation at all sufficiently fine scales, but, in particular, also an orthonormal wavelet-shearlet dictionary, thereby proposing this dictionary as an interesting alternative for geometric separation of pointlike and curvelike structures. To derive this final result we show that curvelets and shearlets are sparsity equivalent in the sense of a finite p-norm (0 < p ≤ 1) of the cross-Grammian matrix. 1. Introduction Cosmological data analysts face tasks of geometric separation. Gravitation, acting over time, drives an initially quasi-uniform distribution of matter in 3D to concentrate near lower-dimensional structures: points, filaments, and sheets. It would be desirable to process single ‘maps’ of matter density and somehow extract three ‘pure’ maps containing just the points, just the filaments, and just the sheets around which matter is concentrating. However, this problem contains three unknowns for every datum which seems impossible to solve on mathematical grounds. Surprisingly, astronomer Jean-Luc Starck and collaborators have recently been empirically successful in numerical experiments with component separation. They used two or more overcomplete frames, each one specially adapted to particular geometric structures, and were able to obtain separation despite the fact that the underlying system of equations is highly underdetermined. Here we analyze such approaches in a mathematical SAMPTA'09 framework where we can show that success stems from an interplay between geometric properties of the objects to be separated, and the harmonic analysis for singularities of various geometric types. 1.1 Singularities and Sparsity As a mathematical idealization of ’image’, consider a Schwartz distribution f with domain R2 . The distribution f will be given singularities with specified geometry: points and curves. We plan to represent such an ’image’ using tools of harmonic analysis; in particular bases and frames. While many such representations are conceivable, we are interested here just in those bases or frames which can sparsely represent f . The type of basis which best sparsifies f depends on the geometry of its singularities. If the singularities occur at a finite number of (variable) points, then wavelets give what is, roughly speaking, an optimally sparse representation. If the singularities occur at a finite number of smooth curves, then one of the recently studied directional multiscale representations (curvelets or shearlets) will do the best job of sparsification. Since we are concerned with f being a mixture of content types, i.e., points and curves, presumably both systems are needed to represent f sparsely. 1.2 Minimum ℓ1 Decomposition and Perfect Separation In the early 1990’s, R. R. Coifman, Wickerhauser and coworkers became interested in the problem of representing signals using more than one basis and started a first heuristic exploration motivated intuitively, see [5]. A few years later, one of us worked with S .S. Chen to develop a formal, optimization-based approach to the multiple-basis representation problem [4]. Given bases Φi , i = 1, 2, one solves the following problem (BP) min kα1 k1 +kα2 k1 subject to S = Φ1 α1 +Φ2 α2 , thereby exploiting that the ℓ1 norm has a tendency to find sparse solutions when they exist. This can be regarded as the starting point for ℓ1 decomposition techniques. For theoretical work on this topic we refer to, e.g., [6, 10, 15, 16], and for empirical work see, for instance, [9,12,14,15]. 95 For further references we would like to mention the survey paper [1]. 1.3 A Geometric Separation Problem The work just cited, while suggestive and inspiring, concerns discretely indexed signal/image processing, and so is either empirical or else rigorously analytical but not directly relevant to geometric separation tasks, which will involve always continuum ideas. In this paper we develop related methods in a mathematical setting where the notion of successful separation can be made definitionally precise and can be established by mathematical analysis. For this, we pose a simple but clear model problem of geometric separation. Consider a ‘pointlike’ object P made of point singularities: P X |x − xi |−1 . P= Figure 1: Frequency tilings of radial wavelets and curvelets as well as of orthonormal wavelets and shearlets (from left to right). Since the scaling subband of each pair are similar as illustrated in Figure 1, we can define two families of filters (FjC )j and (FjS )j which allows to decompose a function f into pieces fjC (resp. fjS ) with different scales j. The piece fjC (resp. fjS ) at subband j arises from filtering f using FjC (resp. FjS ): i=1 Consider as well a curvelike object C, a singularity along a closed curve τ : [0, 1] 7→ R2 : Z C = δτ (t) dt, where δx is the usual Dirac Delta at x. By this choice, we arrange that one of the two distributions does not become dramatically larger than the other as we go to finer and finer scales; rather the ratio of energies is more or less independent of scale. This makes the separation problem challenging at every scale. Now assume that we observe the ‘Signal’ f = P + C, (1) however, the distributions P and C are unknown to us. The Geometric Separation Problem now consists in recovering P and C from knowledge of f . fjC = FjC ⋆ f and fjS = FjS ⋆ f, so that the Fourier transform fˆiC (resp. fˆjS ) is supported in the scaling subband of scale j of the associated pair of tight frames. The filters are defined in such as way, that we can reconstruct the original function from these pieces using the formula f= X FjC ⋆ fjC = j X FjS ⋆ fjS , f ∈ L2 (R2 ). j For the precise construction of those filters and further properties, we refer to [7]. We can now use these tools to attack the Geometric Separation Problem scale-by-scale. For this, we filter the model problem (1) to derive the sequences of filtered images fjC = PjC + CjC and fjS = PjS + CjS for all scales j. (2) 1.4 Two Geometric Frames 1.5 We focus on two pairs of overcomplete systems for representing the object f : In Section 2 we will develop and analyze the decomposition technique based on ℓ1 minimization we intend to employ, first in a very general Hilbert space setting. These results will then be applied to the scale-dependent Geometric Separation Problem (2) proving that the radial waveletcurvelet as well as the orthonormal wavelet-shearlet dictionary achieves nearly-perfect separation at all sufficient fine scales (Theorems 1 and 3). The sparsity equivalence between curvelets and shearlets we derive in Subsection 3.2 thereby allows transference of this result from the radial wavelet-curvelet to the orthonormal wavelet-shearlet dictionary. • Radial Wavelets – a tight frame with perfectly isotropic generating elements. • Curvelets – a highly directional tight frame with increasingly anisotropic elements at fine scales. as well as the pair • Orthonormal Separable Meyer Wavelets – an orthonormal basis of perfectly isotropic generating elements. • Shearlets – a highly directional tight frame with increasingly anisotropic elements at fine scales and a unified treatment of both the continuous and digital setting. We pick these because, as is well known, point singularities are coherent in wavelets and curvilinear singularities are coherent in curvelets/shearlets. For the precise definitions we refer to [2, 3], [11, 13], as well as [7]. SAMPTA'09 2. Outline General Component Separation We now first study the behavior of ℓ1 minimization in the general two-frame case. Suppose we have two tight frames Φ1 , Φ2 in a Hilbert space H, and a signal vector S ∈ H. We know a priori that there exists a decomposition S = S10 + S20 , 96 where S10 is sparse in Φ1 and S20 is sparsely represented by Φ2 . Our analysis will center on the use of cluster coherence to exploit the geometric structure of the sparse expansions rather than merely the fact that the vector is sparse. Typically, separation results employ the notion of mutual coherence between two tight frames Φ = (φi )i and Ψ = (ψj )j , µ(Φ, Ψ) = max max |hφi , ψj i|, j • the nonzeros of sparse vectors often do not arise in arbitrary patterns, but are rather highly structured, and that • the interactions between the dictionary elements in ill-posed problems are not arbitrary, but rather geometrically driven. These key observations lead to the following new notion. Definition 1. Given tight frames Φ = (φi )i and Ψ = (ψj )j and an index subset S associated with expansions in frame Φ, we define the cluster coherence µc (S; Φ, Ψ) = max j X |hφi , ψj i|. 2.2 Component Separation by ℓ1 Minimization Now consider the following optimization problem: argminS1 ,S2 kΦT1 S1 k1 + kΦT2 S2 k1 subject to S = S1 + S2 . Notice that in this problem, the norm is placed on the analysis coefficients rather than on the synthesis coefficients as in (BP) to avoid ‘self-terms’ in the frame expansions. The introduction of cluster coherence now ensures that the principle (S EP) gives a successful approximate separation. 2δ , 1 − 2µc where µc = max(µc (S1 ; Φ1 , Φ2 ), µc (S2 ; Φ2 , Φ1 )). 3. Geometric Separation of Pointlike and Curvelike Structures 3.1 Radial Wavelet-Curvelet Dictionary The concepts of the previous section will now be applied to S = fjC = PjC + CjC , our signal of interest from (2). The tight frames are Φ1 , the full radial wavelet frame, and Φ2 , the full curvelet tight frame. The subsignals S1⋆ , S2⋆ we derive by applying the optimization problem (S EP) will be relabel to Wj , the wavelet component, and Cj , the curvelet component. The main difficulty in applying Proposition 1 consists in choosing the sets of significant coefficients suitably. We achieve this by using microlocal analysis to understand heuristically the location of the significant coefficients in phase space. Roughly speaking, we then employ the HartSmith phase space metric defined by d((b, θ); (b′ , θ′ )) = |heθ , b − b′ i| + |heθ′ , b − b′ i| +|b − b′ |2 + |θ − θ′ |2 i∈S Thus cluster coherence bounds between a single member of frame Ψ and a cluster of members of frame Φ, clustered at S, in contrast to mutual coherence, which can be thought of as singleton coherence. A related notion called ‘cumulative coherence’ was introduced in [16], but notice that here we fix a specific set of significant coefficients and do not maximize over all such subsets. The key idea for our analysis is that the index subsets we consider are not abstract, but have a specific geometric interpretation. Maximizing over all subsets with a common combinatorial property would prohibit utilizing this interpretation, hence cumulative coherence is not suitable for our purposes. SAMPTA'09 kS1⋆ − S10 k2 + kS2⋆ − S20 k2 ≤ i whose importance was shown by [6], as a means to impose conditions on the interactions between the dictionary elements. However, this notion is too weak for our purposes. Our novel contribution to sparse recovery and ℓ1 minimization consists in exploiting the facts that (S1⋆ , S2⋆ ) = k1S1c ΦT1 S10 k1 + k1S2c ΦT2 S20 k1 ≤ δ. Let (S1⋆ , S2⋆ ) solve (S EP). Then 2.1 Cluster Coherence (S EP) Proposition 1 ( [7]). Suppose that S can be decomposed as S = S10 + S20 so that each component Si0 is relatively sparse in Φi , i = 1, 2, i.e., to define an ‘approximate’ set of significant wavelet coefficients Λ1,j = {wavelet lattice} ∩{(b, θ) : d((b, θ); W F (P)) ≤ ηj aj } and an ‘approximate’ set of significant curvelet coefficients Λ2,j = {curvelet lattice} ∩{(b, θ) : d((b, θ); W F (C)) ≤ ηj aj } for carefully chosen ηj ; W F denotes the wavefront set. Tedious, highly technical estimates then lead to the following separation result: Theorem 1 ( [7]). A SYMPTOTIC S EPARATION USING A R ADIAL WAVELET-C URVELET D ICTIONARY. kWj − PjC k2 + kCj − CjC k2 → 0, kPjC k2 + kCjC k2 j → ∞. This result shows that components are recovered asymptotically: at fine scales, the energy in the curvelike component is all captured by the curvelet coefficients and the energy in the pointlike component is all captured by the wavelet coefficients. 97 4. 3.2 Sparsity Equivalence We now aim to show that curvelets and shearlets are sparsity equivalent in the sense that, for 0 < p ≤ 1, the ℓp norm of the curvelet coefficient sequence is finite if and only if the same is true for the shearlet coefficient sequence. First we observe that for two tight frames Φ = (φi )i and Ψ = (ψj )j , their cross-Grammian matrix M (i, j) = hφi , ψj i contains all information on the relation between coefficient sequences ΦT S and ΨT S for some signal S. Sparsity equivalence can therefore be proven by analyzing the p-norm, 0 < p ≤ 1 defined by ³ X |M (i, j)|p )1/p , kM kp = max (sup i (sup j j X |M (i, j)|p )1/p i ´ of a cross-Grammian matrix M . Now setting (ση )η to be the shearlet tight frame and (γµ )µ to be the curvelet tight frame, we derive the following result. We remark that the low frequency part has to be dealt with particular care, but for these technicalities we refer to [7]. Proposition 2 ( [8]). For all 0 < p ≤ 1, k(hση , γµ i)η,µ kp < ∞. Using basic estimates from frame theory and the previous proposition, we can show that shearlets and curvelets are indeed sparsity equivalent, thereby allowing us to easily transfer results about sparsity from one system to the other. Theorem 2 ( [8]). Let f ∈ L2 (R2 ) and 0 < p ≤ 1. Then k(hf, ση i)η kp < ∞ if and only if k(hf, γµ i)µ kp < ∞. 3.3 Orthonormal Wavelet-Shearlet Dictionary Similar to Subsection 3.1, S = fjS = PjS + CjS (see (2)) is now our signal of interest, and the tight frames are Φ1 , the full orthonormal wavelet frame, and Φ2 the full shearlet tight frame. The subsignals S1⋆ , S2⋆ , we derive by applying the optimization problem (S EP) will be relabel to Wj , the wavelet component, and Sj , the shearlet component. The results from Subsection 3.2 as well as similar correspondences between radial wavelets and orthonormal wavelets now form the backbone for the transfer of Theorem 1 to the orthonormal wavelet-shearlet dictionary. Careful application of those to the key estimates in the proof of Theorem 1 leads to a similar result for the orthonormal wavelet-shearlet dictionary. Theorem 3 ([7]). A SYMPTOTIC S EPARATION USING AN O RTHONORMAL WAVELET-S HEARLET D ICTIONARY. kWj − PjS k2 + kSj − CjS k2 → 0, kPjS k2 + kCjS k2 SAMPTA'09 j → ∞. Conclusion We first considered signals, being a superposition of two subsignals, each of which is relatively sparse with respect to some tight frame. As a model procedure for separation we considered ℓ1 minimization of the analysis (rather than synthesis) frame coefficients. By introducing cluster coherence as a new concept for analyzing the interaction of the two tight frames by taking the geometry of the sparse component expansions into account, we derived an estimate for the ℓ2 norm of the separation error. We then considered signals, which are a superposition of pointlike and curvelike structures. Using the previously derived estimate, we proved that for both pairs of tight frames (radial wavelets/curvelets) as well as (orthonormal wavelets/shearlets) at sufficiently fine scale, nearlyperfect separation is achieved using the model procedure, thereby proposing the orthonormal wavelet-shearlet dictionary as an interesting alternative for geometric separation of pointlike and curvelike structures. The sparsity equivalence between curvelets and shearlets we further proved thereby allows to derive this separation result only for one dictionary and easily transfer it to the other one. Acknowledgment The authors would like to thank Emmanuel Candès, Michael Elad, and Jean-Luc Starck, for numerous discussions on related topics. The second author would like to thank the Department of Statistics at Stanford University and the Department of Mathematics at Yale University for their hospitality and support during her long-term visits. The authors would also like to thank the Newton Institute of Mathematics in Cambridge, UK for providing an inspiring research environment which led to the completion of a significant part of this work during their stay. This work was partially supported by NSF DMS 05-05303 and DMS 01-40698 (FRG), and by Deutsche Forschungsgemeinschaft (DFG) Heisenberg Fellowship KU 1446/8-1. We further thank the anonymous referee for useful comments and suggestions. References: [1] A.M. Bruckstein, D.L. Donoho, and M. Elad. From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images. SIAM Review 51:34–81, 2009. [2] E. J. Candès and D. L. Donoho. Continuous curvelet transform: I. Resolution of the wavefront set. Appl. Comput. Harmon. Anal. 19:162–197, 2005. [3] E. J. Candès and D. L. Donoho. Continuous curvelet transform: II. Discretization of frames. Appl. Comput. Harmon. Anal. 19:198–222, 2005. [4] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Review 43:129–159, 2001. [5] R. R. Coifman and M. V. Wickerhauser. Wavelets and adapted waveform analysis. A toolkit for signal processing and numerical analysis, In Different 98 [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] perspectives on wavelets (San Antonio, TX, 1993), 47:119–153, Proc. Sympos. Appl. Math., Amer. Math. Soc., Providence, RI, 1993. D. L. Donoho and X. Huo. Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory 47:2845–2862, 2001. D. L. Donoho and G. Kutyniok. Microlocal Analysis of the Geometric Separation Problem. Preprint, 2009. D. L. Donoho and G. Kutyniok. Sparsity Equivalence of Anisotropic Decompositions. Preprint, 2009. M. Elad, J.-L. Starck, P. Querre, and D. L. Donoho. Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA). Appl. Comput. Harmon. Anal. 19:340–358, 2005. R. Gribonval and M. Nielsen. Sparse representations in unions of bases. IEEE Trans. Inform. Theory 49:3320–3325, 2003. K. Guo, G. Kutyniok, and D. Labate. Sparse Multidimensional Representations using Anisotropic Dilation und Shear Operators. In Wavelets und Splines (Athens, GA, 2005), G. Chen und M. J. Lai, eds., Nashboro Press, Nashville, TN (2006), 189–201. M. Kowalski and B. Torrésani. Sparsity and Persistence: mixed norms provide simple signal models with dependent coefficients. Signal, Image and Video Processing, to appear. G. Kutyniok and D. Labate. Resolution of the Wavefront Set using Continuous Shearlets. Trans. Amer. Math. Soc. 361:2719–2754, 2009. F. G. Meyer, A. Averbuch, and R. R. Coifman. Multilayered Image Representation: Application to Image Compression. IEEE Trans. Image Proc. 11:1072– 1080, 2002. J.-L. Starck, M. Elad, and D. L. Donoho. Image decomposition via the combination of sparse representations and a variational approach. IEEE Trans. Image Proc. 14:1570–1582, 2005. J. A. Tropp. Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inform. Theory 50:2231–2242, 2004. SAMPTA'09 99 SAMPTA'09 100 Special session on Sampling and Communication Chair: Götz PFANDER SAMPTA'09 101 SAMPTA'09 102 A Kashin Approach to the Capacity of the Discrete Amplitude Constrained Gaussian Channel Brendan Farrell (1) and Peter Jung (2) (1) Heinrich-Hertz Lehrstuhl, Technische Universität Berlin, Einsteinufer 25, 10587 Berlin, Germany. (2) Fraunhofer German-Sino Lab for Mobile Communications - MCI, Einsteinufer 37, 10587 Berlin, Germany. brendan.farrell@mk.tu-berlin.de, peter.jung@hhi.fhg.de Abstract: We derive an explicit lower bound on the capacity of the discrete amplitude–constrained Gaussian channel by proving the existence of tight frames that permit redundant vector representations with small coefficients. Our method encodes the information in subspaces that are optimal in terms of the power to amplitude ratio. In a recent paper, Lyubarskii and Vershynin discuss how the work of Kashin (1977) implies the existence of such representations, and they term them Kashin respresentations. We use this work from frame theory to address the relationship between signal redundancy, peak–to–average power ratio and achievable data rates. 1. Introduction Communication at high data rates and with moderate cost on hardware and complexity provide challenging topics in engineering and applied mathematics. An important problem in this direction is efficient signaling and coding under an amplitude constraint. In general, the cost for high data rate is related to a power budget. However, in practical communication systems, there sometimes exist disruptive or non-linear effects that only occur at high signal amplitudes. The information–theoretic treatment of amplitude–constrained channel is completely different from the power–constrained channel. On the other hand, coding for power–constrained Gaussian channels is well understood. Clearly, if a loss in data rate is accepted, signals can be constructed with lower maximum amplitude. The optimal scaling between power and amplitude and an explicit relation to achievable rates will be given in this paper. In this case, the data-rate loss is caused by considering redundant representations. Here, the original vectors are expanded with respect to a particular frame and the coefficients are then transmitted. We show that there exist frames which allow the standard coding approach to be used for the amplitude-constrained channel. Our result is Theorem 2, which comes at the end of the paper. This theorem states that for the amplitude constrained, Gaussian channel the rate   1 Signal Power log 1 + λmin (1) 2λmin Noise Power is achievable for a redundancy λmin that is an explicit function of the peak-to-average power ratio. We note SAMPTA'09 that by making the amplitude constraint compatible with Gaussian codebooks, we make the developed tools and understanding of Gaussian codebooks applicable to the amplitude–constrained channel. Results from frame theory, thus, allow us to address a question in information theory. While the results used from functional analysis are well known there, we show a new application. 1.1 The Information–Theoretic Problem The capacity of a communication channel is the maximum amount of information per unit of time that can be sent from a sender through the channel to the receiver. Shannon made this operational concept mathematically rigorous by formulating it in terms of entropy [7]. In [7] Shannon addressed the discrete–time model: Y = X + Z, (2) for the noisy channel, where X and Y denote the (real) channel input and output, and the additive noise Z is a Gaussian random variable with variance σ 2 . Let X n be a random vector in Rn according to a distribution to be determined and Z n the random vector having n identical independent distributed (iid) copies of Z. Shannon introduced two concepts of a capacity for this model. The information capacity C (i) is the supremum of the information rates: 1 (3) sup I(X n ; Y n ) C (i) = lim n→∞ n µn ∈F n taken over all distributions µn of X n from a particular subset F n ⊂ P n of probability distributions P n . I(X n ; Y n ) denotes the mutual information between the random variables X n and Y n and is equal to the entropy of Y n minus the entropy of Y n given X n , I(X n ; Y n ) = h(Y n ) − h(Y n |X n ). From its concavity in µn it follows that the optimum µnopt is at least achieved for a product distribution, i.e. single letter coding with a measure µ = µ1 is optimal in this sense. Shannon considered an averaged power constraint P which corresponds to the set F = F 1 of single–letter distributions: Z (4) F = {µ ∈ P | |x|2 dµ(x) ≤ P } or equivalently n 1X E|xi |2 ≤ P. n i=1 (5) 103 He found that the optimum µopt is attained for a Gaussian distribution with variance P and that C (i) = 1 P log(1 + 2 ). 2 σ (6) Shannon further showed with a so called coding theorem that it is even possible to get arbitrary close to that value justifying the term channel capacity. That is, for each rate nR R < C (i) there exist 2nR codewords {X(ω)}ω=2 in Rn ω=1 nR n (called a (2 , n) code) such that X(ω) + Z can be distinguished at the receiver with error probability going to zero as n increases. (X will now denote codewords and be indexed by ω.) Each admissible Pn codeword satisfies the average power constraint n1 i=1 |Xi (ω)|2 ≤ P ; however, to achieve the capacity it may be necessary to use codewords having maximum amplitudes which scale with √ n. We address an additive, white Gaussian noise (AWGN) channel under the assumption that there is both a power constraint, n 1X |Xi (ω)|2 ≤ P, (7) n i=1 and a strict amplitude constraint: max |Xi (ω)| ≤ A, i=1,...,n (8) for two positive, real numbers P and A and for all ω = 1, ..., 2nR . The information capacity under a constraint A on the amplitudes of the signals was solved by Smith [8]. Similar to the Gaussian channel with power constraint only, Smith showed that the capacity of the amplitude–constrained channel is attained when the entries xi are independent. The set of (single–letter) input distributions is in this case: F = {µ ∈ P | µ({|x| > A}) = 0}. (9) Smith found that the optimum measure µopt has discrete and finite support. Similar results are known for other noise densities (see for example [6]). A characterization of the number of mass points in the Gaussian case is unknown. For a given assumption on this number the values and the positions can be computed. From this Smith gave an algorithm which numerically computes C (i) . Smith establishes an algorithm to determine the optimal input probability measure given the constraints A, P and σ 2 . However, to date there is not a general strategy applicable for a practical range of these parameters. 1.2 Frames and Banach Geometry We will work strictly with real numbers. Pn We have the following norms for Rn : kxklpn = ( i=1 |xi |p )1/p and n n = maxi=1,...,n |xi |. B kxkl∞ p will denote the unit ball in n R with respect to the ℓp -norm. We denote by UnN an ndimensional subspace of RN , N ≥ n. We will often speak of a matrix U ∈ Rn×N whose rows are orthonormal and span UnN or whose columns constitute a tight frame for Rn . SAMPTA'09 n Definition 1. A set of vectors {ui }N i=1 ⊂ R is a tight n frame for R if kxk22 = N X i=1 = |hx, ui i|2 (10) for all x ∈ Rn . It follows that the columns of an n×N matrix U constitute a tight frame for Rn if and only if U U ∗ = In , where In denotes the identity matrix of size n. In the proof of the coding theorem (see, for example, [1]) for the Gaussian channel with average power constraint P , the constructed √ codewords X ∈ Rn satisfy the constraint kXkℓn2 ≤ nP . Similarly, in the amplitude constrained channel codewords must satisfy kXkℓn∞ ≤ A. In other words, admissible signals X for the amplitude constraint channel lie in a n . And for a power conscaled cube, i.e. X ∈ A · B∞ strained channel the signals are contained in an increasing √ ball X ∈ nP · B2n . Of course the difficult aspect of this channel is the amplitude constraint. We do not require that the random input variables {xi }ni=1 be independent, which allows us to use redundant representations. The basic idea for our approach is the following: given N n n vectors {ui }N i=1 spanning R , N > n, a vector x ∈ R may be expressed, in general, in multiple ways as a linear combination of the vectors {ui }N i=1 : x= N X bi u i . (11) i=1 In light of the amplitude constraint, the question is whether one of the possible expressions (10) satisfies N ≤ A. If this is possible, then we may transmit the kbkl∞ vector b and suffer an efficiency loss of N − n symbols. The representation (10) is called a Kashin representation [5] of the vector x if kbk∞ ≤ Ckxk2 . We first address a general frame setting and then focus on the Kashin representations in Section 3. 2. General Frame Setting As we have seen, the capacity of the discrete Gaussian channel with average power constraint P and noise variance σ 2 is 12 log(1 + σP2 ). This means, If R < 12 log(1 + P nR codewords, and all adσ 2 ) is the rate, then there are 2 missible codewords for the power con√ this channel satisfy straint kX(ω)k2 ≤ nP , ω = 1, ..., 2nR . If one has a n tight frame {ui }N i=1 for R , N = [λn], then one can also achieve the rate: λP 1 log(1 + 2 ) 2λ σ (12) nR by transmitting codewords {Y (ω)}2ω=1 ⊂ RN satisfying U Y (ω) = X(ω) for ω = 1, ..., 2nR . Since columns of U ∈ Cn×N form a tight frame for Rn , 2 = kY (ω)kl2 , and thus: kU X(ω)klN n N n 1 X 1 X |Yi (ω)|2 = |Xi (ω)|2 ≤ P. N i=1 λn i=1 (13) 104 The key point is that a vector Y (ω) that satisfies U Y (ω) = X(ω) is, in general, not unique. For a given additional constraint, one may ask if there exists a set Y ⊂ RN satisfying the additional constraint and a tight frame with matrix U such that: U Y = {x|x ∈ Rn , kxk2 = 1}. (14) The existence of such a set and a corresponding tight 1 log(1 + λP frame is sufficient to imply that 2λ σ 2 ) is an achievable rate for the discrete Gaussian channel with the additional constraint. The additional constraint of interest here is the amplitude N ≤ A for all constraint; that is, it is required that kY (ω)kl∞ nR 2 codewords Y (ω). Thus, for √ a given codebook {X(ω)}ω=1 satisfying kX(ω)kl2n ≤ nP for all ω, we would like to nR determine a second codebook {Y (ω)}2ω=1 ⊂ RN satisfyN ≤ A and a tight frame so that U Y (ω) = ing kY (ω)kl∞ X(ω) for all ω. For completeness and clarity, we include the communication strategy. The next section will show that Step 2 is possible for an appropriate λ. Communication Strategy: n 1. The set of vectors {ui }N i=1 form a tight frame for R and are known to both transmitter and receiver. 2. Each codeword X(ω) satisfies the power constraint , and its Kashin representation Y (ω) ∈ RN satisfying N ≤ A is determined. kY (ω)kl∞ 3. To transmit the message ω, the transmitter sends Y (ω). 4. Y (ω) + Z N ∈ RN is received. 5. Receiver multiplies Y (ω) + Z N by U to obtain X(ω) + U Z N ∈ Rn . 6. Receiver decodes X(ω) + U Z N ∈ Rn . We note that, in contrast to the approach of Smith [8], this approach is still based on Gaussian codebooks, and, therefore, the extensive tools developed for Gaussian codebooks are still applicable. matrix on Rn . One possible coefficient vector for equation (14) is a = U ∗ x. For this vector, we note p N hU ∗ x, U ∗ xi (17) ≤ kakl2N = kakl∞ p = (18) hIn x, xi = kxkl2n . Consequently, for a tight frame, it is always possible to N ≤ kxkln , and thus equafind a vector a satisfying kakl∞ 2 tion (15) can √be satisfied for every tight frame with Kashin level K = N . Of course the study of Kashin representations is concerned with optimally small constants and their relation to the redundancy λ = N/n. We will be interested in the dependence of K = K(λ) on λ, but we postpone the discussion of the constant K(λ) until the next section. Now, we show a lower bound on the achievable capacity when the ampli√ tude constraint is K(λ) P (or greater). If we set any n orthonormal vectors in RN to be the rows of a matrix U , then U U ∗ = In , and the columns of U constitute a tight frame for Rn . Thus, a tight frame for Rn can be constructed from any n-dimensional subspace of RN . For U ∈ Cn×N , let UnN denote the subspace of RN spanned by its rows. Then U (B2N ∩ UnN ) = B2n . Therefore, for any x ∈ B2n , as long as the rows of U are linearly independent there exists a y ∈ (B2N ∩ UnN ) such that x = U y. In the higher dimensional space, we have N -norm constraint. We thus want to find an nan k · kl∞ dimensional subspace of RN that can be mapped isometrically with respect to the k · kl2n -norm to Rn , and we must be able to cover B2n in this way. First results on the smallest constant C, such that a projecN covers B2n was given by Kashin in tion of the ball C · B∞ [3]. There he showed that the scaling is O(n−1/2 ), and the exact optimal scaling was then determined in [2]. Since the k · k2 -isometric projection is equivalent to the existence of a tight frame, we formulate their result in terms of frames. Theorem 1 ([3, 2]). For all positive integers N and n, N > n, there exists a tight frame for Rn consisting of N vectors such that every vector in Rn has a Kashin representation of level: K(λ) := C 3. Kashin Representations or Optimal Subspaces Definition 2 (Kashin Representations). For a set of vecn tors {ui }N i=1 ⊂ R , N > n, the expansion x= N X ai ui (15) i=1 is a Kashin representation with level K of the vector x ∈ Rn if Kkxkl2n √ N ≤ kakl∞ , i = 1, ..., N. (16) N See [3, 4, 5]. We denote by U the n × N dimensional matrix with columns {ui }N i=1 . If these vectors constitute a tight frame, then U U ∗ = In , where In denotes the identity SAMPTA'09   1/2 λ λ log 1 + , λ−1 λ−1 (19) where λ = N/n with respect to this frame. See also [4, 5] for further discussion of this result. In [5] Lyubarskii and Vershynin have recently given an algorithm for determining a Kashin representation. In the same paper they discuss various ways to generate the required frames and determine their Kashin constants. Theorem 2. For a given amplitude constraint A, there exists a constant λmin such that the capacity CP,A of the discrete Gaussian channel with average power constraint P , amplitude constraint A and noise variance σ 2 is lower bounded by   λmin P 1 log 1 + . (20) CP,A ≥ 2λmin σ2 105 Proof Theorem 1 shows the existence of a frame with the necessary properties, as discussed in the communication strategy in Section 2. Denoting the matrix corresponding to this frame by U , for each codeword X(ω) ∈ Rn , there exists a codeword Y (ω) ∈ RN such that X(ω) = U Y (ω), and N kY (ω)kl∞ K(λmin ) √ kX(ω)kl2n N √ ≤ K(λmin ) P . ≤ (21) (22) Lastly, λmin is the solution to C  1/2 A λ λ =√ , log(1 + ) λ−1 λ−1 P which exists and is unique since is monotone increasing. 4.  λ λ−1 log(1 + (23) 1/2 λ λ−1 ) Conclusion We have considered an application of the redundant representations found in frame theory and geometric functional analysis to a fundamental question in information theory. References: [1] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, New York, 1991. [2] A. Garnaev and E. D. Gluskin. The widths of euclidean balls. Doklady An. SSSR., 277:1048–1052, 1984. [3] B. S. Kashin. Diameters of some finite-dimensional sets and classes of smooth functions. Izv. Akad. Nauk SSSR Ser. Mat., 41(2):334–351, 478, 1977. English transl. in Math. USSR IZV. 11 (1978), 317-333. [4] B. S. Kashin and V. N. Temlyakov. A remark on compressed sensing. Mathematical Notes, 82(5):748–755, Nov 2007. [5] Y Lyubarskii and R. Vershynin. Uncertainty principles and vector quantization. preprint. [6] W. Oettli. Capacity-achieving input distributions for some amplitude-limited channels with additive noise (corresp.). IEEE Transactions on Information Theory, 20(3):372–374, May 1974. [7] C.E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423,623– 656, 1948. [8] Joel G. Smith. The Information Capacity of Amplitude and Variance Constrained Scalar Gaussian Channels. Information and Control, 18:203–219, 1971. SAMPTA'09 106 Erasure-proof coding with fusion frames Bernhard G. Bodmann(1) , Gitta Kutyniok(2) and Ali Pezeshki(3) (1) Department of Mathematics, University of Houston, Houston, TX 77204, USA (2) Institute of Mathematics, University Osnabrueck, 49069 Osnabrueck, Germany (3) Electrical and Computer Engineering Department, Colorado State University, Fort Collins, CO 80523, USA bgb@math.uh.edu, kutyniok@uni-osnabrueck.de, pezeshki@engr.colostate.edu Abstract: The main goal of this paper is the design of frames for transmitting vectors through a memoryless analog erasure channel. The channel transmits the frame coefficients perfectly or discards them, depending on the outcomes of Bernoulli trials with a failure probability q. For sufficiently small q, we construct frames which encode above a fixed non-zero rate and allow the receiver to recover part of the erased coefficients so that the remaining mean-square error vanishes as the frame size increases. We give examples for which the mean-square reconstruction error remaining after corrections are applied decays faster than any inverse power of the number of frame vectors. 1. Introduction We are concerned with the linear transmission of vectors through a memoryless channel that either transmits a coefficient perfectly or discards it, in accordance with the outcomes of independent, identically distributed Bernoulli trials. The problem of reconstructing a vector in a finitedimensional real or complex Hilbert space when not all of its frame coefficients are known has already received much attention in the literature [1–9]. However, many results focus on optimal performance for the smallest possible number of erased coefficients [4, 7–9], which is not typical for transmissions via a memoryless erasure channel. Other results on so-called maximally robust frames guarantee recovery from a certain fraction of lost frame coefficients [10], but this may involve inverting an arbitrarily ill-conditioned matrix. The notion of a memoryless analog erasure channel is simply one that transmits each frame coefficient independently with a given success probability q and otherwise erases it, meaning it does not let the receiver access the coefficient. Within this error model for transmissions, we investigate the performance of fusion frames [11–13], previously also referred to as frames of subspaces [14] or weighted projections resolving the identity [15], which lend themselves to various methods of error correction. What makes the fusion frames useful for error correction purposes is that they have many subsets which are frames for their span. Thus, one can design hierarchical methods SAMPTA'09 for error correction which make error estimates feasible. The main result presented here is that for a fixed, sufficiently small erasure probability q, we design fusion frames such that their associated coding rate is bounded away from zero and the mean-square error remaining after error correction is applied decays faster than any polynomial in terms of the number of frame vectors. The techniques for our results involve combinatorial elements similar to the construction of product codes initially investigated by Elias [16], together with some framespecific arguments. 2. Preliminaries Throughout the paper, we let H be a real or complex Hilbert space. Instead of expanding vectors in Hilbert spaces with orthonormal bases, many applications nowadays use frames, stable, non-unique (redundant) expansions, for various purposes. We first briefly recall the basic terminology, and refer the reader to [17] for further details. Definition 1. We call a family of vectors F = {fj }j∈J in H a frame if there exist constants PA, B > 0 such that for all x ∈ H with kxk = 1, A ≤ j∈J |hx, fj i|2 ≤ B. If we can choose A = B, then we say that the frame is A-tight. In case A = B = 1 we call F a Parseval frame. A frame is called equal-norm if there is a c > 0 such that all vectors have the norm kfj k = c. With each frame F, we associate the analysis operator V : H → ℓ2 (J), which maps a vector to its frame coefficients, (V x)j = hx, fj i. The fact that a vector is over-determined by its frame coefficients helps correct errors which may occur in the course of a transmission, or when frame coefficients are stored in an unreliable medium. A main goal of frame design is to optimize the performance of a frame given certain constraints. This could be, for example, the dimension of the Hilbert space and the number of frame vectors, or their ratio. In analogy with binary codes, we define a coding rate for a given frame. Definition 2. Let H be a Hilbert space of dimension d and F a frame for H consisting of n vectors. We say that F has a coding rate of R = d/n. The coding and error correction method we discuss hereafter relies on frames arising from tensor product constructions. These frames are a special type of a fusion 107 frame, see e.g. [12–15]. ily of vectors F = {fj1 ⊗ fj2 ⊗ · · · ⊗ fjm : ji ∈ Ji for all i} is a tight frame for H = H1 ⊗H2 ⊗· · ·⊗Hm . We call this frame F a tight product frame. Remark 1. If F is a Parseval frame then (V ∗ EV − I)x = V ∗ (E − I)V x and the inverse can be obtained from Neumann series (V ∗ EV )−1 = P∞ the norm-convergent ∗ n n=0 (V (I − E)V ) . Applying this operator to the output of blind reconstruction gives perfect reconstruction of the input vector. Remark 1. We note that if we fix all but one index, say the (1) (2) (m−1) last, then the resulting set fj1 ⊗fj2 ⊗· · ·⊗fjm−1 ⊗F (m) is a tight frame for its span. Therefore, F has a natural fusion frame architecture. Similarly, fixing only the first m − k indices of the frame vectors in the tensor product would provide a tight frame for a subspace for any 0 ≤ k < m. Moreover, there is a partial ordering on these tight frames for subspaces induced by the partial ordering of the subspaces they span. Next, we define a measure for average reconstruction performance when probabilities for erasures are known. To this end, we average the square of the reconstruction error with the distribution of erasures and input vectors. Here and herafter, we denote the expectation of any random variable η with respect to the underlying probability meaR sure P by E[η] = ηdP. Definition 3. Given Hilbert spaces H1 , H2 , . . . Hm and (i) tight frames F (i) = {fj }j∈Ji for each Hi , then the fam(1) (2) (m) 3. Erasures and the mean-square error A communication system is given by a frame F for a Hilbert space H, and an error model for the transmission of frame coefficients. Our main error model assumes memoryless erasures, that is, the values of randomly selected frame coefficients become unknown in the course of transmission, in accordance with the outcomes of Bernoulli trials. In brief, frame coefficients are erased, independently of each other, with a fixed probability q ≥ 0. Depending on the implementation of decoding, the performance of a frame can be measured in different ways; we generally distinguish active error correction and blind reconstruction. When actively correcting erasures, one tries to fill in the values for the erased coefficients, and aims for a high probability of successfully restoring all lost coefficients. When blind reconstruction is used, one sets the missing coefficients to zero and reconstructs always in the same way. In this case, the usual goal is obtaining a small error norm, such as the mean-square error or the worstcase error. In the present work we consider a combination of the two approaches. We measure the quality of error correction by the mean-square error that results from using the corrected coefficients with the possibly remaining, uncorrected erasures set to zero. The average in this mean-square error is taken over the random erasures and over random unitnorm input vectors. For simplicity, we consider input vectors which are independent of the erasures and uniformly distributed on the unit sphere of the Hilbert space. Definition 4. Let F = {f1 , f2 , . . . fn } be a Parseval frame for a real or complex Hilbert space H. The blind reconstruction error for an input vector x ∈ H and an erasure of frame coefficients with indices K = {j1 , j2 , . . . jm }, m ≤ n, is given by kV ∗ EV x − xk = k(V ∗ EV − I)xk where E is the diagonal n × n matrix with Ej,j = 1 if j 6∈ K and Ej,j = 0 else. If the positive operator V ∗ EV has a bounded inverse, then we say that the corresponding SAMPTA'09 erasure is correctible. Definition 5. Let {βj }j∈J be a family of binary ({0, 1}valued) random variables governed by a probability measure P, and let ∆ be the random diagonal matrix with entries ∆j,j = βj . Moreover, let ξ be a random variable with values in the unit sphere {x ∈ H : kxk = 1} which is independent of the family {βj }, and assume that the distribution of U ξ is identical to that of ξ for any fixed unitary U . Given a Parseval frame F for a Hilbert space H with analysis operator V , we define the mean-square error by σ 2 (V, β) = E[kV ∗ ∆V ξk2 ] . There is a simple expression for the mean square error as the square of a weighted Frobenius norm of the Grammian V V ∗. Lemma 1. Let {βj }j∈J be as above, assume the family is identically distributed with probability P(β1 = 1) = q, and assume the joint distribution is such that P(βj = βj ′ = 1) = r for all j 6= j ′ . Let ∆ be the random diagonal matrix with entries ∆j,j = βj . If V is the analysis operator of a Parseval frame F = {fj }j∈J containing n = |J| vectors in a Hilbert space of dimension d, then σ 2 (V, β) = n n  X X 1 (q − r) kfj k4 + r |hfj , fl i|2 . d j=1 j,l=1 4. Bounding the mean-square error for iterative decoding This section describes how product frames can be used to trade an increase in block length of encoding for better error correction capabilities. We first consider the simplest case in which H has two factors, H = H1 ⊗H2 . Also, as preparation for our main theorem, we first consider packet erasures [15] instead of erasures for single frame coefficients. This means, we have a frame F = F (1) ⊗ F (2) and a two-parameter family of random variables {βj,j ′ } which govern erasures of frame coefficients in such a way that either all coefficients belonging to some j ′ are erased or all of them are left intact. We compute the mean-square error for this error model. Proposition 1. Let H = H1 ⊗H2 and let V1 and V2 be the (1) analysis operators of Parseval frames F (1) = {fj }j∈J1 (2) 108 and F (2) = {fj ′ }j ′ ∈J2 for H1 and H2 having dimension d1 and d2 , respectively. Let {βj,j ′ : j ∈ J1 , j ′ ∈ J2 } be a two-parameter family of binary random variables which have probabilities P(βj,j ′ = 1) = q and are distributed (2) (2) such that there is a family {βj ′ }j ′ ∈J2 and βj,j ′ = βj ′ almost surely, regardless of j. The mean-square error for the frame F and this type of packet erasures reduces to that of F (2) , σ 2 (V1 ⊗ V2 , β) = σ 2 (V2 , β (2) ) . Next, we continue with three combinatorial lemmata. They prepare the main result which concerns the error correction capabilities of tight product frames. The main problem we wish to address with this result is the following: Given a fixed, sufficiently small erasure probability q, find frames such that their associated coding rate is bounded away from zero and the mean-square error remaining after error correction is applied decays fast in terms of the number of frame vectors. We show hereafter that product frames of the form F = F (1) ⊗ · · · ⊗ F (m) , for which each factor F (i) can correct up to two erased frame coefficients, satisfy the desired properties. Lemma 2. Let n1 ≥ 3 and let {β1 , β2 , . . . βn1 } be a family of independent, identically distributed random variables which take values Pn1 in {0, 1}. Suppose q0 = P(β1 = βj ≥ 3), then 1) and let q1 = P( j=1 q1 ≤ m Lemma 3. Let {ni }m i=1 be the sizes of index sets {Ji }i=1 , with ni ≥ 3 for all i ∈ {1, 2, . . . m}. Assume there is an m-parameter family of binary, independent identically distributed random variables {βj1 ,j2 ,...jm } and associated (1) (2) (m−1) families {βj2 ,j3 ,...jm }, {βj3 ,j4 ,...jm }, . . . {βjm } which (0) are iteratively defined by βj1 ,j2 ,...jm ≡ βj1 ,j2 ,...jm and ( Pn (k−1) 1, if jkk=1 βjk ,jk+1 ,...jm ≥ 3 , (k) βjk+1 ,jk+2 ,...jm = 0, else. (m−1) If P(β1,1,...1 = 1) = q0 , then the family {βj } is inde(m−1) pendent, identically distributed with qm−1 = P(βj 1) having the bound 1 m−1 P(γj = 1) ≤ 2 −1) 31 nm−1 n3m−2 · · · n13 m−1 q03 m−1 = P(γj1 = γj2 = 1) ≤ n2 q 4 . These lemmata allow us to formulate an error bound for the remaining mean-square error for blind reconstruction after the error correction protocol has been applied. Theorem 1. Let V = V1 ⊗ V2 ⊗ · · · ⊗ Vm be the analysis operator of a Parseval product frame F = F (1) ⊗ F (2) ⊗ · · · ⊗ F (m) for a Hilbert space H = H1 ⊗ H2 ⊗ · · · ⊗ Hm . Denote the dimension of each Hi by di and the number of frame vectors in F (i) by ni . Let {βj1 ,j2 ,...jm } be an m-parameter family of binary independent, identically (m−1) distributed random variables, define {βj } as above, Pnm (m−1) (m−1) ≥ 3 and if jm =1 βjm and let γj1 ,j2 ,...jm = βjm γj1 ,j2 ,...jm = 0 otherwise, then σ 2 (V, γ) ≤ The probability computed in the above lemma is the probability of an erased block after applying erasure correction iteratively. The next lemma considers what happens when the error correction is applied to packets at the final level. Here, we deviate from the strategy of only reconstructing nontrivially when at most two packets are missing. Instead, we correct for missing packets and compute the SAMPTA'09 probabilities for the residual mean-square error. nm X 1  (m) (qm − rm ) kfj k4 dm j=1 + rm nm X j,l=1 (m) |hfj (m) , fl i|2  with qm = 61−2·3 m−1 1 2 4·3 4·3 n4·3 n3m nm−1 m−2 · · · n1 and rm ≤ m−1 q04·3 m−1 6 qm . nm Corollary 1. If V = V1 ⊗ V2 ⊗ · · · ⊗ Vm and all Vi belong to equal-norm Parseval frames, then it is well known (i) that kfj k2 = ndii and by the Cauchy Schwarz inequality (i) (i) |hfj , fl i|2 ≤ d2i /n2i . Thus, we have σ 2 (V, γ) ≤ qm dm dm + rm dm ≤ 7qm nm nm with qm = 61−2·3 . 1 3 4 n q , 6 and for j1 6= j2 , we have 1 3 3 n q . 6 1 0 The probability estimated in this lemma is that of a packet of n1 coefficients remaining corrupted after an error correction protocol has been applied which can correct any two erased coefficients. By iteration, we obtain a simple consequence. qm−1 ≤ 6− 2 (3 Lemma 4. Let {β1 , β2 , . . . , βn }, n ≥ 1, be independent, identically distributed binary random variables with probability P(β1 = 1) = q. Let the P random variables n γ1 , γ2 , . . . , γn be defined by γj = βj if j=1 βj ≥ 3, and otherwise γj = 0 for all j ∈ {1, 2, . . . n}. Then, for any j, m−1 1 2 4·3 4·3 n3m n4·3 m−1 nm−2 · · · n1 m−1 q04·3 m−1 . Example 1. Assume that an equal-norm product frame F = F (1) ⊗ · · · ⊗ F (m) has F (i) with ni = i2 n1 vectors for each i ∈ {1, 2, . . . m} and n1 ≥ 3. Let the dimension of the Hilbert space Hi spanned by F (i) be dim(Hi ) = i2 n1 − 2 , and assume the frame can correct any two erased coefficients. Examples of such frames are the harmonic ones, 109 see e.g. [2]. The tensor product of these m Hilbert spaces, H = ⊗m i=1 Hi , has dimension dim(H) = (m!)2 nm 1 m  Y i=1 2  1− 2 . i n1 This means, the coding rate R is bounded, independently of m, by m  Y ∞ 2 X 1 2  2  ) 1 − > (1 − i2 n1 n1 n1 i=2 i2 i=1  2  π2 2  = (1 − ) 1− −1 . n1 6n1 6 R> 1− It is straightforward to check that n1 ≥ 3 ensures R > 0. The preceding theorem then states that after correcting erasures, the probability of an uncorrected block at the final level is qm ≤ m6 n31 61−2·3 m−1 q04·3 m−1 e4 Pm−1 k=1 3m−k ln(k2 n1 ) and upon estimating the sum in the exponent with Jensen’s inequality, 2 m−1 X k=1 3−k ln k ≤ 2 ∞ X k=1 3−k ln k ≤ ln 3 , 2 we have qm ≤ m6 n31 61−2·3 m−1 q04·3 m−1 e2(3 m −1) ln n1 4·3m ln e 3 2 . To achieve exponential decay of qm in 3m requires −2 ln 6 + 4 ln q0 + 6 ln n1 + 12 ln 3 < 0, 2 which amounts to 27 √ q0 n13/2 < 1 . 8 6 Since n1 = 3 is the smallest dimension to start the iteration, fast decay of the mean-square error needs q0 < √ 8 2/81 ≈ 0.14. The number of transmitted frame coefficients is (m!)2 nm 1 , 1 so by Stirling’s approximation O(e(m+ 2 ) ln m+m ln n1 ), whereas by the preceding corollary the decay of the meanm square error is of order O(e−c3 ), for a suitable c > 0. This implies that the mean-square error decays faster than any inverse power of the number of transmitted coefficients. Acknowledgment This work was partially supported by National Science Foundation grant DMS 08-07399 and by the Deutsche Forschungsgemeinschaft under Heisenberg Fellowship SAMPTA'09 KU 1446/8-1. References: [1] V. K. Goyal, M. Vetterli, and N. T. Thao, Quantized overcomplete expansions in Rn : analysis, synthesis, and algorithms. IEEE Trans. Inform. Theory, 44(1): 16–31, 1998. [2] V. K. Goyal, J. Kovačević, and J. A. Kelner, “Quantized frame expansions with erasures,” Appl. Comp. Harm. Anal., 10:203–233, 2001. [3] J. Kovačević, P. L. Dragotti, and V. K. Goyal, “Filter bank frame expansions with erasures,” IEEE Trans. Inform. Theory, 48:1439–1450, 2002. [4] P. Casazza and J. Kovačević, “Equal-norm tight frames with erasures,” Adv. Comp. Math., 18:387– 430, 2003. [5] G. Rath and C. Guillemot, Performance analysis and recursive syndrome decoding of DFT codes for bursty erasure recovery, IEEE Trans. on Signal Processing, 51 (5):1335–1350, 2003. [6] G. Rath and C. Guillemot, Frame-theoretic analysis of DFT codes with erasures, IEEE Transactions on Signal Processing, 52 (2):447–460, 2004. [7] R. Holmes and V. I. Paulsen, “Optimal frames for erasures,” Lin. Alg. Appl., 377:31–51, 2004. [8] B. G. Bodmann and V. I. Paulsen, “Frames, graphs and erasures,” Linear Algebra Appl., 404: 118–146, 2005. [9] D. Kalra, Complex equiangular cyclic frames and erasures, Linear Algebra Appl., 419:373–399, 2006. [10] M. Püschel and J. Kovačević, “Real, tight frames with maximal robustness to erasures”, Proc. Data Compr. Conf., Snowbird, UT, 63–72, March 2005. [11] P. G. Casazza and G. Kutyniok, Robustness of fusion frames under erasures of subspaces and of local frame vectors, Contemp. Math., 464, Amer. Math. Soc., Providence, RI, 149–160, 2008. [12] P. G. Casazza, G. Kutyniok, and S. Li, “Fusion Frames and Distributed Processing,” Appl. Comput. Harmon. Anal., 25:114–132, 2008. [13] G. Kutyniok, A. Pezeshki, A. R. Calderbank, and T. Liu, “Robust Dimension Reduction, Fusion Frames, and Grassmannian Packings,” Appl. Comput. Harmon. Anal., 26:64–76, 2009. [14] P. G. Casazza and G. Kutyniok, “Frames of subspaces,” in: “Wavelets, frames and operator theory,” Contemp. Math., 345, Amer. Math. Soc., Providence, RI, 87–113, 2004. [15] B. G. Bodmann, “Optimal linear transmission by loss-insensitive packet encoding,” Appl. Comput. Harmon. Anal., 22:274–285, 2007. [16] P. Elias, Error-free coding, IRE Trans. IT, 4:29–37, 1954. [17] O. Christensen, “An Introduction to Frames and Riesz Bases,” Birkhäuser, Boston, 2003. 110 Representation of operators by sampling in the time-frequency domain Monika Dörfler (1) and Bruno Torrésani(2) (1) ARI, Austrian Academy of Science, Wohllebengasse 12-14, A-1040 Vienna, Austria. (2) LATP, Centre de Mathématique et d’Informatique, 39 rue Joliot-Curie, 13453 Marseille cedex 13, France. Monika.Doerfler@oeaw.ac.at, Bruno.Torresani@cmi.univ-mrs.fr Abstract: Gabor multipliers are well-suited for the approximation of certain time-variant systems. However, this class of systems is rather restricted. To overcome this restriction, multiple Gabor multipliers allowing for more than one synthesis windows are introduced. The influence of the choice of the various parameters involved on approximation quality is studied for both classical and multiple Gabor multipliers. efficient, e.g. in the sense of sparsity, to use several side diagonals, but a lower redundancy in the Gabor system used. The aim of this contribution is the description of error estimates for the approximation of operators by generalized Gabor multipliers, based on the operator’s spreading function. From this description guidelines for the choice of good parameters for the approximation are deduced and illustrated by various numerical experiments. 1. 2. Introduction In a recent paper [1], the authors describe the representation of operators in the time-frequency domain by means of a twisted convolution with the operator’s spreading function. Although not suitable for direct discretization, the spreading representation provides a better understanding of certain operators’ behavior: it reflects the operator’s action in the time-frequency domain. This motivates an approach that uses the spreading representation of timefrequency multipliers [1], in order to optimize the parameters involved. More specifically, in the one-dimensional, continuous-time case, given an operator H with integral kernel κH and spreading function ηH : Z ∞ ηH (b, ν) = κH (t, t − b)e−2iπνt dt, −∞ we aim at modeling the operator by its action on the sampled short-time Fourier transform (STFT) or Gabor coefficients, given for any f ∈ L2 (R) by Vg f (mb0 , nν0 ) = hf, gmn i , m, n ∈ Z (1) where the gmn = Mnν0 Tmb0 g denote the Gabor atoms associated to g ∈ L2 (R) and the lattice constants b0 , ν0 ∈ R+ , see [3]1 . In the case of classical Gabor multipliers, the modification consists of a pure multiplication. Thus, the linear operator applied to the coefficients Vg f is diagonal, an approach that leads to accurate approximation for so-called underspread operators [5]. The restriction to diagonality may be relaxed in order to achieve better approximation for a wider class of operators at low cost. It also appears, that in certain approximation tasks it is more finite dimensional case H = CL is obtained similarly, replacing integrals with finite sums, and letting m = 0, . . . Nb − 1, n = 0, . . . Nν − 1, where Nb = L/b0 , Nν = L/ν0 and b0 , ν0 divide L. 1 The SAMPTA'09 Approximation in the time-frequency domain: the parameters Throughout this paper, H denotes a (finite or infinitedimensional) Hilbert space, equipped with an action of the Heisenberg group of time-frequency shifts. 2.1 Time-frequency multipliers Vg∗ Let denote the adjoint of Vg . A Gabor multiplier [4] is defined as M : f ∈ H 7−→ Mf = V2∗ (m · V1 f ). Here, m is the pointwise multiplication operator whose symbol, defined on the lattice Λ will also be denoted by m. We shall denote by Λo the adjoint lattice, o its fundamental domain, and Πo the corresponding periodization operator. In the infinite-dimensional situation H = L2 (R), and for a product lattice of the form Λ = b0 Z × ν0 Z, we have Λo = P t0 Z × ξ0 Z with t0 = 1/ν0 , ξ0 = 1/b0 , and Πo f (ζ) = λo ∈Λo f (ζ + λo ), ζ ∈ o . In a finitedimensional setting H = CL , with Λ = ZNb × ZNν , with Nb , Nν two divisors of L, we have Λo = ZNν × ZNb , and the obvious form for the periodization operator. In the definition of the multipliers, several parameters have to be fixed: the analysis and synthesis windows g and h, the lattice Λ, and the symbol m. For practical as well as theoretical reasons, the windows should be well-localized in time and frequency. As for the lattice, it is expected that denser lattices will lead to better results in approximation, but higher computational cost. However, it will be seen that too dense lattices are not suitable. Finally, the symbol m can be optimized to best approximate a given operator. In [1], an explicit expression for the best approximation was obtainned in the spreading domain, yielding a very efficient algorithm (compare [2]). 111 The spreading function of Gabor multipliers takes the form ηM (ζ) = M (ζ) · Vg h(ζ) , where M is the symplectic Fourier transform of m. Note, that this leads to a periodic function with period o . Hence, good approximation by a classical Gabor multiplier is possible, if the essential support of the spreading function is smaller than 1 and can then be contained in the fundamental domain o of the adjoint lattice for a dense enough lattice Λ. Also, to reduce aliasing as much as possible, the analysis and synthesis windows must be chosen such that Vg h is small outside o and positive on the support of the spreading function, also see Section 4.1. 2.2 Generalized Gabor multipliers Multiple Gabor multipliers are sums of Gabor multipliers with different synthesis windows. Definition 1 (Multiple Gabor Multiplier) Let g, h ∈ H denote two window functions. Let Λ be a time-frequency lattice. Let {µj , j ∈ J} denote a finite set of timefrequency shifts, and let {mj , j ∈ J} be a family of bounded functions on Λ. Set h(j) = π(µj )h, then the associated generalized Gabor multiplier M is defined, for f ∈ H, as XX Mf = m(λ, µj )hf, π(λ)giπ(λ)h(j) . λ∈Λ j∈J It is immediately obvious that in addition to the parameters mentioned above, the window h as well as the sampling points J must be chosen. 3. Error analysis in L2 (R) In [1], it was shown that the symbol m(λ, µj ) := mj (λ) of the best approximation of a Hilbert-Schmidt operator by a multiple Gabor multiplier with fixed sets Λ, J and windows, is given by the symplectic Fourier transform of the o -periodic functions Mj obtained via the vector equation M (ζ) = U(ζ)−1 · B(ζ) , ζ ∈ o , ∞ The finite-dimensional situation is similar, replacing the integral over o with a finite sum over the finite fundamental domain {0, . . . t0 − 1} × {0, . . . ξ0 − 1}. 4. Choosing the parameters For simlicity, we specialize the following discussion to the infinite-dimensional case H = L2 (R), and rectangular lattice Λ = b0 Z × ν0 Z. The finite-dimensional situation is handled similarly. 4.1 Gabor Multipliers If an operator with known spreading function is to be approximated by a Gabor multiplier, the lattice may be adapted to the eccentricity of the spreading function according to the error expression obtained in Proposition 1, which may be considerably simplified for the case of only one synthesis window, see [1]. In order to choose the eccentricity of the lattice accordingly and adapt the window to the chosen lattice as to avoid aliasing, assume, that we may find b0 , ν0 , with b0 · ν0 < 1, such that supp(ηH ) ⊆ Tz o , where o = [0, ν10 ] × [0, b10 ]. In this case, the error resulting from best approximation by a Gabor multiplier with respect to the lattice b0 Z × ν0 Z is bounded by Ce · kηH k22 , with |Vg h(t, ξ)|2 , 2 k,l |Vg h(t + kt0 , ξ + lξ0 )| Ce = 1 − inf o P t,ξ∈H (3) with oH = o ∩ Supp(ηH ), and becomes minimal for a window that is optimally concentrated inside o . Heuristically as well as from numerical experiments we know, that the tight window, [3], corresponding to the given lattice is usually a good choice to fulfill this requirement. (2) where the matrix and vector valued functions U and B are given by the Λo -periodizations     ′ Ujj ′ = Πo Vg h(j ) Vg h(j) , Bj = Πo ηH Vg h(j) , provided U is invertible a.e. The case of one synthesis windows may be immediately obtained from the above formula. Note that formula (2) allows for an efficient implementation of the otherwise expensive calculation of the best approximation by multiple Gabor multipliers. We may now give an expression for the error in the approximation given above, in the case H = L2 (R) Proposition 1 Let M denote the vector-valued function obtained as in (2) and set, for the Hilbert-Schmidt operator H, ΓH =PΠo (|ηH |2 ). Then the approximation error E = kηH − j Mj Vj k2 is given by P Z −1  )ij (ζ)Bi (ζ)Bj (ζ)  i,j (U E= |ΓH (ζ)| 1 − dζ |ΓH (ζ)| o SAMPTA'09 Notice that this covers the multiplier case obtained in [1]. Notice also that this immediately yields P −1 )ij Bi Bj i,j (U 2 E ≤ kηH k 1 − |ΓH | 4.2 Generalized Gabor Multipliers The main additional task in the generalized situation is the choice of the sampling points µj for the synthesis windows. A good choice will again be guided by the behavior of the spreading function. The relevant areas in the spreading domain should be covered as well as possible with the smallest possible overlap by the cross-ambiguity functions of the different synthesis windows with respect to a given reference-window localized at (0, 0) e.g. the Gaussian window. Motivated by the results from the Gabor multiplier situation, we choose a tight window with respect to the analysis lattice and look for the most appropriate sampling points for the synthesis windows. Examples will be given in Section 5.2. 5. Examples We now turn to numerical experiments, in the finite case H = CL . In the following examples, the relative approximation error for the best approximation H̃ of H is given 112 b0 = 2,3,4,5,6,9,10,12,15,18; ν0 = 2 Approximation error for redundancy 5 and different lattice eccentricities b0 = 2,3,4,5,6,9,10,12,15,18; ν0 = 3 −0.5 −2 −4 −6 −8 −10 −12 −5 −10 −15 −1 −1.5 b0 = 6,ν0 = 6 b0=4,ν0 = 9 −2 5 10 15 20 25 30 5 b0 = 2,3,4,5,6,9,10,12,15,18; ν0 = 5 10 15 20 25 b0=9,ν0 =4 30 b0=8,ν0 = 2 b0=2,ν0 =8 −2.5 b0 = 2,3,4,5,6,9,10,12,15,18; ν0 = 6 b0=18,ν0 = 2 −3 −1 −1 −2 5 −3 −2 −4 −3 10 15 20 25 30 5 b = 2,3,4,5,6,9,10,12,15,18; ν = 9 0 10 15 20 25 30 25 30 . b = 2,3,4,5,6,9,10,12,15,18; ν = 10 0 15 20 Support of Spreading function Figure 2: Approximation error for different lattice-eccentricity −5 5 10 0 Approximation error, b0 = ν0 = 6 0 −0.5 −0.5 −0.5 −1 −1 −1 −1.5 −1.5 −1.5 −2 5 10 15 20 25 Support of Spreading function 30 5 10 15 20 25 Support of Spreading function 30 Figure 1: Approximation error for different bandwidth of spreading function and different values of b0 , ν0 . 1 window 2 windows −2.5 −3 −3.5 5 by E = kH̃ − Hk/kHk , the logarithm of which is represented in the next plots. We display here the Fröbenius norm, the plots obtained with the operator norm are almost identical. 5.1 Classical Gabor Multipliers We generate operators with compact support in the spreading domain, in a square of side size between 3 and 61, symmetric about 0. The values are random, the signal length is L = 180. We then investigate the approximation quality for various pairs of lattice constants, with b0 varying between 2 and 18 and ν0 between 2 and 10. The results are presented in Figure 1. Note the two distinct regimes: the error grows exponentially up to a certain value of the support size, depending on the lattice density, and slower thereafter. A possible explanation for this effect, to be further investigated, is the fact, that the error (see the bound in (3)) is comprised of an aliasing error and the inherent inaccuracy of Gabor multiplier approximation, even for very high sampling density, of overspread operators. In order to emphasize the importance of lattice adaptation to eccentricity, we show the results for different lattice constants resulting in the same redundancy (5) in Figure 2. The solid lines show the results for b0 = ν0 = 6, leading to far better results than the lattice constants not adapted to the (symmetric) support of the spreading function. 5.2 Generalized Gabor Multipliers In order to illustrate the influence of additional synthesis windows on the approximation quality, we first consider SAMPTA'09 10 15 20 Support Spreading function 25 30 Figure 3: Spreading function of operator and best approximation with one or two synthesis windows, approximation error for growing support of spreading function. the same operators as in the previous section, but allow for one additional synthesis window. Here, and in the subsequent examples, one window will always be a window centered about 0, as above, with a time-shifted version of the original window as additional window. Hence, only the shift-parameter of the additional window has to be considered. Figure 3 shows the improvement in approximation quality for shift-parameters of the additional window between −5 and 5 (solid), as opposed to the single window approximation. Next, we investigate the following situation: an operator with two effectively disjoint components in the spreading domain is, again, approximated by a multiple Gabor multiplier with 2 synthesis windows. For better comparison, the two components are the component from the previous examples plus a shifted version (by 90 samples) thereof. Figure 4 shows the spreading functions of one of the operators and its best approximation with two synthesis windows, for the optimal additional window. Note the aliasing effect. In this situation, using two appropriate synthesis windows, the obtained results are similar to those in the case of one spreading function component and one synthesis window, as discussed in the previous section. In Figure 5, we display the results for 3 symmetric pairs of lattice constants, the optimal window’s result being represented by the solid line, while the dashed lines show the results of close but suboptimal synthesis windows. As the operator was generated by a translation by 90 samples, the 113 0 −80 −60 −60 −40 −40 −20 −20 0 0 20 20 40 40 60 60 80 80 0 0 0 0 0 −0.2 −0.4 Approximation error −80 −50 b =ν =6 Spreading function approximation, b = ν = 6 Spreading function operator, k = 12 −0.6 −0.8 −1 −1.2 −1.4 −1.6 −50 50 0 50 18 16 Figure 4: Spreading function of operator and best approxima- 14 tion. 12 Support Spreading function Approximation error, b = ν = 4 0 10 0 0 20 10 30 40 50 60 Translation parameter g2 −2 Figure 6: Approximation error for growing support of spread- −4 ing function and various additional synthesis windows. −6 2 4 6 8 10 12 14 16 18 20 22 16 18 20 22 10 12 14 16 Support Spreading function 18 20 22 Approximation error, b0 = ν0 = 6 −1 −2 −3 2 4 6 8 10 12 14 Approximation error, b0 = ν0 = 10 −0.2 −0.4 −0.6 −0.8 −1 −1.2 2 4 6 8 Figure 5: Approximation error for varying support of two com- going from |J| = 1 to larger index sets J involves inverting (generally small) matrices instead of computing a point-wise ratio. Higher redundancy of the Gabor system involved is more expensive in the sense of coefficients. In many cases, using an additional window may be more favorable in improving approximation quality than a denser lattice. Future work on this topic will include systematic numerical experiments as well as the analytical investigation of the approximation quality of generalized and classical Gabor multipliers. Another goal is the development of a method to determine an adapted sampling scheme for the synthesis windows from an operator’s spreading function. 7. Acknowledgments ponents of spreading function and two synthesis windows. tight window, shifted by 90 samples itself, is expected to be the optimal additional window. This is confirmed by the experiments. In a last experiment, the two components in the spreading domain are close and, for growing bandwidth, overlapping. Figure 6 shows, as before, the results of approximation for growing support of both spreading function components, with b0 = ν0 = 6 and various additional synthesis windows. The additional window with shift-parameter 0 is, of course, the original window and yields the approximation result obtained for a single synthesis window. For the optimal window, the result is close to the single window/single component case for the same lattice. 6. Discussion and conclusions The examples given in the previous section show that the choice of various parameters has considerable influence on the performance of approximation by (generalized) Gabor multipliers. While the situation is rather easily understood in the case of classical Gabor multipliers, it is much more intricate in the generalized case. It should be noted that, while yielding better results in the approximation, using a small number of additional synthesis windows does not dramatically increase the computational cost: in (2), SAMPTA'09 The first author was funded by project MA07-025 of WWTF Austria. The second author was partly supported by the CNRS programme PEPS/ST2I MTF&Sons. References: [1] Monika Dörfler and Bruno Torrésani. Representation of operators in the time-frequency domain and generalized Gabor multipliers. arXiv:0809.2698, 2008, to appear in Journal of Fourier Anal. and Appl. [2] Hans G. Feichtinger, Mario Hampejs, and Günther Kracher. Approximation of matrices by Gabor multipliers. IEEE Signal Proc. Letters, 11(11):883– 886, 2004. [3] Hans G. Feichtinger and Thomas Strohmer. Gabor Analysis and Algorithms. Theory and Applications. Birkhäuser, 1998. [4] Hans Georg Feichtinger and Kristof Nowak. A first survey of Gabor multipliers. In H. G. Feichtinger and T. Strohmer, editors, Advances in Gabor Analysis, Boston, 2002. Birkhauser. [5] Werner Kozek. Adaptation of Weyl-Heisenberg frames to underspread environments. In [3], 1998. 114 Operator Identification and Sampling Götz Pfander (1) and David Walnut (2) (1) School of Engineering and Science, Jacobs University Bremen, 28759 Bremen, Germany. (2) Dept. of Mathematical Sciences, George Mason University, Fairfax, VA 22030 USA. g.pfander@iu-bremen.de dwalnut@gmu.edu Abstract: Time–invariant communication channels are usually modelled as convolution with a fixed impulse–response function. As the name suggests, such a channel is completely determined by its action on a unit impulse. Time–varying communication channels are modelled as pseudodifferential operators or superpositions of time and frequency shifts. The function or distribution weighting those time and frequency shifts is referred to as the spreading function of the operator. We consider the question of whether such operators are identifiable, that is, whether they are completely determined by their action on a single function or distribution. It turns out that the answer is dependent on the size of the support of the spreading function, and that when the operators are identifiable, the input can be chosen as a distribution supported on an appropriately chosen grid. These results provide a sampling theory for operators that can be thought of as a generalization of the classical sampling formula for bandlimited functions. 1. Letting ηH (t, ν) = Hf (x) = = ZZ ZZ Z hH (t, x) e−2πiν(x−t) dx gives ηH (t, ν) e2πiν(x−t) f (x − t) dν dt ηH (t, ν) Tt Mν f (x) dν dt. ηH (t, ν) is the spreading function of H. If supp ηH ⊆ [0, a] × [−b/2, b/2] for some a, b > 0 then a is called the maximum time-delay and b the maximum Doppler spread of the channel. Z 2. Letting σH (x, ξ) = hH (t, x) e2πitξ dt gives Hf (x) = Z σH (x, ξ)fb(ξ) e2πixξ dξ. σH (x, ξ) is the Kohn-Nirenberg (KN) symbol of H and we have the relation ZZ ηH (t, ν) = σH (x, ξ) e−2πi(νx−ξt) dx dξ. The function hH (t, x) is referred to as the impulse response of the channel and is interpreted as the response of the channel at time x to a unit impulse at time x − t, that is, originating t time units earlier. If hH (t, x) = hH (t) then the characteristics of the channel are time-invariant and in this case the channel is modelled as a convolution operator. Such channels are identifiable since hH (t) can be recovered as the response of the channel to the input signal δ0 (t), the unit-impulse at t = 0. In other words, the spreading function ηH is the symplectic Fourier transform of the KN symbol of H. In 1963, T. Kailath [3, 4, 5] asserted that for time-variant communication channels to be identifiable it is necessary and sufficient that the maximum time-delay, a, and Doppler shift, b, satisfy ab ≤ 1 and gave an argument for this assertion based on counting degrees of freedom. In the argument, Kailath looks at the response of the channel to a train of impulses separated by at least a time units, so that in this sense the channel is being “sampled” by a succession of evenly-spaced impulse responses. The condition ab ≤ 1 allows for the recovery of sufficiently many samples of hH (t, x) to determine it uniquely. Kailath’s conjecture was given a precise mathematical framework and proved in [6]. The framework is as follows. Choose normed linear spaces D(R) and Y (R) of functions or distributions on R, and a normed linear space of bounded linear operators H ⊂ L(D(R), Y (R)). Each fixed element g ∈ D(R) induces a map Φg : H −→ Y (R), H 7→ Hg. If for some g ∈ D(R), Φg is bounded above and below, that is, there are constants 0 < A ≤ B such that for all H ∈ H, There are two representations of H that will be convenient for our purposes. AkHkH ≤ kHgkY ≤ B kHkH 1. Channel Models and Identification A communications channel is said to be measurable or identifiable if its characteristics can be determined by its action on a single fixed input signal. A general model for linear (time-varying) communication channels is as operators of the form Hf (x) = SAMPTA'09 Z hH (t, x) f (x − t) dt. 115 then we say that H is identifiable with identifier g ∈ D(R). Taking D = S0′ , Y = L2 , and HS = {H ∈ b HS(L2 ) : ηH ∈ S0 (R × R), supp ηH ⊆ S} where 2 b S ⊆ R × R, HS(L ) is the class of Hilbert-Schmidt operators, and S0 is the Feichtinger algebra (defined below), the following was proved in [6]. Theorem 1. If S = [0, a] × [−b/2, b/2] then HS is identifiable if and P only if ab ≤ 1. In this case an identifier is given by g = n δna . 2. Distributional Spreading Functions and Operator Sampling The requirement that ηH ∈ S0 excludes some very natural operators from consideration in this formalism, for example the identity operator (ηH (t, ν) = δ0 (t)δ0 (ν)), convolution operators (ηH (t, ν) = h(t)δ0 (ν) giving Hf = f ∗ h), and multiplication operators, (ηH (t, ν) = δ0 (t)m(ν) b giving Hf = m · f ). A more natural setting for operator identification is the modulation spaces (see [2] for a full treatment of the subject). For convenience we give the definitions below for modulation spaces on R, but all definitions and results can be extended to Rd . For ϕ ∈ S(R) define for f ∈ S ′ (R) the short-time Fourier transform (STFT) of f by Vϕ f (t, ν) = hf, Tt Mν ϕi Z = f (x) e−2πiν(x−t) ϕ(x − t) dx. For 1 ≤ p, q ≤ ∞ define the modulation space M p,q (R) by M p,q (R) = {f ∈ S ′ (R) : Vϕ f ∈ Lp,q (R)}, that is, for which kVϕ kLp,q = µZ µZ p |Vϕ f (t, ν)| dt ¶q/p ¶1/q is finite. The usual modifications are made if p or q = ∞. M p,q is a Banach space with respect to the norm kf kM p,q = kVϕ f kLp,q and different nonzero choices of ϕ ∈ S define equivalent norms. The space M 1,1 is the Feichtinger algebra denoted S0 and M ∞,∞ is its dual S0′ . The space S0′ contains the Dirac impulses δx : f 7→ f (x) for P x ∈ R as well as distributions of the form g = j cj δxj , xj ∈ R and {cj } ⊆ C a bounded sequence. In our next step toward operator sampling we observe that it is possible to take D = S0′ , Y = S0′ , and HS = {H ∈ L(D, Y ) : ηH ∈ S0′ , supp ηH ⊆ S} in the operator identification formalism. Indeed the following theorem was shown in [10]. Theorem 2. The operator class HS (defined above) is identifiable if S = [0, a] × [−b/2, b/2] and ab < 1, and is not identifiable if ab > 1. completely determined by its actions on a fixed input in terms of a norm inequality. The next step is to find an explicit reconstruction formula for the impulse response of the channel operator directly from its response to the identifier. Such formulas illustrate a connection between operator identification and classical sampling theory and lead to a definition of operator sampling. If, in the operator identification formalism described earlier, an operator Pclass H is identified by a distribution of the form g = j cj δxj , then we call {xj } a set of sampling for H and g a sampling function for the operator class H. In the results obtained so far, operator sampling is possible only for operators with compactly supported spreading function, and in order to interpret Theorem 1 in this context we make the following definition. Given a Jordan domain S ⊆ R2 , define the operator Paley-Wiener space OP W 2 (S) by OP W 2 (S) = {H ∈ HS(L2 ) : supp ηH ⊆ S}. OP W 2 is a Banach space with respect to the HilbertSchmidt norm kHkOP W 2 = kηH kL2 . Then Theorem 1 can be extended as follows ([8]). Theorem 3. Let Ω, T, T ′ > 0 with T ′ < T and ΩT < 1. ′ Then OP W 2 ([0, P T ] × [−Ω/2, Ω/2]) is identifiable with identifier g = n δnT and moreover we have the formula X hH (t, x) = r(t) (Hg)(t + kT )ϕ(x − t − kT ) k∈Z unconditionally in L2 (R2 ), where r ∈ S(R) is such that r = 1 on [0, T ′ ] and vanishes outside a sufficiently small neighborhood of [0, T ′ ], and where ϕ ∈ S(R) is such that ϕ b = 1 on [−Ω/2, Ω/2] and vanishes outside a sufficiently small neighborhood of [−Ω/2, Ω/2]. In the more general modulation space setting we can define the operator Paley-Wiener space OP W p,q (S) by OP W p,q (S) = {H ∈ L(S0 , S0′ ) : supp ηH ⊆ S, σH ∈ M pq,11 } where σH (x, ξ) ∈ M pq,11 means that the twodimensional STFT of σH satisfies ¶q/p ¶1/p Z µZ µZ |Vϕ⊗ϕ σH (t1 , t2 , ν1 , ν2 )|p dt1 dt2 dν1 dν2 is finite. Here Vϕ⊗ϕ (t1 , t2 , ν1 , ν2 ) = hf, Tt1 Mν1 ϕ ⊗ Tt2 Mν2 ϕi. 3. A Theory of Operator Sampling OP W p,q is a Banach space with respect to the norm kHkOP W p,q = kσH kM pq,11 . In this case, Theorem 3 generalizes as follows ([8]). Theorem 4. Let 1 ≤ p, q ≤ ∞, Ω, T, T ′ > 0 with T ′ < ′ T and ΩT < 1. Then OP W p,q ([0, P T ] × [−Ω/2, Ω/2]) is identifiable with identifier g = n δnT and moreover we have the formula X hH (t, x) = r(t) (Hg)(t + kT )ϕ(x − t − kT ) In discussing identifiability of operators in various settings, we have been content to show that an operator is unconditionally in M 1p,q1 (R2 ) and in the weak-* sense if p or q = ∞, where r and ϕ are as in Theorem 3. SAMPTA'09 k∈Z 116 Example 1. If we take H to be ordinary convolution by hH (t), this means that hH (t, x) depends only on t, that is, hH (t, x) = hH (t). In this case H can be identified in principle by g = δ0 , the unit impulse, since Hg(x) = hH (x). Translating this into our operator sampling formalism results in something slightly different. Assume that h ∈ M 1,q is supported in the interval [0, T ′ ] and that T > T ′ , and Ω > 0 are chosen so that ΩT < 1. In this case, ηH (t, ν) = h(t) δ0 (ν) and σH (x, ξ) = b h(ξ). Therefore σH ∈ M ∞q,11 and H ∈ OP W ∞,q ([0, T ′ ] × {0}). P If g = n δnT then Hg is simply the T –periodized impulse response h(t), and it follows that X r(t) (Hg)(t + kT )ϕ(x − t − kT ) k∈Z = r(t) h(t) X ϕ(x − t − kT ) = h(t) area of the support of the spreading function. It is notable that Kailath also asserted something along these lines. This means that a time-variant channel whose spreading function has essentially arbitrary support is identifiable as long as the area of that support is smaller than one. Using ideas from [6], Bello’s conjecture was proved in [9]. Theorem 5. HS is identifiable if vol+ (S) < 1, and not identifiable if vol− (S) > 1. Here vol+ (S) is the outer Jordan content and vol− (S) the inner Jordan content of S. P In this case, the channel is identified by g = n cn δn/L where L ∈ N and the L–periodic sequence {cn } is chosen based on the geometry of S. We next present a generalization of Theorem 4 to this case. Before stating the result, a few preliminaries are required. Definition 1. Given L ∈ N, let ω = e−2πi/L and define the translation operator T on (x0 , . . . , xL−1 ) ∈ CL by k∈Z since r(t) = 1 on [0, T ′ ] and P vanishes outside a neighborhood of [0, T ′ ] and since k ϕ(x − t − kT ) = 1 by the Poisson Summation Formula and in consideration of the support b Indeed the theorem says that the P constraints on ϕ. sum k ϕ(x − t − kT ) converges to 1 in the M ∞,1 norm and in particular uniformly on compact sets. Example 2. If we take H to be multiplication by some fixed function m ∈ M p,1 with supp m b ⊆ [−Ω/2, Ω/2] then ηH (t, ν) = δ0 (t)m(ν), b h(t, x) = δ0 (t) m(x − t), and σH (x, ξ) = m(x). Therefore σH ∈ M p∞,11 and H ∈ OPP W p,∞ ({0} × [−Ω/2, Ω/2]). If g = n δnT , with PT > 0 chosen small enough that ΩT < 1, then Hg = n m(nT ) δnT , and it follows from Theorem 4 that δ0 (t) m(x − t) X = r(t) (Hg)(t + kT )ϕ(x − t − kT ) k∈Z = = r(t) X XX m(nT ) δ(n−k)T (t)ϕ(x − t − kT ) k∈Z n∈Z m(nT ) ϕ(x − nT ) n∈Z by support considerations on the function r(t). Therefore we have the summation formula X m(nT ) ϕ(x − nT ) m(x) = n where the sum converges unconditionally in M p,1 if 1 ≤ p < ∞ and weak-* if p = ∞, and moreover there are constants 0 < A ≤ B such that for all such f , Akf kM p,1 ≤ k{f (nT )}kℓp ≤ Bkf kM p,1 . Taking p = 2, this recovers the classical sampling formula when the sampling is above the Nyquist rate. 4. Spreading functions with nonrectangular support and Bello’s conjecture In 1969, P. A. Bello [1] argued that what is important for channel identification is not the product ab of the maximum time-delay and Doppler shift of the channel but the SAMPTA'09 T x = (xL−1 , x0 , x1 , . . . , xL−2 ), and the modulation operator M on CL by M x = (ω 0 x0 , ω 1 x1 , . . . , ω L−1 xL−1 ). Given a vector c ∈ CL the finite Gabor system with window c is the collection {T q M p c}L−1 q,p=0 . Note that the discrete Gabor system defined above consists of L2 vectors in CL so is necessarily overcomplete. Definition/Proposition 2. The Zak Transform is defined X for f ∈ S(R) by Zf (t, ν) = f (t − n) e2πinν . n Zf (t, ν) satisfies the quasi-periodicity relations Zf (t + 1, ν) = e2πiν Zf (t, ν) and Zf (t, ν + 1) = Zf (t, ν). Z can be extended to a unitary operator from L2 (R) onto L2 ([0, 1]2 ). If the spreading function of H, ηH (t, ν), is supported in b with vol+ (S) < a bounded Jordan region S ⊆ R × R 1, then by appropriately shifting and scaling ηH we can assume without loss of generality that for some L ∈ N, S ⊆ [0, 1] × [0, L] and that S meets at most L of the L2 rectangles Rq,m = ([0, 1/L] × [0, 1]) + (q/L, m), 0 ≤ q, m < L whose union is [0, 1] × [0, L]. We can further assume that S does not meet any of the rectangles Rq,m on the “edge” of the larger rectangle, specifically it does not meet Rq,m with q = 0, m = 0, q = L − 1 or m = L − 1. The P following Lemma connects the output Hg(x) where g = n cn δn/L to the spreading function ηH (t, ν). From this a reconstruction formula analogous to that in Theorem 4 can be derived. Lemma 1. Given a period-L sequence (cn ) and g = P c δ n n n/L , then for (t, ν) in a sufficiently small neighborhood of [0, 1/L] × [0, 1], e−2πiνp/L (Z ◦ H)g(t + p/L, ν) = L−1 X X L−1 (T q M m c)p e−2πiνq/L ηH (t + q/L, ν + m). q=0 m=0 In other words, the spreading function can be realized as coefficients on the vectors of a finite Gabor system. The system is in general underdetermined since there are L 117 equations and L2 unknowns. If, however, the support set S of the spreading function ηH (t, ν) satisfies vol+ (S) < 1 and since S meets at most L of the rectangles Rq,m , there are at most L nonzero unknowns in the above linear system. If the resulting L × L matrix is invertible, then ηH can be determined uniquely from Hg. The vector c must be chosen so that this matrix is invertible. It is shown in [7] that if L is prime then such a c always exists. We can prove the following theorem (cf. [8], [9]). Theorem 6. Let 1 ≤ p, q ≤ ∞. If vol− (S) > 1 then OP W p,q (S) is not identifiable. If vol+ (S) < 1 then OP W p,q (S) is identifiable via P operator sampling, and the identifier is of the form g = n cn δn/L where L ∈ N and (cn ) is an appropriately chosen period-L sequence. Moreover, we have the formula hH (t, x) = L−1 X j=0 rj (t) X vector spaces. J. Fourier Anal. Appl., 11(6):715–726, 2005. [8] G. Pfander and D. Walnut. On the sampling of functions and operators with an application to Multiple– Input Multiple–Output channel identification. In Manos Papadakis Dimitri Van De Ville, Vivek K. Goyal, editor, Proc. SPIE Vol. 6701, Wavelets XII, pages 67010T–1 – 67010T–14, 2007. [9] G.E. Pfander and D. Walnut. Measurement of time–variant channels. IEEE Trans. Inform. Theory, 52(11):4808–4820. [10] G.E. Pfander and D. Walnut. Operator identifcation and Feichtinger’s algebra. Sampl. Theory Signal Image Process., 5(2):151–168, 2006. bj,k (Hg)(t − qj /L + k/L) k∈Z × ϕj (x − t − qj /L − k/L) unconditionally in M 1p,q1 and in the weak-* sense if p = ∞ or q = ∞. For 0 ≤ j < L, the rectangles Rqj ,mj are precisely those that meet S. Also for each 0 ≤ j < L, rj (t)ϕ bj (ν) = 1 on Rqj ,mj and vanishes outside a small neighborhood of Rqj ,mj , and bj,k is a period-L sequence in k based on the inverse of the matrix derived from the discrete Gabor system that appears in Lemma 1. 5. Conclusion This paper contains a brief overview of some recent results on the measurement and identification of communication channels and the relation of these results to sampling theory. These connections provide explicit reconstruction formulas for identification of operators modelling timevariant linear channels. References: [1] P.A. Bello. Measurement of random time-variant linear channels. 15:469–475, 1969. [2] K. Gröchenig. Foundations of Time-Frequency Analysis. Applied and Numerical Harmonic Analysis. Birkhäuser, Boston, MA, 2001. [3] T. Kailath. Sampling models for linear time-variant filters. Technical Report 352, Massachusetts Institute of Technology, Research Laboratory of Electronics, 1959. [4] T. Kailath. Measurements on time–variant communication channels. 8(5):229–236, Sept. 1962. [5] T. Kailath. Time–variant communication channels. IEEE Trans. Inform. Theory: Inform. Theory. Progress Report 1960–1963, pages 233–237, Oct. 1963. [6] W. Kozek and G.E. Pfander. Identification of operators with bandlimited symbols. SIAM J. Math. Anal., 37(3):867–888, 2006. [7] J. Lawrence, G.E. Pfander, and D. Walnut. Linear independence of Gabor systems in finite dimensional SAMPTA'09 118 Special session on Sampling and Industrial Applications Chair: Laurent FESQUET SAMPTA'09 119 SAMPTA'09 120 An Event-Based PID Controller With Low Computational Cost Sylvain Durand and Nicolas Marchand NeCS Project-Team, INRIA - GIPSA-lab - CNRS, Grenoble, France. sylvain.durand@inrialpes.fr, nicolas.marchand@gipsa-lab.inpg.fr Abstract: In this paper, some improvements of event-based PID controllers are proposed. These controllers, contrary to a time-triggered one which calculates the control signal at each sampling time, calculate the new control signal only when the measurement signal sufficiently changes. The contribution of this paper is a low computational cost scheme thanks to a minimum sampling interval condition. Moreover, we propose to reduce much more the error margin during the steady state intervals by adding some extra samples just after transients. A cruise control mechanism is used for simulations and a noticeable reduction of the mean control computation cost is finally achieved with similar closed-loop performances to the conventional time-triggered ones. 1. Introduction The classical so-called discrete time framework of controlled systems consists in sampling the system uniformly in the time with some constant sampling period hnom and in computing and updating the control law every time instants t = khnom . This field, denoted time-triggered (or synchronous in sense that all the signal measurements are synchronous), has been widely investigated [6] even in the case of sampling jitter or measure loss that can be seen as some asynchronicity. However, some works addressed more recently event-based sampling where the sampling intervals are event-triggered (also called asynchronous), as for example when the output crosses a certain level. Thus the term sampling period denotes a time interval between two consecutive level crossings and the sampling periods are hence not equidistant in time anymore. Event-triggered notion is taking more and more importance in the signal processing community with now various publications on this subject (see for instance [1] and the references therein). In the control community, very few works have been done. In [3], it is proved that such an approach reduces the number of sampling instants for the same final performance. In [8], it is shown that controlling an asynchronous sampled system or a continuous time system with quantized measurements and a constant control law over sampling periods are equivalent problems. Many reasons are motivating event-based systems and in particular because more and more asynchronous systems or systems with asynchronous needs are encountered. Ac- SAMPTA'09 tually, the demand of low power electronic components in all embedded applications encourages companies to develop asynchronous versions of the existing time-triggered components, where a significant power consumption reduction can be achieved by decreasing the samplings and consequently the CPU utilization: about four times less power than its synchronous counterpart for the 80C51 microcontroller of Philips Semiconductors in [12]. Note that the sensors and the actuators based on level crossing events also exist, rendering a complete asynchronous control loop now possible. But the most important contributions come from the real-time control community. Indeed, real-time synchronous control tasks are often considered as hard tasks in term of time synchronization, requiring strong real time constraints. Efforts are so carried on the co-design between the controller and the task scheduler in order to soften these constraints. The adopted approach is often either to change dynamically the sampling period related to the load [10, 11] or to use event-driven control where the events are generated with a mix of level crossings and a maximal sampling period [9, 2]. This maximal sampling period seems to be added for stability reasons in order to fulfill the condition of NyquistShannon sampling theorem: a new control signal is performed when the time elapsed since the last sample exceeds a certain limit. We first proposed in [7] to remove it because, thanks to the level detection, the NyquistShannon sampling condition is no more consistent. The CPU cost is hence considerably reduced without performance loss. We now focus on the improvement of eventbased control by reducing even more the computational cost with a controller based on a fully asynchronous level detection. The next two sections recall the conventional time-triggered structure and the existing event-based algorithms. The main contribution is developed in section 4 where an event-driven controller with low computational cost is detailed. All controllers are finally compared (in terms of performances and CPU needs) in section 5. Notations: e− will denote the value of e at the last sampling time. 2. Time-Based Control The textbook PID controller is given as follows:   1 E(s) + Td sE(s) U (s) = K E(s) + Ti s 121 This equation can be divided into a proportional, an integral and a derivative parts, i.e. Up , Ui and Ud respectively, which are then modified to improve performances [4]. First, set point weighting is applied on Up and Ud for a more flexible structure, giving the PID two dimensions of freedom. Moreover, a low-pass filter is added in the derivative term to avoid problems with high frequency measurement noise. Up (s) = K (βYsp (s) − Y (s)) K E(s) Ui (s) = Ti s KTd s Ud (s) = (γYsp (s) − Y (s)) 1 + Td s/N A discrete time controller is finally obtained: the proportional part is straightforward and the backward difference approximation is used for integral and derivative parts. 3. Event-Based Control The basic setup of an event-based PID controller, introduced in [2], consists of two parts: a time-triggered event detector used for level crossings and an event-triggered PID controller which calculates the control signal. The first part runs with the sampling period hnom (that is the same as for the corresponding conventional time-triggered PID) whereas the second part runs with the sampling interval hact which depends on the requests sent by the event detector when a new control signal has to be calculated. This is required either when the relative error between the measured signal and the desired one crosses a certain level, i.e. abs(e − e− ) > elim , or if the maximal sampling period is achieved, i.e. hact ≥ hmax . We proposed in [7] to remove this maximal sampling period underlying a primordial fact in asynchronous control that is that the Nyquist-Shannon sampling condition is no more consistent thanks to the level detection. However, the integral part, i.e. ui = u− i + K/Ti · hact · e, leads to important overshoots after the steady states since the sampling period hact becomes huge due to the absence of event. In fact, this time interval between the last sample before the steady state and the first sample of the transient can be divided into a “real” steady state interval which is equal to hact −hnom , plus the detection time period hnom . During the first part the error is very small (lower than elim else the steady state is not achieved) and so is the product he (lower than (hact − hnom ) elim ). As regards the second part, when the set point changes the error becomes large but only during the event detection and therefore the product is hnom e. From this observation, several control algorithms were proposed in [7] and we will use the hybrid one which gives good performances with the minimum of samplings. The hybrid algorithm is a mix between i) a controller with a saturation of he which is bounded in (hact − hnom ) · elim + hnom · e when hact ≥ hmax and ii) a controller with an exponential forgetting factor of hact to decrease its impact after a long steady state interval, with hiact = hact · exp (hnom − hact ) corresponding to the new sampling period used in the integral part. This mix leads to SAMPTA'09 bound the exponential forgetting factor: if hact ≥ hmax  he = hiact − hnom · elim + hnom · e else he = hact · e end ui = u− i + K/Ti · he (1) A first improvement could be obtained by changing the level crossing detection since only one level is really required. Indeed, the control signal needs to be calculated when the measurement is too far from the set point, i.e. as soon as abs(e) > elim . Of course, with this method the number of samples increases during the transients but, at least, the error between the system and the set point is now sure to be lower than elim during the steady state intervals, which was not the case before with the level detection of the relative error abs(e − e− ) > elim . A second improvement could be done on the timetriggered event detector which is currently a discrete time system: an event could only be detected at the time instants t = khnom thereof several levels could miss if they appear between two sampling instants. Thus we propose to use a continuous time event detector which is in fact closer to the real case since a sensor based on level crossing events will send a request as soon as a level is crossed. Afterwards, the hybrid controller with these improvements is called the asynchronous event-based controller. 4. Event-Based Control with Low Computational Cost The asynchronous event-based controller is interesting but the number of samples is still important during transients. Indeed, a new request is sent as soon as the error is upper than the detection limit, i.e. abs(e) > elim , which means (quasi)-continuously during the whole transient. To avoid that, we propose to add a minimum sampling interval condition to lighten the transients in order that a new control signal is performed only if a certain time was elapsed since the last sample, i.e. hact > hmin . This minimum sampling interval could be chosen as the discrete sampling period hnom corresponding to the conventional time-triggered controller or not, but it does have to satisfy the Nyquist-Shannon sampling condition. The choice hmin = hnom leads to a discrete-time event detector when the dynamics is important and to a continuous-time event detector when the dynamics is slow (quasi-steady state). Thus, when an event occurs after a steady state configuration, a new control signal is instantaneously computed. Whatever that may be the hmin value, an important reduction of the computational cost is achieved. Nevertheless, we propose to improve the event-based scheme again by adding a few number of samples more. The idea here is to decrease much more the error during the steady state intervals. Currently, one could assure that the error is lower than the limit elim but cannot know how much lower. Moreover, one could not know if the measured signal is going closer or moving away from the set point. 122 Therefore, we propose to add some extra samples after a transient while an event-based controller would do not do anything because the condition abs(e) > elim is wrong. Thus, an extra event is sent to the controller if nothing appends after the last time a control signal was calculated plus a certain sampling interval hextra . Then, this is repeated while the error is upper than a desired minimum level emin . One only needs to define his desired error margin and some extra samples will be added to achieve that. Note that the lower emin is chosen the higher the number of extra samples will be. 5. Simulation Results: Application to a Cruise Control Mechanism Event-based controller is a good solution, more especially for all the systems which do not need to be constantly controlled. We chose to illustrate our proposals with the cruise control mechanism depicted in [5] because the desired speed of the car is constant most of the time and a new control signal is so only required when the set point changes or when the load (i.e. the slope of the road) varies. The simulations run during 50s with the following test bench: at time 0 the set point is set to 25m/s (90km/h), then at time 2s it is changed to 30.6m/s (110km/h) and changed again to 36.1m/s (130km/h) at time 30s. The gear ratio is chosen accordingly to the speed range, i.e. n = 5, and no disturbance is applied, i.e. θ = 0. The first simulation results are shown on Figure 1 where the conventional time-triggered PI controller is compared to the asynchronous event-based one (see section 3). The top plot shows the set point and both measured signals, the bottom plot shows the sampling intervals (i.e. this signal changes each time the controller calculates a new control signal). The asynchronous event-based controller permits to obtain a system response as quick as the time-triggered one, by calculating a control signal about four time less only (with this benchmark). However, the number of samples remains important during the transients. Our proposal, i.e. the event-based PI controller with a low computational cost, avoids that since the number of samples is dropped by a ratio of 30, as shown on Figure 2. The equation of motion of the car (ν is the velocity) is: mν̇ = F − Fd The force F is generated by the engine, whose torque is proportional to a control signal 0 ≤ u ≤ 1 that controls the throttle position and depends on engine velocity too.  2 ! αn ν −1 F = αn uTm 1 − β ωm where αn depends on the gear ratio n. The disturbance force Fd has three major components due to the gravity Fg , to the rolling friction Fr and to the aerodynamic drag Fa . Figure 1: A conventional time-triggered PI controller (15000 sampling intervals) vs. the asynchronous eventbased one (3703 sampling intervals, that is 24.7%). Fd = Fg + Fr + Fa with Fg = mgsin(θ) Fr = mgCr sgn(ν) Fa = 12 ρCd Aν 2 where θ is the slope of the road, i.e. the disturbance. As regards the control law, an anti-windup mechanism is added to consider the saturation of the control signal u. Thus the integral part consists on the integral of the error plus a reset based on the saturation of the actuator (in order to prevent windup when the actuator is saturated). ui = u − i + K hact x− (u − usat ) Ti Ta where x = hact · e for the time-triggered controller and x = he defined by (1) for the event-based controllers. Parameter values are K = 0.8, Ti = 1.4 and Ta = 0.7. The nominal and maximal sampling intervals used for the hybrid algorithm are hnom = 0.1s and hmax = 0.5s and those used for the low computational cost and the extra samples ones are hmin = 0.1s and hextra = 0.5s. The detection levels are elim = 0.1 and emin = 0.01 for crossing events and for extra samples respectively. SAMPTA'09 Figure 2: The asynchronous event-based PI controller (3703 sampling intervals) vs. the one with a low computational cost (126 sampling intervals, that is 3.4%). Whatever the achieved gain with the low computational cost controller, we propose to improve the error during the steady state intervals by adding some samples just after the transients. Results are shown on Figure 3 where one could see that, by adding extra samples, the sampling number is 123 finally reduced and the steady state intervals are not oscillating anymore. These are thanks to a measurement signal closer to the set point during the steady state intervals. for more general types of control. 7. Acknowledgments This research has been supported by the GIPSA-lab, CNRS and INRIA in the FeedNetBack project context. The project aims to close the control loop over wireless networks by applying a co-design framework that allows the integration of communication, control, computation and energy management aspects in a holistic way. References: Figure 3: The asynchronous event-based PI controller with a low computational cost (126 sampling intervals) vs. the one with extra samples (120 sampling intervals). Finally the integral of the norm of the error are compared for the whole controllers to verify if the responses are not too far from the conventional time-triggered one. All measurements on Figure 4 have a similar behavior with some differences during the steady state intervals because of the allowed error margin elim . The final values are 74.67 for the reference, 78.2 for the asynchronous event-based controller, 78.63 for the low computational cost one and 77.12 for the extra samples one. Moreover, as regards the last one, it is possible to be much more closer to the time-based value by reducing the minimum value emin . Figure 4: Integral of the norm of the error. 6. Conclusions and Future Works In this paper we propose to improve the event-based PID controllers depicted in [2] and [7]. The first improvement consists on a minimum sampling interval condition used to decrease the number of samples during the transients. The second one comes from the wishing to reduce much more the error margin during the steady state intervals. Based on these ideas, event-based PID controllers with low computational cost and with extra samples are proposed. A cruise control mechanism is used to compare them (in simulation) with the conventional time-triggered and with the classical event-based controllers. Both proposals clearly give good performances with a minimum of sampling intervals and the controller with extra samples permits to reduce the error margin as low as desired to achieve a response very closed to the conventional one. Next steps in this research is naturally to test these controllers in practice and develop other event-based methods SAMPTA'09 [1] F. Aeschlimann, E. Allier, L. Fesquet, and M. Renaudin. Asynchronous FIR filters: towards a new digital processing chain. In Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems, pages 198–206, 2004. [2] K-E Årzén. A simple event-based PID controller. In Preprints of the 14th World Congress of IFAC, 1999. [3] K.J. Åström and B. Bernhardsson. Comparison of Riemann and Lebesque sampling for first order stochastic systems. In Proceedings of the 41st IEEE Conference on Decision and Control, 2002. [4] K.J. Åström and T. Hägglund. PID controllers: theory, design, and tuning, 2nd Edition. The Instrumentation, Systems, and Automation Society, 1995. [5] K.J. Åström and R.M. Murray. Feedback Systems: An Introduction for Scientists and Engineers. Princeton University Press, 2008. [6] K.J. Åström and B. Wittenmark. Computer Controlled Systems, 3rd Edition. Prentice Hall, 1997. [7] S. Durand and N. Marchand. Further results on event-based PID controller. In Proceedings of the European Control Conference, 2009. [8] N. Marchand. Stabilization of Lebesgue sampled systems with bounded controls: the chain of integrators case. In Proceedings of the 17th IFAC World Congress, 2008. [9] J.H. Sandee, W. Heemels, and P.P.J. van den Bosch. Event-driven control as an opportunity in the multidisciplinary development of embedded controllers. In Proceedings of American Control Conference, pages 1776–1781, 2005. [10] O. Sename, D. Simon, and D. Robert. Feedback scheduling for real-time control of systems with communication delays. In Proceedings of the IEEE Conference on Emerging Technologies and Factory Automation, volume 2, 2003. [11] D. Simon, D. Robert, and O. Sename. Robust control/scheduling co-design: application to robot control. In Proceedings of the IEEE Symposium on RealTime and Embedded Technology and Applications, pages 118–127, 2005. [12] H. van Gageldonk, K. van Berkel, A. Peeters, D. Baumann, D. Gloor, and G. Stegmann. An asynchronous low-power 80C51 microcontroller. In Proceedings of the 4th International Sympsonium on Advanced Research in Asynchronous Circuits and Systems, pages 96–107, 1998. 124 A coherent sampling-based method for estimating the jitter used as entropy source for True Random Number Generators Boyan Valtchanov, Viktor Fischer, Alain Aubert Laboratoire Hubert Curien UMR CNRS 5516, Bât. F 18 Rue du Professeur Benoît Lauras , 42000 Saint Etienne, France. {boyan.valtchanov,fischer,alain.aubert}@univ-st-etienne.fr This paper was partially supported by the Rhône-Alpes Region and Saint-Etienne Métropole, France Abstract: The paper presents a method, which can be employed to measure the timing jitter present in periodic clock signals that are used as entropy source in true random number generators aimed at cryptographic applications in reconfigurable hardware. The method uses the principle of a coherent sampling and can be easily implemented inside the chip in order to test online the jitter source. The method was carefully validated in various simulations that have shown that the measured jitter size corresponds perfectly to that of the jitter injected to the model. While the primary aim of the proposed measuring technique was the evaluation of the quality of jitter as an entropy source in random number generators, we believe that the same principle can be used in order to characterize the jitter in fast communication links as well. 1. Introduction In the global communication era, more and more recent industrial applications need to secure data and communications. Many cryptographic primitives and protocols that are used to ensure confidentiality, integrity and authenticity use random number generators in order to generate confidential keys, initial vectors, nonces, padding values, etc. While random bit-stream generators can be easily implemented in analog or mixed-signal devices, the generation of random bit-streams is a challenging task when the generator should be implemented in a logic device like FPGAs (Field-Programmable Gate Arrays). Clearly, logic devices are well suited for algorithmic (pseudo) random number generators, but the true-random number generators need sources of randomness that are difficult to find and explore in logic devices. A mathematical model of the true random number generator (TRNG) is also a crucial element of the cryptographic application design since the final entropy of the generated random bit-stream could be characterized and thus certified if one is able to characterize the physical phenomenon that is used as the entropy source. If the model does not exist, there would be no guarantee that the final entropy of the output stream is true-random, pseudo-random or perhaps a mixture of random and pseudo random phenomena. Characterizing SAMPTA'09 and monitor the entropy source (the jitter) and proposing a mathematical model is the main motivation of the paper. 2. Jitter as an entropy source for TRNGs Many of the TRNGs known up to date [1], [4], [5], use the jitter present in clock signals (generated using ring oscillators, phase-locked loops or delay-locked loops) as a source of entropy. The quality of the generated random bits is related to the parameters of the clock jitter. In order to avoid jitter manipulations and attacks, it is important to measure these parameters on-line and, if possible, inside the device. The jitter can be defined as a short-term variation of an event from its ideal position [6]. In general, it is expressed as the variation in time of the zero crossing (rising or falling edge) of the clock signal. The jitter can be a good candidate for randomness generation, since its behavior is closely related to the thermal noise inside semiconductors [2]. The advantage of the thermal noise employed as a source of randomness is that it is relatively difficult to manipulate it in order to realize an attack on the TRNG. The method presented in this paper considers only a truerandom (Gaussian) jitter component and it does not take into account the deterministic behavior of the jitter at this stage of our research. For a deeper understanding of the jitter behavior we recommend to read [9]. 3. Principle and theoretical background Tro1 Tbeat D Counter Tro2 or Tvco Figure 1: Random jitter component measurement based on the coherent sampling. The proposed method allows to accurately quantify the random component of the jitter present in clock signals generated inside logic devices. Although the technique can be used to measure the jitter, it has been developed not for measurement or testing purposes, but rather for modeling a TRNG that uses the jitter as a source of randomness. 125 3.1 Figure 2: Principle of the coherent sampling. Measurement of the true-random jitter component Let us assume that the two clock signals are derived from two internal ring oscillators, and let Tro1Ideal and Tro2Ideal be the two ideal jitter-free periods. We need to achieve a small time period difference between Tro1Ideal and Tro2Ideal , namely: Tro2Ideal = Tro1Ideal + ∆Ideal . (1) This difference comes from the fact that even with the same number of delay elements the two ring oscillators differs due to process variations during manufacturing. With a careful placement, one can obtain ∆ of several tens of picoseconds. However the ∆ wont be reproducible from one chip to another. If a random jitter would be included in the previous equations, we obtain: Figure 3: Experimental TBeat signal example. The proposed measurement technique (see Figure 1) is based on a coherent sampling: the sampling of a periodic signal by another periodic signal featuring similar frequency [3]. The signal on the output of the sampler is called a beat signal and it is a low-frequency signal depending on the frequency difference ∆ between the two clock signals Tro1 and Tro2 . Figure 2 shows the principle of the coherent sampling using two (clock) signals having similar frequencies and the resulting beat signal TBeat , representing the image of Tro1 . An example of this TBeat signal captured on oscilloscope is given in Figure 3. Using the infinite persistence of the oscilloscope, we can clearly see the variations of the period of the beat signal. These variations are the consequence of the jitter present in Tro1 and Tro2 signals. Because of the coherent relationship between the two frequencies, each ”half-period” of the beat signal is an integer number of the clock period Tro2 . A counter clocked with this clock signal can thus be used in order to represent these variations. In the next section, we will discuss how we can compute the jitter present in Tro1 by observing the variations in a population of several TBeat periods. If the proposed technique would be used to measure precisely the jitter of the internal clock signal, one should use an accurate external low phase-noise VCO (Voltage Controlled Oscillator) as a sampling clock and accurately tune its period in relationship to the internal clock period in order to obtain a small ∆. Instead, in order to model the TRNG behavior and to measure the jitter inside the device, we have used two ring oscillators, implemented in the same FPGA. Both oscillators have the same number of inverters. In order to guarantee a small difference between clock periods (∆), the placement and routing have to be done manually. The final period difference is thus caused mainly by the different delays of the routing scheme selected by the placement and routing tool. Next, we will analyze the case, when only random (Gaussian) jitter component is present in the generated clock signals. SAMPTA'09 Tro1 = Tro1Ideal + N (0, σ1 ) = N (Tro1Ideal , σ1 ) (2) Tro2 = Tro2Ideal + N (0, σ2 ) = N (Tro2Ideal , σ2 ) (3) Where N (0, σ) denote a zero-mean Normal distribution with standard deviation σ. We can then express the difference ∆ by: ∆ = N (Tro2Ideal , σ2 ) − N (Tro1Ideal , σ1 ) (4) q ∆ = N (∆Ideal , σ12 + σ22 ) (5) If σ1 is the same as σ2 , what is the case when the two signals are derived from internal ring oscillators, we get √ (6) ∆ = N (∆Ideal , 2σ) Otherwise one should make precise characterization of the VCO used to match the frequencies in order to measure the σV CO . According to [8], we can express the length of TBeat as: s Tro1Ideal q 2 Tro1Ideal TBeat σ1 + σ22 ) (7) = N( , ∆Ideal ∆Ideal ∆Ideal which, if σ1 equals σ2 , simplifies to: s Tro1Ideal Tro1Ideal √ TBeat 2σ) = N( , ∆Ideal ∆ ∆Ideal (8) The length of the resulting beat signal, TBeat can be then expressed as a normal process: TBeat = N (µTBeat , σTBeat ) ∆Ideal with the mean and standard deviation: r Tro1Ideal TRoIdeal √ 2σ , σTBeat = µTBeat = ∆ ∆Ideal (9) (10) In consequence, if we measure the µTB eat and σTBeat using the principle presented in Figure 1, which is based on 126 the use of an 8-bit counter, we can precisely calculate the amount of the random jitter, expressed in 1σ ps, i.e. the RMS jitter (root mean square) present in the two clock signals using equation (11). ∆Ideal σT σ = q Beat TRoIdeal √ 2 ∆Ideal (11) 4. Simulation results In order to validate equation (11), we have used a simulation model presented in [8] and depicted in Figure 4. The random jitter is generated in text files using Matlab and then injected in VHDL simulation using the Textio package. We have injected different amounts of random jitter (RMS) to the clock signals and analyzed the obtained values of the counter. The Tro1Ideal was set to 5 ns (200Mhz) and ∆ to 40 ps. The results of the simulations and recalculated jitter values using equation 11 are presented in Table 4. As it can be seen, the measurement precision that can be achieved is close to 1 ps RMS. Figure 5 present the case for 7 ps RMS jitter present in both Tro1 and Tro2 signals. Figure 4: Simulation setup. Different Counter Values of TBeat Period 135 Histogram of TBeat: Mean=121.9694 TBeat Std dev=2.8127 1200 1000 130 800 125 600 120 400 115 110 200 0 2000 4000 6000 8000 0 115 120 125 130 Figure 5: Histogram of the simulated TBeat . Injected 1σ RMS jitter [ps] 10 9 8 7 6 Measured µT beat 121.93 121.98 121.97 121.97 121.98 Measured σT beat 4.03 3.64 3.24 2.81 2.47 Calculated 1σ RMS [ps] 10.19 9.20 8.19 7.10 6.24 Table 1: Simulation results of the random jitter quantification. 5. Discussion and conclusions We have proposed a jitter measurement technique that can be embedded in FPGA devices for evaluating and monitoring of the source of randomness employed in true random SAMPTA'09 number generators. The measurement technique can be used as well to characterize the jitter present in high-speed clock signals, if an external VCO (Voltage Controlled Oscillator) is used. The use of an external and precise clock source is necessary in order to closely match the period of the signal under test to the period of the reference clock signal. We have shown by simulation that the measurement error of the proposed method is less than 1 ps RMS of the random component of the jitter. However, in real world situations and especially inside FPGAs, the jitter can exhibit a non negligible deterministic component due to various factors (power supply variations, cross-talks, R-F interference, etc...). In this case, equation (11) cannot be used for random component jitter quantification and the deterministic jitter has to be considered, too. However, we believe that it is possible to integrate this deterministic behavior of the jitter in the proposed model. This integration is the objective of our current research. References: [1] V. Fischer, M. Drutarovsky, M. Simka, and N. Bochard. High performance True Random Number Generator in Altera Stratix FPLDs. Lecture notes in computer science, FPL’04, pages 555–564, 2004. [2] A. Hajimiri and TH Lee. A general theory of phase noise in electrical oscillators. Solid-State Circuits, IEEE Journal of, 33(2):179–194, 1998. [3] J.L. Huang and K.T. Cheng. An On-Chip Short-Time Interval Measurement Technique for Testing HighSpeed Communication Links. Proceedings of the 19th IEEE VLSI Test Symposium, page 380, 2001. [4] P. Kohlbrenner and K. Gaj. An embedded true random number generator for FPGAs. Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays, pages 71–78. [5] B. Sunar, W.J. Martin, and D.R. Stinson. A Provably Secure True Random Number Generator with Built-In Tolerance to Active Attacks. IEEE TRANSACTIONS ON COMPUTERS, pages 109–119, 2007. [6] T. Technologies. Synchronous Optical Network (SONET) Transport Systems: Common Generic Criteria. Technical report, GR-253-CORE, 2000. [7] K.H. Tsoi, K.H. Leung, and P.H.W. Leong. Compact FPGA-based true and pseudo random number generators. Field-Programmable Custom Computing Machines, 2003. FCCM 2003. 11th Annual IEEE Symposium on, pages 51–61, 2003. [8] B. Valtchanov, A. Aubert, F. Bernard, and V. Fischer. Modeling and observing the jitter in ring oscillators implemented in FPGAs. In Design and Diagnostics of Electronic Circuits and Systems, 2008. DDECS 2008. 11th IEEE Workshop on, pages 1–6, 2008. [9] SW Wedge. Predicting random jitter-Exploring the current simulation techniques for predicting the noise in oscillator, clock, and timing circuits. Circuits and Devices Magazine, IEEE, 22(6):31–38, 2006. 127 SAMPTA'09 128 Orthogonal Exponential Spline Pulses with Application to Impulse Radio Masaru Kamada(1) , Semih Özlem(2) and Hiromasa Habuchi(1) (1) Ibaraki University, Hitachi, Ibaraki 316-8511, Japan. (2) Bogazici University, Bebek, Istanbul, Turkey. kamada@mx.ibaraki.ac.jp, semozl@gmail.com, habuchi@mx.ibaraki.ac.jp Abstract: With application to the impulse radio communications in mind, a locally supported and zero-mean pulse which is orthogonal to its shifts by integers is sought among the exponential splines having the knot interval 21 . An example pulse is obtained that complies with the regulation imposed by the US Federal Communications Commission and will potentially enable an impulse radio communications system as fast as 6G pulses per second. 1. Introduction The M-shaped linear spline  √ 0 ≤ t ≤ 12   √3t,    √3(2 − 3t), 12 ≤ t ≤ 1 3 M (t) = √3(3t − 4), 13 ≤ t ≤ 2    3(2 − t),  2 ≤t≤2  0, elsewhere advantage that they can be shaped through linear dynamical systems [5] . The pulse functions, if they are found, will work as practical pulses which carry information in the impulse radio communications. The problem is simple: we are to find a locally supported and zero-mean exponential spline q(t) with the knot interval 12 that satisfies { ∫ ∞ 1, k = 0 q(t)q(t − k)dt = (2) 0, k ̸= 0 −∞ for any ingeter k. This paper presents a procedure to find such a pulse function and its application to the impulse radio. (1) plotted in Fig. 1 is not a wavelet in the sense of muntiresolutional analysis because M (t) is not orthogonal to its contracted version M (2t). But it has three remarkable properties that (i) it is locally supported, (ii) its integration over the domain is zero, and (iii) its shifts by integers are orthogonal to one another [2]. Those properties are exactly what is required of pulses for the impulse radio communications [6]. The three properties are required (i) for the sake of real-time communications, (ii) for the pulse to be feasible as a radio waveform, and (iii) for pulse detection to be robust against noise in the sense of least-square estimation, respectively. We shall look for this kind of pulse functions in the broader family of exponential splines [4, 5] which have the 2. Construction of orthogonal pulses Any exponential spline can be represented by a linear combination of the exponential B-spline and its shifts [4, 5]. An exponential B-spline with the knot interval 21 is the output β(t) = S(b)(t) (3) of a linear dynamical system S having the transfer function G(s) = for the input being a series of delta functions b(t) = such that B(z) = n ∑ n ∑ l=0 bl δ(t − l/2) (5) l bl z − 2 1 = (1−z − 2 e 1 1 2 -1 Figure 1: M-shaped linear spline. SAMPTA'09 (4) l=0 M(t) 0 µn−1 sn−1 + · · · + µ1 s + µ0 (s − λ0 )(s − λ1 ) · · · (s − λn−1 ) λ0 2 1 )(1−z − 2 e λ1 2 1 ) · · · (1−z − 2 e λn−1 2 This exponential B-spline is locally supported as ( n) β(t) = 0, t ∈ / 0, . 2 ).(6) (7) In order to keep the splines zero-mean, instead of the original exponential B-spline β(t), we shall use ( ) 1 α(t) = β(t) − β t − (8) 2 129 n−1 by {cl }l=0 , and prepare time-reversed functions which has the zero mean ∫ ∞ α(t)dt = 0 (9) ã(t) = a(−t), c̃(t) = c(−t), q̃(t) = q(−t) (21) −∞ and the “mirror” system S̃ having the transfer function G(−s). Then we can express the correlation by and is locally supported as ) ( n+1 . α(t) = 0, t ∈ / 0, 2 r(k) = (q ∗ q̃)(k) = (S ◦ S̃)(a ∗ ã ∗ c ∗ c̃)(k), (10) Another representation of this α(t) is the output α(t) = S(a)(t) (11) where ∗ denotes the convolution integral, and we can write D(z) = C(z)C(z −1 ) in the form of S for the input C(z)C(z −1 ) = d0 + a(t) = n ∑ l=0 where A(z) = n+1 ∑ al z n−1 ∑ j j dj (z − 2 + z 2 ) (23) j=1 al δ(t − l/2), (12) which implies n−1 ∑ (c ∗ c̃)(t) = d0 δ(t)+ dj (δ(t−j/2) + δ(t+j/2)) .(24) − 2l j=1 l=0 1 (22) = (1−z − 2 e λ0 2 1 ) · · · (1−z − 2 e λn−1 2 1 )(1 − z − 2 ).(13) In the meantime, a locally supported exponential spline ϕ(x) = (S ◦ S̃)(a ∗ ã)(x) Let the desired pulse function be represented in the form q(t) = n−1 ∑ l=0 cl α(t − l/2). ϕ(x) = ϕ(−x). n−1 ∑ d0 ϕ(k)+ dj (ϕ(k − j/2) + ϕ(k + j/2)) (15) j=1 q(t)dt = 0. (16) −∞ The remaining request is that its autocorrelation ∫ ∞ q(t)q(t − x)dt r(x) = (17) −∞ should satisfy the orthogonality conditions { 1, k = 0 r(k) = 0, k = ±1, ±2, · · · (18) = { 1, k = 0 0, k = 1, 2, · · · , n − 1. We assume that (27) is solvable since we cannot proceed unless this is the case. Then, C(z)C(z −1 ) determined by n−1 (23) from {dj }j=0 can be factorized in the form 1 n−1 ∑ l=0 cl δ(t − l/2) and C(z) = SAMPTA'09 1 n−1 ∑ l=0 l that cl z − 2 (20) 1 1 · · · (z − 2 −γn−1 )(z 2 −γn−1 ). (28) Taking half the factors, we can find 1 1 1 √ C(z) = ± γ0 (z − 2 −γ1 )(z − 2 −γ2 ) · · · (z − 2 −γn−1 ) (29) n−1 that gives the sought coefficients {cl }l=0 by (20). Exciting the system S with the input series of delta functions v(t) = Now we have only to find the coefficients make (19) hold good. Define 1 C(z)C(z −1 ) = γ0 (z − 2 −γ1 )(z 2 −γ1 )(z − 2 −γ2 )(z 2 −γ2 ) reduced from (18) by (15) and the equality r(x) = r(−x). n−1 {cl }l=0 (27) n−1 Solvability of (27) for {dj }j=0 can be checked by numerical computation in practice. A simpler condition in terms of dynamical parameters is yet to be established. 1 with respect to shift by integers. Here the number n of n−1 {α(t − l/2)}l=0 employed for composing q(t) in (14) is chosen so that the number n of the unknown coefficients n−1 {cl }l=0 be the same as that of the essential conditions { 1, k = 0 (19) r(k) = 0, k = 1, 2, · · · , n − 1 c(t) = (26) By (22), (24), (25) and (26), we can reduce the orthogonality conditions (19) to the linear equations Then it is automatic that q(t) is locally supported as and has the zero mean ∫ ∞ associated with the composite system S ◦ S̃ satisfies (14) q(t) = 0, t ∈ / (0, n) (25) n−1 ∑ l=0 cl a(t − l/2), (30) we obtain the desired pulse function q(t) = S(v)(t) = n−1 ∑ l=0 cl α(t − l/2). (31) 130 In the case G(s) = 1s , the problem is trivial and the resulting pulse is the Haar function   1, 0 < t ≤ 12 −1, 12 < t ≤< 1 H(t) = (32)  0, elsewhere. as illustrated in Fig. 2. Since a good broadband antenna is d , the transmitted signal w(t) well approximated [6] by dt is differentiated once by the transmitter antenna to be the radio signal ∞ ∑ d d wl p(t − l) w(t) = dt dt 1 s2 yields M (t) of (1) as expected. BeThe case G(s) = √ cause it happens that M (t) = 3(H ∗ H)(t), we might speculate that the pulse associated with G(s) = s13 could be proportional to (H ∗H ∗H)(t). But that is not true since (H ∗ H ∗ H)(t) is not orthogonal to (H ∗ H ∗ H)(t − 2). It is interesting as well as disappointing that we obtain a complex-valued pulse in the case G(s) = s13 . A nice example pulse will appear in the next section in the context of its application to the impulse radio communications. 3. Application to Impulse Radio While the series of delta functions a(t) does not exist in the real world, its integration  ∫ t t<0 0, ∑l (33) a(τ )dτ = ak , 2l < t < l+1 2 , l = 0, 1, · · · , n ∑k=0 n+1 −∞ n+1 < t a = A(1) = 0, k=0 k 2 is a locally supported piecewise constant function that can be easily generated by electric current switches. The system S excited by the piecewise constant function u(t) = shapes the pulse ∫ t v(τ )dτ = −∞ n ∑ cl ∫ t−l/2 a(τ )dτ (34) −∞ l=0 and has the relationship ∫ t n ∫ t−l/2 ∑ p(t) = cl α(τ )dτ = q(τ )dτ. (36) (37) −∞ Besides the simple and practical system (35) to shape p(t) from the piecewise constant seed u(t), the pulse p(t) has the remarkable property ∫ ∞ 2 ∫ ∞ d p(t) p(t − k)dt = − q(t)q(t − k)dt 2 −∞ dt −∞ { −1, k = 0 = (38) 0, k = ±1, ±2, · · · which follows from (17), (18), (36), (37) and the partial integration formula. This property gives the foundation to transmission and detection of the pulse p(t) in the impulse radio communications. Given data bits {wl }, we transmit the waveform ( ∞ ) ∞ ∑ ∑ w(t) = S wl u(t − l) = wl p(t − l) (39) SAMPTA'09 2 d Correlating the received signal dt 2 w(t) with the template pulse p(t−k), which is the same as the transmission pulse, for its duration (k, k + n), we have the bit wk recovered by ∫ k+n 2 ∫ ∞ 2 d d w(t) p(t−k)dt = w(t) p(t−k)dt 2 2 dt k −∞dt ∫ ∞ 2 ∞ ∑ d p(t−l) p(t−k)dt = wl 2 dt −∞ l=−∞ = −wk (42) because of the property (38). It should be noted that, because of (38), the detection formula (42) virtually performs the least-squares approximad d tion of the radio waveform dt w(t) by dt p(t−k) = q(t−k) d w(t) to detect wk . Additive noises superimposed on dt will then be most suppressed in the sense of least-squares estimation. G(s) = p(t) = 0, t ∈ / (0, n) l=−∞ l=−∞ (35) which is locally supported as −∞ and again by the receiver antenna to arrive at the receiver as ∞ ∑ d2 d2 w(t) = (41) wl 2 p(t − l). 2 dt dt An example pulse associated with the transfer function p(t) = S(u)(t) l=0 (40) l=−∞ l=−∞ 1 (43) (s+18)(s+11.1i+10−13 )(s−11.1i+10−13 ) and its derivatives are plotted in Fig. 3. The correlation in Fig. 4 becomes 1 and 0 at the origin and at the other integers, respectively, to verify (38). The power spectral d density of the radio pulse dt p(t) = q(t) is plotted in Fig. 5 along with the spectral mask (plotted by the boxy line) for the indoor ultra-wideband communications systems [1] imposed by the US Federal Communications Commission received pulses w w dtd p d2 dt2 2 2 d dt w radio pulses w dtd p transmission pulses w wp -w5 -w S -w4 -w -w template pulses S -w w w w w4 w w5 S S receiver transmitter Figure 2: Schematic diagram of the transceiver. 131 p q 0.2 0.2 0 1 3 3 2 0 1 q 3 2 d2 p dt2 x p x 1 -0.2 -0.2 x (a) pulse seed u(t) 1 3 2 0 (c) radio pulse d p(t) dt 1 2 3 (d) received pulse d2 p(t) dt2 Figure 3: Pulses for impulse radio. as the upper bound which no practical pulses are allowed to exceed. The frequency axis of the mask is scaled down by 6 GHz for the purpose of comparison, or equivalently, the pulse repetition rate is assumed to be 6 G pulses per second, which is much faster than the 1.32G pulses per second of the high speed direct sequence ultra-wideband protocol discussed in the IEEE 802.15.3a standard. The fast transmission is possible because the pulses are orthogonal even though they are densely overlapping. But dense pulses are prone to interfere with one another in the situation that several reflected pulses arrive with various delays. Multipath compensation by digital filtering is crucial in order to effectively exploit the dense pulses we obtained. Transmitting a sounder pulse and digitizing the observed correlations, we have the end-to-end impulse response of the multipath channel. Digital filtering by an FIR approximation of the inverse impulse response will work as a kind of rake receiver. This compensation requires an analog-to-digital converter and a digital filter that work at the pulse rate and thus costs more hardware. But this cost should be justified since all the pulse-based systems cannot be faster without having denser pulses in the first place. A detailed analysis of the multipath effects, channel modeling error, and pulse synchronization is available in [3]. We may ignore the multipath effects and channel modeling error in the extreme situation that antennas are inductively coupled at a very short distance less than one inch. TransferJet technology has been working in the same situation at the maximum transmission rate of 560Mbps since 2008. A faster system will hopefully be the first application of the dense pulses obtained in this paper. 4. Conclusions Inspired by the M-shaped orthogonal pulse, we derived a procedure to construct an exponential spline pulse with the knot interval 12 that is locally supported, has its mean zero, and is orthogonal to its shifts by integers. An exam- SAMPTA'09 1 2 3 0 -5 -1 0 -1 Figure 4: Correlation of the pulse. 5 1 0 -2 d2 p dt2 =q Power Spectral Density (dB) d p dt -3 (b) transmission pulse p(t) -10 -20 -30 -40 0 1 2 3 Relative Frequency Figure 5: Power spectral density of the pulse and the FCC spectral mask. ple pulse was obtained that will potentially enable an impulse radio communications system as fast as 6G pulses per second under the FCC regulation for the indoor ultrawideband communications. 5. Acknowledgment This work was partially supported by JSPS grant-in-aid No. 17560357. References: [1] Revision of Part 15 the Commission’s rule regarding ultra-wideband transmission systems. ET Docket No.98-153, Federal Communications Commission, Washington, D.C., 2002. [2] A. J. Jerri. Wavelets – Detailed Treatment with Applications. Exercises of Chapter 3. Sampling Publishing, Potsdam, NY, to appear in 2009. [3] M. Kamada S. Özlem and H.Habuchi. Construction of orthogonal overlapping pulses for impulse radio communications. IEICE Transactions on Fundamentals, E91-A(11):3121–3129, Nov. 2008. [4] M. Unser and T. Blu. Cardinal exponential splines: Part I—Theory and filtering algorithms. IEEE Transactions on Signal Processing, 53(4):1425–1438, April 2005. [5] M. Unser. Cardinal exponential splines: Part II— Think analog, act digital. IEEE Transactions on Signal Processing, 53(4):1439–1449, April 2005. [6] M. Z. Win and R. A. Scholtz. Impulse radio: how it works. IEEE Commun. Lett., 2(2):36–38, 1988. 132 Special session on Mathematical Aspects of Compressed Sensing Chair: Holger RAUHUT SAMPTA'09 133 SAMPTA'09 134 A short note on non-convex compressed sensing Rayan Saab (1) and Özgür Yılmaz(2) (1) Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, B.C. Canada V6T 1Z4. (2) Department of Mathematics, University of British Columbia, Vancouver, B.C. Canada V6T 1Z2. rayans@ece.ubc.ca, oyilmaz@math.ubc.ca Abstract: In this note, we summarize the results we recently proved in [14] on the theoretical performance guarantees of the decoders ∆p . These decoders rely on ℓp minimization with p ∈ (0, 1) to recover estimates of sparse and compressible signals from incomplete and inaccurate measurements. Our guarantees generalize the results of [2] and [16] about decoding by ℓp minimization with p = 1, to the setting where p ∈ (0, 1) and are obtained under weaker sufficient conditions. We also present novel extensions of our results in [14] that follow from the recent work of DeVore et al. in [8]. Finally, we show some insightful numerical experiments displaying the trade-off in the choice of p ∈ (0, 1] depending on certain properties of the input signal. 1. Introduction Let ΣN S be the set of all S-sparse vectors, N ΣN S := {x ∈ R : #supp(x) ≤ S}, and define, qualitatively, compressible vectors as vectors that can be “well approximated” in ΣN S . For p > 0, let σS (x)ℓp denote the best S-term approximation error of x in ℓp (quasi-)norm, i.e., σS (x)ℓp := min kx − vkp . v∈ΣN S We are interested in recovering x from its possibly noisy “encoding” b = Ax + e, (1) where A is an M × N matrix with M < N . Equivalently, we seek accurate, stable, and “implementable” decoders ∆ : RM 7→ RN such that k∆(Ax + e) − xk scales well with the noise level kek, and is small whenever x is compressible. In general, the problem of constructing decoders with such properties is non-trivial (even if e = 0) as A is overcomplete. However, if A ∈ RM ×N is in general position, it can be shown that there is a decoder ∆0 which satisfies ∆0 (Ax) = x for all x ∈ ΣN S whenever S < M/2 [10]. This ∆0 can be explicitly computed via the optimization problem ∆0 (b) := arg min kyk0 subject to b = Ay. y SAMPTA'09 (2) Unfortunately, (2) is combinatorial in nature, thus its complexity grows extremely quickly as N becomes much larger than M . Naturally, one then seeks to replace (2) with a more tractable optimization problem. 1.1 Decoding by ℓp minimization Define the decoders ∆ǫp (b) = arg min kxkp subject to kAx − bk2 ≤ ǫ, (3) x and ∆p (b) = arg min kxkp subject to Ax = b, (4) with 0 < p ≤ 1. [2, 4, 9, 10, 15], that in the noise-free setting ∆1 recovers x exactly if x is sufficiently sparse and if A has certain properties. Furthermore, one has error guarantees even when x is not “exactly” sparse and when the encoding is noisy, e.g., [2, 9]. In this note we focus on ∆p and ∆ǫp with 0 < p < 1. Early work by Gribonval and co-authors (e.g. [12, 13]) presents sufficient conditions for having ∆p (b) = ∆1 (b) = x and stability conditions to deal with noisy encoding. However, these conditions are pessimistic in the sense that they generally guarantee recovery of only very sparse vectors. Recently, Chartrand [5] showed that in the noise-free setting, a sufficiently sparse signal can be recovered perfectly with ∆p , where p ∈ (0, 1), under less restrictive requirements than those needed to guarantee perfect recovery with ∆1 . Moreover, in [6], Staneva and Chartrand showed that if A is an M × N Gaussian matrix, recovery of x in ΣN S is guaranteed provided M > C1 (p)S + pC2 (p)S log(N/K). In other words, the dependence on N of the required number of measurements M (that guarantees perfect recovery for all x ∈ ΣN S ) disappears as p approaches 0, unlike the case with p = 1. These results motivate a more detailed study of the stability and robustness properties of the decoders ∆p . In the remainder of the note, we summarize our recent results in [14] concerning the theoretical properties of ∆p and ∆ǫp . In addition, we present some extensions of our results on the instance optimality in probability of ∆p when the entries of A are drawn from any sub-Gaussian distribution. Finally, we present numerical results suggesting scenarios where using ∆p , p ∈ (0, 1), is better than using ∆1 . 135 2. Main Results We begin with the relevant notation. Let δS , the Srestricted isometry constants of A (see, e.g., [2]), be the smallest constants satisfying (1 − δS )kck22 ≤ kAck22 ≤ (1 + δS )kck22 2.2 for every c ∈ ΣN S . We say that a matrix satisfies RIP(S, δ) if δS < δ. It has been shown that if A is an M × N matrix the columns of which are i.i.d. random vectors with any sub-Gaussian distribution, then A satisfies RIP (S, δ) with S ≤ c1 M/log(N/M ), δ < 1 with probability > 1 − 2e−c2 M (see, e.g., [1], [3]). Following the notation of [16], we say that a decoder ∆ is (q, p) instance optimal if (5) k∆(Ax) − xkq ≤ CσS (x)ℓp /S 1/p−1/q holds for all x ∈ RN . Moreover, a decoder ∆ is said to be (q, p) instance optimal in probability if (5) holds for any x with high probability on the draw of A. Note that the stability results of Candès et al. [2] imply (2,1) instance optimality of the decoder ∆1 , while the results of Wojtaszczyk in [16] show that ∆1 is (2,2) instance optimal in probability if the entries of A are drawn from a Gaussian distribution or if its columns are drawn uniformly from the sphere. 2.1 Decoding with ∆p : stability and robustness We consider the scenario where x is arbitrary and σS (x)ℓp is its best S-term approximation error measured in ℓp (qausi)-norm. In particular, we are interested in controlling the error k∆ǫp (b) − xkp2 . Theorem 1 Let p ∈ (0, 1] and let x be arbitrary. Suppose that 2 2 (6) δkS + k p −1 δ(k+1)S < k p −1 − 1, for some k > 1, kS ∈ Z+ . Let b = Ax + e where kek2 ≤ ǫ. Then ∆ǫp (b) satisfies k∆ǫp (b) − xkp2 ≤ C (1) ǫp + C (2) σS (x)pℓp , S 1−p/2 (7) where C (1) and C (2) are given in [14]. Remark 2 This is a straightforward generalization of the results of [2] regarding the performance of ∆1 . In fact, by setting p = 1 in the above theorem, we obtain the main theorem of [2], with precisely the same constants. Remark 3 Using ǫ = 0 in the above theorem, we find that the decoder ∆p is (2, p) instance optimal. Similarly, ǫ assuming x ∈ ΣN S (hence σS (x)ℓp = 0), we see that ∆p can stably recover sparse signals. We can also compare Sp , the sparsity of vectors that are guaranteed to be recovered with ∆p and S1 , the sparsity of vectors that are guaranteed to be recovered with ∆1 . This helps illustrate the possible benefits of using ∆p over using ∆1 in recovering sparse signals. SAMPTA'09 Corollary 4 (relationship between S1 and Sp ) Suppose for some k and S1 , δ(k+1)S1 < k−1 k+1 . Then ∆1 recovers S1 -sparse vectors and ∆p recovers Sp -sparse vectors with   k+1 S . Sp ≥ 1 k p/(2−p) + 1 Instance optimality in probability of ∆p In [7], it was shown that no decoder, ∆ : RM 7→ RN , is (2, 2) instance optimal unless M ∼ N . In this section, we show that ∆p is (2, 2) instance optimal in probability. Our approach is similar to that of [16], which we summarize now. Denoting by BqK the unit ball of ℓq in K dimensions, a matrix A is said to possess the LQ1 (α) property if and only if A(B1N ) ⊃ αB2M . In [16], Wojtaszczyk shows that random Gaussian matrices of size M × N , as well as matrices whose columns are drawn uniformly q from the sphere posses the ) with high probLQ1 (α) property, α = µ log (N/M M √ ability. Here µ < 1/ 2 is a constant. Noting that such matrices also satisfy RIP ((k + 1)S, δ) with S < M c log(N/M ) with high probability, Wojtaszczyk proves that ∆1 , with these matrices, is (2,2) instance optimal in probability. Our proof of the analogous result for ∆p , p ∈ (0, 1), relies on the non-trivial generalization of the LQ1 property to an LQp (α) property with α =  (1/p−1/2) ) 1/Cp µ2 log (N/M . Specifically, we say that M a matrix A satisfies LQp (α) if and only if A(BpN ) ⊃ αB2M . Below, we will use Aω to denote matrices whose entries are drawn from a zero mean, normalized column variance Gaussian distribution and Ãω to denote matrices drawn uniformly from the sphere. The following lemma states that the matrices Aω and Ãω satisfy the LQp property with high probability. Lemma 5 Ãω and Aω satisfy the LQp (α) property with 1/p−1/2  ) with probability ≥ 1 − α = 1/Cp µ2 log (N/M M e−cM on the draw of the matrix. √ Here, Cp is a constant that depends only on p, µ < 1/ 2 is a constant, and c is a constant that depends on µ. Proving Lemma 5 is non-trivial and requires a result by [11], relating the distances of p-convex bodies to their convex hulls. On the other hand, this lemma provides the machinery needed to prove the following theorem, which extends an analogous result of Wojtaszczyk [16]. Theorem 6 Let Aω ∈ RM ×N , ω ∈ Ω, be a random matrix with entries drawn independently from a zero-mean, normalized column variance Gaussian distribution, and let (Ω, P ) be the associated probability space. There exists constants c1 , c2 , c3 > 0 such that for all S ≤ c1 M/ log (N/M ), the following are true. (i) ∃Ω1 , with P (Ω1 ) ≥ 1 − e−c2 M , such that ∀x ∈ RN , ∀e ∈ RM and ∀ω ∈ Ω1 k∆p (Aω (x)+e)−xk2 ≤ C(kek2 + σS (x)ℓp ), (8) S 1/p−1/2 136 (ii) ∀x ∈ RN , ∃Ωx , with P (Ωx ) ≥ 1 − e−c3 M , such that ∀e ∈ RM and ∀ω ∈ Ωx 1 0.9 0.8 k∆p (Aω (x)+e)−xk2 ≤ C (kek2 + σS (x)ℓ2 ) . (9) 0.7 p 0.6 eω . The statement also holds for A 0.5 0.4 0.3 0.2 Note that the constants above (both denoted by C) rely on the parameters of the particular LQp and RIP properties that the matrix satisfies, and are omitted for ease of exposition. For the proofs of Lemma 5 and Theorem 6 see [14]. Finally, we present the following extension of Theorem 6. 0.1 10 20 30 40 50 S (a) 1 Proposition 7 The conclusions of Theorem 6 also hold when the entries of A are i.i.d., drawn from a subGaussian distribution. 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 p 0.5 0.5 Our proof of the above proposition, which we omit here, relies on the recent work of [8] where the LQ1 (α) property was modified, allowing the authors to show the (2,2) instance optimality of ∆1 when the entries of the matrix A are drawn from any sub-Guassian distribution. 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 10 20 30 40 50 S (b) In this section, we present some numerical experiments to highlight important aspects of sparse recovery using ∆p , 0 < p ≤ 1. First, we are interested in the sufficient conditions under which decoding with ∆p can guarantee perfect recovery of signals in ΣN S for different values of p and S. Our goal is to show empirically that with smaller values of p ∈ (0, 1), ∆p allows recovery of less sparse signals than would have been possible with larger values of p, as Theorem 1 predicts. To that end, we generate a 100 × 300 matrix whose columns are drawn from a Gaussian distribution and estimate its RIP constants δS via Monte Carlo (MC) simulations. Under the assumption that the estimated constants are the correct ones (while in fact they are only lower bounds), Figure 1(a) shows the regions where (6) guarantees recovery for different (S, p)-pairs. On the other hand, Figure 1(b) shows the empirical recovery rates using the same matrix with fifty different instances of x ∈ ΣN S , and decoding by ∆p , where we choose the non-zero coefficients of x randomly from the Gaussian distribution. Here, we compute ∆p (Ax), as a solution to the ℓp optimization problem of (4) by using a projected gradient algorithm P on a smoothed version of kxkpp , namely i (x2i + ǫ2 )p/2 , where the solution to each subproblem, starting with a large ǫ is used as an initial estimate for the next subproblem with a smaller ǫ. Note that this approach is similar to the one described in [5]. Clearly, the empirical results show that ∆p is successful in a wider range of scenarios than those predicted by Theorem 1. This can be attributed to the fact that the conditions presented in this paper are only sufficient. Moreover, what is observed in practice is not necessarily a manifestation of uniform recovery. Rather, the practical results could be interpreted as success of ∆p with high probability on either x or A. In our second set of experiments, we wish to observe the instance optimality of ∆p , i.e., the linear growth of the SAMPTA'09 0.15 ||"p(Ax)−x||2 Numerical Experiments 0.1 p=0.4 p=0.6 p=0.8 p=1 0.05 0 0 0.02 0.04 0.06 0.08 0.1 0.06 0.08 0.1 ! (a) 0.4 ||"p(Ax)−x||2 3. Figure 1: For a Gaussian matrix A ∈ R100×300 , whose δS values are estimated via MC simulations, we generate the theoretical (a) and practical (b) phase-diagrams for reconstruction via ℓp minimization. The lighter shading indicates higher recoverability rates. . 0.3 p=0.4 p=0.6 p=0.8 p=1 0.2 0.1 0 0 0.02 0.04 ! (b) Figure 2: Reconstruction error with compressible signals, S = 5 (a), S = 35 (b). Observe the almost linear growth of the error for different values of p, highlighting the instance optimality in probability of the decoders. ℓ2 reconstruction error k∆p (Ax) − xk2 , as a function of σS (x)ℓ2 . To that end, we generate scenarios that allude to the conclusions of Theorem 6. We generate a signal composed of xT ∈ Σ300 S , supported on an index set T , and a signal zT c supported on T c = {1, 2, ..., 300}\T , where all the coefficients are drawn from the Gaussian distribution and kxT k2 = kzT c k2 = 1. We then set xλ = xT + λzT c with increasing values of λ starting from 0, i.e., xλ be- 137 comes less compressible as λ increases, and T is the “effective support” of xλ . Next, we choose our measurement matrix A ∈ R100×300 by drawing its columns uniformly from the sphere. For each value of λ we measure the reconstruction error k∆p (Axλ ) − xλ k2 , and we repeat the process 50 times while randomizing the index set T but preserving the coefficient values. We report the averaged results for different values of p with S = 5 in Figure 2(a) and S = 35 in Figure 2(b). Note that when S = 5, ∆1 provides the best performance, and the performance of ∆p degrades monotonically as p decreases. On the other hand, when S = 35, ∆p with p = 0.4 provides the best performance and the performance degrades as p increases. We investigate this observation further by examining the performance as a function of S ∈ {5, 10, ..., 35}. In Figure 3, we plot the value of an “empirical effective constant” which we calculate as the maximum of k∆p (Axλ ) − xλ k2 /λ as λ > 0 varies. This constant acts as a surrogate for C in (9) assuming that such a constant exists and that σS (x)ℓ2 = kλzT c k2 = λ. The behavior gradually changes from favoring p = 1 when S, the size of the effective support of xλ , is small to favoring p = 0.4 as S increases. A closer look at the explicit value of the constant in Theorem 6 sheds some light on this behavior. Below, we use the notation of [14]. The constant C in (9) behaves like (2C (2) )1/p /γp (where C (2) and γp are explicitly given in [14]). Specifically, 1/γp depends only on the matrix A and increases exponentially as p decreases, while C (2) , the constant in Theorem 1, depends on p, as well as k and δ(k+1)S (where k > 1 is a free parameter). When S is relatively small, the associated RIP constants remain small, which consequently implies that [C (2) ]1/p remains small provided p is isolated away from 0. In this case, the behavior of C is determined by that of γp , i.e., C is smallest when p = 1. On the other hand, when S is large, [C (2) ]1/p grows as p approaches 1 (this is a manifestation of the more restrictive RIP requirements for larger p as stated in (6)). This increase seems to be dominating the behavior of C, thus for larger S we get better effective constants with smaller p. Such a heuristic could be an interpretation of the behavior we observe in Figure 3. For a rigorous quantitative analysis, one needs to identify the s-restricted isometry constants of the matrix A for every s. Such a treatment is beyond the scope of this note. Effective Constant 2 1.8 1.6 p=0.4 p=0.6 p=0.8 p=1 1.4 1.2 1 5 10 15 20 S 25 30 35 Figure 3: The empirical effective constant as a function of S for different values of p. Note the gradual change favoring p = 1 when S is small to p = 0.4 as S increases. SAMPTA'09 References: [1] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin. A Simple Proof of the Restricted Isometry Property for Random Matrices. Constructive Approximation, 2008. [2] E. J. Candès, J. Romberg, and T. Tao. Signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math., 59(8):1207–1223, 2005. [3] E. J. Candès and T. Tao. Decoding by linear programming. IEEE Trans. Inform. Theory, 51(12):489–509, 2005. [4] E. J. Candès and T. Tao. Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inform. Theory, 52(12):5406–5425, 2006. [5] R. Chartrand. Exact reconstructions of sparse signals via nonconvex minimization. IEEE Signal Process. Lett., 14(10):707–710, 2007. [6] R. Chartrand and V. Staneva. Restricted isometry properties and nonconvex compressive sensing. Inverse Problems, 24(035020), 2008. [7] A. Cohen, W. Dahmen, and R. DeVore. Compressed sensing and best k-term approximation. Journal of the American Mathematical Society (to appear), 2008. [8] R. DeVore, G. Petrova, and P. Wojtaszczyk. Instance-optimality in probability with an ℓ1 minimization decoder. preprint, 2008. [9] D. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289–1306, 2006. [10] D. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization. Proc. Natl. Acad. Sci. USA, 100(5):2197–2202, 2003. [11] Y. Gordon and N.J. Kalton. Local structure theory for quasi-normed spaces. Bull. Sci. Math., 118:441– 453, 1994. [12] R. Gribonval, R. M. Figueras i Ventura, and P. Vandergheynst. A simple test to check the optimality of sparse signal approximations. EURASIP Signal Processing, special issue on Sparse Approximations in Signal and Image Processing, 86(3):496–510, 2006. [13] R. Gribonval and M. Nielsen. Highly sparse representations from dictionaries are unique and independent of the sparseness measure. Appl. Comput. Harm. Anal., 22(3):335–355, May 2007. [14] R. Saab and O. Yilmaz. Sparse recovery by nonconvex optimization – instance optimality. CoRR, abs/0809.0745, 2008. [15] J.A. Tropp. Recovery of short, complex linear combinations via l1 minimization. IEEE Transactions on Information Theory, 51(4):1568–1570, April 2005. [16] P. Wojtaszczyk. Stability and instance optimality for gaussian measurements in compressed sensing. Preprint, 2008. 138 Orthogonal Matching Pursuit with random dictionaries P. Bechler, and P. Wojtaszczyk Institut of Applied Mathematics, University of Warsaw P.Bechler@mimuw.edu.pl, wojtaszczyk@mimuw.edu.pl Abstract: In this paper we investigatet the efficiency of the Orthogonal Matching Pursuit for random dictionaries. We concentrate on dictionaries satisfying Restricted Isometry Property. We introduce a stronger Homogenous Restricted Isometry Property which is satisfied with overwhelming probability for random dictionaries used in compressed sensing. We also present and discuss some open problems about OMP. 1. Introduction In this paper we investigate the efficiency of the Orthogo√ nal MatchinT = U T ∗ T g Pursuit for random dictionaries. Orthogonal Matching Pursuit is a well known greedy algorithm widely used in approximation theory, statistical estimations and compressed sensing (for the review of greedy algorithms see [6]). One of its main features is that it can be applied for arbitrary dictionary. However the efficiency of the algorithm depend very strongly on properties of the dictionary. We work in the context of a Hilbert space H (which may be assumed to be finite dimensional). The dictionary is a subset D ⊂ H such that span D = H. We usually assume that kxk is close to 1 for x ∈ D. Generally it is assumed that kxk = 1 for x ∈ D (see e.g. [6]). However for random dictionaries it is very rarely satisfied. On the other hand for such dictionary kxk is close to 1 with great probability. The Orthogonal Matching Pursuit algorithm with respect to the dictionary D obtains iteratively a sequence OMPn f ∈ H of approximants of a given element f ∈ H and a sequence d1 , . . . , dn ∈ D in the following way: define the set of m sparse vectors (with respect to the dictionary D) as m X ΣD = Σ = { aj dj : {dj }m m m j=1 ⊂ D}. (1) j=1 For a given f ∈ H we define its best error of m–term approximation as σm (f ) = inf{kf − zk : z ∈ Σm }. (2) Clearly we always have σm (f ) ≤ kf − OMPm (f )k = kfm k. Obviously when our dictionary is an orthonormal basis then σm (f ) = kf − OMPm (f )k for each f ∈ H. Unfortunately this is the only case when it is so. The fundamental, and still largely unanswered question is how close OMPm (f ) can get to this optimal rate of approximation given by σm (f ). It is to be expected that the answer to the above question must depend on some extra properties of the dictionary. 2. Dictionaries One of the commonly used quantitative parameters of the dictionary is its mutual coherence. It is defined as η= sup d1 6=d2 ∈D |hd1 , d2 i|. (3) Recently, especially in the context of compressed sensing, a restricted isometry property (RIP for short) became very useful. Let us recall the following well known definition (c.f. [1, 2]). Definition 1 The dictionary Φ = {φj }N j=1 has RIP(k, δ), 0 < δ < 1 if for any set T ⊂ {1, . . . , N } with #T = k and any sequence of numbers xj we have • Given OMPn−1 f and d1 , . . . , dn−1 ∈ D choose sX sX dn ∈ D such that X 2 ≤k (1 − δ) |x | x φ k ≤ (1 + δ) |xj |2 . j j j   j∈T j∈T j∈T |hf −OMP f, dn i| = sup |hf − OMP f, di : d ∈ D n−1 n−1 (4) • Define OMP0 f = 0. and define OMPn f as the orthogonal projection of f onto span{d1 , . . . , dn }. Generally we will write f − OMPs f := fs . The standard measure of approximation power of a dictionary is the error of the best m–term approximation. We SAMPTA'09 There are some easy relations between those notions. If the dictionary D has mutual coherence η then it satisfies RIP(k, 1−η) for k < η −1 . On the other hand if D satisfies RIP(k, δ) then it has mutual coherence ∼ δ. Usually dictionaries with RIP are exhibited as random dictionaries. To be more precise we define a dictionary in Rn 139 as Φ(ω) = {φj }N j=1 where φj = (γj,1 , . . . , γj,n ) and γj,i are idependent copies of a fixed subgaussian random variable normalised so that Ekφk k2 = 1. In this context it is known (see e.g. [1]) that for a fixed 0 < δ < 1 there exists c > 0 such that the dictionary Φ(ω) with overwhelming probability satisfies RIP(k, δ) with k = ⌊cn/ log N ⌋. On the other hand it is also known that such a dictionary with overwhelming probability has mutual coherence of order k −1/2 . It is clear that when we have two events each of them happening with big probability then they happen simultanously with big probability. This leads to the following definition: Definition 2 The dictionary Φ has homogenous restricted isometry property HRIP(k, δ), 0 < δ < 1 if for any l ≤ k p it satisfies RIP(l, δ l/k). Following standard reasoning we obtain Theorem 1 Suppose that integers n, N and numbers 0 < δ < 1 and a > 0 are given and suppose that the random dictionary Φ(ω) = {φ1 , . . . , φN } ⊂ Rn is as described above. Then there exist c, c1 > 0 which depend only on the subgaussian distribution involved, δ and a such that dictionary Φ(ω) satisfies HRIP(k, δ) for k = ⌊c1 n/ log N ⌋ with probability ≥ 1 − 3N −a Basically this tells us that unless we are very unlucky a randomly chosen dictionary satisfies HRIP, which is clearly stronger property than RIP. We believe that HRIP is a useful property. Theorem 4 is some indication of this. 3. Main Result Now we want to present a result on the approximation power of OMP for dictionaries satisfying RIP. For dictionaries with incoherence analogous results were obtained by D. Donoho, M. Elad and N.V. Temlyakov [3]. If we are interested p in random dictionaries results from [3] require S ≤ n/ log N while ours apply for the full range S ≤ n/ log N . Theorem 2 There exist constants C and c depending only on ǫ > 0 such that for the dictionary Φ satisfying RIP(2K, ǫ) and for 0 ≤ k ≤ S ≤ K we have 2 kfS k ≤ Ckfk k (σS−k (fk ) + Aǫkfk k) . (5) with A = c(1 + log K)). Note that in particular seting k = 0 we get kfS k2 ≤ Ckf k(σS (f ) + Aǫkf k). (6) The proof of Theorem 2 is rather complicated. It uses a lot of geometry of Hilbert space, theory of Riesz bases and ideas from [3] and [5]. The main new technical tool is the following lemma on norm of matrices. Lemma 1 Let 0 < ǫ < 1 and let A = [ai,j ] be an n × n upper triangular matrix such that for any x ∈ Rn (1 − ǫ)kxk ≤ kAxk ≤ (1 + ǫ)kxk SAMPTA'09 (7) and |ai,i | ≥ 1 − ǫ for i = 1, 2, . . . , n. Let B = [bi,j ] be the off diagonal part of A i.e. ( ai,j if i < j bi,j = 0 if j ≤ i. Then kBk ≤ 4ǫ⌈log2 n⌉. The above inequailties (5) and (6) have some merit only if ǫA < 1. Generally one would like to avoid the presence of kfk k (or kf k) inside the brackets in (5), (6). The most desirable would be to have direct estimates of the form kfs k ≤ Cσs (f ). Unfortunatelly in full generality such estimates are not true even when we replace the constant by a function of s. Pn Here is an appropriate example. Let x = √1n j=1 ej ∈ R2n so kxk = 1. Let us consider the dictionary consisting of vectors: e1 , . . . , en , ψj := kej + βn−1/2 xk−1 (ej + βn−1/2 x) for j = n + 1, . . . , n + s plus orthonormal vectors which are orthonormal to make a basis √ to all those √ in R2n . We take β = 4 n and s = ⌊ǫ n⌋ . Then the following are easy to check • The mutual coherence is ≤ n−1/2 . • The Riesz constant√of this basis is nary has RIP(2n, ǫ) √ ǫ so the dictio- • Orthogonal Matching Pursuit for vector x in first s iterations chooses vectors ψj and only later chooses vectors ej . Thus we see that σn (x) = 0 while x − OMPk (x) 6= 0 for k = n + s − 1. For dictionaries with mutual coherence η J. Tropp [7], slightly improving estimate from [4], have proved Theorem 3 If the dictionary has mutual coherence η then √ (8) kfm k ≤ 8 mσm (f ) for m < (3η)−1 . Using this we obtain Theorem 4 Let √ the dictionary Φ satisfies HRIP(k, δ). Then for m ≤ c/ k we have kf⌊m log m⌋ k ≤ Cσm (f ). (9) Let us give a sketch of a proof which follows arguments √ from [3]. We start with m ≤ c′ k for which (8) holds. We set ml = m(2l − 1) and we fix K ∼ k 3/4 . Using HRIP we get that dictionary Φ satisfies RIP(2K, ǫ) with Aǫ ≤ δk −1/8 ≤ βm−1/4 . From Theorem 2 and (8) we get kfm2 k2 ≤ Ckfm1 k(σm (f ) + Aǫkfm1 k) ≤ Ckfm1 k(σm (f ) + 8βm1/4 σm (f )) 2 (f ) ≤ 8Cm1/2 (1 + 8βm1/4 )σm ′ 3/4 2 ≤ C m σm (f ). √ Thus we get kfm2 k ≤ C ′ m3/8 σm (f ). Repeating this argument and carefully tracking constants we see that after at most µ ∼ log log m steps we get the claim. 140 Analogous result from [3] uses only √ mutual coherence and in our case gives (9) for m ≤ c 3 k. The main drawback of Theorem 4 is the limitation on m. It is clear from the above sketch that this restriction is inherited from Theorem 3. It is very unlikely that (8) can be substantially improved using only mutual coherence. We believe however that for dictionaries with RIP or HRIP one can prove more. So let us state the following conjecture Conjecture Assume that the dictionary satisfies HRIP(k, δ). There exist constants C, c, α and β (possibly depending on δ) such that for every f and for m logα m ≤ ck we have kf⌊m logα m⌋ k ≤ Cmβ σm (f ). Let us note that it follows from Theorem 3 that there exists a function ψ(k, δ) and constants C and β such that if the dictionary satisfies HRIP(k, δ) then for every f ∈ H kfm k ≤ Cmβ σm (f ). for m ≤ ψ(k, √ δ). (Clearly Theorem 3 gives β = 1/2 and to know if ψ can ψ(k, δ) ∼ k). It would be interesting √ grow significantly faster than k. References: [1] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, A simple proof of the restricted isometry property for random matrices, Constr. Approx., 28 (2008), no. 3, 253–263. [2] E. Candés, The restricted isometry property and its implications for compressed sensing, Compte Rendus de l’Academie des Sciences, Paris, Series I, 346(2008), 589–592. [3] D.Donoho, M.Elad, V.N.Temlyakov, On Lebesguetype inequalities for greedy approximation J. Approx. Theory 147 (2007), no. 2, 185–195. [4] A.Gilbert, S.Muthukrishnan, M.Strauss, Approximation of functions over redundant dictionaries using coherence, Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (Baltimore, MD, 2003), 243–252, ACM, New York, 2003. [5] S.Kwapień, A. Pełczyński, The main triangle projection in matrix spaces and its applications Studia Math. 34 (1970) 43–68. [6] V.N.Temlyakov, Greedy approximation, Acta Numerica 17 (2008) 235–409 [7] J.Tropp, Greed is good: Algorithmic results for sparse approximation, IEEE Trans. Inform. Theory, 50 (2004), 2231-2242 SAMPTA'09 141 SAMPTA'09 142 Average Case Analysis of Multichannel Basis Pursuit Holger Rauhut (1) , Yonina C. Eldar (2) (1) Hausdorff Center for Mathematics, and Institute for Numerical Simulation, University of Bonn Endenicher Allee 62, 53115 Bonn, Germany. (2) Department of Electrical Engineering, Technion, Israel Institute of Technology, Haifa, Israel 32000. rauhut@hcm.uni-bonn.de, yonina@ee.technion.ac.il Abstract: We consider the recovery of jointly sparse multichannel signals from incomplete measurements using convex relaxation methods. Worst case analysis is not able to provide insights into why joint sparse recovery is superior to applying standard sparse reconstruction methods to each channel individually. Therefore, we analyze an average case by imposing a probability model on the measured signals. We show that under a very mild condition on the sparsity and on the dictionary characteristics, measured for example by the coherence, the probability of recovery failure decays exponentially in the number of channels. This demonstrates that most of the time, multichannel sparse recovery is indeed superior to single channel methods. 1. Introduction Recovery of sparse signals from a small number of measurements is a fundamental problem in many different signal processing tasks such as image denoising [3], analogto-digital conversion [21, 11], radar, compression, inpainting, and many more. The recent framework of compressed sensing (CS), founded in the works of Donoho [8] and Candes [3], studies acquisition methods as well as efficient computational algorithms that allow reconstruction of a sparse vector x from linear measurements y = Ax, where A is referred to as the measurement matrix. The key observation is that y can be relatively short, and still contain enough information to recover x. Determining the sparsest vector x consistent with the data y = Ax is generally an NP-hard problem [7]. To determine x in practice, a multitude of efficient algorithms have been proposed. The most extensively studied recovery method by now is the ℓ1 -minimization approach (Basis Pursuit). Greedy methods, such as simple thresholding [23] or orthogonal matching pursuit (OMP) [26], are faster in practice, but BP provides significantly better recovery guarantees [10, 22]. The BP principle as well as greedy approaches have been extended to the multichannel setup where the signal consists of several channels [29, 30, 15, 6, 5, 20, 12, 13, 18]. Here one assumes that each channel is sparse and in addition that the channels have a small common support set. In this situation the signals are called jointly sparse. A variety of theoretical recovery results have been established SAMPTA'09 already in this setting. In [5] a recovery result was derived for a mixed ℓp /ℓ1 program (multichannel BP) in which the objective is to minimize the sum of the ℓp -norms of the rows of the estimated matrix whose columns are the unknown vectors. Recovery results for the more general problem of blocksparsity were developed in [13] based on the restricted isometry property (RIP), and in [12] based on mutual coherence. In practice, multichannel reconstruction techniques perform much better than recovering each channel individually. However, the theoretical equivalence results predict no performance gain. The reason is that these recovery results apply to all possible input signals, and are therefore worst-case measurements. Clearly, if we input the same signal to each channel, then no additional information on the joint support is provided from multiple measurements. Therefore, in this worst-case scenario there is no advantage for multiple channels. In order to capture more closely the true underlying behavior of existing algorithms and observe a performance gain when using several channels, we consider an average analysis. In this setting, the inputs are considered to be random variables so that the case of identical inputs in all channels has zero probability. The idea is to develop conditions on the measurement matrix A such that the inputs can be recovered with high probability given a certain input distribution. Most existing recovery results focus on worst-case analysis. Recently, there have been several papers that consider random ensembles. In [25] random sub-dictionaries of A are considered and analyzed. This allows to obtain results for BP with a single input channel. In [23], average-case performance of single channel thresholding was studied. These ideas were then extended to two multichannel recovery algorithms: thresholding and simultaneous OMP (SOMP) [18, 17]. Under a mild condition on the sparsity and on the matrix A, it was shown that the probability of reconstruction failure decays exponentially with the number of channels. In the present paper we contribute to this line of research by adding an average-case analysis of multichannel BP, that is mixed ℓ2 /ℓ1 -minimization [30, 15, 13, 12]. We denote by AS the submatrix of A consisting of the columns indexed by S ⊂ 1, . . . , N , while X S is the submatrix of X consisting of the rows of X indexed by S. The ℓth column of A is denoted by aℓ or Aℓ . The ℓp -norm is denoted by k · kp while k · kF is the Frobenius norm. 143 2. Multichannel ℓ1 -minimization We consider multichannel signal recovery where our goal is to recover a jointly-sparse matrix X ∈ CN ×L from n linear measurements per channel. Here N denotes the signal length and L the number of channels, i.e., the number of signals. We assume that X is jointly k-sparse, meaning that there are at most k rows in the matrix X that are not identically zero. More formally, SL we define the support of the matrix X as supp X = ℓ=1 supp Xℓ , where the support of the ℓth column is supp Xℓ = {j, Xjℓ 6= 0}. Our assumption is that kXk0 = k where kXk0 is equal to the size of the support. The measurements are given by Y = AX, Y ∈ Cn×L , (1) where A ∈ Cn×N is a given measurement matrix. Each measurement vector Yℓ corresponds to a measurement of the corresponding signal Xℓ . The natural approach to determine X given Y is to solve the problem min kXk0 X such that AX = Y. N X j=1 kX j k2 , subject to AX = Y, (3) which promotes joint sparsity, as argued for instance in [15]. In the single channel case L = 1 this is the usual BP principle. 3. (1 − δ)kxk22 ≤ kAxk22 ≤ (1 + δ)kxk22 , for all vectors x that are 2k-sparse. Let X ∈ CN ×L , Y = AX, and let X be the minimizer of (3). Then kX − XkF ≤ Ck −1/2 kX − X̂ (k) k1,2 (2) However, (2) is NP hard in general [7]. Therefore, we consider instead the convex relaxation [30, 15, 13] defined by min kXk2,1 = Note that in both of the cited results the conditions do not depend on the number of channels. Indeed, under the same conditions as in Propositions 3..1 and 3..2, it is shown in [26] that BP will recover a single k-sparse vector. Therefore, if (4) holds, then instead of solving (3) we may as well use BP on each of the columns of Y . q N −n The coherence is lower bounded by µ ≥ n(N −1) √ [24]. The lower bound behaves like 1/ n for large N , which limits √ the Proposition 3..2 to maximal sparsities k = O( n). To improve on this we can generalize existing recovery results [3, 2] based on RIP to the multichannel setup. The next proposition follows from [13]: √ Proposition 3..3 Assume X ∈ Cn×N with δ2k < 2 − 1, where δ2k is the smallest constant δ such that p where C is a constant, kXkF = Tr(X ∗ X) is the PN j Frobenius norm of X, kXk1,2 = j=1 kX k2 , and X̂ (k) denotes the best k-term approximation of X, i.e., supp X̂ (k) consists of the indices corresponding to the k largest row norms kX ℓ k2 . In particular, recovery is exact if | supp X| ≤ k. It is well known that Gaussian and √ Bernoulli random matrices A ∈ Rn×N satisfy δ2k ≤ 2 − 1 with high probability as long as [1, 4] n ≥ Ck log(N/k). Worst Case Recovery Results Recovery results for the program (3) were considered in [5, 13, 12]. In particular, the lemma below is derived in [5] and follows also from [12]. Proposition 3..1 Let S ⊂ 1, . . . , N and suppose that kA†S aℓ k1 < 1 for all ℓ ∈ / S, (4) with A†S = (A∗S AS )−1 A∗S denoting the pseudo-inverse of AS . Then (3) recovers all X ∈ CN ×L with supp X = S from Y = AX. Assuming the columns of A are normalized, kaℓ k2 = 1, we can guarantee that (4) holds as long as the coherence µ of A is small enough, where [9] µ = max |haj , aℓ i|. j6=ℓ (5) The following result follows from Proposition 3..1 or from [12] by noting that the block coherence in this setting is equal to µ/d. Proposition 3..2 Assume that (2k − 1)µ < 1. (6) Then (3) recovers all X with kXk0 ≤ k from Y = AX. SAMPTA'09 (7) Therefore, Proposition 3..3 allows for a smaller number of measurements. However, there is still no dependency on the number of channels. Indeed, under the same RIP condition BP will recover a single k-sparse vector and therefore, as before, BP may as well be applied to each of the columns of Y individually. 4. Average Case Analysis Intuitively, we would expect multichannel sparse recovery to perform better than single channel recovery. However, in the worst case setting this is not true as already suggested by the results cited above. The reason is very simple. If each channel carries the same signal, Xℓ = x for ℓ = 1, . . . , L, then also the components of Y = AX are all the same and we do not have more information on the support of X than provided by a single component Yℓ . This can indeed be proven rigorously. Proposition 4..1 Suppose there exists a k-sparse vector x ∈ RN that ℓ1 -minimization is not able to recover from y = Ax. Then there exists a k-sparse multichannel signal X ∈ RN ×L for which mixed ℓ2 /ℓ1 -minimization fails on Y = AX. 144 For the simple proof we refer to the journal version [14]. Realizing that (3) is not more powerful than usual BP in the worst case, we seek an average-case analysis. This means that we impose a probability model on the k-sparse X. In particular, as in [18], we will assume that on the ksparse support set S the coefficients of X are independent and follow a normal distribution, X S = ΣΦ (8) where Σ = diag(σj , j ∈ S) ∈ Rk×k is an arbitrary diagonal matrix with non-zero diagonal elements σj , while Φ ∈ Rk×L is a Gaussian random matrix, i.e., all entries are independent standard normal random variables. Note that taking Σ to be the identity matrix results in a standard Gaussian random matrix, while taking arbitrary non-zero σj ’s on the diagonal of Σ allows for different variances. The following recovery condition is instrumental in proving average case recovery results for multichannel BP. It generalizes results of [27, 16] for the monochannel case. In order to introduce we need to introduce the sign sgn(X) of a signal matrix, ( Xℓj , kX ℓ k2 6= 0; kX ℓ k2 sgn(X)ℓj = 0, kX ℓ k2 = 0. Proposition 4..2 Let X ∈ CN ×L with supp X = S and assume AS to be non-singular. If k sgn(X S )∗ A†S aℓ k2 < 1 for all ℓ ∈ /S (9) then X is the unique minimizer of (3). Combining the above proposition with a concentration inequality for sums of independent random variables that are uniformly distributed on the sphere [19], we arrive at the following average case recovery result for multichannel BP. Theorem 4..3 Let S ⊂ {1, . . . , N } be a set of cardinality k and let X ∈ RN ×L with supp X ⊂ {1, . . . , N } such that the coefficients on S are given by (8) with some diagonal matrix Σ ∈ Rk×k . If kA†S aℓ k2 ≤ α < 1 for all ℓ ∈ / S, then with probability at least   L 1 − N exp − (α−2 − log(α−2 ) − 1) 2 (10) kA†S aℓ k2 ≤ δ <1 1−δ for all ℓ ∈ / S. Note that in contrast to the worst case result in Proposition 3..3 where a condition on δ2k is needed, we only require that δk+1 is small, which is clearly weaker. For random matrices A we have the following bound on kA†S aℓ k2 . Proposition 4..5 Let S ⊂ {1, . . . , N } be a set of cardinality k and suppose that A ∈ Rn×N is drawn at random according to a Gaussian or Bernoulli distribution. Then kA†S aℓ k2 ≤ δ for all ℓ ∈ /S with probability at least 1 − ǫ provided that n ≥ Cδ −2 [(k + 1) ln(1 + 12/δ) + ln(2N/ǫ)]. (12) The constant C is no larger than 162/7 ≈ 23.1. Note that the log-factor in (12) enters only as an additive term, while in (7) it appears as multiplicative factor. 5. Conclusion Our main result is that under mild conditions on the sparsity and measurement matrix, the probability of failure of multichannel BP (3) decays exponentially with the number of channels. To develop this result we assumed a probability model on the non-zero coefficients of a jointly sparse signal. This shows that multichannel BP outperforms single channel BP applied to each channel individually, on average. Proofs of our theorems, together with improved results for simple thresholding and numerical experiments will appear in [14]. 6. Acknowledgements The work of YE was supported in part by the Israel Science Foundation under Grant no. 1081/07 and by the European Commission in the framework of the FP7 Network of Excellence in Wireless COMmunications NEWCOM++ (contract no. 216715). HR acknowledges funding by the Hausdorff Center for Mathematics, University of Bonn and the WWTF project SPORTS (MA 07-004). (11) (3) recovers X from Y = AX. The proof of the theorem will appear in the journal version [14]. For α < 1 we are guaranteed that the exponent has a negative argument, and therefore the error decays exponentially in L. We note that for the monochannel case L = 1, Theorem 4..3 is contained implicitly in [28, Theorem 13]. The appearance of the 2-norm in (10) instead of the 1-norm as in (4) makes the condition of the theorem weaker than worst-case estimates. Let us finally state conditions on the matrix A and the sparsity level k ensuring that kA†S aℓ k2 is small, which is needed in order to apply Theorem 4..3. SAMPTA'09 Proposition 4..4 Suppose A has restricted isometry constant δk+1 ≤ δ < 1/2. If S ⊂ {1, . . . , N } has cardinality k then References: [1] R. G. Baraniuk, M. Davenport, R. A. DeVore, and M. Wakin. A simple proof of the restricted isometry property for random matrices. Constr. Approx., 28(3):253–263, 2008. [2] E. J. Candès. The restricted isometry property and its implications for compressed sensing. Compte Rendus de l’Academie des Sciences, Paris, Serie I, 346:589–592, 2008. [3] E. J. Candès, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math., 59(8):1207–1223, 2006. 145 [4] E. J. Candès and T. Tao. Near optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inform. Theory, 52(12):5406–5425, 2006. [5] J. Chen and X. Huo. Theoretical results on sparse representations of multiple-measurement vectors. IEEE Trans. Signal Processing, 54(12):4634–4643, Dec. 2006. [6] S. F. Cotter, B. D. Rao, K. Engan, and K. KreutzDelgado. Sparse solutions to linear inverse problems with multiple measurement vectors. IEEE Trans. Signal Processing, 53(7):2477–2488, July 2005. [7] G. Davis, S. Mallat, and M. Avellaneda. Adaptive greedy approximations. Constr. Approx., 13(1):57– 98, 1997. [8] D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52(4):1289–1306, 2006. [9] D. L. Donoho and X. Huo. Uncertainty principles and ideal atomic decomposition. IEEE Transactions Info. Theory, 47(7):2845–2862, 2001. [10] David L. Donoho. For most large underdetermined systems of linear equations the minimal l1 solution is also the sparsest solution. Commun. Pure Appl. Anal., 59(6):797–829, 2006. [11] Y. C. Eldar. Compressed sensing of analog signals. submitted to IEEE Trans. Signal Processing. [12] Y. C. Eldar and H. Bölcskei. Block-sparsity: Coherence and efficient recovery. to appear in ICASSP09. [13] Y. C. Eldar and M. Mishali. Robust recovery of signals from a union of subspaces. submitted to IEEE Trans. Inf. Theory. [21] M. Mishali and Y. C. Eldar. Blind multi-band signal reconstruction: Compressed sensing for analog signals. IEEE Trans. Signal Process., 57(3):993–1009, Mar. 2009. [22] H. Rauhut. On the impossibility of uniform sparse reconstruction using greedy methods. Sampl. Theory Signal Image Process., 7(2):197–215, 2008. [23] K. Schnass and P. Vandergheynst. Average performance analysis for thresholding. IEEE Signal Processing Letters, 14(11):828–831, Nov. 2007. [24] T. Strohmer and R. W. Heath. Grassmannian frames with applications to coding and communication. Appl. Comput. Harmon. Anal., 14(3):257–275, 2003. [25] G. Teschke. Multi-frame representations in linear inverse problems with mixed multi-constraints. Appl. Comput. Harmon. Anal., 22(1):43–60, 2007. [26] J. A. Tropp. Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform. Theory, 50(10):2231–2242, 2004. [27] J. A. Tropp. Recovery of short, complex linear combinations via l1 minimization. IEEE Trans. Inform. Theory, 51(4):1568–1570, 2005. [28] J. A. Tropp. On the conditioning of random subdictionaries. Appl. Comput. Harmon. Anal., to appear. [29] J. A. Tropp, A. C. Gilbert, and M. J. Strauss. Algorithms for simultaneous sparse approximation: part I: Greedy pursuit. Signal Processing, 86(3):572 – 588, 2006. [30] J. A. Tropp, A. C. Gilbert, and M. J. Strauss. Algorithms for simultaneous sparse approximation: part II: Convex relaxation. Signal Processing, 86(3):589 – 602, 2006. [14] Y. C. Eldar and H. Rauhut. Average case analysis for multichannel sparse recovery using convex relaxation. preprint, 2009. [15] M. Fornasier and H. Rauhut. Recovery algorithms for vector valued data with joint sparsity constraints. SIAM J. Numer. Anal., 46(2):577–613, 2008. [16] J. J. Fuchs. On sparse representations in arbitrary redundant bases. IEEE Trans. Inform. Theory, 50(6):1341–1344, 2004. [17] R. Gribonval, B. Mailhe, H. Rauhut, K. Schnass, and P. Vandergheynst. Average case analysis of multichannel thresholding. In Proc. IEEE Intl. Conf. Acoust. Speech Signal Process., 2007. [18] R. Gribonval, H. Rauhut, K. Schnass, and P. Vandergheynst. Atoms of all channels, unite! Average case analysis of multi-channel sparse recovery using greedy algorithms. J. Fourier Anal. Appl., 14(5):655–687, 2008. [19] H. König and S. Kwapień. Best Khintchine type inequalities for sums of independent, rotationally invariant random vectors. Positivity, 5(2):115–152, 2001. [20] M. Mishali and Y. C. Eldar. Reduce and boost: Recovering arbitrary sets of jointly sparse vectors. IEEE Trans. Signal Process., 56(10):4692–4702, Oct. 2008. SAMPTA'09 146 Special session on Sampling Using Finite Rate of Innovation Principles Chairs: Pier-Luigi DRAGOTTI, Pina MARZILIANO SAMPTA'09 147 SAMPTA'09 148 Sampling of Sparse Signals in Fractional Fourier Domain Ayush Bhandari (1) and Pina Marziliano (2) (1)Temasek Labs @ NTU, 50 Nanyang Drive, Singapore - 637553 (2) School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore - 639798 {ayushbhandari,epina}@ntu.edu.sg Abstract: In this paper, we formulate the problem of sampling sparse signals in fractional Fourier domain. The fractional Fourier transform (FrFT) can be seen as a generalization of the classical Fourier transform. Extension of Shannon’s sampling theorem to the class of signals which are fractional bandlimited shows its association to a Nyquist-like bound. Thus proving that signals that have a non-bandlimited representation in FrFT domain cannot be sampled. We prove that under suitable conditions, it is possible to sample sparse (in time) signals by using the Finite Rate of Innovation (FRI) signal model. In particular, we propose a uniform sampling and reconstruction procedure for a periodic stream of Diracs, which have a nonbandlimited representation in FrFT domain. This generalizes the FRI sampling and reconstruction scheme in the Fourier domain to the FrFT domain. where  q 2 2 1−j cot θ j t +ω   e 2 2π def Kθ (t, ω) = δ(t − ω),   δ(t + ω), cot θ−jωt csc θ , θ 6= pπ θ = 2pπ θ + π = 2pπ (2) is the transformation kernel, parametrized by the fractional order θ ∈ R and p is some integer. The FrFT of a time-frequency representation e.g. Gabor Transform results in rotation of the plane by the fractional order of the FrFT [2]. Thus, we denote fractional order by θ and from now on, we will use fractional order and angle interchangeably. The inverse-FrFT with respect to angle θ is the FrFT at angle −θ, given by, Z ∞ x(t) = x bθ (ω) K−θ (t, ω) dω. (3) −∞ 1. Introduction Shannon’s sampling theorem [1] provides access to the digital world. Our understanding of this sampling theorem together with the reconstruction formula is solely based on the frequency content of the signal of interest. This is where the indispensable Fourier transform comes into the picture. Almeida [2] introduced the fractional Fourier transform or the FrFT–a generalization of the Fourier transform–to the signal processing community in 1994. The generalization of the Fourier transform by FrFT has several interesting consequences from the signal processing perspective. For instance, non-bandlimited signals in the Fourier domain can still have a compactly supported representation in FrFT domain [3], when dealing with non stationary distortions, the FrFT based filters can perform better than Fourier domain based filters (in sense of mean square error) [4] etc. To give the reader an idea about the growing popularity of FrFT, it would be worth mentioning that on at least eight occasions including, [3, 5, 6, 7, 8, 9, 10, 11], Shannon’s sampling theorem [1, 12] was independently extended to the class of fractional bandlimited signals. In [13], the FrFT of a signal or a function, say x(t), is defined by x bθ (ω) = FrFT{x(t)} = SAMPTA'09 Z x(t)Kθ (t, ω)dt (1) Whenever θ = π/2, (1) collapses to the classical Fourier transform definition. A direct consequence of the generalization of the Fourier transform by the FrFT results in a modification in the idea of bandlimitedness. Its impact is visible in the change that manifests in Shannon’s sampling theorem for fractional bandlimited signals [11], which is stated in Theorem 1. Theorem 1 (Shannon–FrFT). Let x(t) be a continuoustime signal. If the spectrum of x(t), i.e. x bθ (ω) is fractional bandlimited to ωm which means, x bθ (ω) = 0, when |ω| > ωm , then x(t) is completely determined by giving its ordinates at a series of equidistant points spaced T = ωπm sin θ seconds apart. This theorem has an equivalence to the Shannon’s sampling theorem for θ = π/2. The reconstruction formula for fractional bandlimited signals is given in [11], X x(t) = λ∗θ (t) λθ (nT ) x(nT )sinc ((t − nT ) ωm csc θ) n∈Z (4) is a domain independent chirp where λθ (·) = e modulation function and the ‘*’ in the superscript denotes complex conjugation. If x e (t) is the approximation of 2 x(t), then ke x (t) − x(t)k = 0 when ωm 6 ω2s sin θ–the Nyquist rate for FrFT–where ωs = 2π/T is the sampling frequency. Note that all the aforementioned results are equivalent to Shannon’s sampling theorem with respect to def θ j(·)2 cot 2 149 Fourier domain for θ = π/2. Theorem 1 (for FrFT) has a striking similarity with the Shannon’s sampling theorem (for FT), in that, sampling non-bandlimited signals is impossible. Consider Dirac’s delta function or δ(t). Using (2), we have, q (5) δbθ (ω) = FrFT {δ (t)} = 1−j2πcot θ λθ (ω) which is a non-bandlimited function (and least sparse when compared to the time-domain counterpart) and thus, Theorem 1 fails to answer the following question: If x(t) is a fractional non-bandlimited signal, then, how can we sample and reconstruct such a signal? To make this statement clear, we introduce the fractional convolution operator, which is denoted by ‘∗∗θ ’. Accordingly, filtering x(t) by a filter, h(t), in ‘fractional sense’ 1 is equivalent to [14], q 2. 2.1 2.2 Fractional Fourier Series (FrFS) Periodic signals can be expanded in FrFT domain as a fractional Fourier series or FrFS [19]. The FrFS of a periodic signal, say x(t), can be written as, x(t) = x bθ [m]Φθ (m, t) (8) where, We model our sparse signal as a periodic stream of K Diracs, i.e. K−1 X k=0 = r sin θ − j cos θ j t2 +(2πm sin θ/τ )2 2 e τ ck X δ(t − tk − nτ ) (7) n∈Z x bθ [m] = Z hτ i x(t)Φ∗θ (m, t)dt = hx, Φθ (m, ·)i (9) where hτ i denotes the integral width and ha, bi = R a(t)b∗ (t)dt denotes the inner product. The well-known Fourier series (FS) is just a special case of FrFS for θ = π2 . 3. Stream of Diracs in Fractional Fourier Domain In Fourier analysis, the Poisson summation formula (PSF) plays an important role. It is a well-known fact that a stream of Diracs (Dirac comb) in time-domain is another stream of Diracs in Fourier domain. In this subsection, we will derive the equivalent representation of Dirac comb in FrFT domain. This can be seen as a generalization of the Poisson summation formula for Dirac comb in FrFT domain. Theorem 2. Let P n∈Z δ(t − nτ ) be a Dirac comb, then FrFT δ (t − nτ ) ←→ −j 1 q 2π X b δθ [kω0 sin θ] e 1−j cot θ τ  t2 + (kω0 sin θ)2 2  cot θ+jkω0 t k∈Z where ω0 = 2π τ . def P Proof. Let s(t) = n∈Z δ(t − nτ ). The proof is done by expanding s(t) in FrFS basis or, 1 We adhere to this modified definition of convolution operator as it inherits the fractional Fourier duality property, in that, FrFT {x(t) ∗θ h(t)} = λ∗θ (ω) · x bθ (ω)b hθ (ω), which does not hold for the FrFT of x(t) ∗ h(t) unless θ = π2 . cot θ−j2πmt/τ constitutes the basis for FrFS expansion for a τ -periodic x(t). The FrFS coefficients are given by, n∈Z Sparse Signal Model x(t) = Φ∗θ (m, t) X Preliminaries SAMPTA'09 X m∈Z 1−j cot θ ∗ λθ (t)·([x(t)λθ (t)] 2π ∗ [h(t)λθ (t)]) (6) where ‘∗’ denotes the usual convolution operator. In light of this definition, we wish to address the problem of recovering parsimonious x(t) from the samples of its filtered version, i.e., y(nT ) = x(t) ∗θ h(t)|t=nT , n ∈ Z. This problem has a natural/strong link with that of sparse sampling [15, 16, 17]. The Heisenberg-Gabor uncertainty principle for the FrFT [18] (a generalization of the Fourier duality) asserts that the product of spreads of x bθ (ω) and 2 x(t) has a lower bound which is proportional to sin4 θ (assuming that kxk = 1). This implies that sparsity in one domain will lead to loss of compact support in canonically conjugate domain. Our contribution in this article is to propose a sampling and reconstruction scheme for signals which have a sparse representation in time domain and whose fractional spectrum is non-bandlimited. We model our sparse signal as a continuous periodic stream of Diracs which is being observed by an acquisition device which deploys a sincbased filter. The paper is organized as follows: We assume that the reader is familiar with basic ideas outlined in [12, 16, 17]. In Section II, we introduce our sparse signal model and the definition of the fractional Fourier series (FrFS). Using these as preliminaries, in Section III, we derive an equivalent representation of our signal in FrFT domain. In Section IV, we discuss the sampling theorem and its completeness and Section V is the conclusion. x(t)∗θ h(t) = K−1 with period τ , weights {ck }k=0 and arbitrary shifts, K−1 {tk }k=0 ⊂ [0, τ ). In sense of [16], the signal has 2K degrees of freedom per period and the rate of innovation being ρ = 2K τ . From now on, the signal x(t) will denote the stream of Diracs. s(t) = X k∈Z hs, Φθ i Φθ (k, t). | {z } (10) s bθ [k] 150 Note that x(t) is non-bandlimited, however, it can be completely described by the knowledge of p[m] which in turn can be expanded as a linear combination of K complex exponentials. The coefficients of this expansion are given by, (9) sbθ [k] = hs, Φθ (k, t)i tZ 0 +τ κ (θ) = √ τ s (t) Φ∗θ (k, t) dt, ∀t0 ∈ R 4. t0 Zτ /2 κ (θ) = √ τ 2 δ (t) ej (t +(kω0 sin θ)2 /2) cot θ−jkω0 t −τ /2 (since s(t + τ ) = s(t) and s(t) = δ (t) , t ∈ dt We assume that a sinc–based kernel is used to prefilter x(t). In particular, we let the sampling ker-  , τ2 )  −τ 2 cot θ inition in (6), prefiltering the input signal x(t) with the kernel/low-pass filter ϕ(−t) and sampling can be written as, y (nT ) = x(t) ∗θ ϕ(−t)|t=nT . The main result is in the form of the following theorem. Theorem 3. Let x(t) be a τ -periodic stream of Diracs K−1 K−1 weighted by coefficients {ck }k=0 and locations {tk }k=0 2K with finite rate of innovation ρ = τ . Let the sampling kernel/prefilter ϕ(t) be an ideal low-pass filter which has fractional bandwidth [−Bπ, Bπ], where B is chosen such that B ≥ ρ. If the filtered version of x(t), i.e. y(t) = x(t) ∗θ ϕ(−t) is sampled uniformly at locations t = nT , n = 0, . . . , N − 1 then the samples, k∈Z  For q sake of convenience, we will assume that the constant 1−j cot θ has been absorbed in τ . Note that at θ = π2 , 2π P 1 s(t) = τ k∈Z ejkω0 t which is the result of applying the PSF on s(t) in Fourier domain. Our immediate goal now is to derive the FrFS equivalent of x(t) in (7). Since x(t) is a linear combination of some s(t) delayed by some time shift tk , it will be useful to recall shift property of FrFT [2] which states that, FrFT {s (t − tk )} 1 y (nT ) = x(t) ∗θ ϕ (−t)|t=nT , n = 0, . . . , N − 1, are a sufficient characterization of x(t), provided that  N ≥ 2Mθ + 1 and Mθ = Bτ 2csc θ . Proof. Using the following FrFT pair, q FrFT 1−j cot θ ∗ ω λθ (ω) · rect( 2πB ) ←→ 2π (B csc θ) λ∗θ (t) sinc (Bt csc θ) 2 (12) = sbθ (ω − tk cos θ) ej 2 tk sin θ cos θ−jωtk sin θ . PK−1 Therefore, call x(t) = k=0 ck · sk (t) where sk (t) is the time-shifted version of s(t) with shift parameter tk . Using Theorem 2 and the shift-property of FrFT, we have, X δ(t − tk − nτ ) sk (t) = we define our sampling kernel as, ϕB (t − nT ) = λ∗θ (t) ϕ (B csc θ (t − nT )) which is compactly supported over [−Bπ, Bπ]. Prefiltering and sampling x(t) results in, y (nT ) = x(t) ∗θ ϕ (−t)|t=nT , n = 0, . . . , N − 1 λ∗ (nT ) X = θ p[m] τ m∈Z E D 2πm × ej τ t , (B csc θ) sinc ((B csc θ) (t − nT )) . n∈Z (8) = X FrFT{ δ(t − tk )} |ω=mω0 sin θ Φθ (m, t) m∈Z (12) = 2 2 cot θ 1X ej 2 (tk −t )+jmω0 (t−tk ) . m∈Z {z } |τ The inner product in the above step is further simplified using the Fourier integral, D 2πm E ej τ t , (B csc θ) sinc ((B csc θ) (t − nT )) = PSF for Dirac Comb in FrFT Having obtained the FrFT-version of sk (t), we can write, x(t) = K−1 X k=0 = K−1 X ck · ck k=0 X n∈Z X X 1 τ m∈Z SAMPTA'09 cot θ 2 j rect( Bτ m csc θ )e (t2k −t2 )+jmω0 (t−tk ) m∈Z θ −jt2 cot 2 =e δ(t − tk − nτ ) ej | K−1 X k=0 2 nel to be ϕn (t) = e−j 2 t sinc(t − nT ). Integer translates of ϕn (t) form an orthonormal basis and the FrFT of bθ (ω) =  ϕ(t)(= ϕ0 (t)) is given by ϕ q cot θ −j 2 ω 2 1−j cot θ rect(ω/2π). In light of the defe 2π 2 κ (θ) = √ ej ((kω0 sin θ) /2) cot θ τ r (5) κ (θ) 2π = √ δbθ [kω0 sin θ] (11) 1 − j cot θ τ √ where κ (θ) = sin θ − j cos θ. Back substitution of (11) in (10) results in, 1 q 2π s(t) = τ 1−j cot θ   (kω0 sin θ)2 X −j t2 + cot θ+jkω0 t 2 δbθ [kω0 sin θ] e × . This concludes the proof. Sampling and Reconstruction of Sparse Signals in Fractional Fourier Domain 2 θ j cot 2 (tk )−jmω0 tk ck e {z p[m] ! ej } 2πm τ t . 2πm τ (nT ) . We can therefore conclude that, λ∗ (nT ) X j 2πm τ (nT ) p[m] rect( Bτ m y (nT ) = θ csc θ )e τ m∈Z = λ∗θ (nT ) τ Mθ X m=−Mθ p[m]ej 2πm τ (nT ) , n = 0, . . . , N − 1 151 Figure 1: Sampling and reconstruction of periodic stream of Diracs in FrFT domain. where Mθ =  Bτ csc θ  2 . Signal reconstruction from its samples: Call p[m] = PK−1 m k=0 ak uk – a linear combination  of K-complex ex√ ω0 tk with weights ak = ponentials, uk = λ∗π/2 K−1 ck · λθ (tk ). The problem of calculating {ak }k=0 K−1 and {uk }k=0 is based on finding a suitable polyno QK−1 −1 mial A(z) = whose inverse zk=0 1 − uk z transform yields the annihilating filter coefficients, A[m] which annihilate p[m]. In matrix notation, finding A[m] is equivalent to finding a corresponding vector A that forms a null space of a suitable submatrix of p[m] i.e. P(2Mθ −K+1)×(K+1) – which is essentially the set  Null(P) = A ∈ RK+1 : P · A = 0 . For details of this computation, the reader is referred to (cf. Pg. 1427, [16]). Figure 1 shows the layout of this algorithm.  5. Conclusion We presented a scheme for sampling and reconstruction of sparse signals in fractional Fourier domain. A direct consequence of modeling our signal of interest as a Finite Rate of Innovation signal, is that, the outcome bears an acute resemblance with the results previously derived, for the Fourier domain case. This simplifies the problem to the extent that reconstruction strategy remains unchanged and as we have shown, one can obtain the precise locations and amplitudes of the stream of Diracs using the annihilating filter method. Since time and frequency domains are special cases of the FrFT domain, it turns out that the number of values (Mθ ) required for exact reconstruction of time domain signal depends on the chirp rate of transformation, i.e. θ. References: [1] C. E. Shannon. Communications in the presence of noise. Proc. of the IRE, 37:10–21, January 1949. [2] L. B. Almeida. The fractional Fourier transform and time-frequency representations. IEEE Trans. Signal Proc., 42(11):3084–3091, Nov 1994. [3] X. G. Xia. On bandlimited signals with fractional Fourier transform. IEEE Signal Proc. Letters, 3(3):72–74, Mar 1996. SAMPTA'09 [4] A. Kutay, H. M. Ozaktas, O. Ankan, and L. Onural. Optimal filtering in fractional Fourier domains. IEEE Trans. Signal Proc., 45(5):1129–1143, May 1997. [5] A. I. Zayed. On the relationship between the Fourier and fractional Fourier transforms. IEEE Signal Proc. Letters, 3(12):310–311, Dec 1996. [6] T. Erseghe, P. Kraniauskas, and G. Carioraro. Unified fractional Fourier transform and sampling theorem. IEEE Trans. Signal Proc., 47(12):3419–3423, Dec 1999. [7] A. I. Zayed and A. G. Garcı́a. New sampling formulae for the fractional Fourier transform. Signal Proc., 77(1):111– 114, 1999. [8] A. G. Garcı́a. Orthogonal sampling formulas: A unified approach. SIAM Rev., 42(3):499–512, 2000. [9] Ç. Candan and H. M. Ozaktas. Sampling and series expansion theorems for fractional Fourier and other transforms. Signal Proc., 83(11):2455–2457, 2003. [10] R. Torres, P. F. Pellat, and Y. Torres. Sampling theorem for fractional bandlimited signals: A self-contained proof. application to digital holography. IEEE Signal Proc. Letters, 13(11):676–679, Nov. 2006. [11] R. Tao, B. Deng, Z.-Q. Wei, and Y. Wang. Sampling and sampling rate conversion of band limited signals in the fractional Fourier transform domain. IEEE Trans. Signal Proc., 56(1):158–171, Jan. 2008. [12] M. Unser. Sampling-50 years after Shannon. Proc. IEEE, 88(4):569–587, 2000. [13] H. M. Ozaktas and M. A. Kutay. Introduction to the fractional Fourier transform and its applications. Academic Press, 1999. [14] P. Kraniauskas, G. Cariolaro, and T. Erseghe. Method for defining a class of fractional operations. IEEE Trans. Signal Proc., 46(10):2804–2807, Oct 1998. [15] P. Marziliano. Sampling innovations. PhD thesis, EPFL, Switzerland, 2001. [16] M. Vetterli, P. Marziliano, and T. Blu. Sampling signals with finite rate of innovation. IEEE Trans. Signal Proc., 50(6):1417–1428, Jun 2002. [17] T. Blu, P.-L. Dragotti, M. Vetterli, P. Marziliano, and L. Coulot. Sparse sampling of signal innovations. IEEE Signal Proc. Mag., 25(2):31–40, March 2008. [18] S. Shinde and V. M. Gadre. An uncertainty principle for real signals in the fractional Fourier transform domain. IEEE Trans. Signal Proc., 49(11):2545–2548, Nov 2001. [19] S. C. Pei, M. H. Yeh, and T. L. Luo. Fractional Fourier series expansion for finite signals and dual extension to discrete-time fractional Fourier transform. IEEE Trans. Signal Proc., 47(10):2883–2888, Oct 1999. 152 Estimating Signals With Finite Rate of Innovation From Noisy Samples: A Stochastic Algorithm Vincent Y. F. Tan and Vivek K Goyal Massachusetts Institute of Technology, Cambridge, MA 02139 USA vtan@mit.edu, vgoyal@mit.edu Abstract: As an example of the concept of rate of innovation, signals that are linear combinations of a finite number of Diracs per unit time can be acquired by linear filtering followed by uniform sampling. However, in reality, samples are not noiseless. In a recent paper, we introduced a novel stochastic algorithm to reconstruct a signal with finite rate of innovation from its noisy samples. Even though variants of this problem has been approached previously, satisfactory solutions are only available for certain classes of sampling kernels, for example kernels which satisfy the Strang–Fix condition. In our paper, we considered the infinite-support Gaussian kernel, which does not satisfy the Strang–Fix condition. Other classes of kernels can be employed. Our algorithm is based on Gibbs sampling, a Markov chain Monte Carlo (MCMC) method. This paper summarizes the algorithm and provides numerical simulations that demonstrate the accuracy and robustness of our algorithm. 1. Introduction The celebrated Nyquist–Shannon sampling theorem [4, 6] states that a signal x(t) known to be bandlimited to Ωmax Hz is uniquely determined by samples of x(t) spaced 1/(2Ωmax ) sec apart. The textbook reconstruction procedure is to feed the samples as impulses to an ideal lowpass (sinc) filter. Furthermore, if x(t) is not bandlimited or the samples are noisy, introducing pre-filtering by the appropriate sinc sampling kernel gives a procedure that finds the orthogonal projection to the space of Ωmax -bandlimited signals. Thus the noisy case is handled by simple, linear, time-invariant processing. Sampling has come a long way since the sampling theorem, but until recently the results have mostly applied only to signals contained in shift-invariant subspaces [9]. Moving out of this restrictive setting, Vetterli et al. [10] showed that it is possible to develop sampling schemes for certain classes of non-bandlimited signals that are not subspaces. As described in [10], for reconstruction from samples it is necessary for the class of signals to have finite rate of innovation (FRI). The paradigmatic example is the class of signals expressed as x(t) = X k SAMPTA'09 ck φ(t − tk ) (1) where φ(t) is some known function. For each term in the sum, the signal has two real parameters ck and tk . If the density of tk s (the number that appear per unit of time) is finite, the signal has FRI. It is shown constructively in [10] that the signal can be recovered from (noiseless) uniform samples of x(t)∗h(t) (at a sufficient rate) when φ(t)∗h(t) is a sinc or Gaussian function. Results in [2] are based on similar reconstruction algorithms and greatly reduce the restrictions on the sampling kernel h(t). In practice, though, acquisition of samples is not a noiseless process. For instance, an analog-to-digital converter (ADC) has several sources of noise, including thermal noise, aperture uncertainty, comparator ambiguity, and quantization [11]. Hence, samples are inherently noisy. This motivates our central question: Given the signal model (i.e. a signal with FRI) and the noise model, how well can we approximate the parameters that describe the signal and hence the signal itself? In this work, we address this question by developing a novel algorithm to reconstruct the signal from the noisy samples. The main contribution is to show that a stochastic approach can effectively circumvent the ill-conditioning of algebraic techniques. This paper is an abridged version of [7], where many additional details can be found. 2. Problem Definition and Notation The basic setup is shown in Fig. 1. As mentioned in the introduction, we consider a class of signals characterized by a finite number of parameters. In this paper, similar to [2, 3, 10], the class is the weighted sum of K Diracs x(t) = K X ck δ(t − tk ). (2) k=1 (The use of a Dirac delta simplifies the discussion. It can be replaced by a known pulse φ(t) and then absorbed into the sampling kernel h(t), yielding an effective sampling kernel φ(t) ∗ h(t).) The signal to be estimated x(t) is filtered using a Gaussian lowpass filter   t2 (3) h(t) = exp − 2 2σh with width σh to give the signal z(t). Even though h(t) does not have compact support, it can be well approximated by a truncated Gaussian, which does have compact 153 3. Presentation of the Gibbs Sampler z(t) x(t) h(t) z[n] C/D + T e[n] y[n] Figure 1: Block diagram showing our problem setup. x(t) is a signal with FRI given by (2) and h(t) is the Gaussian filter with width σh given by (3). e[n] is i.i.d. Gaussian noise with standard deviation σe and y[n] are the noisy samples. From y[n] we will estimate the parameters that describe x(t), namely {(ck , tk )}K k=1 , and σe , the standard deviation of the noise. support. The filtered signal z(t) is sampled at rate of 1/T Hz to obtain z[n] = z(nT ) for n = 0, 1, . . . , N − 1. Finally, additive white Gaussian noise (AWGN) e[n] is added to z[n] to give y[n]. Therefore, the whole acqui−1 sition process from x(t) to {y[n]}N n=0 can be represented by the model M M: y[n] = K X k=1   (nT − tk )2 + e[n] (4) ck exp − 2σh2 for n = 0, 1, . . . , N − 1. The amount of noise added is a function of σe . We define the signal-to-noise ratio (SNR) in dB as ! PN −1 2 |z[n]| △ dB. (5) SNR = 10 log10 PN −1n=0 2 n=0 |z[n] − y[n]| In the sequel, we will use boldface to denote vectors. In particular, y = c = t = [y[0], y[1], . . . , y[N − 1]]⊤ , ⊤ [c1 , c2 , . . . , cK ] , [t1 , t2 , . . . , tK ]⊤ . (6) (7) (8) We will be measuring the performance of our reconstruction algorithms by using the normalized reconstruction error R∞ 2 △ −∞ |zest (t) − z(t)| dt R∞ E= , (9) |z(t)|2 dt −∞ where zest (t) is the reconstructed version of z(t). By construction E ≥ 0 and the closer E is to 0, the better the reconstruction algorithm. The problem can be summarized as: Given y = {y[n] | n = 0, . . . , N − 1} and the model M, estimate the parameters {(ck , tk )}K k=1 . Also estimate the noise variance σe2 . Ideally, we would like to minimize E in (9) directly, but this does not seem to be tractable since the dependence of y[n] on {tk }K k=1 is highly nonlinear. Thus, we propose the use of a stochastic algorithm (known as the Gibbs sampler) for the maximum likelihood (ML) estimation of {tk }K k=1 . The Gibbs sampler is a proxy for minimizing E. This is followed by linear least squared error (LLSE) estimation of {ck }K k=1 as a tractable and effective means for approximate minimization of E. SAMPTA'09 The algorithm introduced in [7] is a stochastic optimization procedure based on Gibbs sampling to estimate θ = {c, t, σe }. Detailed derivations and a self-contained introduction to Gibbs sampling are given in [7], and code written in MATLAB can be found at http://web.mit.edu/∼vtan/frimcmc. Here, we merely summarize the main steps of the algorithm and the intuition behind Gibbs sampling. The overall procedure is given in Algorithm 1. The algorithm uses Gibbs sampling (Algorithm 2) to estimate the set of Dirac positions {tk }K k=1 . It then uses a least-squares procedure to estimate the weights {ck }K k=1 . The basic idea of Gibbs sampling is to exploit the fact that it is easier to compute samples drawn approximately according to the posterior distribution of the parameters given the data than it is to directly minimize E. This is true when one can analytically determine the conditional distribution of one parameter given the remaining parameters and the data. (The required derivations are presented in [7].) After a number of iterations Ib called the burn-in period, samples drawn through Gibbs sampling can be treated as if they are drawn from the true posterior. Thus, samples drawn in I additional iterations can be averaged to obtain a good approximation of the mean of the posterior distribution. Algorithm 1 Parameter Estimation and Signal Reconstruction Algorithm Require: Data y, Model M 1: Obtain estimates {t̂k }K k=1 and σ̂e using the Gibbs sampler detailed in Algorithm 2 with the data y and the model M given in (4). 2: Obtain estimates {ĉk }K k=1 using a linear least squares estimation procedure and {t̂k }K k=1 from the Gibbs sampler. 3: Compute zest (t) = x̂(t) ∗ h(t) given the parameters {(ĉk , t̂k )}K k=1 and the known pulse h(t). 4: Compute reconstruction error E given in (9). Algorithm 2 The Gibbs Sampling Algorithm (0) Require: y, I, Ib , θ (0) = {c(0) , t(0) , σe } 1: for i ← 1 : I + Ib do (i) (i−1) (i−1) (i−1) (i−1) 2: c1 ∼ p(c1 |c2 , c3 , . . . , cK , t(i−1) σe ) (i) (i) (i−1) (i−1) (i−1) (i−1) 3: c2 ∼ p(c2 |c1 , c3 , . . . , cK , t σe ) .. .. 4: . ∼ . 5: 6: 7: 8: 9: 10: 11: 12: 13: (i) (i) (i) (i) (i−1) cK ∼ p(cK |c1 , c2 , . . . , cK−1 , t(i−1) , σe (i) t1 (i) t2 ∼ ∼ ) (i−1) (i−1) (i−1) (i−1) p(t1 |c , t2 , t3 , . . . , tK , σe ) (i) (i−1) (i−1) (i−1) (i) p(t2 |c , t1 , t3 , . . . , tK , σe ) (i) .. . . ∼ .. (i) (i) (i) (i) (i−1) tK ∼ p(tK |c(i) , t1 , t2 , . . . , tK−1 , σe ) (i) σe ∼ p(σe |c(i) , t(i) ) end for Compute θ̂MMSE using least squares return θ̂MMSE 154 Sampling ck . tion given by ck is sampled from a Gaussian distribu-   1 βk , , p(ck |θ−ck , y, M) = N ck ; − 2αk 2αk AF/RF (Fig. 2(a)) GS (Fig. 2(b))   N −1 1 X (nT − tk )2 αk = exp − , 2σe2 n=0 σh2 N 30 30 σe 10−6 2.5 SNR 137 dB 10.2 dB (10) Table 1: Parameter values for comparing annihilating filter and root-finding (AF/RF) against Gibbs sampling (GS). where △ K 5 5 (11)   N −1 (nT − tk )2 1 X exp − βk = 2 σe n=0 2σh2         K X  (nT − tk′ )2 ck′ exp − . (12) × − y[n]   2σh2   k′′ =1  △ k 6=k 4. Numerical Results and Experiments In this section, the annihilating filter and root-finding algorithm [10] provides a baseline for comparison. After exhibiting its instability, we provide simulation results to validate the accuracy of the algorithm we proposed in Section 3. More extensive experimentation, including comparisons to [3] and applications to an audio signal, is reported in [7]. It is easy to sample from Gaussian densities when the parameters (αk , βk ) have been determined. 4.1 Annihilating Filter and Root-Finding Sampling tk . form tk is sampled from a distribution of the " p(tk |θ−tk , y, M) ∝ exp − 1 2σe2 N −1 X γk n=0   #  (nT − tk )2 (nT − tk )2 × exp − + νk exp − σh2 2σh2 (13) where △ γk = c2k , (14)       (nT − tk′ )2 △ ′ νk = 2ck − y[n] . ck exp − 2   2σh   k′′ =1     K X k 6=k (15) It is not straightforward to sample from this distribution. (i) We used rejection sampling [5, 8] to generate samples tk from p(tk |θ−tk , y, M). The proposal distribution q̃(tk ) was chosen to be an appropriately scaled Gaussian, since it is easy to sample from Gaussians. Sampling σe . σe is sampled from the ‘Square-root Inverted-Gamma’ [1] distribution IG −1/2 (σe ; ϕ, λ), p(σe |θ−σe , y, M) = −(2ϕ+1) 2λϕ σe Γ(ϕ)   λ exp − 2 I[0,+∞) (σe ), σe N , 2 "  #2 K X (nT − tk )2 △ 1 λ= y[n] − ck exp − 2 2σh2 △ (17) (18) k=1 Thus the distribution of the variance of the noise σe2 is Inverted Gamma, which corresponds to the conjugate prior of σe2 in the expression of N (e; 0, σe2 ) [1] and thus it is easy to sample from. SAMPTA'09 4.2 Gibbs Sampling Algorithm Initial Demonstration. To demonstrate the evolution the Gibbs sampler, we performed an initial experiment with parameters as above, with the exception that the noise standard deviation was increased to σe = 2.5, giving an SNR of 10.2 dB. We plot the iterates of the most challenging parameters—the tk s—in Fig. 3. We observe that the sampler converges in fewer than 20 iterations for this run, even though the parameter values were initialized far from their optimal values. The true filtered signal z(t) and its estimate zest (t) are plotted in Fig. 2(b). Note the close similarity between z(t) and zest (t). (16) where ϕ= In [10], for signals of the form (2) and certain sampling kernels, the annihilating filter was used as a means to locate the tk values. Subsequently a least squares approach yielded the weights ck . It was shown that in the noiseless scenario, this method recovers the parameters exactly. In the same paper, a method for dealing with noisy samples is suggested. Unfortunately, this method seems to be inherently ill-conditioned. In Fig. 2, we show a pair of simulations with the parameters as tabulated in Table 1. We observe from Fig. 2(a) that (even with an oversampling factor of N/(2K) = 3) the annihilating filter and rootfinding method is not robust to even a miniscule amount of added noise. Further Experiments on Simulated Data. To further validate our algorithm, we performed extensive simulations on different problem sizes with different levels of noise [7]. These experiments support the conclusion that the Gibbs sampler algorithm is insensitive to initialization. It always finds approximately optimal estimates from any starting point because the Markov chain provably converges to the stationary distribution [8]. We also find that the noise standard deviation σe can be estimated accurately; this may be important in some applications. 155 5. Concluding Comments σe = 1e−6, E = 0.2721 20 z(t) zest(t) 15 10 5 0 −5 −20 −10 0 10 20 (a) The reconstruction using annihilating filter and rootfinding completely breaks down when noise of a small standard deviation σe = 10−6 (SNR = 137 dB) is added. 20 z(t) z (t) est 15 References: 10 5 0 −5 0 5 10 15 20 25 (b) The Gibbs sampling technique gives a much better reconstruction even at a higher noise level σe = 2.5 (SNR = 10.2 dB). Figure 2: Demonstration of the instability of annihilating filter/root-finding approach and the improvement from Gibbs sampling. t k 15 10 5 0 We addressed the problem of reconstructing a signal with FRI given noisy samples. We showed that it is possible to circumvent some of the problems of the annihilating filter and root-finding approach [3, 10]. We introduced the Gibbs sampling algorithm to find the locations and augmented with a least squares approach to find the weights. The success of the Gibbs sampling algorithm does not depend on the choice of kernel h(t), but rather the i.i.d. Gaussian noise assumption. The formulation of the Gibbs sampler does not depend on the specific form of h(t). In fact, we used a Gaussian sampling kernel to illustrate that our algorithm is not restricted to the classes of kernels considered in [2]. A natural extension to our work here is to assign structured priors to c, t and σe . These priors can themselves be dependent on their own set of hyperparameters, giving a hierarchical Bayesian formulation. In this way, there would be greater flexibility in the parameter estimation process. We can also seek to improve on the computational load of the algorithms introduced here and in particular the sampling of tk via rejection sampling. 0 20 40 60 Iteration 80 100 Figure 3: Evolution of the tk s in the GS algorithm. The true values are indicated by the broken red lines. SAMPTA'09 [1] J. M. Bernardo and A. F. M. Smith. Bayesian Theory. Wiley, 1st edition, 2001. [2] P. L. Dragotti, M. Vetterli, and T. Blu. Sampling moments and reconstructing signals of finite rate of innovation: Shannon meets Strang–Fix. IEEE Trans. Signal Processing, 55(5):1741–1757, 2007. [3] I. Maravic and M. Vetterli. Sampling and reconstruction of signals with finite rate of innovation in the presence of noise. IEEE Trans. Signal Processing, 53(8):2788–2805, 2005. [4] H. Nyquist. Certain topics in telegraph transmission theory. Trans. American Institute of Electrical Engineers, 47:617–644, April 1928. [5] C. P. Robert and G. Casella. Monte Carlo Statistical Methods. New York: Springer-Verlag, 2nd edition, 2004. [6] C. E. Shannon. Communication in the presence of noise. Proc. Institute of Radio Engineers, 37(1):10– 21, January 1949. [7] V. F. Y. Tan and V. K. Goyal. Estimating signals with finite rate of innovation from noisy samples: A stochastic algorithm. IEEE Trans. Signal Process., 56(10):5135–5146, October 2008. [8] L. Tierney. Markov chains for exploring posterior distributions. Technical Report 560, School of Statistics, Univ. of Minnesota, March 1994. [9] M. Unser. Sampling–50 years after Shannon. Proc. IEEE, 88(4):569–587, 2000. [10] M. Vetterli, P. Marziliano, and T. Blu. Sampling signals with finite rate of innovation. IEEE Trans. Signal Processing, 50(6):1417–1428, 2002. [11] R. H. Walden. Analog-to-digital converter survey and analysis. IEEE J. Selected Areas of Communication, 17(4):539–550, April 1999. 156 The Generalized Annihilation Property A Tool For Solving Finite Rate of Innovation Problems Thierry Blu The Chinese University of Hong Kong, Shatin N.T., Hong Kong thierry.blu@m4x.org Abstract: We describe a property satisfied by a class of nonlinear systems of equations that are of the form F(Ω)X = Y. Here F(Ω) is a matrix that depends on an unknown Kdimensional vector Ω, X is an unknown K-dimensional vector and Y is a vector of N ≥ K) given measurements. Such equations are encountered in superresolution or sparse signal recovery problems known as “Finite Rate of Innovation” signal reconstruction. We show how this property allows to solve explicitly for the unknowns Ω and X by a direct, non-iterative, algorithm that involves the resolution of two linear systems of equations and the extraction of the roots of a polynomial and give examples of problems where this type of solutions has been found useful. At first sight, solving such a nonlinear system of equations is a daunting task. Fortunately, if the matrix F(Ω) satisfies a property that we shall call “Generalized Annihilation Property” (GAP), this reduces to solving two linear systems of equations sandwiching a nonlinear step that amounts to polynomial root extraction in practical cases. The filters ϕ(t) that satisfy the GAP are thus especially interesting, since the related FRI problems enjoy a straight non-iterative solution. 2. The Generalized Annihilation Property (GAP) We carry on with the previously identified general nonlinear problem, namely F(Ω) X = Y, 1. Introduction We consider the signal resulting from the convolution between a window ϕ(t) and the sum of K Diracs with amplitude xk located at time tk . Given the N uniform samples yn (T = sampling step ) yn = K X xk ϕ(nT −tk ) where n = 1, 2, . . . , N, (1) (3) where the unknowns are Ω = [ω1 , ω2 , . . . ωK ] and X = [x1 , x2 , . . . xK ], and where the measurements are Y = [y1 , y2 , . . . yN ]. This system is said to satisfy the Generalized Annihilation Property whenever there exist K + 1 constant matrices, Ak , and K + 1 scalar functions of Ω, hk (Ω), such that we have the identity k=1 then FRI problems (see [1, 2]) consist in retrieving the parameters tk and xk . Solving such problems is conceptually interesting because it shows how to break the standard Nyquist-Shannon bandlimitation rule for the exact reconstruction of signals from their uniform samples [3]. The system of consistent equations (1) can be expressed under the generic form of a nonlinear problem as shown in Fig. 1 (see next page), where the parameters Ω = [ω1 , ω2 , . . . ωK ] are related unambiguously to the unknowns tk ’s. Because of the variety of settings adapted to this general approach, it happens to be necessary to distinguish between the parameters ωk —which we shall call “abstract parameters”—and the locations tk : typically, the ωk ’s will be the zeros of some polynomial and from these ωk ’s, we will be able to retrieve the tk ’s using a functional relation of the form ωk = λ(tk ) for some invertible function λ(t). SAMPTA'09 K X hk (Ω) Ak F(Ω) = 0. (4) k=0 for any vector of parameters Ω. By right multiplying with X, the above equation implies that any solution Ω of (3) is also a solution of the (generalized) annihilation equation K X hk (Ω)Ak Y = 0. (5) k=0 This equation can be expressed in a matrix form AH = 0 where the unknown is H = [ h0 (Ω), h1 (Ω), . . . , hK (Ω) ]T   and the matrix A = A0 Y, A1 Y, . . . , AK Y . Thus, in order to solve (3) for Ω and X, the idea consists in finding the scalar coefficients hk (Ω) that satisfy (5), then retrieving ω1 , ω2 , . . . , ωK from the knowledge of hk (Ω), and finally finding X such that F(Ω) X = Y. Without elaborating on the conditions that make this solution unique, a 157      | ϕ(T − t2 ) ϕ(2T − t2 ) .. . ··· ··· ϕ(T − tK ) ϕ(2T − tK ) .. .  ϕ(N T − t1 ) ϕ(N T − t2 ) {z ··· ϕ(N T − tK ) xK } | {z } ϕ(T − t1 ) ϕ(2T − t1 ) .. .     x1 x2 .. .      =   X F(Ω)  y1 y2 .. .      (2) yN | {z } Y Figure 1: Algebraic equivalent of the consistency equations (1). minimal requirement is that the matrices Ak have at least K rows. In the simple case where the hk (Ω)’s are related to the ωk ’s through a polynomial relation K X hk (Ω)z −k = k=0 K Y (1 − ωk z −1 ), (6) k=1 solving (3) boils down to a three-step algorithm that can be summarized as follows: 1. Compute a solution H = [ 1, h1 , . . . , hK−1 , hK ]T of   A0 Y, A1 Y, . . . , AK Y H = 0; 2. Compute the roots ωk of the z-transform H(z) = PK −k ; k=0 hk z 3. Compute a solution X of F(Ω) X = Y. 3. The GAP is actually shared by many interesting filters that can be used in sampling schemes, resulting in easily solvable FRI problems. Among them, the first ones to be identified were the periodized sinc, the infinite (i-e., not periodized) sinc and the Gaussian kernels [1]. Even more interestingly, recent research indicates that this property may somewhat be related to the Strang-Fix conditions which makes a very intriguing connection with approximation theory [12], and considerably broadens the class of FRI-admissible kernels. In all cases investigated so far, the scalar coefficients hk (Ω) satisfy (6). 3.1   F(Ω) =   ω1 ω12 .. . ω2 ω22 .. . ··· ··· ω1N ω2N ···  ωK 2  ωK    N ωK where the frequencies to retrieve, fk , are related to ωk through ωk = ej2πfk . This problem satisfies the GAP for band-diagonal matrices Ak which are more precisely given by:   Ak = 0N −K,k IN −K 0N −K,K−k , where 0m,n is the m × n zero matrix and In is the n × n identity matrix. A minimal—yet not sufficient—condition for the unicity of the solution is N ≥ 2K. Since the Ak can be seen as shifting operators by k samples, the annihilation equation is analogous to a filtering equation—with an annihilating filter. The annihilation algorithm is then equivalent to Prony’s method [4]. Of course, spectral estimation in the presence of noise has been addressed by numerous researchers since the 1970’s [5, 6, 7, 8, 9, 10, 11]. SAMPTA'09 Periodized sinc (Dirichlet) filter Solving the FRI problem in the case of a periodic stream of Diracs is equivalent to considering (1) where ϕ is a periodized sinc kernel, e.g., a Dirichlet kernel Example—Spectral estimation problems boil down to a nonlinear problem of the form (3) involving the Vandermonde matrix:  Some GAP Kernels ϕ(t) = X sinc(B(t − k ′ τ )) = k′ ∈Z sin(πBt) Bτ sin(πt/τ ) where τ is the period of the Dirac stream and B some bandwith (chosen so that Bτ is an odd integer) [2]. This problem can be reformulated using the annihilation equation (4) by defining the following annihilation matrices   Ak = 0Bτ −K,k IBτ −K 0Bτ −K,Bτ −k W where W = [e−j2πmn/N ] for |m| ≤ ⌊Bτ /2⌋ and 1 ≤ n ≤ N , is the N -DFT submatrix of size Bτ × N . Then, the abstract parameters ωk are related to the locations tk through ωk = e−j2πtk /τ . This kernel has been found useful for the estimation of UWB channels [13] and for image superresolution [14]. 3.2 Infinite sinc filter The filter ϕ(t) is given by ϕ(t) = sinc Bt with B = 1/T .  When ϕ ∗ x (t) is sampled uniformly at frequency B, the nonlinear system of equations satisfies the GAP. The abstract parameters ωk are related to the locations tk through 158 ωk = tk and the annihilation matrices are given by  K   K K ··· 0 ··· K K−1 0  . . . ..  0 .. .. .. .  Ak =  . .. .. ..  . . . .  .   K K 0 ··· ··· ··· K K−1  1 0 ··· ···   0 2k . . .   . . . . 3k . . . ×  ..  . .. ..  . . .  . 0 ··· ··· 0 Additionally, there is a constraint on the minimal number  of samples N for the GAP to hold, which is that N be 0 larger than ⌈(S + maxk {tk })/T ⌉. ..  .     0  4. Conclusion  K 0  0 ..  .   ..   .    0  Nk 3.3 Gaussian filter The filter ϕ(t) is given by ϕ(t) = exp(−t2 /σ 2 ). When  ϕ ∗ x (t) is sampled uniformly at frequency T −1 , the nonlinear system of equations satisfies the GAP. The abstract parameters ωk are related to the locations tk through ωk = exp(2tk T /σ 2 ) and the annihilation matrices are given by   Ak = 0N −K,k IN −K 0N −K,K−k   T2 e σ2 0 ··· ··· 0   (2T )2 ..   .. 2  0  σ . . e  ×  ..  .. ..  .  . . 0   (N T )2 2 0 ··· ··· 0 e σ A version of this solution (actually, for a Gabor kernel) was used in Optical Coherence Tomography, showing the possibility to resolve slices of a microscopic sample below the coherence length of the illuminating reference light [15]. 3.4 Finite Support Strang-Fix filters Through linear combinations of its shifts, the finite support filter ϕ(t) is assumed to reconstruct polynomials up to some degree L − 1 (standard Strang-Fix condition [16]) or exponentials eal t where al − a0 is linear with l = 0, 1, . . . , L − 1. More precisely, in the standard StrangFix case, we denote by cl,n the coefficients such that X cl,n ϕ(nT − t) = tl where l = 0, 1, . . . , L − 1, n∈Z by T the sampling step, and by [0, S] the support of ϕ(t). Then, the abstract parameters ωk are related to the locations tk through ωk = tk and the annihilation matrices are given by    Ak =   ck,1 ck,2 ck−1,1 .. . ck−1,2 .. . ck−L+1,1 ck−L+1,2 SAMPTA'09 ··· ··· ··· ck,N ck−1,N .. . ck−L+1,N    .  We have shown how to unify the different techniques used in FRI signal reconstruction through an algebraic property that we call the Generalized Annihilation property. In essence, this property allows to solve nonlinear system of equations within two noniterative steps. We hope that this property can be used to solve other FRI problems (i.e, with new kernels) in particular in dimensions higher than 1 (for instance, like in [17]), and maybe to solve other types of problems not directly related to sampling. References: [1] M. Vetterli, P. Marziliano, and T. Blu, “Sampling signals with finite rate of innovation,” IEEE Transactions on Signal Processing, vol. 50, pp. 1417–1428, June 2002. [2] T. Blu, P.-L. Dragotti, M. Vetterli, P. Marziliano, and L. Coulot, “Sparse sampling of signal innovations,” IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 31–40, 2008. [3] C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, pp. 379–423 and 623–656, July and October 1948. [4] R. Prony, “Essai expérimental et analytique,” Annales de l’École Polytechnique, vol. 1, no. 2, p. 24, 1795. [5] P. Stoica and R. L. Moses, Introduction to Spectral Analysis. Upper Saddle River, NJ: Prentice Hall, 1997. [6] S. M. Kay, Modern Spectral Estimation—Theory and Application. Englewood Cliffs, NJ: Prentice Hall, 1988. [7] D. W. Tufts and R. Kumaresan, “Estimation of frequencies of multiple sinusoids: Making linear prediction perform like maximum likelihood,” Proceedings of the IEEE, vol. 70, pp. 975–989, September 1982. [8] S. M. Kay and S. L. Marple, “Spectrum analysis—a modern perspective,” Proc. IEEE, vol. 69, pp. 1380– 1419, November 1981. [9] Special Issue on Spectral Estimation, Proceedings of the IEEE, vol. 70, September 1982. [10] V. F. Pisarenko, “The retrieval of harmonics from a covariance function,” Geophysical Journal, vol. 33, pp. 347–366, September 1973. [11] R. Roy and T. Kailath, “ESPRIT–estimation of signal parameters via rotational invariance techniques,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, pp. 984–995, July 1989. 159 [12] P.-L. Dragotti, M. Vetterli, and T. Blu, “Sampling moments and reconstructing signals of finite rate of innovation: Shannon meets Strang-Fix,” IEEE Transactions on Signal Processing, vol. 55, pp. 1741–1757, May 2007. Part 1. [13] I. Maravić, J. Kusuma, and M. Vetterli, “Lowsampling rate UWB channel characterization and synchronization,” Journal of Communications and Networks, vol. 5, no. 4, pp. 319–327, 2003. [14] L. Baboulaz and P.-L. Dragotti, “Exact feature extraction using finite rate of innovation principles with an application to image super-resolution,” IEEE Transactions on Image Processing, vol. 18, pp. 281– 298, February 2009. [15] T. Blu, H. Bay, and M. Unser, “A new highresolution processing method for the deconvolution of optical coherence tomography signals,” in Proceedings of the First IEEE International Symposium on Biomedical Imaging: Macro to Nano (ISBI’02), vol. III, (Washington DC, USA), pp. 777–780, July 7-10, 2002. [16] G. Strang and G. Fix, “A Fourier analysis of the finite element variational method,” in Constructive Aspects of Functional Analysis (G. Geymonat, ed.), pp. 793– 840, Rome: Edizioni Cremonese, 1973. [17] D. Kandaswamy, T. Blu, and D. Van De Ville, “Analytic sensing: reconstructing pointwise sources from boundary Laplace measurements,” in Proceedings of the SPIE Conference on Mathematical Imaging: Wavelet XII, (San Diego CA, USA), August 26August 30, 2007. To appear. SAMPTA'09 160 An “algebraic” reconstruction of piecewise-smooth functions from integral measurements Dima Batenkov, Niv Sarig, Yosef Yomdin Department of Mathematics, Weizmann institute of science, Rehovot, Israel. {dima.batenkov, niv.sarig, yosef.yomdin}@weizmann.ac.il 1. Introduction This paper presents some results on a well-known problem in Algebraic Signal Sampling and in other areas of applied mathematics: reconstruction of piecewise-smooth functions from their integral measurements (like moments, Fourier coefficients, Radon transform, etc.). Our results concern reconstruction (from the moments) of signals in two specific classes: linear combinations of shifts of a given function, and “piecewise D-finite functions” which satisfy on each continuity interval a linear differential equation with polynomial coefficients. Let us start with some general remarks and a conjecture. It is well known that the error in the best approximation of a C k -function f by an N -th degree Fourier polynomial is of order NCk . The same holds for algebraic polynomial approximation and for other basic approximation tools. However, for f with singularities, in particular, with discontinuities, the error is much larger: its order is only √CN . Considering the so-called Kolmogorow N -width of families of signals with moving discontinuities one can show that any linear approximation method provides the same order of error, if we do not fix a priori the discontinuities’ position (see [7], Theorem 2.10). Another manifestation of the same problem is the “Gibbs effect” - a relatively strong oscillation of the approximating function near the discontinuities. Practically important signals usually do have discontinuities, so the above feature of linear representation methods presents a serious problem in signal reconstruction. In particular, it visibly appears near the edges of images compressed by JPEG, as well as in the noise and low resolution of the CT and MRI images. Recent non-linear reconstruction methods, in particular, Compressed Sensing ([2, 3]) and Algebraic Sampling ([4, 12, 14, 6, 9]), address this problem in many cases. Both approaches appeal to an a priori information on the character of the signals to be reconstructed, assuming their “simplicity” in one or another sense. Compressed sensing assumes only a sparse representation in a certain (wavelets) basis, and thus it presents a rather general and “universal” approach. Algebraic Sampling usually requires more specific a priori assumptions on the structure of the signals, but it promises a better reconstruction accuracy. In fact, we believe that ultimately the Algebraic Sampling approach has a potential to reconstruct “simple signals with singularities” as good as smooth ones. In par- SAMPTA'09 ticular, the results of [5, 11, 8, 17, 14] strongly support (also apparently do not accurately formulate and prove) the following conjecture: There is a non-linear algebraic procedure reconstructing any signal in a class of piecewise C k -functions (of one or several variables) from its first N Fourier coefficients, with the overall accuracy of order NCk . This includes the discontinuities’ positions, as well as the smooth pieces over the continuity domains. At present there are many approaches available to a robust detection of discontinuities from Fourier data (see [8, 5, 11] and references therein). The remaining problem seems to be an accurate estimate of the accuracy of the solution of the nonlinear systems arising. Our results below can be considered, in particular, as a step in this direction. On the other hand, they have been motivated by the results in [4, 12, 14], and in [9, 6]. 2. Linear combinations of shifts of a given function Reconstruction of this class of signals from sampling has been described in [4, 12]. We study a rather similar problem of reconstruction from the moments. Our method is based on the following approach: we construct convolution kernels dual to the monomials. Applying these kernels, we get a Prony-type system of equations on the shifts and amplitudes. Let us restate a general reconstruction problem, as it appears in our specific setting. We want to reconstruct signals of the form F (x) = N X X (l) ai,j,l fi (x + xj ) (1) i=1 j,l where the fi ’s are known functions of x = (x1 , . . . , xd ), and the form (1) of the signal is known a priori. The parameters ai,j,l , xj = (xj1 , . . . , xjd ) are to be found from a finite number of “measurements”, i.e. of linear (usually integral) functionals like polynomial moments, Fourier moments, shifted kernels, evaluation over some grid and more. In this paper we consider only linear combinations of shifts of one known function f (although the method of “convolution dual” can be extended to several shifted functions and their derivatives - see [16]). First we consider general integral “measurements” and then restrict 161 ourselves to the moments and Fourier coefficients. In what follows x = (x1 , . . . , xd ), t = (t1 , . . . , td ), j is a scalar index, while k = (k1 , . . . , kd ), i = (i1 , . . . , id ) and n = (n1 , . . . , nd ) are multi-indices. Partial ordering of multiindices is given by k ≤ k ′ ⇔ kp ≤ kp′ , p = 1, . . . , d. So we have s X F (x) = aj f (x + xj ). (2) j=1 RLet the measurements µk (F ) be given by µk (F ) = F (t)ϕk (t)dt, for a certain (multi)-sequence of functions ϕk (t), k ≥ 0 = (0, . . . , 0). Given f and ϕ = {ϕk (t)}, k ≥ 0 we now try to find certain “triangular” linear combinations X ψk (t) = Ci,k ϕi (t) (3) 0≤i≤k forming, in a sense, some “f -convolution dual” functions (similar to a bi-orthogonal set of function) with respect to the system ϕk (t). More accurately, we require that Z (4) f (t + x)ψk (t) = ϕk (x). We shall call a sequence ψ = {ψk (t)} satisfying (3), (4) f - convolution dual to ϕ. Below we find convolution dual systems to the usual and exponential monomials. We consider a general problem of finding convolution dual sequences to a given sequence of measurements as an important step in the reconstruction problem. Notice that it can be generalized by dropping the requirement of a spePk cific representation (3): ψk (t)R = i=0 Ci,k ϕi (t). Instead we can require only that f (t)ψk (t) be expressible in terms of the measurements sequence µk . Also ϕk in (4) can be replaced by another a priori chosen sequence ηk . This problem leads, in particular, to certain functional equations, satisfied by polynomials and exponents (as well as exponential polynomials and some kinds of elliptic functions). Now we have the following result: Theorem 1. Let a sequence ψ = Pψk (t) be f -convolution dual to ϕ. Define Mk by Mk = 0≤i≤k Ci,k µi . Then the parameters aj and xj in (2) satisfy the following system of equations (“generalized Prony system”): s X aj ϕk (xj ) = Mk , k = 0, . . . . (5) j=1 P = Ci,k µi = Proof We have Mk 0≤i≤k R R P F (t) 0≤i≤k Ci,k ϕi (t)dt = F (t)ψk (t) = R Ps Ps j j f (t + x a a ϕ (x ). )ψ (t)dt = k j=1 j j=1 j k In specific examples we can find the minimal number of equations in (5) necessary to uniquely reconstruct the parameters aj and xj in (2). So here ϕn (x) = xn1 1 ∙ ∙ ∙ xnd d for each multi-index n = (n1 , . . . , nd ). We look for the dual functions ψn satisfying the convolution equation Z f (t + x)ψn (t)dt = xn (7) for each multi-index n. To solve this equation we apply Fourier transform to both sides of (7). Assuming that fˆ(ω) ∈ C ∞ (Rd ), fˆ(0) 6= 0 we find (see [16]) that there is a unique solution to (7) provided by X ϕn (x) = Cn,k xk , (8) k≤n where Cn,k "   1 n ∂ n−k n+k = √ (−i) ∂ω n−k ( 2π)d k 1 ω=0 fˆ(ω) # . This calculation is symbolic and works for more general cases. The actual calculation in our polynomial case is done using straightforward matrix calculations. We set the generalized polynomial moments as X Mn = Cn,k mk (9) k≤n and obtain, as in Theorem 1, the following system of equations: s X aj (xj )n = Mn , n ≥ 0. (10) j=1 This system can be solved explicitly in a standard way (see, for example, [13, 4, 15]). In one-dimensional case it goes as follows (see [13]): from (10) we get that for z = (z1 , . . . , zd ) the generalized moments generating function (d = 1 yet, notice that the formulas are still multidimensional) I(z) = X Mn z n = s X j=1 n∈Nd aj d Y l=1 1 1 − xjl zl (11) is a rational function. Hence its Taylor coefficients satisfy linear recurrence relation, which can be reconstructed through a linear system with the Hankel-type matrix formed by an appropriate number of the moments Mn ’s. This is, essentially, a procedure of the diagonal Padé approximation for I(z) (see [13]). The parameters aj , xj are finally reconstructed as the poles and the residues of I(z). For several variables, although the formulas are the same as above, the generalization of the solution of the Prony system is more involved and should be addressed separately. In one dimensional case with the derivatives f (l) included we have F (x) = r s X X aj,l f (l) (x + xj ). (12) j=1 l=0 2.1 Reconstruction from moments We are given a finite number of moments of a signal F as in (2) in the form Z mn = F (t)tn dt. (6) SAMPTA'09 The corresponding moment-generating function in this case takes the form s X l   r X X l (−1)q+l aj,l /(xj )l I(z) = . (13) q (1 − xj z)q+1 q=0 j=1 l=0 162 which is still a rational function (d-dimensional case with derivatives is similar). We would like to stress that in this case the dual polynomials ψk are not changed and they are given as in (8). Therefore also the formula for the generalized moments Mn is the same as in (9). 2.2 b Fourier case fˆ(k) −ikx = ϕ−k (x). e fˆ(k) (14) Here the triangular system of equations (3) is actually not 1 triangular any more but still since ψk (x) = fˆ(k) ϕ−k (x) we can express the generalized moments through the orig1 inal ones via Mk = fˆ(k) µ−k [F ]. Now exactly as before we can find a generalized Prony system in the form X X 1 aj e−ikxj = aj ρkj (15) µ−k [F ] = Mk = ˆ f (k) j j where ρj = e−ixj . In this case we get a rational exponential generating function and we can find its poles and residues on the unit complex circle as we did in the polynomial case. 2.3 Further extensions The approach above can be extended in the following directions: 1. Reconstruction of signals built from several functions or with the addition of dilations also can be investigated (a perturbation approach where the dilations are approximately 1 is studied in [15]). 2. Further study of “convolution duality” can significantly extend the class of signals and measurements allowing for a closed - form signal reconstruction. Reconstruction of piecewise D-finite functions from moments Let g : [a, b] → R consist of K+1 “pieces” g0 , . . . gK with K ≥ 0 jump points a = ξ0 < ξ1 . . . < ξK < ξK+1 = b D= j=0 i ai,j x i=0  dj dxj N X i=1 SAMPTA'09 αi,n ui (x), Piecewise D-finite Reconstruction Problem. Given N, {ki }, K, a, b and the moment sequence {mk } of a piecewise D-finite function g, reconstruct all the parameters {ai,j }, {ξi }, {αi,n }. Below we state some results (see [1] for detailed proofs) which provide explicit algebraic connections between the above parameters and the measurements {mk }. The first two theorems assume a single continuity interval (compare [10]). Theorem 2. Let K = 0 and D g ≡ 0 with D given by (16). Then the moment sequence {mk (g)} satisfies a linear recurrence relation  N N (E −a I) (E −b I) · kj N X X (i,j) ai,j Π  (k, E) mk = 0 j=0 i=0 (18) where E is the discrete forward shift operator and Π(i,j) (k, E) are monomials in E whose coefficients are (i+k)! polynomials in k: Π(i,j) (k, E) = (−1)j (i+k−j)! Ei−j . Theorem 3. Denote  def (i,j) def E(E) = (E −a I)N (E −b I)N , vk = E(E) · Π(i,j) (k, E) mk , ∞ dj def X (0,j) k def hj (z) = vk z , Gj (x) = E(x) j g(x) dx k=0 Assume the conditions of Theorem 2. Then (1) The vector of the coefficients a = (ai,j ) satisfies a linear homogeneous system   (0,0)  (1,0) (k ,N ) v0 . . . v0 N v0 a0,0  (0,0) (1,0) (kN ,N )   v1 . . . v1   a1,0  v1  .  Ha =  =0 . . . .   . . . . . . .   ..   . vc (1,0) M vc ... (k ,N ) M v cN akN ,N (19) c ∈ N. for all M (i,j) (aij ∈ R) (16) Each gn may be therefore written as a linear combination of functions {ui }N i=1 which are a basis for the space ND = {f : D f ≡ 0}: gn (x) = a We subsequently formulate the following (0,0) M Furthermore, let g satisfy on each continuity interval some linear homogeneous differential equation with polynomial coefficients: D gn ≡ 0, n = 0, . . . , K where kj N X X xk g(x)dx mk (g) = In the same manner as in section 2.1 we now choose 1 e−ikx . ϕk (x) = eikx . We get immediately ψk (x) = fˆ(k) Indeed, Z Z 1 ikt e dt = f (t + x)ψk (t)dt = f (t + x) ˆ f (k) 3. We term such functions g “piecewise D-finite”. Many real-world signals may be represented as piecewise Dfinite functions, in particular: polynomials, trigonometric functions, algebraic functions. The sequence {mk = mk (g)} is given by the usual moments Z n = 0, 1, . . . , K (17) (2) vk = mi+k (Gj (x)). Consequently, hj (z) is the moment generating function of Gj (x). Pkj ai,j xi . Then the functions (3) Denote pj (x) = i=0 Φ = {1, h0 (z), . . . hN (z)} are polynomially depen PN max kj dent: pj (z −1 ) = Q(z) where j=0 hj (z) z Q(z) is a polynomial with deg Q < max kj . The system of polynomials {z max kj pj (z −1 )} is called the Padé-Hermite form for Φ. 163 To handle the piecewise case, we represent ( the jump dis0 x<0 def continuities by the step function H(x) = and 1 x≥0 write g as a distribution g(x) = ge0 + K X n=1 gf n (x)H(x − ξn ) (20) Theorem 4. Let K > 0 and let g be as in (20) with operator D annihilating every piece gf n . Then the operator  Y K def N b D= (x − ξi ) I · D (21) n=1 annihilates the entire g as a distribution. Consequently, conclusions of Theorems 2 and 3 hold with D replaced by b as in (21). D Proposition 5. Let K ≥ 0 and {ui }N i=1 be a basis for the space ND , where D annihilates every piece of g. Assume Rξ (17) and denote cni,k = ξnn+1 xk ui (x) for n = 0, . . . , K. f ∈ N: A straightforward computation gives ∀M   α1,0    .   0 m0 c1,0 . . . c0N,0 . . . cK  ..   N,0   m1  .. .. .. ..    .. = .  α   . N,0 . . . .     ..   ..  K 0 c01,M . . . c . . . c  .  f f f N,M N,M mM f αN,K (22) The above results can be combined as follows to provide a solution of the Reconstruction Problem: (a) Let N, {ki }, K, a, b and {mk (g)} be given. If K > 0, b according to (21). replace D (still unknown) with D (b) Build the matrix H as in (19). Solve Ha = 0 and obtain the operator D∗ = Da which annihilates g. (c) If K > 0, factor out all the common roots of the polynomial coefficients of D∗ with multiplicity N . These are the locations of the jump points {ξn }. The remaining part is the operator D† which annihilates every gn . (d) By now D† and {ξn } are known. So compute the basis for ND† and solve (22). c and M f determine the minimal required The constants M size of the corresponding linear systems (19) and (22) in order for all the solutions of these systems to be also solutions of the original problem. It can be shown that: c without any ad1. There exists no uniform bound on M ditional information on the nature of the solutions. Explicit bounds may be obtained for simple function classes such as piecewise polynomials of bounded degrees or real algebraic functions. f=M f(D) 2. For every specific D, an explicit bound M may be computed for the system (22). The above algorithm has been tested on exact reconstruction of piecewise polynomials, piecewise sinusoids and rational functions. SAMPTA'09 References: [1] D.Batenkov, Moment inversion problem for piecewise D-finite functions, arXiv:0901.4665v2 [math.CA]. [2] E. J. Candes̀. Compressive sampling. Proceedings of the International Congress of Mathematicians, Madrid, Spain, 2006. Vol. III, 1433–1452, Eur. Math. Soc., Zurich, 2006. [3] D. Donoho, Compressed sensing. IEEE Trans. Inform. Theory 52 (2006), no. 4, 1289–1306. [4] P.L. Dragotti, M. Vetterli and T. Blu, Sampling Moments and Reconstructing Signals of Finite Rate of Innovation: Shannon Meets Strang-Fix, IEEE Transactions on Signal Processing, Vol. 55, Nr. 5, Part 1, pp. 1741-1757, 2007. [5] K. Eckhoff, Accurate reconstructions of functions of finite regularity from truncated Fourier series expansions, Math. Comp. 64 (1995), no. 210, 671–690. [6] M. Elad, P. Milanfar, G. H. Golub, Shape from moments—an estimation theory perspective, IEEE Trans. Signal Process. 52 (2004), no. 7, 1814–1829. [7] B. Ettinger, N. Sarig. Y. Yomdin, Linear versus non-linear acqusition of step-functions, J. of Geom. Analysis, 18 (2008), 2, 369-399. [8] A. Gelb, E. Tadmor, Detection of edges in spectral data II. Nonlinear enhancement, SIAM J. Numer. Anal. 38 (2000), 1389-1408. [9] B. Gustafsson, Ch. He, P. Milanfar, M. Putinar, Reconstructing planar domains from their moments. Inverse Problems 16 (2000), no. 4, 1053–1070. [10] V. Kisunko, Cauchy type integrals and a D-moment problem. C.R. Math. Acad. Sci. Soc. R. Can. 29 (2007), no. 4, 115–122. [11] G. Kvernadze, T. Hagstrom, H. Shapiro, Locating discontinuities of a bounded function by the partial sums of its Fourier series., J. Sci. Comput. 14 (1999), no. 4, 301–327. [12] I. Maravic and M. Vetterli, Exact Sampling Results for Some Classes of Parametric Non-Bandlimited 2D Signals, IEEE Transactions on Signal Processing, Vol. 52, Nr. 1, pp. 175-189, 2004. [13] E. M. Nikishin, V. N. Sorokin, Rational Approximations and Orthogonality, Translations of Mathematical Monographs, Vol 92, AMS, 1991. [14] P. Prandoni, M. Vetterli, Approximation and compression of piecewise smooth functions, R. Soc. Lond. Philos. Trans. Ser. A Math. Phys. Eng. Sci. 357 (1999), no. 1760, 2573–2591. [15] N. Sarig, Y. Yomdin, Signal Acquisition from Measurements via Non-Linear Models, C. R. Math. Rep. Acad. Sci. Canada Vol. 29 (4) (2007), 97-114. [16] N. Sarig and Y. Yomdin, Reconstruction of “Simple” Signals from Integral Measurements, in preparation. [17] E. Tadmor, High resolution methods for time dependent problems with piecewise smooth solutions. Proceedings of the International Congress of Mathematicians, Vol. III (Beijing, 2002), 747–757, Higher Ed. Press, Beijing, 2002. 164 Distributed Sensing of Signals Under a Sparse Filtering Model Ali Hormati , Olivier Roy , Yue M. Lu and Martin Vetterli Ecole Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland. We consider the task of recovering correlated vectors at a central decoder based on fixed linear measurements obtained by distributed sensors. Two different scenarios are considered: In the case of universal reconstruction, we look for a sensing and recovery mechanism that works for all possible signals, whereas in the case of almost sure reconstruction, we allow to have a small set (with measure zero) of unrecoverable signals. We provide achievability bounds on the number of samples needed for both scenarios. The bounds show that only in the almost sure setup can we effectively exploit the signal correlations to achieve effective gains in sampling efficiency. In addition, we propose an efficient and robust distributed sensing and reconstruction algorithm based on annihilating filters. 1. Introduction Consider two signals that are linked by an unknown filtering operation, where the filter is sparse in the time domain. Such models can be used, e.g., to describe the correlation between the transmitted and received signals in an unknown multi-path environment. We sample the two signals in a distributed setup: Each signal is observed by a different sensor, which sends a certain number of non-adaptive and fixed linear measurements of that signal to a central decoder. We study how the correlation induced by the above model can be exploited to reduce the number of measurements needed for perfect reconstruction at the central decoder, but without any inter-sensor communication during the sampling process. Our setup is conceptually similar to the Slepian-Wolf problem in distributed source coding [6], which consists of correlated sources to be encoded separately and decoded jointly. While communication between the encoders is precluded, correlation between the measured data can be taken into account as an effective means to reduce the amount of information transmitted to the decoder. The main difference between our work and this classical distributed source coding setup is that we study a sampling problem and hence are only concerned about the number of sampling measurements we need to take, whereas the latter is about coding and hence uses bits as its “currency”. From the sampling perspective, our work is closely related to the problem of distributed compressed sensing, first introduced in [1] (see also [4, 5]). In that framework, jointly sparse data need to be reconstructed based on linear projections computed SAMPTA'09 M1 x1 A1 Dec h Abstract: x̂1 , x̂2 M2 x2 A2 Figure 1: Distributed sensing setup. Signals x1 and x2 are connected through an unknown sparse filter h. The ith sensor (i = 1, 2) provides a Mi -dimensional observation of the signal xi via a non-adaptive and fixed linear transform Ai to a central decoder. by distributed sensors. In this paper, we first introduce in Section 2. a novel correlation model for distributed signals. Instead of imposing any sparsity assumption on the signals themselves (as in [1]), we assume that the signals are linked by some unknown sparse filtering operation. Such models can be useful in describing the signal correlation in several practical scenarios (e.g. multi-path propagation and binaural audio recoding). In Section 3., we introduce two strategies for the design of the sampling system: In the universal strategy, we seek to successfully sense and recover all signals, whereas in the almost sure strategy, we allow to have a small set (with measure zero) of unrecoverable signals. We establish the corresponding achievability bounds on the number of samples needed for the two strategies mentioned above. These bounds indicate that the sparsity of the filter can be useful only in the almost sure strategy. Since the algorithms that achieves the bounds are computationally prohibitive, we introduce in Section 4., a concrete distributed sampling and reconstruction scheme that can recover the original signals in an efficient and robust way. Finally, Section 5. presents an application of our results in the area of binaural hearing aids. A preliminary version of this work was also presented at ICASSP 2009. In this paper, we add results on the achievability bound for the almost sure setup as well as a new section on applications. 2. The Correlation Model Consider two signals x1 (t) and x2 (t), where x2 (t) can be obtained as a filtered version of x1 (t). In particular, we assume that x2 (t) = (x1 ∗ h)(t) , (1) 165 x1 (t) h(t) x2 (t) ... (x2 [0], . . . , x2 [N − 1])T , linked to each other through a circular convolution ... ... ... x2 [n] = (x1 ⊛ h)[n] for n = 0, 1, . . . , N − 1, A/D A/D ... ... ... where h = (h[0], . . . , h[N − 1])T ∈ RN is an unknown K-sparse vector, that is, khk0 = K. ... windowing windowing (4) 3. Bounds 3.1 Universal Recovery x1 [n] x2 [n] h[n] Figure 2: The continuous-time sparse filtering operation and its discrete-time counterpart. P where h(t) = K k=1 ck δ(t − tk ) is a stream of K Diracs K with unknown delays {tk }K k=1 and coefficients {ck }k=1 . In this work, we study a finite-dimensional discrete version of the above model. As shown in Figure 2, we assume that the original continuous signal x1 (t) is bandlimited to [−σ, σ]. Sampling x1 (t) at uniform time interval T leads def to a discrete sequence of samples xs1 [n] = x1 (nT ), where the sampling rate 1/T is set to be above the Nyquist rate σ/π. To obtain a finite-length signal, we subsequently apply a temporal window to the infinite sequence xs1 [n] and get def x1 [n] = xs1 [n] wN [n], for n = 0, 1, ..., N − 1, where wN [n] is a smooth temporal window of length N . Note that when N is large enough, we can neglect the windowing effect, since w bN (ω)/(2π) approaches a Dirac function δ(ω) as N → ∞. Applying the above procedure to x2 (t) and using (1), we have   2πm 1 b2 ≈ X1 [m]H[m], (2) X2 [m] ≈ x T NT where def H[m] = K X ck e−j2πmtk /(N T ) . (3) k=1 The above relationship implies that the finite-length signals x1 [n] and x2 [n] can also be approximately modeled as the input and output of a discrete-time filtering operation1. In general, the location parameters {tk } in (3) can be arbitrary real numbers, and consequently, the discrete-time filter h[n] is no longer sparse (see Figure 2 for a typical impulse response of h[n]). However, when the sampling interval T is small enough, we can assume that the real-valued delays {tk } are close enough to the sampling grid, i.e., tk /T ≈ nk for some integers {nk }. We will follow this assumption2 throughout the paper. Definition 1 (Correlation Model) The signals of interest are two vectors x1 = (x1 [0], . . . , x1 [N − 1])T and x2 = 1 Note that in order to be unambiguous in the positions {t }, we need k to ensure that N T > max {tk }. k 2 We introduce this assumption (i.e. tk /T = nk for some nk ∈ Z) mainly for the simplicity it brings to the theoretical analysis in later parts of this paper. It is however not an inherent limitation of our work. SAMPTA'09 Let A1 and A2 be the sampling matrices used by the two sensors, and A be the block-diagonal matrix with A1 and A2 on the main diagonal. We first focus on finding those A1 and A2 such that every xT = (xT1 , xT2 ) is uniquely determined by its sampling data Ax. Here we denote by X the set of all stacked vectors x such that its components x1 and x2 satisfy (4) for some K-sparse vector h. Definition 2 (Universal Achievability) We say a sampling pair (M1 , M2 ) is achievable for universal reconstruction if there exists fixed measurement matrices A1 ∈ RM1 ×N and A2 ∈ RM2 ×N such that the set def B(A1 , A2 ) = {x ∈ X : ∃ x′ ∈ X with x 6= x′ (5) ′ but Ax = Ax } is empty. Intuition suggests that, due to the correlation between the vectors x1 and x2 , the minimum number of samples needed to perfectly describe all possible vectors x can made smaller than the total number of coefficients 2N . The following proposition shows that, surprisingly, this is not the case. Proposition 1 A sampling pair (M1 , M2 ) is achievable for universal reconstruction if and only if M1 ≥ N and M2 ≥ N. Proof Let us consider two stacked vectors xT = (xT1 , xT2 ) ′T and x′T = (x′T 1 , x2 ), each following the correlation model (4). They can be written under the form     IN I ′ x= x1 and x = N′ x′1 , C C where C and C ′ are circulant matrices with vectors h and h′ as the first column, respectively. It holds that    I −I N x1 x − x′ = N . C −C ′ x′1 Moreover, we have that   I −I N rank N = N + rank (C − C ′ ) . C −C ′ When C − C ′ is of full rank, the above matrix is of rank 2N . This happens, for example, when K = 1 with C = 2I N and C ′ = I N . In this case, x − x′ can take any possible values in R2N . Hence, a necessary (and sufficient) condition for the set (5) to be empty is that the blockdiagonal matrix A is a M × 2N -dimensional matrix of full rank, with M ≥ 2N . In particular, A1 and A2 must be full rank matrices of size M1 × N and M2 × N , respectively, with M1 , M2 ≥ N . Note that, in the centralized scenario, the full rank condition would still require to take at least 2N measurements. 166 M2 3.2 Almost Sure Recovery As shown in Proposition 1, universal recovery is a rather strong requirement to satisfy since we have to take at least N samples at each sensor, without being able to exploit the existing correlation. In many situations, however, it is sufficient to consider a weaker requirement, which aims at finding measurement matrices that permit the perfect recovery of almost all signals from X . Definition 3 (Almost Sure Achievability) We say a sampling pair (M1 , M2 ) is achievable for almost sure reconstruction if there exist fixed measurement matrices A1 ∈ RM1 ×N and A2 ∈ RM2 ×N such that the set B(A1 , A2 ), as defined in (5), is of probability zero. The above definition for the almost sure recovery depends on the probability distribution of the signal x1 and the sparse filter h. In what follows, it is sufficient to assume that the signal x1 and the non-zero coefficients of the filter h have non-singular3 probability distributions over RN and RK , respectively. The following proposition gives an achievability bound of the number of samples needed for the almost sure reconstruction. Proposition 2 A sampling pair (M1 , M2 ) is achievable for almost sure reconstruction if N 2K + 1 K +2 K + 2 2K + 1 N M1 Figure 3: Achievable sampling region for universal reconstruction (shaded area), sampling pairs achieved for almost sure reconstruction for K odd (solid line) and sampling pairs achieved for almost sure reconstruction by the proposed algorithm based on annihilating filters (dashed line). ⌊N/2⌋ − K K +1 X1 X2 Figure 4: Sensors 1 and 2 both send the first K + 1 DFT coefficients of their observation, but only complementary subsets of the remaining frequency components. M1 ≥ min {K + r, N } , M2 ≥ min {K + r, N } , and M1 + M2 ≥ min {N + K + r, 2N } , (6) where r = 1 + mod (K, 2). Proof Due to space limitations, we just provide the sketch of the proof which is constructive in nature. First, let the two sensors take the Fourier transform of their signals and send the first (K + r + 1)/2 frequency components to the central decoder. By dividing the two sets of measurements (Note that the denominator should not be zero, which is guaranteed almost surely), the decoder calculates the necessary Fourier elements of the K-sparse filter h in order to reconstruct it almost surely. Then, the sensors transmit complementary subsets of frequency indices up to the Nyquist frequency. Knowing the filter h and the frequency content of one of the signals at some index, the decoder computes the corresponding frequency content of the other signal using (4). Proposition 2 shows that, in contrast to the universal scenario, the correlation between the signals by means of the sparse filter provides a big saving in the almost sure setup, especially when K ≪ N . This is depicted as the solid line in Figure 3. Unfortunately, the algorithm that attains the bound in (6) is combinatorial in nature and thus, computationally prohibitive [1]. In the following, we propose a novel distributed sensing algorithm based on annihilating filters. This algorithm needs effectively K more measurements with respect to the achievability region for the almost sure reconstruction but exhibits polynomial complexity of O(KN ). 3 By a non-singular distribution, we mean any continuous distribution such that the probability that the random variables lie in a low-dimensional subspace is zero. SAMPTA'09 4. Distributed Sensing Algorithm The proposed distributed sensing scheme is based on a frequency-domain representation of the input signals. Let us denote by X 1 ∈ CN and X 2 ∈ CN the DFTs of the vectors x1 and x2 , respectively. The circular convolution in (4) can be expressed as X2 = H ⊙ X1 , (7) where H ∈ CN is the DFT of the filter h and ⊙ denotes the element-wise product. Our approach consists of two main steps: 1. Finding filter h by sending the first K + 1 (1 real and K complex) DFT coefficients of x1 and x2 . 2. Sending the remaining frequency indices by sharing them among the two sensors. The decoder first finds the filter h using only the first K + 1 DFT coefficients of x1 and x2 . To this end, the decoder first computes H[m] = X2 [m] X1 [m] and H[−m] = H ∗ [m] (8) provided that X1 [m] is non-zero for m = 0, 1, . . . , K. This happens almost surely if the distribution of x1 is, for example, non-singular. Then, it finds the K-sparse filter with an annihilating filter approach; see [7] for details. The senors also transmit complementary subsets (in terms of frequency indexes) of the remaining DFT coefficients of their signals (N − 2K − 1 real values in total). This is illustrated in Figure 4. The first K + 1 DFT coefficients allow to almost surely reconstruct the filter h. The missing frequency components of x1 (resp. x2 ) are then recovered from the available DFT coefficients of x2 (resp. x1 ) using the relation (7). 167 α αmax d t αmin αmax ω αmin (a) (b) Figure 5: Audio Experiment Setup. (a) A sound source travels at a distance of d meter in front of the head. (b) Angular position of the sound source with respect to time. relative delay between the two received signals, which can be used to localize the source. Figure 6 demonstrates the localization performance of the algorithm. Figure 6(a) shows the evolution of the original binaural impulse response over time. Figures 6(b)- 6(d) exhibits the sparse approximation to the filter, using different number of measurements. This clearly demonstrates the effect of the over-sampling factor on the robustness of the reconstruction algorithm. 0.8 0.8 2 0.6 1.5 0.4 1 0.2 0.5 0 −0.2 delay (msec) delay (msec) 0.4 Note that in order to compute X1 [m] from X2 [m], the frequency components of the filter H[m] should be nonzero. This is insured almost surely with our assumption that the nonzero elements of the filter h are chosen according to a non-singular distribution in RK . In terms of achievability, we have thus shown the following result. 0 −0.4 1 0.2 0.5 0 −0.2 0 −0.4 −0.5 −0.5 −0.6 −0.6 −1 0 2 4 6 8 −1 10 0 2 4 Time (s) 6 8 10 Time (s) (a) Original (b) L = 5 0.8 0.8 2 0.6 2 0.6 1.5 1.5 0.4 1 0.2 0.5 0 −0.2 0 −0.4 delay (msec) 0.4 delay (msec) Proposition 3 A sampling pair (M1 , M2 ) is achievable for almost sure reconstruction using the efficient annihilating filter method if 2 0.6 1.5 1 0.2 0.5 0 −0.2 0 −0.4 −0.5 M1 ≥ min {2K + 1, N } , −0.6 0 M2 ≥ min {2K + 1, N } , and M1 + M2 ≥ min {N + 2K + 1, 2N } . In the presence of noise or model mismatch, we add robustness to the system by sending L + 1 DFT coefficients of xi (i = 1, 2) with L ≥ K to the decoder. We denoise the measurements by using the denoising algorithm due to Cadzow; for details see [3]. Then the annihilating filter method uses the denoised measurements to estimate the sparse filter. −0.5 −0.6 −1 2 4 6 8 10 −1 0 Time (s) 2 4 6 8 10 Time (s) (c) L = 15 (d) L = 25 Figure 6: Tracking the binaural impulse response. Each column in the image corresponds to the binaural impulse response at the time mentioned on the x axis. (a) Original binaural filter. (b)-(d) Tracking the evolution of the main peak with different values of the oversampling factor L. 6. Conclusions 5. Application In a practical scenario, we consider the signals recorded by two hearing aids mounted on the left and right ears of the user. We assume that the signals of the two hearing aids are related thorough a filtering operation. We refer to this filter as binaural filter. In the presence of a single source in far field, and neglecting reverberations and the headshadow effect [2], the signal recorded at hearing aid 2 is simply a delayed version of the one observed at hearing aid 1. Hence, the binaural filter can be assumed to have sparsity factor K = 1. In the presence of reverberations and head shadowing, the filter from one microphone to the other is no longer sparse which introduces model mismatch. Despite this model mismatch, the transfer function between the two received signals should be approximately sparse, with the main peak indicating the desired relative delay. In our setup, a single sound source located at distance d = 1 meter from the head of a KEMAR mannequin, moves back and forth between two angles αmin = −45◦ and αmax = 45◦ . The angular speed of the source is ω = 18 deg/sec. The sound is recorded by the microphones of the two hearing aids, located at the ears of the mannequin. We want to retrieve the binaural filter between the two hearing aids at hearing aid 1, from limited data transmitted by hearing aid 2. Then, the main peak of the binaural filter indicates the SAMPTA'09 A general formulation of the distributed sensing problem has been proposed where the two signals are connected through an unknown sparse filter. In this context, both universal and almost sure reconstruction were addressed together with their corresponding achievable bounds. In addition, a distributed sensing scheme was presented, together with a method to make it robust to model mismatch. Our future research will focus on investigating more the applications of the proposed methods in the distributed sensing context. References: [1] D. Baron, M. B. Wakin, M. F. Duarte, S. Sarvotham, and R. G. Baraniuk. Distributed compressed sensing. Technical Report ECE-0612, Electrical and Computer Engineering Department, Rice University, Dec. 2006. [2] J. Blauert. Spatial Hearing: The Psychophysics of Human Sound Localization. MIT Press, Cambridge, MA, 1997. [3] J. A. Cadzow. Signal enhancement – A composite property mapping algorithm. IEEE Trans. Acoust., Speech, Signal Process., 36(1):49– 67, Jan. 1988. [4] E. J. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory, 52(2):489–509, Feb. 2006. [5] D. L. Donoho. Compressed sensing. IEEE Trans. Inf. Theory, 52(4):1289–1306, Apr. 2006. [6] D. Slepian and J. K. Wolf. Noiseless coding of correlated information sources. IEEE Trans. Inf. Theory, 19:471–480, Jul. 1973. [7] M. Vetterli, P. Marziliano, and T. Blu. Sampling signals with finite rate of innovation. IEEE Trans. Signal Process., 50(6):1417–1428, Jun. 2002. 168 A method for generalized sampling and reconstruction of finite-rate-of-innovation signals Chandra Sekhar Seelamantula and Michael Unser Biomedical Imaging Group Ecole polytechnique fédérale de Lausanne Switzerland {chandrasekhar.seelamantula, michael.unser}@epfl.ch signals that have a finite rate of innovation (FRI). SpecifiAbstract: cally, consider the stream of time-ordered Dirac impulses: We address the problem of generalized sampling and reconstruction of finite-rate-of-innovation signals. SpecifL X ically, we consider the problem of sampling streams of x(t) = aℓ δD (t − tℓ ), (1) Dirac impulses and propose a two-channel method that enℓ=1 ables fast, local reconstruction under some suitable condiwhere δD (·) denotes the Dirac impulse. The problem is tions. We also specify some acquisition kernels and give to compute the parameters {aℓ , tℓ ; 1 ≤ ℓ ≤ L} based the associated reconstruction formulas. It turns out that on some measurements on x(t). The parametric nature these kernels can also be combined into one kernel, which of the problem has resulted in the development of techcan be employed in the single-channel sampling scenario. niques that are quite different from those that sampling The single-kernel approach carries over all the advantages theorists have been familiar with. Typically, the reconof the two-channel counterpart. Simulation results are prestruction techniques developed by Vetterli et al. [5] and sented to validate the theoretical calculations. Dragotti et al. [6] have a flavor of parametric spectral estimation [7]. They also employ in a novel fashion spline 1. Introduction and prior art kernels [8, 9] that reproduce polynomials or exponentials. It is remarkable that these kernels, which play a vital role Sampling theory is the foundation on which digital sigin wavelet theory, are also quite useful for sampling FRI nal processing has been built. The popular flavor of the signals. sampling theory is due to Shannon [1] and deals excluIn the techniques mentioned above, the focus is exsively with bandlimited signals. Shannon’s theory was clusively on the single-channel case. Recently, some generalized in several ways, the most prominent one benew multichannel approaches have also been developed. ing the theory of multichannel sampling developed by PaKusuma and Goyal proposed a new technique for reconpoulis [3]—his theory is known as the Generalized Samstructing an unknown number of impulses over a finite inpling Theory. Papoulis’ formalism, however, deals only terval of time by using a successive approximation critewith bandlimited signals. To accommodate the more genrion [10]. Their technique can be implemented using a eral class of finite-energy signals, Unser and Zerubia [4] bank of integrators and B-splines. Baboulaz and Dragotti developed a theory, which does not rely on the bandlimproposed a distributed acquisition scheme for FRI sigiting constraint. Another important extension is the samnals and demonstrated applications to image registration pling and reconstruction of signals that lie in some shiftand super-resolution image restoration [11]. In [12], we invariant subspace spanned by the integer-shifted versions have proposed a two-channel sampling method for the of a generator kernel (see [2] and the references therein). FRI problem (cf. Fig. 1). We have employed first-order The specific case of bandlimited sampling corresponds to resistor-capacitor networks to sample streams of Dirac ima sinc kernel and is subsumed by this formalism. pulses and piecewise-constant functions. The reconstrucRecently, Vetterli et al. [5] extended sampling theory in tion technique boils down to solving a system of two equaa new direction to answer a question that has not been tions containing the unknown parameters in decoupled addressed before—that of sampling and reconstructing form. The key result in [12] is given below: streams of Dirac impulses and signals derived therefrom. These signals are not constrained to lie in the space of Proposition 1 The stream of Dirac impulses in (1) is finite-energy functions nor in the space of bandlimited uniquely specified by the samples yα (nT ) = (x∗hα )(nT ) functions. They may also not lie in some shift-invariant and yγ (nT ) = (x ∗ hγ )(nT ), n ∈ Z, where hα (t) = subspace generated by a kernel. Typically, such signals α e−α t u(t), hγ (t) = γ e−γ t u(t), and α 6= γ, provided are specified by a set of discrete parameters per time unit, that min {tℓ − tℓ−1 } ≥ T . 2≤ℓ≤L also known as their rate of innovation. We are interested in SAMPTA'09 169 Motivation for the present work The above proposition relies on causal exponential functions for sampling. Working with exponentials has the practical advantage that they can be easily generated by employing first-order resistor-capacitor circuits. From a mathematical viewpoint, however, exponentials are probably not the only class of functions that enable accurate reconstruction. The main motivation behind the present paper is the quest for alternative kernels hα (t) and hγ (t) that would fit into the framework of the above proposition (also cf. Fig. 1). To that end, we first reformulate the method proposed in [12] in a more general framework and specify some kernels that enable exact reconstruction. 2. Generalized sampling formulation Consider the two-channel sampling scenario shown in Fig. 1. Let hα (t) and hγ (t), α, γ ∈ C, denote the impulse responses of two causal linear shift-invariant systems, compactly supported on [0, T ] and nonzero over that interval. Consider the stream of Dirac impulses in (1), where the impulses are separated by at least T ; i.e., min {tℓ − tℓ−1 } ≥ T. 2≤ℓ≤L (2) Deviations from this condition shall be addressed later. The output of the system to the input x(t) is given by ∆ yα (t) = (x ∗ hα )(t) = L X aℓ hα (nT − tℓ ) δK [nT − r(tℓ )] , ℓ=1   where r(tℓ ) = tTℓ T is the operator that performs the ceiling of tℓ with respect to the sampling grid and δK denotes the Kronecker impulse. The sequence yα (nT ) comprises Kronecker impulses, each corresponding to a Dirac impulse in x(t) under the condition (2). Note that the sampling period T equals the support of the kernel. Similarly, corresponding to a system with impulse response hγ (t), γ 6= α, we have yγ (nT ) = L X aℓ hγ (nT − tℓ ) δK [nT − r(tℓ )] . ℓ=1 Note that these sampling instants correspond to the nonzero values in the sequences yα (nT ) and yγ (nT ) and are therefore known. Consider the ℓth nonzero samples in the sequences yα (nT ) and yγ (nT ): yα (r(tℓ )) = aℓ hα [r(tℓ ) − tℓ ] and (3) yγ (r(tℓ )) = aℓ hγ [r(tℓ ) − tℓ ]. (4) SAMPTA'09 hα (t) T ℓ=1 hγ (t) T {aℓ , tℓ } Figure 1: Two-channel sampling of a stream of dirac impulses. In (3) and (4), the indices r(tℓ ) and the values on the left hand side are known. The impulse responses hα and hγ are also known; their design shall be explained below. The amplitude and position parameters {tℓ , aℓ } are unknown and have to be determined. The amplitude of the ℓth Dirac impulse appears as a multiplicative factor. The position of the Dirac impulse is encoded in the amplitude of the Kronecker impulse. Dividing (3) by (4) eliminates aℓ and gives rise to an equation in the unknown tℓ , which can be computed if and only if (hα /hγ )(t) is invertible on its range. The value of tℓ thus obtained can then be substituted in (3) or (4) to obtain the value of aℓ . Some specific functions that fit into the above reconstruction paradigm are presented next. 3. Let us next consider the samples of yα (t) taken on a uniform grid with a sampling step T . Note that we have chosen the sampling period to be equal to the support of hα (t); otherwise, we are likely to miss some closelyspaced impulses as the following analysis shows. The samples of yα (t) are given by L X aℓ δD (t − tℓ ) Kernels for two-channel sampling aℓ hα (t − tℓ ). ℓ=1 yα (nT ) = L ! Parameter computation 1.1 We specify only the kernel hα (t); unless otherwise mentioned, hγ (t) is obtained by replacing α with γ; i.e., both kernels have the same functional form. The kernels involve gating by the B-spline of order zero, at scale T : β(t) = u(t) − u(t − T ), where u(t) is the unit step function. We specify the kernel definitions and give the expressions for {tℓ , aℓ } directly. The intermediate calculations are omitted but it is straightforward to supply them starting from the definition of the kernel. 1. Exponential spline (E-spline) kernels [9]: hα (t) = e−α t β(t), α ∈ R, where u(t) is the unit-step function. The parameters of ℓth impulse are given by   1 yα (r(tℓ )) tℓ = r(tℓ ) + log and α−γ yγ (r(tℓ ))    α yα (r(tℓ )) log . aℓ = yα (r(tℓ )) exp − α−γ yγ (r(tℓ )) This kernel choice has been analyzed in sufficient detail in [12]. The specific kernel given above is a first-order E-spline kernel. One could, in principle, also employ higher-order kernels. The advantage of first-order E-spline kernels over the higher-order ones, however, is that they always give rise to closedform solutions. The higher-order kernels exhibit this property only for certain values of the spline parameters. For further discussion on this issue, we refer the reader to [12]. 2. Power functions: hα (t) = tα β(t), α ∈ R. Corre- 170 spondingly, the parameters of x(t) are given by tℓ aℓ  1 yα (r(tℓ )) α−γ = r(tℓ ) − and yγ (r(tℓ ))   −α yα (r(tℓ )) α−γ = yα (r(tℓ )) . yγ (r(tℓ ))  For α ∈ Z+ , the power function becomes a monomial of degree α. Since B-splines of order α can reproduce polynomials (and naturally, monomials too) up to degree α, they are included as special elements of this class. Therefore, power functions, which play a vital role in moment-based sampling approaches [6, 11] for the FRI problem, are also useful in the generalized sampling approach. Also, note that fractional powers are admissible in the kernel definition. 2 3. Gaussian functions: hα (t) = e−α t β(t), where α ∈ R. Correspondingly, we have that s   1 yγ (r(tℓ )) log , and tℓ = r(tℓ ) − α−γ yα (r(tℓ ))    α yγ (r(tℓ )) log . aℓ = exp α−γ yα (r(tℓ )) 4. Complex E-splines: hα (t) = e−jα t β(t), α ∈ R. This kernel cannot be treated as a special case of the E-spline kernels with an imaginary parameter. The reason is that there is an issue related to parameter identifiability that deserves special attention. The potential problem is that this kernel may give rise to more than one solution for tℓ ; there is, however, no ambiguity in the solution for aℓ . We further explain this issue and also state a condition that helps overcome the non-uniqueness hurdle. The cause of ambiguity is essentially the quasiperiodicity of the complex exponential over the support [0, T ]: e−jα (r(tℓ )−tℓ ) = e−jα (r(tℓ )−tℓ + 2mπ α ), for m ∈ Z such that 0 ≤ (r(tℓ )−tℓ + 2mπ α ) ≤ T . The restriction on m is due to the fact that we are considering a truncated complex exponential. The inequality gives rise to multiple solutions for tℓ . The solution to this problem lies in tying up the choices of the values of α and T such that m = 0 is the only possibility in the above inequality. This amounts to requiring that the complex exponential have at maximum one 2π > T . Unperiod within a sampling interval; i.e., α der this condition, we have the reconstruction formulae:   yα (r(tℓ ))ejα r(tℓ ) , and tℓ = −j log yγ (r(tℓ ))ejγ r(tℓ ) aℓ = yα (r(tℓ )) exp (jα(r(tℓ ) − tℓ )) . Similarly, a truncated Fresnel kernel can be employed by considering purely imaginary parameters in the SAMPTA'09 definition of the Gaussian above. For complex parameters, the E-spline and Fresnel kernels have an exponential and Gaussian decay, respectively. 5. Hybrid sampling kernels: In the kernels considered above, we have enforced the same functional form for both hα (t) and hγ (t). By relaxing this property, we can make the reconstruction technique more efficient. For example, if we set one of the parameters (but not both), say α to zero, the kernel reduces to a causal B-spline of order 0: hα (t) = β(t). The second kernel can be taken from any of the choices listed above. The samples from the zeroth-order B-spline channel then directly yield aℓ = yα (r(tℓ )). Using the samples from the second channel, we can compute the positions of the Dirac impulses. For example, if we employ the truncated power function in the second  1 yγ (r(tℓ )) γ . channel, we have that tℓ = r(tℓ ) − aℓ Note that r(tℓ ) and yγ (r(tℓ )) are known. Having listed a few kernel choices, we reiterate that, in the present formalism, the condition stated in (2) is crucial for the super-resolution localization of impulses. If two successive Dirac impulses are spaced closer apart than the sampling period, then they give rise to overlapping Kronecker impulses and resolving them is not possible within the proposed formulation. The existing approaches [5, 6, 10, 11] do not suffer from this limitation. 4. Kernels for single-channel sampling The principal advantage offered by the two-channel method equipped with the choice of a proper kernel is the decoupling between the amplitudes and positions of the impulses. As shown next, this advantage can be carried over to the single-channel case by suitably integrating the previously listed kernels into a single function. For example, consider the kernel: hα,γ (t) = e−α t β(t) + e−γ (t−T ) β(t − T ), which has the same properties as the hybrid kernel in the two-channel case (kernel (1) in Section 3.). This choice would give rise to two nonzero samples per Dirac impulse, which can be used to solve for aℓ and tℓ . Again, if α = 0, the first sample would straightaway give the amplitude, which can then be used together with the second sample to compute the position. Thus, we have a similar algorithm as in the two-channel case, the only difference being that, in the two-channel case, these samples are acquired one per channel whereas in the onechannel case, they are acquired in the same channel—the overall sampling rate, however, is the same in both the cases. In general, the kernels for the single-channel case can be defined as: hα,γ (t) = hα (t) + hγ (t − T ). Since the support of the kernel hα,γ (t) is double that of hα (t)  or hγ (t), impulses that are farther apart by at least 2T i.e.,  min {tℓ − tℓ−1 } ≥ 2T only can be resolved. The ker- 2≤ℓ≤L nels defined in this paper are shown in Fig. 2. 171 1 1 0 0 1 Time 0.5 0 2 Complex exponential spline Amplitude 1 0.5 0 1 Time 5 Truncated Gaussian 1.5 0.5 0 2 Hybrid kernel (B!spline & E!spline) 1.5 Amplitude Truncated power function 1.5 Amplitude Amplitude Exponential spline 1.5 0 1 Time 0 2 Hybrid kernel (Two E!splines) !1 1 Amplitude Amplitude Imaginary part !5 0 0.5 (a) 1.5 1 0.2 0.4 0.6 0.8 Time (seconds) 1 0.2 0.4 0.6 0.8 Time (seconds) 1 1 0.5 5 1 Real part !1 0 1 2 0 Time 0 1 Time 2 0 0 1 Time 2 Figure 2: Sampling kernels. The parameters α = 2, γ = 1, and T = 1, are chosen for the sake of illustration. Amplitude 0 0 !5 5. Simulations We next validate the theoretical findings by numerical experiments. We simulate the two-channel sampling of nine Dirac impulses shown in Fig. 3(a); the amplitudes and positions are chosen for the purpose of illustration. The minimum spacing between two impulses is 0.0076 seconds. The sampling period T is chosen to be 0.0038 seconds to ensure that (2) is satisfied. The impulses are sampled using the power function kernels with parameters α = 3, γ = 2, and T = 0.0038 seconds. These values are chosen for the purpose of illustration. The reconstructed stream of Dirac impulses is shown in Fig. 3(b). The reconstruction is accurate to numerical precision. Identical results were obtained with the other kernel choices. 6. Conclusions We have extended the results developed in [12] and proposed new kernels for both single-channel and twochannel sampling scenarios. The kernels are built using functions known in system theory such as the exponential, power function, Gaussian, etc. The main advantage of the proposed formulation is that, under the condition of minimum separation between consecutive impulses, a fast local reconstruction algorithm can be developed. This advantage, however, comes with the shortcoming that impulses spaced farther apart than the sampling period only can be resolved. It would be a challenging task to develop local reconstruction algorithms without imposing constraints on the minimum/average separation between impulses or groups thereof. Acknowledgments This work was supported by the Swiss National Science Foundation (SNSF) Grant 200020-101821. References: [1] C. E. Shannon, “Communication in the presence of noise,” Proc. IRE, vol. 37, no. 1, pp. 10-21, Jan. 1949. SAMPTA'09 (b) Figure 3: (a) Ground truth, (b) Reconstructed signal. [2] M. Unser, “Sampling—50 years after Shannon,” Proc. IEEE, vol. 88, no. 4, pp. 569-587, Apr. 2000. [3] A. Papoulis, “Generalized sampling expansion,” IEEE Trans. Circuits Syst., vol. 24, no. 11, pp. 652– 654, 1977. [4] M. Unser and J. Zerubia, “A generalized sampling theory without band-limiting constraints,” IEEE Trans. Circuits Syst. II, Analog and Digit. Signal Process., vol. 45, no. 8, pp. 959–969, Aug. 1998. [5] M. Vetterli, P. Marziliano, and T. Blu, “Sampling signals with finite rate of innovation,” IEEE Trans. Signal Process., vol. 50, no. 6, pp. 1417–1428, Jun. 2002. [6] P.L. Dragotti, M. Vetterli, and T. Blu, “Sampling moments and reconstructing signals of finite rate of innovation: Shannon meets Strang-Fix,” IEEE Trans. Signal Process., vol. 55, no. 5, pp. 1741–1757, May 2007, Part 1. [7] P. Stoica and R. Moses, Introduction to Spectral Analysis, Englewood Cliffs, NJ: Prentice-Hall, 2000. [8] M. Unser, “Splines: A perfect fit for signal and image processing,” IEEE Signal Process. Mag., vol. 16, no. 6, pp. 22–38, Nov. 1999. [9] M. Unser and T. Blu, “Cardinal exponential splines: Part I—Theory and filtering algorithms,” IEEE Trans. Signal Process., vol. 53, no. 4, pp. 1425–1438, Apr. 2005. [10] J. Kusuma and V. K. Goyal, “Multichannel sampling of parametric signals with a successive approximation property,” in Proc. IEEE Intl. Conf. on Imag. Proc., 2006, pp. 1265–1268. [11] L. Baboulaz and P. L. Dragotti, “Distributed acquisition and image super-resolution based on continuous moments from samples,” in Proc. IEEE Intl. Conf. on Imag. Proc., 2006, pp. 3309–3312. [12] C. S. Seelamantula and M. Unser, “A generalized sampling method for finite-rate-of-innovation-signal reconstruction,” IEEE Signal Process. Lett., vol. 15, pp. 813-816, 2008. 172 MULTICHANNEL SAMPLING OF TRANSLATED, ROTATED AND SCALED BILEVEL POLYGONS USING EXPONENTIAL SPLINES Hojjat Akhondi Asl and Pier Luigi Dragotti Imperial College London Department of Electrical and Electronic Engineering hojjat.akhondi-asl@imperial.ac.uk, p.dragotti@imperial.ac.uk ABSTRACT Recently there has been an interest in single and multichannel sampling of certain parametric signals based on rate of innovation using exponential reproducing kernels. In [5] it was shown that, using exponential reproducing kernels, we can achieve a fully symmetric multichannel sampling system where different channels receive translated versions of the input signal. For the case of bilevel polygons as the input signal considered in [5], having only translations is not practical and one may want to look at the cases of more complicated geometric transformations, such as rotation and scaling. In this paper we present a sampling theorem for multichannel sampling of translated, rotated and scaled bilevel polygons using Radon projections and generalized exponential splines. 1. INTRODUCTION Recently, it was shown [1, 2] that it is possible to sample and perfectly reconstruct some classes of non-bandlimited signals using suitable sampling kernels. Signals that can be reconstructed using this framework are called signals with Finite Rate of Innovation (FRI) as they can be completely defined by a finite number of parameters. Stream of weighted Dirac impulses and bilevel polygons are some examples of FRI signals. There has been a recent interest in sampling FRI signals using exponential spline [3] (E-spline) kernels. Dragotti et al. [2] showed that E-splines can be used as the sampling kernel to sample and perfectly reconstruct 1-D FRI signals. Extensions to the multidimensional case were considered in [5, 14] where we proposed sampling theorems for a stream of 2-D Dirac impulses (based on the ACMP algorithm [11]) and bilevel polygons (based on Radon projections [10]). Apart from the sampling kernels used in [5, 14], the reconstruction algorithms are also different from the ones used in the conventional multidimensional sampling theories [12, 13]. An advantage of E-spline sampling kernels over polynomial reproducing kernels such as B-splines is that, they can be employed in a fully symmetric multichannel sampling environment. By symmetric sampling, we mean that the sampling SAMPTA'09 process can be evenly distributed between different acquisition devices. The inspiration and development of multichannel sampling of FRI signals is very recent and it has been looked at in [5, 6, 7, 8]. In [6] Seelamantula and Unser, by using simple RC filters, propose a simple acquisition and reconstruction method within the framework of multichannel sampling, where 1-D FRI signals such as an infinite stream of nonuniformly-spaced Dirac impulses and piecewise-constant signals can be sampled and perfectly reconstructed. In [7] Kusuma and Goyal proposed new ways of sampling 1-D Dirac impulses using a bank of integrators or B-splines. Their proposed scheme is closely related to previously known cases [1, 2] but provides a successive approximation property, which could be useful for detecting undermodelling when the number of Dirac impulses are unknown. In [8] Baboulaz and Dragotti use a multichannel sampling setup for sampling FRI signals and utilize that for image registration based on continuous moments and image super-resolution. In [5] we illustrate that symmetric multichannel sampling of bilevel polygons can be achieved with the geometric transformations being a 2-D translation between the different signals. In practice, this is usually not the case, and in this paper we want to look at the cases of more complicated geometric transformations, such as rotation and scaling. The paper is organised as follows: In Section II we will briefly discuss the sampling setup needed for sampling 2-D FRI signals (single channel) and based on that we describe our multichannel sampling setup. In Section III we present our algorithm for sampling and perfectly reconstructing translated, rotated and scaled bilevel polygons with the use of generalized E-splines and Radon projections. In Section IV we provide simulation results to support our proposed theory. 2. MULTICHANNEL SAMPLING SETUP Before describing the multichannel sampling framework, let us first, for the sake of clarity, show how a general 2-D sampling setup (single channel) for FRI signals is represented. Figure 1 shows a general 2-D sampling setup for FRI signals 173 where g(x, y) represents the input signal, ϕ(x, y) the sampling kernel, sj,k the samples and T x, T y are the sampling intervals. From the setup shown in Figure 1, the samples sj,k Fig. 1. 2-D sampling setup are given by: Z ∞ Z sj,k = −∞ ∞ g(x, y) ϕ( −∞ x y − j, − k) dx dy (1) Tx Ty where the kernel ϕ(x, y) is the time reversed version of the filter response h(x, y). ϕ(x, y) can easily be produced by the tensor product between ϕ(x) and ϕ(y), that is ϕ(x, y) = ϕ(x) ⊗ ϕ(y). As mentioned before, ϕ(x, y) is chosen to be an exponential reproducing kernel. The theory of exponential reproducing kernels is quite recent and is based on the notion of exponential splines (E-splines) [3]. A function βα~ (x) with Fourier transform β̂α~ (ω) = N Y 1 − eαn −jω jω − αn n=0 is called E-spline of order N where α ~ = (α0 , α1 , . . . , αN ) can be real or complex. The produced spline has a compact support and can reproduce any exponential in the subspace spanned by (eα0 t , eα1 t , . . . , eαN t ) which is obtained by successive convolutions of lower order E-splines ((N+1)-fold convolution). Exponential spline kernels can therefore reproduce, with their shifted versions, real or complex exponentials. That is, in 2-D form, any kernel satisfying: X X m,n cj,k ϕ(x − j, y − k) = eαm x eβn y (2) j∈Z k∈Z is an E-spline for a proper choice of the coefficients cm,n j,k . Here, m = 0, 1, . . . , M , n = 0, 1, . . . , N , αm = α0 + mλ1 and βn = β0 + nλ2 . The values of (α0 , β0 ) and (λ1 , λ2 ) can be chosen arbitrarily, but too small or too large values could lead to unstable results for the reproduction of exponentials. E-splines are biorthogonal functions and the coefficients cm,n j,k can be found using the dual of βα~ (x). An important property of E-splines is that they are a generalized version of B-splines. This is because, if the α ~ parameters are set to zero, then the produced spline would result in a B-spline, a polynomial reproducing spline. This property will be used to estimate the transformation parameters in Section III. The reader can refer to [5, 14] for sampling theories on single-channel sampling and perfect reconstruction of 2-D Dirac impulses and bilevel polygons using exponential splines. SAMPTA'09 We can now describe our multichannel sampling setup. A multichannel sampling system can be thought of multiple acquisition devices observing an input signal. In order to perfectly reconstruct the input signal using only one acquisition device, we normally require expensive acquisition devices with high sampling rates. By using a bank of acquisition devices (filters) and synchronizing the different channels exactly, we are able to reduce the number of samples needed from each device, resulting in a cheaper and more efficient sampling system. To model our multichannel system, consider a bank of E-spline filters to acquire FRI signals where each filter has access to a geometrically transformed version of the input signal. Figure 2 shows the described multichannel sampling scenario where the bank of filters ϕ1 (x, y), ϕ2 (x, y), . . . , ϕN −1 (x, y) receive different versions of the input signal g0 (x, y). Here, the geometric transformations (e.g. translation, rotation and scaling ) are denoted by T1 , T2 , . . . , TN −1 . Fig. 2. Multichannel sampling setup In [4] Baboulaz considered the use of E-splines for sampling a stream of 1-D Dirac impulses in a multichannel sampling setup described in Figure 2. He showed that if two 1-D signals are just shifted version of the other, then by setting one parameter to be common between the exponents of the E-spline sampling kernels for the two signals, one can not only estimate the shifts between the two signals, but also can linearly relate the exponential moments of the two signals (the reader can refer to [4, 5, 14] for more detailed discussion). Because of the direct relationship between the exponential moments of the two signals, we can achieve perfect reconstruction of the reference signal with fewer exponential moments required. Since less moments are required from each signal, a lower order E-spline sampling kernel would be needed, which in turn less samples from each signal are required to achieve perfect reconstruction. This is because, from [2] we know that a stream of Dirac impulses is uniquely determined from the samples if there are at most K Dirac impulses in an interval size of 2KLT where L is the support of the sampling kernel. Since the support of the sampling kernels is reduced in the multichannel case, we can achieve the same performance with a smaller sampling rate T . 174 3. ALGORITHM Unfortunately we can not estimate the more complicated geometric transformations like the way it was done for the simple translation case in [5] with exponential reproducing kernels. Also, even if we assume that the transformation parameters are known and given, we still can not use the sampling algorithm shown in [5] for the multichannel framework. This is because introducing more complicated transforms such as rotation and/or scaling for example, would result in a non-linear relationship between the exponential moments of the different signals. The first question we need to answer is that, assuming an oracle gives us the values of the transformation parameters, can we sample and perfectly reconstruct translated, rotated and scaled bilevel polygons in a symmetric multichannel framework? It is known that for an N-sided bilevel polygon, with N+1 projections, perfect reconstruction of the polygon can be achieved. That is points that have N+1 line intersections from the N+1 back-projections correspond to the N vertices of the polygon [9]. We also know that a Radon projection at an angle φ of a rotated image with respect to its reference image with an angle θ, is the same projection, but scaled and translated, on the reference image at the angle φ + θ. Therefore, if all the transformation parameters are known, and assuming that the rotation angle is not zero that is, θ 6= 0, then the N + 1 projections needed could be separated between the different channels, in order to sample and perfectly reconstruct the reference image in a symmetric manner. The next question would be, how can we estimate the transformation parameters? We know that with the use of polynomial reproducing kernels, we can obtain the geometric moments of a signal, and geometric moments up to order 2 from two signals are enough to estimate translation, rotation and scaling parameters between the two signals. We also know that, as E-splines are a generalized version of B-splines [3], we can reproduce a combination of polynomials and exponentials from E-splines. From the polynomials moments up to order 2, we can estimate all the transformation parameters. 4. RESULTS As an example, in [5] we showed that to achieve perfect reconstruction for a 4-sided bilevel polygon, a 2-D E-spline order of 12 is required to produce 5 projections at the angles 0, 45, 90, tan−1 (2) and tan−1 ( 21 ). With 2-D E-spline order of 7 however we can produce 3 projections at the angles 0, 45, 90 on the reference signal, and a 2-D E-spline order of 7 on the second signal would give 3 projections for the reference signal at the angles θ, 45 + θ, 90 + θ where θ is the rotation parameter. Assuming θ is not zero, we would have enough projections to perfectly reconstruct the reference signal. Therefore an spline order of 7+2 = 9 (2 is needed for es- SAMPTA'09 timating the transformation parameters) on each signal would give us enough projections to perfectly reconstruct the reference signal. An example for a 4-sided bilevel polygon with two acquisition devices is shown in Figure 3 where the reference signal, its translated, rotated and scaled version, their samples, the E-spline sampling kernel, and the reconstructed reference signal are all shown. 5. CONCLUSION In this paper we showed that with the use of Radon projections and generalized E-splines, symmetric multichannel sampling of translated, rotated and scaled bilevel polygons can be achieved. For estimating the geometrical transformations, we showed that as E-splines are a generalized version of B-splines, we can reproduce combination of polynomials and exponentials from E-splines. Therefore from the polynomial moments up to order 2, we can estimate all the unknown transformation parameters. For symmetric multichannel sampling of geometrically transformed bilevel polygons, we illustrated that the N+1 Radon projections needed for perfect reconstruction of an N-sided bilevel polygon, can be separated between the different channels, assuming that the rotation parameter is not zero. Our sampling and reconstruction algorithm is based on noise-free communication between the transmitter and receiver which is rather not very practical. The future research of this work is to test the stability and performance of our method in the presence of noise. 6. REFERENCES [1] M. Vetterli, P. Marziliano and T. Blu, “Sampling Signals with Finite Rate of Innovation”, IEEE Transactions on Signal Processing, vol. 50, pp. 1417-1428, June 2002. [2] P.L. Dragotti, M. Vetterli and T. Blu, “Sampling Moments and Reconstructing Signals of Finite Rate of Innovation: Shannon meets Strang-Fix”, IEEE Transactions on Signal Processing, vol. 55, pp. 1741-1757, May 2007. [3] M. Unser and T. Blu, “Cardinal Exponential Splines: Part I - Theory and Filtering Algorithms”, IEEE Transactions on Signal Processing, vol. 53, pp. 1425, 2005. [4] L. Baboulaz, “Feature Extraction for Image Superresolution using Finite Rate of Innovation Principles”, PhD thesis, Department of Electrical and Electronic Engineering, Imperial College London, 2008. URL: http://www.commsp.ee.ic.ac.uk/~lbaboula/ [5] H. Akhondi Asl and P.L. Dragotti, “Single and Multichannel Sampling of Bilevel Polygons Using Exponential Splines”, To Appear on IEEE International Conference on Acoustics, Speech, and Sig- 175 nal Processing, Taipei, Taiwan, April 2009. URL: http://cspserver2.ee.ic.ac.uk/~Hojakndi/ [6] C. S. Seelamantula and M. Unser, “A Generalized Sampling Method for Finite-Rate-of-Innovation-Signal Reconstruction”, IEEE Signal Processing Letters, vol.15, pp. 813-816, August 2008. (a) (b) [7] J. Kusuma and V. K. Goyal, "Multichannel Sampling of Parametric Signals with a Successive Approximation Property," IEEE International Conference on Image Processing, pp. 1265-1268, October 2006. [8] L. Baboulaz and P. L. Dragotti, "Distributed Acquisition and Image Super-Resolution Based on Continuous Moments from Samples," IEEE International Conference on Image Processing, pp. 3309-3312, October 2006. (c) (d) [9] I. Maravic and M. Vetterli, “Exact sampling results for some classes of parametric non-bandlimited 2-D signals", IEEE Transactions on Signal Processing, vol.52, no.1, pp. 175-189, January 2004ation Principles”, PhD thesis, Imperial College London, 2008. [10] G. T. Herman, “Image Reconstruction from Projections: The Fundamentals of Computerized Tomography”, Academic Press, New York, 1980. (e) [11] F. Vanpoucke, M. Moonen and Y. Berthoumieu, “An Efficient Subspace Algorithm for 2-D Harmonic Retrieval", IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.4, pp. 461-464, April 1994. [12] P. Shukla and P.L. Dragotti, “Sampling Schemes for Multidimensional Signals with Finite Rate of Innovation”, IEEE Transactions on Signal Processing, vol. 55, pp. 3670-3686, July 2007. [13] I. Maravic and M. Vetterli, “Exact sampling results for some classes of parametric non-bandlimited 2-D signals", IEEE Transactions on Signal Processing, vol.52, no.1, pp. 175-189, January 2004. (f) Fig. 3. Symmetric multichannel sampling of translated, rotated and scaled bilevel polygons using E-spline sampling kernels. (a) The reference signal in a frame data size of 256 × 256. (b) The translated (△x = −100, △y = 150), rotated (θ = 35) and scaled (a = 1.1) ver- [14] H. Akhondi Asl, “Single and Multichannel Sampling of Signals With Finite Rate of Innovation Using E-Splines”, MPhil to PhD Transfer Report, Department of Electrical and Electronic Engineering, Imperial College London, 2008. URL: http://cspserver2.ee.ic.ac.uk/~Hojakndi/ sion of the reference signal. (c) & (d) The 16 × 16 samples of both signals. (e) 2-D generalized E-spline of order 9 (f) The reconstructed vertices of the reference signal with 6 back-projections, the crosses are the actual vertices of the polygon. [Not to scale] SAMPTA'09 176 Special session on Sampling and Quantization Chair: Özgür Yilmaz SAMPTA'09 177 SAMPTA'09 178 Quantization for Compressed Sensing Reconstruction John Z. Sun and Vivek K Goyal Massachusetts Institute of Technology, Cambridge, MA 02139 USA johnsun@mit.edu, vgoyal@mit.edu Abstract: Quantization is an important but often ignored consideration in discussions about compressed sensing. This paper studies the design of quantizers for random measurements of sparse signals that are optimal with respect to mean-squared error of the lasso reconstruction. We utilize recent results in high-resolution functional scalar quantization and homotopy continuation to approximate the optimal quantizer. Experimental results compare this quantizer to other practical designs and show a noticeable improvement in the operational distortion-rate performance. parallel, we present only fixed rate. To concentrate on the central ideas, we choose signal and sensing models that obviate discussion of quantizer overload. 2. Background In our notation, a random vector is always lowercase and in bold. A subscript then indicates an element of the vector. Also, an unbolded vector y corresponds to a realization of the random vector y. 2.1 Distributed functional scalar quantization 1. Introduction In practical systems where information is stored or transmitted, data must be discretized using a quantization scheme. The design of the optimal quantizer for a given stochastic source has been well studied and is surveyed in [6]. Here, optimal means the quantizer minimizes the error as measured by some distortion metric. In this paper, we explore optimal quantization for an emerging nonadaptive compression paradigm called compressed sensing (CS) [1, 4]. Several authors have studied the asymptotic reconstruction performance of quantized random measurements assuming a mean-squared error (MSE) distortion metric [3, 5]. Other previous work presented modifications to existing reconstruction algorithms to mitigate distortion resulting from standard quantizers [3, 7] or modified quantization that can be viewed as the binning of quantizer output indexes [10]. Our contribution is to reduce distortion due to quantization through design of the quantizer itself. The key observation is simply that the random measurements are used as arguments in a nonlinear reconstruction function. Thus, minimizing the MSE of the measurements is not equivalent to minimizing the MSE of the reconstruction. We use the theory for high-resolution distributed functional scalar quantization (DFSQ) recently developed in [9] to design optimal quantizers for random measurements. To obtain concrete results, we choose a particular reconstruction function (lasso [11]) and distributions for the source data and sensing matrix. However, the general principle of obtaining improvements through the use of DFSQ theory holds more generally, and we address the conditions that must be satisfied for sensing and reconstruction. Also, rather than develop results for fixed and variable rate in SAMPTA'09 In standard fixed-rate scalar quantization [6], one is asked to design a quantizer Q that operates separably over its components and minimizes MSE between a probabilistic source vector y ∈ RM and its quantized representation ŷ = Q(y). The resulting optimization is   min E ky − Q(y)k2 , Q subject to the constraint that the maximum number of codewords or quantization levels for each yi is less than 2Ri . We can use high-resolution theory to find the quantizer point density of the optimal quantizer. In DFSQ [9], the goal is to create a quantizer that minimizes distortion for some scalar function g(y) of the source vector y rather than the vector itself. Hence, the optimization is now   min E |g(y) − g(Q(y))|2 Q such that the maximum number of codewords or quantization levels representing each yi is less than 2Ri . To apply the following model, we need g(·) and fy (·) to satisfy certain conditions: C1. g(y) is smooth and monotonic for each yi . C2. The partial derivative gi (y) = ∂g(y)/∂yi is defined and bounded for each i. C3. The joint pdf of the source variables fy (y) is smooth and supported in a compact subset of RM . For valid g(·) and fy (·) pairs, we define a set of functions  h i1/2 γi (t) = E |gi (y)|2 | yi = t . (1) We call γi (t) the sensitivity of g(y) with respect to the source variable yi . The optimal point density is then 1/3 λi (t) = C γi2 (t)fyi (t) , (2) 179 for some normalization constant C, which leads to a total operational distortion-rate  2  X γi (yi ) 2−2Ri E . (3) D({Ri }) = 12λ2i (yi ) i The sensitivity γi (t) serves to reshape the quantizer, giving better resolution to regions of yi that have more impact on g(y), thereby reducing MSE. Similar results for variable-rate quantizers are also presented in [9]. However, we will only consider the fixedrate case in this paper. The theory of DFSQ can be extended to a vector of functions, where xj = g (j) (y) for 1 ≤ j ≤ N . Since the cost function is additive in its components, we can show that the overall sensitivity for each component yi is N 1 X (j) γi (t) = γ (t), N j=1 i (4) (j) where γi (t) is the sensitivity of the function g (j) (y) with respect to yi . 2.2 Compressed Sensing CS refers to estimation of a signal at a resolution higher than the number of data samples, taking advantage of sparsity or compressibility of the signal and randomization in the measurement process [1, 4]. We will consider the following formulation. The input signal x ∈ RN is K-sparse in some orthonormal basis Ψ, meaning the transformed signal u = Ψ−1 x ∈ RN contains only K nonzero elements. Consider a length-M measurement vector y = Φx, where Φ ∈ RM×N with K < M < N is a realization of Φ. The major innovation in CS (for the case of sparse u considered here) is that recovery of x from y via some computationally-tractable reconstruction method can be guaranteed asymptotically almost surely. Many reconstruction methods have been proposed including a linear program called basis pursuit [2] and greedy algorithms like orthogonal matching pursuit (OMP) [12]. In this paper, we focus on a convex optimization called lasso [11], which takes the form  x̂ = arg min ky − Φxk22 + µkΨ−1 xk1 . (5) x As one sample result, lasso leads to perfect sparsity pattern recovery with high probability if M ∼ 2K log(N − K) + K under certain conditions on Φ, µ, and the scaling of the smallest entry of u [13]. Unlike in [5], our concern in this paper is not how the scaling of M affects performance, but rather how the accuracy of the lasso computation (5) is affected by quantization of y. A method for understanding the set of solutions to (5) is the homotopy continuation (HC) method [8]. HC considers the regularization parameter µ at an extreme point (e.g., very large µ so the reconstruction is all zero) and slowly varies µ so that all sparsities and the resulting reconstructions are obtained. It is shown that there are N values of µ where the lasso solution changes sparsity, or equivalently N + 1 intervals over which the sparsity does SAMPTA'09 Figure 1: A compressed sensing model with quantization of measurement vector y. The vector ynl denotes the noiseless random measurements. not change. For µ in the interior of one of these intervals, the reconstruction is determined uniquely by the solution of a linear system of equations involving a submatrix of Φ. In particular, for a specific choice µ∗ and observed random measurements y, 2ΦTJµ∗ ΦJµ∗ x̂ + µ∗ v = 2ΦTJµ∗ y, (6) where v = sgn(x̂) and ΦJµ∗ is the submatrix of Φ with columns corresponding to the nonzero elements Jµ∗ ⊂ {1, 2, . . . , N } of x̂. 3. Problem Model Figure 1 presents a CS model with quantization. Assume without loss of generality that Ψ = IN and hence the (random) signal x = u is K-sparse. Also assume a random matrix Φ is used to take measurements, and additive Gaussian noise perturbs the resulting signal, meaning the continuous-valued measurement vector is y = Φx + η. The sampler wants to transmit the measurements with total rate R and encodes y into a transmittable bitstream by using encoder Q. Next, a decoder Q̂ produces a quantized signal ŷ from by . Finally, a reconstruction algorithm G outputs an estimate x̂. The function G is a black box that may represent lasso, OMP or another CS reconstruction algorithm. We now present a probabilistic model for the input source and sensing matrix. It is chosen to guarantee finite support on both the input and measurement vectors, and prevent overload errors for quantizers with small R. However, we emphasize that the following theory is general, and other choices for x and Φ are possible for large enough R. Assume the K-sparse vector x has random sparsity J chosen uniformly from all possibilities, and each nonzero component xi is distributed iid U(−1, 1). Also assume the additive noise vector η is distributed iid Gaussian with zero mean and variance σ 2 . Finally, let Φ correspond to random projections such that each column φj ∈ RM has unit energy (kφj k2 = 1). The columns of Φ thus form a set of N random vectors chosen uniformly on the unit (M − 1)-hypersphere. Since y = Φx, yi = N X j=1 Φij xj = X j∈J Φij xj . | {z } zij The distribution of each zij is found using derived distributions. The resulting pdfs can be shown to be iid fz (z), where z is a scalar random variable that is identical in distribution to each zij . The distribution of yi is then the K − 1 convolution cascade of fz (z) with itself. Thus, fy (y) is smooth and supported for {|yi | ≤ K}, satisfying 180 compute γcs (·). To simplify our notation, let A = ΦJµ∗ . The resulting differentials can be expressed as 3 2.5 ∂G(j) (y, Φ) h T −1 T i = A A A . ∂yi ji 2 We now present the sensitivity through the following theorem: 1.5 1 0.5 0 −5 0 t 5 Figure 2: Distribution fyi (t) for (K, M, N ) = (5, 71, 100). The support of yi is the range [−K, K], where K is the sparsity of the input signal. However, the probability is only non-negligible for small yi . condition C3 for DFSQ. Figure 2 illustrates the distribution of yi for a particular case. The reconstruction algorithm G is a function of the measurement vector y and sampling matrix Φ. We will show that if G(y, Φ) is lasso with a proper relaxation variable µ, then conditions C1 and C2 are met. Using HC, we see G(y, Φ) is a piecewise smooth function that is also piecewise monotonic with every yi for a fixed µ. Moreover, for every µ the reconstruction is an affine function of the measurements through (6), so the partial derivative with respect to any element yi is piecewise defined and smooth (constant in this case). Conditions C1 and C2 are therefore satisfied. 4. Optimal Quantizer Design We now pose the optimal fixed-rate quantizer design as a DFSQ problem. For a given noise variance σ 2 , choose an appropriate µ∗ to form the best reconstruction x̂ from the unquantized random measurements y. We produce M quantizers to transmit the elements of y such that the decoded message ŷ will minimize the distortion between x̃ = G(y, Φ) and x̂ = G(ŷ, Φ) for a total rate R. Note G can be visualized as a set of N scalar functions x̂j = G(j) (ŷ, Φ) that are identical in distribution due to symmetry in the randomness of Φ. Since the sparse input signal is assumed to have uniformly distributed sparsity and Φ distributes energy uniformly to all measurements yi in expectation, we argue by symmetry that each measurement is allotted the same number of bits and that every measurement’s quantizer is the same. Moroever, since the functions representing the reconstruction are identical, we argue using (4) that the overall sensitivity γcs (·) is the same as the sensitivity of any G(j) (ŷ, Φ). Computing (2) yields the point density λcs (·). This is when the homotopy continuation method becomes extremely useful. For a given realization of Φ and η, we can use HC to determine how many elements in the reconstruction are nonzero for µ∗ , denoted Jµ∗ . Equation (6) is then used to find ∂G(j) (y, Φ)/∂yi , which is needed to SAMPTA'09 (7) Theorem 1 Let the noise variance be σ 2 and choose an appropriate µ∗ . Define y\i to be all the elements of a vector y except yi . The sensitivity of each element yi , (j) which is denoted γi (t), can be written as    21 fyi |Φ (t|Φ) h T −1 T i EΦ,y\i A A A | yi = t , fyi (t) ji where A is the submatrix of Φ as described in HC for µ∗ and some observation y. Moreover, for any Φ and its corresponding J, fyi |Φ (t|Φ) is the convolution cascade of {zj ∼ U(−Φij , Φij )} for j ∈ J. By symmetry arguments, (j) γcs (t) = γi (t) for any i and j. This expectation is difficult to calculate but can be approached through L Monte Carlo trials on Φ, η, and x. For each trial, we can compute the partial derivative using (7). We denote the Monte Carlo approximation to that (L) function to be γcs (·). Its form is L (L) (t) = γcs 1X L ℓ=1  fyi |Φ (t|Φℓ ) h T −1 T i2 Aℓ Aℓ Aℓ fyi (t) ji  12 , (8) with i and j arbitrarily chosen. By the weak law of large numbers, the empirical mean of L realizations of the random parameters should approach the true expectation for L large. We now substitute (8) into (2) to find the Monte Carlo approximation to the optimal quantizer for compressed sensing. It becomes 1/3  (L) (t)f (t) , λ(L) (t) = C γ y cs cs i (9) for some normalization constant C. Again by the weak p (L) law of large numbers, λcs (t) − → λcs (t) for L large. 5. Experimental Results We compare the CS-optimized quantizer, called the “sensitive” quantizer, to a uniform quantizer and “ordinary” quantizer λord (t) which is optimized for the distribution of y. This means the ordinary quantizer would be best if we want to minimize distortion between y and ŷ, and hence has a flat sensitivity curve over the support of y. The sensitive quantizer λcs (t) is found using (9) and the uniform quantizer λuni (t) = c, where c is a normalization constant. Using 1000 Monte Carlo trials, we estimate γcs (t). The resulting point density functions for the three quantizers are illustrated in Figure 3. Experimental results are performed on a Matlab testbench. Practical quantizers are designed by extracting codewords 181 1.5 Sensitive Ordinary Uniform λ(t) 1 0.5 0 −5 0 t 5 Figure 3: Estimated point density functions λcs (t), λord (t), and λuni (t) for (K, M, N ) = (5, 71, 100). 10 Sensitive Ordinary Uniform i log D(R ) 5 0 −5 −10 2 3 4 5 component rate Ri 6 Figure 4: Results for distortion-rate for the three quantizers with µ = 0.01 and σ 2 = 0.3. We see that the sensitive quantizer has the least distortion. from the cdf of the normalized point densities. In the approximation, the ith codeword is the point t such that Z t λcs (t′ )dt′ = −∞ i − 1/2 , 2Ri where Ri is the rate for each measurement. The partition points are then chosen to be the midpoints between codewords. We compare the sensitive quantizer to uniform and ordinary quantizers using the parameters µ = 0.1 and σ 2 = 0.3. Results are shown in Figure 4. We find the sensitive quantizer performs best in experimental trials for this combination of µ and σ 2 at sufficiently high rates. This makes sense because λcs (t) is a high-resolution approximation and should not necessarily perform well at very low rates. 6. Conclusion We present a high-resolution approximation to an optimal quantizer for the storage or transmission of random measurements in a compressed sensing system with lasso re- SAMPTA'09 construction. Using DFSQ and HC, we find a sensitivity function γcs (·) that determines the optimal point density function λcs (·) of such a quantizer. Experimental results show that the operational distortion-rate is best when using this so called “sensitive” quantizer. We conclude that proper quantization in compressed sensing is not simply a function of the distribution of the random measurements themselves (using either a high-resolution approximation or practical algorithms like Lloyd-Max). Rather, quantization adds a non-constant effect, called functional sensitivity [9], on the distortion between the the lasso reconstructions of the random measurements and its quantized version. A significant amount of work can still be done in this area. Parallel developments could be made for variablerate quantizers. Also, this theory can be extended to other probabilistic signal and sensing models, and CS reconstruction methods. References: [1] E. J. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory, 52(2):489–509, 2006. [2] S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM J. Sci. Comp., 20(1):33–61, 1999. [3] W. Dai, H. Vinh Pham, and O. Milenkovic. Quantized compressive sensing. arXiv:0901.0749v2 [cs.IT]., 2009. [4] D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52(4):1289–1306, 2006. [5] V. K. Goyal, A. K. Fletcher, and S. Rangan. Compressive sampling and lossy compression. IEEE Sig. Process. Mag., 25(2):48–56, 2008. [6] R. M. Gray and D. L. Neuhoff. Quantization. IEEE Trans. Inform. Theory, 44(6):2325–2383, 1998. [7] L. Jacques, D. K. Hammond, and M. J. Fadili. Dequantized compressed sensing with non-Gaussian constraints. arXiv:0902.2367v2 [math.OC]., 2009. [8] D. M. Malioutov, M. Cetin, and A. S. Willsky. Homotopy continuation for sparse signal representation. In Proc. IEEE ICASSP, pp. 733–736, 2006. [9] V. Misra, V. K. Goyal, and L. R. Varshney. Distributed functional scalar quantization: High-resolution analysis and extensions. arXiv:0811.3617v1 [cs.IT]., 2008. [10] R. J. Pai. Nonadaptive lossy encoding of sparse signals. Master’s thesis, Massachusetts Inst. of Tech., Cambridge, MA, 2006. [11] R. Tibshirani. Regression shrinkage and selection via the lasso. J. Royal Stat. Soc., Ser. B, 58(1):267– 288, 1996. [12] J. A. Tropp. Greed is good: Algorithmic results for sparse reconstruction. IEEE Trans. Inform. Theory, 50(10):2231–2242, 2004. [13] M. J. Wainwright. Sharp thresholds for highdimensional and noisy recovery of sparsity. Department of Statistics, UC Berkley, Tech. Rep 709, 2006. 182 Finite Range Scalar Quantization for Compressive Sensing Jason N. Laska (1) , Petros Boufounos (2) , and Richard G. Baraniuk(1) (1) Rice University, 6100 Main St., Houston, TX 77005 (2) Mitsubishi Electric Research Laboratories, 201 Broadway Cambridge, MA 02139 laska@rice.edu, petrosb@merl.com, richb@rice.edu Abstract: Analog-to-digital conversion comprises of two fundamental discretization steps: sampling and quantization. Recent results in compressive sensing (CS) have overhauled the conventional wisdom related to the sampling step, by demonstrating that sparse or compressible signals can be sampled at rates much closer to their sparsity rate, rather than their bandwidth. This work further overhauls the conventional wisdom related to the quantization step by demonstrating that quantizer overflow can be treated differently in CS and by exploiting the tradeoff between quantization error and overflow. We demonstrate that contrary to classical approaches that avoid quantizer overflow, a better finite-range scalar quantization strategy for CS is to amplify the signal such that the finite range quantizer overflows at a pre-determined rate, and subsequently reject the overflowed measurements from the reconstruction. Our results further suggest a simple and effective automatic gain control strategy which uses feedback from the saturation rate to control the signal gain. 1. Introduction Analog-to-digital converters (ADCs) are an essential part of most modern sensing and communications systems. They are the interface between the analog physical world and the digital processing world that extracts the information we are interested in. Ever-increasing demands for information has pushed the requirements on ADCs to their current physical limits. Fortunately, recent theoretical developments in the area of compressive sensing (CS) enable us to significantly extend the capabilities of current ADCs to keep pace with demand. CS is a framework that allows signals that have sparse representation, i.e., few non-zero elements, or few non-zero coefficients in some basis, to be sampled at a rate close to the sparsity rate, rather than the Nyquist rate. CS employs linear measurement systems and a non-linear reconstruction algorithms to acquire and recover sparse signals. Most of the CS literature to-date focuses on one particular aspect of ADCs, namely sampling. In this paper we reexamine the other significant aspect, quantization. Specifically, we show that the core tenets of CS enable us to reduce the error due to quantization by allowing the quantizer to saturate more often than usual and removing the SAMPTA'09 saturated measurements from the reconstruction process. The organization of this paper is as follows. Section 2. presents a brief background on analog-to-digital conversion, compressive sampling, and finite-range quantization. Section 3. presents a brief analysis of finite-range quantization for CS. We show that CS measurements and the quantization error are i.i.d. Gaussian, and analyze the proposed reconstruction strategy. Section 4., presents numerical results that validate our analysis. We conclude with a brief discussion in Sec. 5. 2. 2.1 Background Analog-to-digital conversion Analog-to-digital conversion consists of two discretization steps: sampling, which converts an analog signal to a set of discrete measurements, and quantization, which converts each real-valued measurement to a discrete one chosen from a pre-determined set. Although both steps are necessary to represent a signal in the discrete digital world, classical results due to Shannon and Nyquist demonstrate that the sampling step is information preserving if a sufficient number of samples, i.e., measurements, are obtained. On the other hand quantization always degrades the signal. The system design to goal is to take enough measurements such that the signal does not alias, and to acquire enough bits to limit the quantization distortion. 2.2 Finite-range quantization Scalar quantization is the process of converting the continuous value of the measurements to one of several discrete values through a non-invertible function R(·). In this paper we focus on uniform quantizers with quantization interval ∆. Thus, the quantization points are qk = q0 + k∆, and every scalar a is quantized to the nearest quantization point R(a) = argminqk |a − qk |. For an infiniterange quantizer this implies that the quantization error is bounded by |a − R(q)| ≤ ∆/2. In practice quantizers have finite range, dictated by hardware constraints such as the voltage limits of the devices and the finite bit-rate of the quantized representation. Without loss of generality we assume a midrise Bbit quantizer that represents a symmetric range of values |a| < T , where T > 0 is the quantization threshold. The corresponding quantization points are at qk = 183 ∆/2 + k∆, k = −2B−1 , . . . , 2B−1 − 1. This assumption implies a quantization interval ∆ = 2−B+1 T . Any measurement with magnitude greater than T saturates the quantizer and “clips” to magnitude T , i.e., it quantizes to the quantization point T − ∆/2. Most classical quantization error analysis assumes that the measurements are scaled such that the quantizer never clips. This is a sensible quantization strategy for classical approaches using linear reconstruction. In that context, saturation events cause significant signal distortion and are undesirable. For that reason, extreme attention is often devoted to pre-ADC automatic gain control (AGC) systems to ensure that the quantizer saturates only rarely. Under this assumption the analysis of a finite or an infinite range quantizer is equivalent in terms of the quantization error. Thus, an infinite-range quantizer is often assumed for its mathematical simplicity. 2.3 Compressive sampling (CS) The theory of compressive sampling (CS) overhauls the conventional wisdom on the sampling process. Specifically, [2] and the references therein show that the number of measurements that are sufficient to exactly reconstruct a sampled signal are significantly fewer than the ShannonNyquist rate as long as the signal is sparse, i.e., can be represented with very few non-zero components in some basis. The key components of CS are randomized measurements and non-linear reconstruction. Specifically, a Nyquistrate sampled discrete-time signal x can be sampled at a lower rate by using a random matrix Φ, of dimension M × N: y = Φx, (1) and reconstructed exactly, if the signal is K-sparse, i.e., only has K non-zero components in some basis and the matrix Φ satisfies the Restricted Isometry Property (RIP) [2]: p p 1 − δ2K kxk2 ≤ kΦxk2 ≤ 1 + δ2K kxk2 (2) for all 2K-sparse signals x, where δ2K is the RIP constant of Φ. RIP guarantees that the norm of the measurements does not deviate significantly from the norm of the K-sparse signal x. b from y+n, where n is noise with knk2 = To reconstruct x η, we perform the optimization b = Ψb α (3) α b = min kαk1 s.t. kΦΨα − yk2 < η, x α P where Ψ is a basis and kαk1 = i |αi | is the ℓ1 norm of the coefficient vector. Reconstructing using (3) guarantees that the norm of the reconstruction error is bounded by cη, where c is a system-dependent constant [2]. In this paper we use the two key components of CS, namely randomized measurements and non-linear reconstruction, to overhaul the conventional wisdom on scalar quantization. In the next sections we demonstrate that the CS measurement process makes the quantization error a white noise process. We use that result demonstrate that in the context of non-linear reconstruction it is advantageous to scale the signal such that the quantizer saturates at a positive rate and reject the saturated measurements from the reconstruction. SAMPTA'09 3. Finite-range quantization for CS The non-linear reconstruction methods used in CS and the democratic nature of the measurements, suggests that with only a small performance penalty, we can choose to ignore measurements. Specifically, in this work we choose to deliberately saturate the quantizer and ignore the measurements that saturated. In the analysis that follows we demonstrate the advantages of this approach compared to scaling the measurements such that they do not saturate or incorporating the saturated measurements in the reconstruction. The analysis is based on three distinct results: 1. CS measurements approximately follow an i.i.d. Gaussian distribution, making the quantization error a well characterized white noise process. 2. Clipping without quantization followed by dropping the saturated measurements preserves the signal norm and the RIP. 3. Once quantization is introduced, the signal-toquantization noise ratio can be minimized by selecting a positive saturation rate and rejecting the saturated measurements. The subsequent sections state and sketch the proofs for these results and their consequences. Due to space limitations, we defer complete proofs and extended analysis to future publications. 3.1 Distribution of CS measurements We assume the measurement matrix Φ in (1) is randomly generated using a zero-mean sub-Gaussian distribution with variance 1/M P . Under this assumption, all the measurements yi = j (Φ)i,j xj are i.i.d. zero-mean random variables with variance kxk22 /M . Using the Lyapunov variant of the Central Limit Theorem, it is also straightforward to show that as the dimension N of the signal x increases the yi become normally distributed. The statement becomes non-asymptotic if the elements of Φ are themselves distributed as a Gaussian. Our initial experiments show that commonly used CS matrix families reach asymptotic behavior even for small N . The implications of this statement are threefold: 1. The expected number of measurements exceeding in √ magnitude a threshold T kxk2 / M is 2Q(T ), where R +∞ 2 Q(x) = √12π x e−t /2 dt is the tail integral of the standard Gaussian distribution. √ 2. The ratio of T kxk2 / M determines the saturation rate. Thus, scaling the signal such that a specific saturation rate is achieved provides a very effective gain control strategy. 3. The quantization error is a white process, although it is correlated to the measurements. We √ should note that in the sequel only the ratio T M /kxk2 is relevant. This ratio is the √ threshold we select by varying the parameter T . The M factor reflects that in practical systems the variance of the elements of the measurement matrix is not a function of the number of measurements. The normalization by kxk2 reflects that in practice automatic gain control or prior signal knowledge is used to determine the proper gain in the input. 184 3.2 Analysis of finite-range CS measurements In this √section we introduce clipping at threshold T kxk2 / M , without quantization. We reject the clipped measurements and demonstrate that if the remaining meae , are sufficient in number, the surements, denoted using y measurement process still satisfies the RIP and preserves f to the norm of K-sparse signals. We use the notation (·) denote the relevant quantities after the saturated measuref is the number of remaining meaments are dropped: M e surements and Φ the mutilated measurement matrix corresponding to the remaining measurements. Assuming the result of Sec. 3.1, the expected number of f saturated measurements is 2M Q(T ). The remaining M measurements follow a truncated Gaussian distribution:  (  kxk2 T kxk2 N yi ; 0, M 2 , |yi | < √M 2 (4) yei ∝ 0, otherwise. e is equal to: Thus, the expected norm of y E{ke yk22 } = M (1 − 2Q(T ))σT2 , (5) 2 where σT is the variance of (4). Thus, the scaled system e Ge y = GΦx (6)   1/2 kxk22 G= (7) M (1 − 2Q(T ))σT2 !1/2 √ 2π = √ (8) 2π(1 − 2Q(T )) − 2T e−T 2 /2 preserves the expected value of the norm of the signal. It is also straightforward to demonstrate that the density of the norm of the signal concentrates around its expected value with very high probability, in manner similar to [1, 3]. e It is also possible to demonstrate that the resulting Φ, which is now signal-dependent, preserves the RIP for all f = O(K log (N/K)), or K-sparse signals, as long as M equivalently M = O(K log (N/K)/(1 − 2Q(T )). The proof is beyond the scope of this paper [5]. However, it is important since it guarantees recovery of the signal, and the robustness to noise we need in the next section. 3.3 Quantization noise In this section we quantize the thresholded measurements √ using quantization interval ∆ = 2−B+1 T kxk2 / M : e + ǫf (9) R(e y) = y Q, where ǫf Q is the vector of the quantization error. From the results of Sec. 3.1 and the distribution of the measurements after thresholding it follows that ǫQ is a white random vector with elements distributed as a wrapped truncated Gaussian random variable and bounded by ±∆/2. For small quantization intervals the distribution is well approximated by a uniform distribution in the same interval, with variance ∆2 /12 [6]. Assuming a unit norm input x the expected squared norm of the quantization error is: (10) E{kf ǫQ k22 } = M (1 − 2Q(T ))∆2 /12 = 2−2B (1 − 2Q(T ))T 2 /3. (11) It can also be shown that for large M the measure of this norm concentrates around its mean. When properly scaled with the G in (8), the√quantization error becomes: 2π2−2B T2 , (12) E{kGf ǫQ k22 } = √ −T 2 /2 3 2π − T e (1−2Q(T )) SAMPTA'09 which suggests an optimal threshold T that minimizes the error. If the RIP is guaranteed, the norm of reconstruction error can be bounded by ckGǫeq k22 with very high probability [2]. For most practical applications, the minimizing T in (12) is not sufficient to guarantee RIP, and therefore we select the smallest T that does. A similar analysis can be performed if we keep all the saturated measurements. In this case the RIP always holds and the measurement error is equal to: (13) E{kǫQ k22 } =   2 2 2Q(T )kxk2 2 ∆ + σtrunc , (14) = M (1 − 2Q(T )) 12 M   2−2B 2 , (15) = kxk22 (1 − 2Q(T )) + 2Q(T )σtrunc 3 2 where σtrunc is the variance of the tail distribution for a standard Gaussian random variable, as truncated by the saturation. Detailed analysis of this can be found in [4]. At T decreases, both σtrunc and Q(T ) increases, which means the error due to the saturated measurements increases at the error due to the unsaturated measurements decreases. The optimal T in this case minimizes (15). The two strategies can be compared to select the optimal given the operating conditions. Especially in low-bit conditions, reducing the quantization interval pays off in terms of the error. However, the tail effects cause a significant penalty if we keep the measurements, and the better strategy is to discard them. As we discuss in the next section in our extensive simulations under a large variety of practical conditions discarding the measurements performs better than using them. 4. 4.1 Experimental validation Experimental setup Signal model: We study the performance of our approach using signals sparse in the frequency domain: in each trial K non-zero Fourier coefficients αn are drawn from an i.i.d. Gaussian distribution, normalized to have unit norm, and randomly assigned to K frequency bins out of the N dimensional space. The sampled signal x is the DFT of the generated Fourier coefficients. Beyond quantization we do not include additional noise sources. In addition to exactly sparse signals, we have performed extensive simulations with compressible signals and confirmed similar results. However, compressible signals are beyond the scope of this paper. Measurement matrix: For each trial a measurement matrix is generated using a Rademacher distribution: each element is drawn independently to be +1 or −1 with equal probability. Our extended experimentation, not shown here in the interest of space, shows that our results are robust to large variety of measurement matrix classes. Reconstruction metric: We report the reconstruction signal-to-noise ratio (SNR) in decibels (dB):   kxk22 , (16) SNR , 10 log bk22 kx − x b denotes the reconstructed signal. where x 185 SNR (dB) 30 30 25 25 20 20 15 15 10 10 M/N=1/16 M/N=3/16 M/N=5/16 30 25 Keep 20 M/N=7/16 M/N=9/16 M/N=11/16 Discard 15 M/N=3/16 M/N=15/16 10 M/N=13/16 5 5 0 0 0 0 0.05 0.1 0.15 Quantizer Threshold (T) (a) M/N=15/16 5 0 0 0.05 0.1 0.15 Quantizer Threshold (T) (b) 0.05 0.1 0.15 Quantizer Threshold (T) (c) Figure 1: Reconstruction SNR (dB) vs. quantizer saturation threshold (T ) using a 4-bit quantizer and downsampling rate M N = ... when (a) the saturated measurements are used for reconstruction and (b) the saturated measurements are discarded before 3 = 16 and 15 : by lowering the threshold T and rejecting saturated reconstruction. (c) Side-by-side comparison of (a) and (b) for M N 16 measurements, we achieve the highest reconstruction SNR. 1 16 13 16 4.2 Experimental results We performed extensive simulations with a variety of signal parameters. Due to space limitations, we present here the results for N = 2048, K = 60, and B = 4 which are typical of the system performance. In our experiments 1 15 we vary M such that M N = 16 . . . 16 and the threshold T in the range [0, 0.18]. For each parameter combination we repeat 100 trials, each trial with a different signal x and matrix Φ as described in Sec. 4.1. For each trial we quantize the measurements using a finiterange quantizer and use them to reconstruct the signal (a) by incorporating the saturated measurements in the reconstruction and (b) by discarding the saturated measurements before reconstruction. Both cases use the linear program (3) with the appropriate value for η. We denote the bdiscard , respectively. bkeep and x reconstructed signal with x The results are shown in Fig. 1, which plots the average reconstruction SNR versus the quantizer dynamic range T for a variety of M N . In particular, Figs. 1 (a) and (b) display bdiscard , respectively. Figure 1 (c) bkeep and x the SNR of x compares the two approaches for the two extreme cases of M 3 M 15 N = 16 and N = 16 . The plots demonstrate that lowering the threshold T such that the saturation rate is non-zero achieves a higher reconstruction SNR compared to scaling such that no measurements clip. Furthermore, rejecting saturated measurements performs better than incorporating them in the reconstruction. This is best illustrated in Fig. 1 (c): the optimal point on the dashed line, which corresponds to discarding saturated measurements, exhibits better SNR than the optimal point on the solid line, which corresponds to incorporating saturated measurements. As expected, the curves coincide when the saturation rate is effectively zero. We also performed this experiment for larger values of K and B. As expected with higher B, we achieve less performance gain. As B grows, the quantization error goes down and thus reducing the quantization interval by dropping measurements is less effective. As K increases, rejecting measurements remains an optimal strategy. However, when K is large enough such that the non-saturated measurements do not satisfy RIP, our method performs worse than incorporating the saturated measurements. SAMPTA'09 5. Discussion Our results demonstrate that CS overthrows the conventional wisdom on finite range quantization. Specifically the common practice of scaling the signal such that the ADC does not overflow is not optimal in light of the nonlinear reconstruction. Our results demonstrate that allowing the signal to saturate is advantageous because it decreases the quantization interval in the unsaturated measurements. The non-linear reconstruction methods allow us to discard saturated measurements and prevent the saturation error from affecting the reconstruction process. Our results further suggests a simple automatic gain control (AGC) strategy, in which the deviation of the average clipping rate from the desired one is used as a feedback to modify the gain. Since the desired clipping rate is nonzero, the feedback is symmetric and increases the gain if the clipping rate is too low. In comparison, classical AGC systems rely on the clipping rate only when the gain is too high and should be reduced. Since in such systems a zero clipping rate is the desired behavior, the AGC needs to rely on other signal features to ensure the gain is sufficient to provide a good signal-to-quantization noise ratio. 6. Acknowledgments The work was supported by grants NSF CCF-0431150, CCF-0728867, CNS-0435425, and CNS-0520280, DARPA/ONR N66001-08-1-2065, ONR N00014-07-1-0936, N00014-08-1-1067, N00014-08-1-1112, and N00014-08-1-1066, AFOSR FA9550-07-1-0301, ARO MURI W311NF07-1-0185, and the Texas Instruments Leadership University Program. References: [1] R. G. Baraniuk, M. A. Davenport, R. DeVore, and M. Wakin. A simple proof of the Restricted Isometry Property for random matrices. In Constructive Approximation, volume 28(3), pages 253–263, Dec 2008. [2] E. Candes. Compressive sampling. In Int. Congress of Mathematics, volume 3, pages 1433–1452, 2006. [3] S. Dasgupta and A. Gupta. An elementary proof of the JohnsonLindenstrauss lemma. In U.C. Berkeley Tech. Rep., volume TR-99006, 1999. [4] G. A. Gray and G. W. Zeoli. Quantization and saturation noise due to analog-to-digital conversion. In IEEE Trans. on Aerospace and Electronic Systems, pages 222–223, Jan 1971. [5] J. N. Laska, P. Boufounos, M. A. Davenport, and R. G. Baraniuk. Democracy in action: finite-range scalar quantization for compressive sensing. In To be submitted, 2009. [6] A. B. Sripad and D. L. Snyder. A necessary and sufficient condition for quantization errors to be uniform and white. In IEEE Trans. on Acoustics, Speech, and Signal Processing, volume ASSP-25, pages 442 – 448, 1977. 186 Special session on Sampling and (In)Painting Chair: Massimo FORNASIER SAMPTA'09 187 SAMPTA'09 188 Report on Digital Image Processing for Art Historians Bruno Cornelis (1,2) , Ann Dooms (1,2) , Ingrid Daubechies (2,3) and Peter Schelkens (1) (1) Dept. of Electronics and Informatics (ETRO), Vrije Universiteit Brussel (VUB) Interdisciplinary Institute for Broadband Technology (IBBT), Pleinlaan 2, B-1050 Brussels, Belgium. (2) Computational and Applied Mathematics Program (CAMP), VUB (3) Princeton University, Program in Applied and Computational Mathematics, Princeton, NJ 08544 bruno.cornelis@vub.ac.be Abstract: As art museums are digitizing their collections, a crossdisciplinary interaction between image analysts, mathematicians and art historians is emerging, putting to use recent advances made in the field of image processing (in acquisition as well as in analysis). An example of this is the Digital Painting Analysis (DPA) initiative [2], bringing together several research teams from universities and museums to tackle art related questions such as artist authentication, dating, etc. Some of these questions were formulated by art historians as challenges for the research teams. The results, mostly on van Gogh paintings, were presented at two workshops. As part of the Princeton team within the DPA initiative we give an overview of the work that was performed so far. 1. Introduction - Penetrating the art world Determining the authenticity of a painting can be a daunting task for art historians, requiring extensive art historical research as well as the analysis of pigments, fabrics etc. However much insight chemical analysis yields [4], it requires the destruction of a sample from the painting and is therefore seldom allowed by conservators. Digital image processing for analyzing paintings could thus prove a useful addition to the art experts’ toolbox, even beyond the purpose of authentication. We expect that art historians will gradually learn to use and trust these tools; a similar emergence and eventual success took place in the medical world in the mid 80s, with the advent of computed tomography. Subsequently, reconstruction algorithms played a significant role in creating other medical imaging technologies, including MRI, PET and SPECT. To stimulate the interaction between the art historical world and branches of digital image processing, the Digital Painting Analysis (DPA) initiative organized two workshops in Amsterdam (IP4AI or Image Processing for Artist Identification) and a symposium (celebrating the inauguration of TiCC, Tilburg centre for Creative Computing) to facilitate a dialog between the two communities. The Van Gogh Museum (Amsterdam) and the Kröller Müller Museum (Otterlo) made it possible for participating teams to work with high resolution digital images of paintings (mostly van Goghs) in their collections. SAMPTA'09 2. Challenges - Convincing the art expert To jumpstart the IP4AI workshops, art historians formulated challenges for the research teams, asking them to provide convincing arguments in favor of digital image processing. These included the following: • Authentication: distinguish an original van Gogh painting from a copy or forgery. This was the main focus of the first workshop; preliminary results of the participating research teams can be found in [7]. • Dating: classify works by van Gogh that were either painted in his early Paris phase (1886 − 1888) or in his later Arles period. Art historians noticed changes in van Gogh’s way of painting throughout his career. Small brushstrokes seem to be more prominent in his Paris period while broader ones prevail in Arles. • Identifying distinguishing features: can an artist’s hand be characterized and features be found that distinguish him from other painters? • Image enhancement: fuse information obtained by different modalities (x-ray, infrared, visual, etc.) to (virtually) enhance damaged paintings, or underpaintings. A first challenge here is detailed and precise registration. • Inpainting: digitally reconstruct missing pieces from a painting when only limited data is at hand. The purpose of this paper is to provide an overview of the tools and general methodology used by the Princeton research team in order to tackle these challenges. Detailed results can be found in [6, 10]. 2.1 Classification - Authentication, dating and identifying features For the analysis of paintings it is crucial to extract distinguishing features/statistics that truly characterize the style of an artist. It is obvious that simple image statistics such as mean or variance of an image will not suffice by themselves. To take an extreme example: reordering by increasing grayscale the pixels in every row of a digital image of a natural scene, and then doing the same in every column, produces an image with same mean and variance as the original, but bereft of (almost) all other information. More complex models that provide additional information 189 are needed. The approach taken for the first three challenges built such models. The analysis consists of three main steps: transform, modeling and classification. Transform. A multiresolution transform is performed on patches of the image. We used the Dual-Tree Complex Wavelet Transform (DTCWT) [9]; it provides approximate shift invariance and directional selectivity (properties standard wavelet transforms lack). The DTCWT uses two parallel filter banks and produces six subbands of coefficients that let us analyze changes in the image in six directions (±15◦ , ±45◦ and ± 75◦ ) at different scales. Modeling. A large number of pixels, and thus also of transform coefficients, combined with noise on the pixel values (due to the acquisition process) impose robust dimensionality reduction and feature extraction techniques. We used Hidden Markov Trees (HMT) [3]. It is possible to describe the wavelet coefficients for a large class of images in terms of two key properties [11]: • 2Population: smooth image regions are represented by wavelet coefficients with a narrow probability distribution function (pdf); edges, ridges or other singularities by wavelet coefficients with a wide pdf. • Persistence: the classification into narrow/wide pdfcoefficients tends to propagate across scales. performed with WEKA [1], a collection of machine learning algorithms for data mining tasks. 2.1.1 Authentication challenge results The authentication challenge was the main research topic for the first IP4AI workshop [7]. To validate their earlier results the Princeton team asked Dutch art conservation student Charlotte Caspers to make original paintings on different materials, with different kinds of paint and brushes, and to create a faithful copy for each of these originals. The dataset provided ground truth: we knew which paintings were original and which ones were copies. We considered both HMT features and thresholding features [10]. The aim was to recognize the difference between a fluid and a more hesitant (copying) stroke through machine learning. For this kind of classification problem the SVM with polynomial kernel machine learning algorithm was the best classifier. The images were subdivided into patches, some of which were used for training the machine learning algorithm. The best results were obtained by using only patches from the painting under investigation and its copy (see Figure 1). The results can be found in Table 1; they show that when both soft and hard brushes are used, the algorithm achieves a succes rate similar to that obtained by state-ofthe-art authentication algorithms for handwriting. These two properties are used to design a statistical model to represent images. Due to the multiresolution nature of the wavelet transform, the wavelet coefficients can be arranged into a quadtree (one coefficient from a coarser scale corresponds to four wavelet coefficients at the next finer scale). At each scale, hidden variables control the wavelet coefficients. They can have two states: L (large, for edge-like structures) and S (small, for smooth regions). The wavelet coefficients are modeled as samples from a mixture of two Gaussian distributions, one with a large variance for the coefficients corresponding to an edge and one with a small variance for coefficients from a smooth region. HMT model the statistical dependencies between wavelet coefficients at different scales. The parameters of the HMT we used as features are: • αT : a 2 × 2 transition probability matrix, that depicts the probabilities that a child node is in a particular state, given the state of the parent node. • µi : the means of the narrow and wide Gaussian distribution (i = 1, 2) for each subband. • σi : variance of the narrow and wide Gaussian distribution (i = 1, 2) for each subband. For example, if we apply a 4-level DTCWT transform on a patch of an image, then the features extracted from that patch would be the following: 6 × 4 × 2 means, 6 × 4 × 2 variances and 6 × 4 × (2 × 2) probabilities, adding up to a total of 192 features. These HMT features are grouped into a model parameter vector and are determined using the expectation maximization algorithm. Classification. The model parameters vectors extracted in the previous step are used as the input for classification algorithms. We used several types of machine learning algorithms: Support Vector Machines, Adaboost, Decision Stump and Random Forest. All experiments were SAMPTA'09 Figure 1: Four sets of patches without overlap. 2.1.2 Dating challenge results For the dating challenge a set of 66 high resolution paintings (90 pixels per linear inch) were put at the disposal of all teams. All the classifiers listed above were trained with 256 × 256 patches using 10-fold cross validation. As can be seen in Table 2, the Random Forest (RF) classifier was the most accurate. Three paintings for which art historians are not sure when they were painted needed to be attributed to one of two periods. Figure 2 shows the resulting classification success rate for patches of paintings from the training set. The RF algorithm was then used on the patches of the three paintings to be attributed, and a majority vote of the patches was determined. 190 Pair 1 2 3 4 5 6 7 Ground CP Canvas CP Canvas Smooth CP Board Bare linen canvas Chalk and Glue CP Canvas Smooth CP Board Paint Oils Acrylics Oils Oils Oils Acrylics Oils Brushes Soft&Hard Soft&Hard Soft&Hard Soft Soft Soft Soft Style TI TI TI Sm,Bl Total 78% 72% 78% 75% 50% 38% 55% Copy 67% 55% 78% 50% 0% 75% 22% Original 89% 89% 78% 100% 100% 0% 88% Table 1: Accuracy for each test on the Caspers data set. Abbreviations: Sm=Smooth, Bl=Blended, TI=Thick Impasto. SVM 61.2% AB 63.2% DS 63.1% RF 70.5% Table 2: Accuracy of different classifiers. Abbreviations: SVM=Support Vector Machines, AB= AdaBoost, DS=Decision Stump, RF=Random Forest. Figure 3: Distinguishing feature challenge. Left: “Still Life: Vase with Gladioli” by V. van Gogh. Right: “Vase with Flowers” by G. Jeannin. 2.2 Arles Tie Paris Figure 2: Classification results for three paintings. 2.1.3 Extracting Distinguishing Features results The test set consisted of floral still lifes painted by van Gogh, Monticelli and other contemporary artists. The goal was to quantify to what extent van Gogh and Monticelli share features, in their brushwork and color schemes, absent in the style of the others. The purpose here was thus to distinguish styles instead of painters (as in authentication). The same methodology described above was used. Results show that wavelet coefficients in direction −45◦ , scale 6 characterize the style of van Gogh and Monticelli whereas wavelet coefficients in the 15◦ , scale 4 subband are more prominent in the other paintings. Examples of these distinguishing features are highlighted in Figure 3. More detailed results are in [6]. SAMPTA'09 Using Different Image Acquisitions Art museums typically have x-ray and infrared photographs in their collections, which can reveal much about what is below the visible surface of a painting. These can also be digitized (or acquired digitally, in the future), and be studied with digital image processing tools. In order to combine the different modes of image acquisition, the first task is to register the images (we used x-ray, infrared and color images of the same painting) to enhance and detect hidden features. Figure 4 shows a woman’s face emerging (horizontally) from underneath the grass in the painting “Patch of Grass”. Because x-rays and photographs are acquired by different modalities, the matching is not as straightforward as it seems initially. Both images were divided into patches and reference points in both images were picked in order to define a smooth warping that gave acceptable results. Another example is the counting of threads/inch in the canvas, visible on x-rays, to determine a painting’s authenticity and date [8]. 2.3 Inpainting An important aspect for art historians and conservators is the preservation of works of art. When paintings become damaged, all the available information (grayscale photographs, low resolution color photographs, ektachromes, 191 4. Acknowledgments We would like to thank Sina Jafarpour, Gungor Polatkan, Andrei Brasoveanu, Eugene Brevdo and Shannon M. Hughes for letting us report briefly on some of their results, and the Van Gogh and Kröller Müller Museums for letting us use their high resolution data set. Special thanks go to Massimo Fornasier for his help with the inpainting challenge. Research was partially supported by The Fund for Scientific Research Flanders (project G.0206.08 and postdoctoral fellowship of Peter Schelkens). References: Figure 4: Registered x-ray on “Patch of Grass”. . . . ) is called upon to help art conservators in their reconstruction or restoration. In [5] techniques were proposed to mathematically reconstruct the original colors of frescoes (reduced to rubble in a wartime bombing) by making use of the information given by preserved fresco fragments and gray level pictures of the intact frescoes taken before the damage occurred. We investigated whether such techniques would also work on van Gogh pictures. With the help of M. Fornasier, one of the authors of [5], we applied these algorithms to a high resolution color image of the “Lemons on a Plate” painting. A patch of 200 × 200 pixels was digitally removed; Figure 5 shows its mathematical reconstruction, using only a low resolution color image (with faithful colors) and a high resolution grayscale image of that painting. The results are quite satisfying and prove that these techniques could be used for restoration purposes. Figure 5: Inpainting. 3. Conclusions The results obtained for the first and second IP4AI workshop in Amsterdam were promising. It is clear however, that these digital techniques on their own are not sufficient to provide conclusive answers to questions of interest to art historians. Nevertheless, they will likely be a worthy addition to the toolbox of art historians and conservators; they have the great advantage of not being invasive. There is also still room for improvement in the different steps of the analysis of paintings. It is worth pointing out, however, that in order to apply such techniques, the quality of the acquired dataset (i.e. high resolution images) is of utmost importance. Only images of equal quality can be compared with each other. SAMPTA'09 [1] http://www.cs.waikato.ac.nz/ml/weka/. [2] http://www.digitalpaintinganalysis.org/. [3] Matthew Crouse, Robert Nowak, and Richard Baraniuk. Wavelet-based statistical signal processing using hidden markov models. IEEE Transactions on Signal Processing, 46:886–902, 1997. [4] Joris Dik, Koen Janssens, Geert Van Der Snickt, Luuk van der Loeff, Karen Rickers, and Marine Cotte. Visualization of a lost painting by vincent van gogh using synchrotron radiation based x-ray fluorescence elemental mapping. Anal. Chem., 80:6436– 6442, 2008. [5] Massimo Fornasier, Ronny Ramlau, and Gerd Teschke. A comparison of joint sparsity and total variation minimization algorithms in a real-life art restoration problem. to appear in Advances in Computational Mathematics, 2008. [6] S. Jafarpour, G. Polatkan, E. Brevdo, S. Hughes, A. Brasoveanu, and I. Daubechies. Stylistic analysis of paintings using wavelets and machine learning. submitted to EUSIPCO 2009. [7] C. R. Johnson, Ella Hendriks, Igor Berezhnoy, Eugene Brevdo, Shannon Hughes, Ingrid Daubechies, Jia Li, Eric Postma, and James Z. Wang. Image processing for artist identification - computerized analysis of vincent van gogh’s painting brushstrokes. IEEE Signal Processing Magazine, July 2008. [8] D. H. Johnson, L. Sun, J. Guo, C. R. Johnson Jr., and E. Hendriks. Matching canvas weave patterns from processing x-ray images of master paintings. submitted to 16th IEEE International Conf. on Image Processing, 25:37–48, November 2009. [9] Nick Kingsbury. Complex wavelets for shift invariant analysis and filtering of signals. Applied and Computational Harmonic Analysis, 10(3):234–253, May 2001. [10] G. Polatkan, S. Jafarpour, A. Brasoveanu, S. Hughes, and I. Daubechies. Detection of forgery in paintings using supervised learning. Submitted to IEEE International Conference on Image Processing 2009. [11] Justin K. Romberg, Hyeokho Choi, Richard G. Baraniuk, and Nick Kingsbury. A hidden markov tree model for the complex wavelet transform. IEEE Transactions on Signal Processing, pages 133–136, 2001. 192 Edge Orientation Using Contour Stencils Pascal Getreuer (1) (1) Department of Mathematics, University of California Los Angeles getreuer@math.ucla.edu Abstract: Many image processing applications require estimating the orientation of the image edges. This estimation is often done with a finite difference approximation of the orthogonal gradient. As an alternative, we apply contour stencils, a method for detecting contours from total variation along curves, and show it more robustly estimates the edge orientations than several finite difference approximations. Contour stencils are demonstrated in image enhancement and zooming applications. 1. structure tensor J(∇u) = ∇u ⊗ ∇u. The structure tensor satisfies J(−∇u) = J(∇u) and ∇u is an eigenvector of J(∇u). The structure tensor takes into account the orientation but not the sign of the direction, thus solving the antipodal cancellation problem. As developed by Weickert [9], let Jρ (∇uσ ) = Gρ ∗ J(Gσ ∗ u) where Gσ and Gρ are Gaussians with standard deviations σ and ρ. The eigenvector of Jρ (∇uσ ) associated with the smaller eigenvalue is called the coherence direction, and is an effective approximation of edge orientation. Introduction 2. A fundamental and challenging problem in image processing is estimating edge orientations. Accurate edge orientations are important for example in edgeoriented inpainting methods [2], and optical character recognition features [8]. 1.1 (1) ∇u⊥ for Estimating Edge Orientation A starting point to edge orientation estimation is to approximate ∇u⊥ with finite differences. Finite difference estimation alone is typically too noisy to be reliable, especially near edges, so the gradient is often regularized by a convolution ∇u ≈ ∇(G ∗ u) where G is for example a Gaussian. However, there is a serious problem in that ∇u⊥ and −∇u⊥ both describe the same edge orientation, so linear smoothing tends to cancel the desired edge information. Introduced by Bigün and Granlund [1] and Forstner and Gulch [3], a better approach is to use the 2 × 2 SAMPTA'09 Contour Stencils Numerical implementation of J(∇u) yet involves estimating ∇u. Since numerical estimates of ∇u are sensitive to noise and unreliable near edges, significant amounts of smoothing is still needed for acceptable results. We abandon ∇u⊥ and approach the estimation of edge orientation from an entirely different principle. Given a smooth curve C and a parameterization γ : [0, T ] → C, consider measuring the total variation of u along C, Z T  ∂t u γ(t) dt. TV(C) = (2) 0 Edge orientations can be estimated by comparing TV(C) with various candidate curves. Contour stencils [4, 5] is a numerical implementation of this idea. Let u : Λ → R be a discrete image. Denote by ui,j , (i, j) ∈ Λ, the value of u at the (i, j)th pixel, and let xi,j ∈ R2 denote its spatial location. 193 +i +j S(α, β) = 8 1 α = (i, j), β = (i − 1, j + 1), > > > >1 α = (i, j), β = (i + 1, j − 1), > < 4 α = (i, j + 1), β = (i + 1, j), 1 α = (i + 1, j + 1), β = (i, j + 2), > > > > >1 α = (i + 1, j + 1), β = (i + 2, j), : 0 otherwise 1 (i, j) 1 4 1 1 1 Figure 1: An example contour stencil S for detecting a 45◦ orientation. A contour stencil is a function S : Λ × Λ → R+ describing weighted edges between pixels (see Figure 1). These edges approximate several parallel curves localized over a small neighborhood. As a discretization of (2), the total variation of S is X 1 S(α, β) |uα − uβ | , (3) TV(S) := |S| α,β∈Λ P and |S| := α,β S(α, β) |xα − xβ |. For the contour √ stencil in Figure 1, |S| = (1 + 1 + 4 + 1 + 1) 2 and TV(S) = 1 |S| |ui,j − ui−1,j+1 | + |ui,j − ui+1,j−1 | + 4 |ui,j+1 − ui+1,j | 1 1 1 2 1 2 2 1 1 1 1 2 2 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 2 4 2 1 1 2 1 1 2 1 1 2 1 1 2 2 1 1 2 1 2 1 2 2 1 1 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 2 1 1 2 1 2 1 1 2 1 2 1 2 1 Figure 3: A node-centered stencil set. candidate stencil, and then determining the best-fitting stencil S ∗ . For efficient implementation, define H = |vi,j − vi+1,j | , Di,j V Di,j = |vi,j − vi,j+1 | , A Di,j = |vi,j − vi+1,j+1 | , B Di,j = |vi,j+1 − vi+1,j | , then the TV(S) can be computed as sums of these differences, and the differences may be reused between successive cells. For the proposed stencil sets, contour stencils cost a few dozen operations per pixel [4]. Input Estimated Orientations 1 1 1 2 1 1 1 2 1 1 1 1 Figure 2: The proposed cell-centered contour stencils. The contours of u are estimated by finding a stencil with low total variation, S ∗ = arg min TV(S) S∈Σ 2 1 1 1 1 1 1 2 1 2 1 4 1 1 1 1 2 1 1 2 2 1 1 2 1 2 2  + |ui+1,j+1 − ui,j+2 | + |ui+1,j+1 − ui+2,j | . 1 1 (4) Figure 4: Edge orientation estimation with contour stencils (using the cell-centered stencils in Figure 2). Contour stencils extend naturally to nonscalar data by replacing the absolute value in (3) with a metric. On color images for example, a suitable choice is the ℓ1 vector norm in YCb Cr color space. where Σ is a set of candidate stencils (see Figures 2 and 3). The best-fitting stencil S ∗ provides a model of the underlying contours. 3. In summary, contour stencil orientation estimation is done by first computing the TV estimates (3) for each Here we compare contour stencils and several finite difference methods for estimating edge orientation. SAMPTA'09 Comparison 194 As a test image with fine orientations, we use a small image of straw (Figure 5). u square whose corners correspond to ui,j , ui+1,j , ui,j+1 , ui+1,j+1 . Cell-centered methods compute orientation estimates logically located in the center of the cells. With node-centered methods, the edge orientation estimates are centered on the pixels. Let Dx+ denote the forward difference operator Dx+ ui,j = ui+1,j − ui,j and similarly in the other coordinate Dy+ . An estimate of ∇u symmetric over the cell is  +  (Dx ui,j + Dx+ ui,j+1 )/2 . (5) ∇ui,j ≈ (Dy+ ui,j + Dy+ ui+1,j )/2 Figure 6 compares ∇u⊥ estimated using (5) with contour stencils using the cell-centered stencil set shown in Figure 2. Figure 5: The test image. Sobel filter (6) As is done with coherence direction (1), any orientation field θ~ can be smoothed by filtering its tensor ~ But for easier comparison, all product: Gρ ∗ (θ~ × θ). methods are shown without smoothing. ∇u⊥ with (5) Contour Stencils (Σ as in Figure 3) Contour Stencils (Σ as in Figure 2) Figure 7: Comparison of node-centered methods. Figure 6: Comparison of cell-centered methods. We consider two categories of methods: cell-centered and node-centered. Define the (i, j)th cell as the SAMPTA'09 The Sobel filter [7] is a node-centered approximation of ∇u,   −1 0 1 ∂x u ≈ −2 0 2 ∗ u (6) −1 0 1 and similarly for ∂y u. Figure 7 compares the Sobel filter with contour stencils using the node-centered stencil set from Figure 3. 195 4. Applications Input Zooming (4×) Contour stencils are useful in applications where edges are significant. Input Contour Stencil Enhancement Figure 9: (This is a color image.) Edge-adaptive zooming using contour stencils [5]. References: Figure 8: Simultaneous sharpening and denoising using contour stencils [4]. Contour stencils can be useful in discretizing image diffusion processes. Figure 8 demonstrates image enhancement using a combination of the Rudin-Osher shock filter [6] and TV-flow that has been discretized with contour stencils. As another application, Figure 9 shows an image zooming result using contour stencils. The method approaches zooming as an inverse problem using a least-squares graph regularization. The regularization is adapted according to the edge orientations estimated from the contour stencils. 5. Conclusions [1] J. Bigün and G. H. Granlund. Optimal orientation detection of linear symmetry. In IEEE First International Conference on Computer Vision, pages 433–438, London, Great Britain, June 1987. [2] Folkmar Bornemann and Tom März. Fast image inpainting based on coherence transport. J. Math. Imaging Vis., 28(3):259–278, 2007. [3] W. Förstner and E. Gulch. A fast operator for detection and precise location of distinct points, corners, and centers of circular features. pages 281– 305, 1987. [4] Pascal Getreuer. Contour stencils for edgeadaptive image interpolation. volume 7257, 2009. [5] Pascal Getreuer. Image zooming with contour stencils. volume 7246, 2009. [6] S. J. Osher and L. I. Rudin. Feature-oriented image enhancement using shock filters. SIAM Journal on Numerical Analysis, 27:919–940, 1990. [7] Irwin Sobel and Jerome A. Feldman. A 3x3 isotropic gradient operator for image processing. Presented at a talk at the Stanford Artificial Project in 1968. [8] Øivind Due Trier, Anil K. Jain, and Torfinn Taxt. Feature-extraction methods for characterrecognition: A survey. Pattern Recognition, 29(4):641–662, April 1996. [9] Joachim Weickert. Anisotropic diffusion in image processing. ECMI Series, Teubner-Verlag, Stuttgart, Germany, 1998. Contour stencils provide reliable orientation estimates at low computational cost, enabling better results in image processing applications. SAMPTA'09 196 Smoothing techniques for convex problems. Applications in image processing. Pierre Weiss (1) , Mikaël Carlavan (2) , Laure Blanc-Féraud (2) and Josiane Zerubia (2) (1) Institute for Computational Mathematics, Kowloon Tong, Hong Kong. (2) Projet Ariana - CNRS/INRIA/UNSA, 2004 route des Lucioles, 06902 Sophia-Antipolis, France. (1) pierre.armand.weiss@gmail.com, (2) firstname.lastname@sophia.inria.fr Abstract: In this paper, we present two algorithms to solve some inverse problems coming from the field of image processing. The problems we study are convex and can be expressed simply as sums of lp -norms (p ∈ {1, 2, ∞}) of affine transforms of the image. We propose 2 different techniques. They are - to the best of our knowledge - new in the domain of image processing and one of them is new in the domain of mathematical programming. Both methods converge to the set of minimizers. ¡ ¢Additionally, we show that they converge at least as O N1 (where N is the iteration counter) which is in some sense an “optimal” rate of convergence. Finally, we compare these approaches to some others on a toy problem of image super-resolution with impulse noise. 1. Introduction Many image processing tasks like reconstruction or segmentation can be done efficiently by solving convex optimization problems. Recently these models received considerable attention and this led to some breakthrough. Among them are the new sampling theorems [5] and the impressive results obtained using sparsity or regularity assumptions in image reconstruction (see e.g. [4]). These results motivate an important research to accelerate the convergence speed of the minimization schemes. In the last decade, many algorithms like iterative thresholding or dual approaches were reinvented by the “imaging community” (see for instance [2, 3] for old references). Recently, the “mathematical programming community” got interested in those problems and it led to some drastic improvements. As examples let us cite the papers by Y. Nesterov [9, 10] and M. Teboulle [1] which improve by one order of magnitude most first order approaches. In this paper, we mainly follow the lines of Y. Nesterov [9]. We consider the problem of minimizing the sum of lp -norms (p ∈ {1, 2, ∞}) of affine transforms of the image. The general mechanism of the algorithms we propose consists in smoothing the problem and solve it with an efficient first order scheme. Our contribution is mainly to extend the results of [9] to a more general setting and to propose a dual variant which behaves better in all problems we tested. We also give convergence rates for the proposed algorithms. We believe, this gives some insight on the important factors that influence the algorithms efficiency and helps designing solvable problems. SAMPTA'09 2. The problems considered In this paper, we consider the following seminal model of image deterioration: u0 = Du + b (1) where u is an original, neat image, D : Rn → Rm is some known linear transform, b ∈ Rm is some additive noise and u0 ∈ Rm is a given observed image. This simple formalism actually models many real situations. For instance, D can be an irregular sampling and a convolution. In this case recovering u from u0 is a super-resolution problem [7]. Other applications include image inpainting, compression noise reduction, texture+cartoon decompositions, reconstruction from noisy indirect measurements... Finding u from the observation u0 is an inverse problem. There exists many ways to solve it. In this paper, we concentrate on two variational models. The first one consists in solving the following convex problem:     min ||Bx||1 + λ||Dx − u0 ||p  . | {z } x∈X (2) Ψ(x) The second one consists in solving: ¡ ¢ min ||y||1 + λ||DB ∗ y − u0 ||p . y∈Y (3) In both problems, B : Rn → Ro is a linear transform, || · ||p denotes the standard lp -norm and X and Y are simple convex sets (like Rn or [0, 1]n ). The interpretation of the first model is as follows: we look for an image x which minimizes ||Bx||1 such that Dx is close to u0 . The function x 7→ ||Bx||1 can be seen as a regularity a priori on the image. For instance, if B is the discrete gradient, then it corresponds to the total variation. If B is some wavelet transform, it is equivalent to a Besov semi-norm [6]. p must be chosen depending on the statistics of the additive noise. For instance, p should be equal to 2 for Gaussian noise, to 1 for impulse noise and to ∞ for uniform noise. The interpretation of the second model is the following: we look for a decomposition y of the restored image in some dictionary B ∗ such that its reconstruction B ∗ y is close to u0 . Minimizing the l1 -norm of y is known to favor sparse structures. The underlying assumption is thus that the original image u is sparse in the dictionary B ∗ . 197 From a numerical point of view, both problems are very similar. However, the first one is slightly more general and complicated than the second. We will thus give a detailled analysis of its resolution and only provide numerical results for the second one. The remaining of the paper is as follows. We first present an algorithm based on a regularization of the primal problem (2). Then we present a technique to regularize a dual version of (2). Finally we propose theoretical and numerical comparisons of both techniques on a problem of image super-resolution. Due to space limitations, we only provide the main ideas in this paper. We refer the reader to [12] (in French), for the proofs of the propositions. 3. Smoothing of the primal problem In this section, we propose a method to minimize (2). Its principle is exactly the same as the method proposed by Y. Nesterov in [9]: 1. Smooth the non-differentiable terms in (2). 2. Solve the regularized problem using an accelerated gradient method. The only difference is that we do not require the set X to be bounded, which requires a slightly different analysis. Now let us present some details of this approach. A key observation to solve (2) is that it can be rewritten as a so called Let p′ denote the conjugate of ´ ³ min-max problem. p i.e. p1′ + p1 = 1 . We can rewrite problem (2) as follows: µ ¶ ¡ ¢ min max hBx, y1 i + λhDx − u0 , y2 i (4) x∈X y∈Y   =     min max (hAx − h, yi) x∈X  y∈Y  | {z } (5) Ψ(x) where h·, ·i denotes the canonical scalar product, ¸ · · ¸ 0 B , h= A= and λu0 λD (6) Y = {y = (y1 , y2 ) ∈ Ro ×Rm , ||y1 ||∞ ≤ 1, ||y2 ||p′ ≤ 1}. (7) The function Ψ is a conjugate function and the set Y is bounded. It can thus be smoothed using a Moreau regularization. Let us denote: ´ ³ µ (8) Ψµ (x) = max hAx − h, yi − ||y||22 . y∈Y 2 This function can be shown to be L-Lipschitz differentiable: ||∇Ψµ (x1 ) − ∇Ψµ (x2 )||2 ≤ L||x1 − x2 ||2 with L = |||A|||2 µ and |||A||| = max x∈Rn ,||x||2 ≤1 (9) (||Ax||2 ). Furthermore, it is a good uniform approximation of Ψ in the following sense: µ 0 ≤ Ψ(x) − Ψµ (x) ≤ D. (10) 2 SAMPTA'09 where D = µ ¶ ¢ ¡ max ||y||22 . Thus, we can make the dify∈Y ference between Ψ and Ψµ as small as desired by decreasing µ. The approximation Ψµ is actually very common in image processing. For instance, when p = 1, it corresponds to the approximation of the absolute value by a Huber function. When p = ∞ it is slightly more difficult, but it can still be computed in closed form. The smoothed problem writes: min (Ψµ (x)) . x∈X (11) It consists in minimizing a differentiable function over a simple set. We can thus apply projected gradient like algorithms to solve it. Unfortunately, µ has to be chosen small in order to get a good approximate solution. This requires to use small step sizes in the gradient descent and thus results in a very slow convergence rate. The main observation of Y. Nesterov in [9] is that using an accelerated version of the projected gradient methods can actually compensate the approximation error. This results ¡ ¢ in a convergence rate in O N1 (where N is the iteration counter), while other first order approaches ³ ´ like projected 1 subgradient descents converge as O √N . Now let us write down the complete algorithm to solve (11). Let x∗µ denote a solution of (11) (it is not unique in general). We propose the following algorithm: Algorithm 1 (Primal) Choose a number of iterations N . Set a starting point x0 (as close as possible to x∗µ ). |||A|||·||x0 −x∗ ||2 µ Set µ = µ(N ) = N 0 Set A = 0, g = 0 and x = x . for k = 0 toq N do 1 a = L + L12 + L2 A ¡ ¢ v = ΠX x0 − g y = Ax+av A+a ³ ´ ∇Ψ (y) x = ΠX y − Lµ . g = g + a∇Ψµ (x) A=A+a end for Set xN = x. Our main convergence results are as follows. Let x∗ denote a solution of (2). Proposition 1 xN converges to the set of minimizers of (2). Proposition 2 The worst case convergence rate is: √ 2|||A||| · ||x0 − x∗µ ||2 D N ∗ . (12) Ψ(x ) − Ψ(x ) ≤ N Note that the distance ||x0 − x∗µ ||2 is unknown in general, so that Algorithm 1 might not seem implementable. In the case where X is a compact set, this quantity can be bounded above by the diameter of X. When X is not bounded, it actually suffices to choose µ of order |||A||| to N 198 ¡ ¢ get a precision of order O N1 . Algorithm (1) is thus im¡ ¢ plementable and converges as O N1 . This convergence rate is neatly sublinear and might seem bad at first sight. Actually, it is somehow optimal. Indeed, A. Nemirovski shows in [8] that some instances of problems like (5) ¡can-¢ not be solved with a better rate of convergence than O N1 using first order methods. 4. Smoothing of the dual problem In this section, we propose an approach alternative to the previous one. Its flavor is similar to a proximal-method. One way to understand this scheme is that we smooth the “dual” problem instead of the primal problem. Note that the min and the max in equation (5) cannot be inverted as we do not suppose X to be compact. So we cannot use properly speaking - the term dual problem. Instead of solving (2), we solve: ´ ³ ² min ||Bx||1 + λ||Dx − u0 ||p + ||x − x0 ||22 x∈X 2 (13) 0 where ² ∈ R+ ∗ and x should be chosen close to the set of minimizers of (2). It can be shown that as ² goes to 0, the unique solution of (13) converges to the Euclidean projection of x0 onto the set of minimizers of (2). We can rewrite (13) as a min-max problem: ¶ ² 0 2 min max (hAx − h, yi) + ||x − x ||2 (14) x∈X y∈Y 2   µ  ´ ³ ²   = max min hAx − h, yi + ||x − x0 ||22 (15) . y∈Y x∈X 2  | {z } Algorithm 2 (Dual) Choose a number of iterations N . Set a point x0 (as close as possible to X ∗ ). Set a starting point y 0 (as close as possible to y²∗ ). |||A|||·||x0 −x∗ ² ||2 Set ² = ²(N ) = . N Set A = 0, g = 0, x̄ = 0 and y = y 0 . for k = 0 toq N do a = L1 + L12 + L2 A ¡ ¢ v = ΠY y 0 − g z = Ay+av A+a ´ ³ y = ΠY z + ∇ΨL² (z) x̄ = x̄ + ax(y) (cf. equation (17)) g = g − a∇Ψ² (y) A=A+a end for x̄ Set x̄N = A . This algorithm can be shown to have the following properties. Proposition 3 x̄N converges to the projection of x0 onto the set of minimizers of (2). Proposition 4 The worst case convergence rate is: √ 2|||A||| · ||x0 − x∗² ||2 D N ∗ Ψ(x̄ ) − Ψ(x ) ≤ . (18) N Rate (18) is actually very similar to (12). It is thus natural to wonder if there is an interest in using this dual approach. Let us present some interesting aspects of this scheme: • In the dual approach, the solution of the regularized problem is unique. This guarantees a certain stability of the iterates. Ψ² (y) Note that we can invert the min and the max only because the term 2² ||x − x0 ||22 makes the problem coercive in x. Now, the important observation is that the function Ψ² is the conjugate of a strongly convex function. It is thus concave and Lipschitz differentiable: ||∇Ψ² (y1 ) − ∇Ψ² (y2 )||2 ≤ L||x1 − x2 ||2 • We can show an additional convergence rate in norm to the regularized solution. Namely, for a fixed ², we have for all k: ||x̄k − x∗² ||22 ≤ (16) ³ ´ ² x(y) = arg min hAx − h, yi + ||x − x0 ||22 . (17) 2 x∈X Actually, a slight modification of Nesterov’s scheme (an ergodic version) can be shown to ensure convergence of xN with the desired convergence rate. In the following, we detail briefly our main results. Let x∗² denote the solution of (13) and y²∗ denote a solution of (15). Let X ∗ denote the set of minimizers of (2) and let us consider the following algorithm: SAMPTA'09 (19) • In practical experiments, model (13) with a small ² leads to slightly better SNR than model (2) for some restoration purposes in image processing. 2 ∀(y1 , y2 ) ∈ Y × Y with L ≤ |||A||| . Problem (15) ² consists in maximizing a Lipschitz differentiable concave function over a convex set. It thus seems interesting to use a scheme similar to Algorithm 1 on this problem. Unfortunately we will get a convergence rate on the dual variable y and not on the variable of interest: D|||A||| ² · k2 • The practical convergence rates of the dual approach were better than those of the primal approach in all our experiments. To conclude the theoretical part of this paper, let us precise that problem (3) can be solved with the same algorithms. However, it is preferable not to regularize the term y 7→ ||y||1 which can be minimized using accelerated softthresholding algorithms [1, 10, 12]. 5. Numerical results In this section we present some comparisons for a problem of image zooming with impulse noise. To solve this problem, we simply set: 199 4 10 3 10 2 Ψ(xk) − Ψ(x*) 10 1 10 0 10 (a) (b) −1 10 Nesterov Primal Nesterov Dual −2 10 Projected Gradient Primal Projected Gradient Dual −3 10 0 500 1000 1500 2000 2500 3000 Number of iterations 3500 4000 4500 5000 Figure 1: Cost function w.r.t. the number of iterations. (c) • D: convolution by a low-pass filter followed by a down sampling of factor d in the x and y directions. • p = 1 (which is adapted to impulse noise). • B: a redundant wavelet transform. We set B to be the Dual-Tree Complex Wavelet Tranform (DTCW) [11]. In that case |||A|||2 can be computed explicitly. For the general case, let us point out that iterated power algorithms provide good approximations of |||A∗ A||| = |||A|||2 . In Figure 1, we chose ² = 0.045 and µ = 10−5 . This ensures that both methods lead to the same asymptotic accuracy (measured in terms of objective function). We can see that the dual approach seems to have a better behavior. For this problem reducing Ψ(x0 ) − Ψ(x∗ ) by a factor 103 is enough for visual purposes. The dual Nesterov approach requires 450 low cost iterations. The smoothing method proposed by Y. Nesterov requires 1700 iterations. The classical Cauchy steps requires much more than 5000 iterations to reach this goal. We can thus see the major improvement of Y. Nesterov’s scheme on these problems. We carried out many other experiments which led to the same conclusion. Figure 2 shows the solution of the problem. The DTCW transform allows to retrieve thin details but slightly blurs the image. Further investigations will be led to address this issue. 6. Acknowledgments The authors would like to thank the CS Compagny in Toulouse (France) for partial funding of this research work. References: [1] A. Beck and M. Teboulle. Fast iterative shrinkagethresholding algorithm for linear inverse problems. SIAM J. on Imaging Science, to appear. [2] A. Bermudez and C. Moreno. Duality methods for solving variational inequalities. Comp. and Maths. with Appls., 7:43-58, 1981. SAMPTA'09 Figure 2: Restoration of a down-sampled and noised image. (a) Original image, (b) down-sampled (by a factor 2) and noised image by 10% of ”Salt & Pepper” noise and finally (c) result of the Nesterov dual approach. [3] R.J. Bruck. On the weak convergence of an ergodic iteration for the solution of variational inequalities for monotone operators in hilbert space. J. Math. Anal. Appl, 61:159-164, 1977. [4] J.F. Cai, R. Chan, Z.W. Shen, and L.X. Shen. Convergence analysis of tight framelet approach for missing data recoverys. Advances in Computational Mathematics, to appear. [5] E. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Inf. Theory, 2006. [6] A. Chambolle, R. Devore, N.Y. Lee, and B.J. Lucier. Nonlinear wavelet image processing: Variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans. Image Processing, 7:319-335, 1998. [7] G. Facciolo, A. Almansa, J.-F. Aujol, and Vicent Caselles. Irregular to regular sampling, denoising and deconvolution. SIAM Journal on Multiscale Modeling and Simulation, in press. [8] A. Nemirovski. Information-based complexity of linear operator equations. Journal of Complexity, 8:153-175, 1992. [9] Y. Nesterov. Smooth minimization of non-smooth functions. Math. Program., 103(1):127-152, 2005. [10] Y. Nesterov. Gradient methods for minimizing composite objective function. CORE Discussion Paper 2007/76, 2007. [11] I. W. Selesnick, R. G. Baraniuk, and N. G. Kingsbury. The dual-tree complex wavelet transform. IEEE Signal Processing Magazine, 22(6), Nov. 2005. [12] P. Weiss. Algorithmes rapides d’optimisation convexe. Applications à la restauration d’images et à la détection de changements. PhD thesis, Université de Nice Sophia Antipolis, Dec. 2008. 200 Image Inpainting Using a Fourth-Order Total Variation Flow Carola-Bibiane Schönlieb∗ Andrea Bertozzi† Martin Burger‡ Lin He§ April 19, 2009 Abstract i.e., λ(x) = λ0 >> 1 in Ω \ D and 0 in D. The corresponding steepest descent for the total variation inpainting model reads We introduce a fourth-order total variation flow for image ut = −p + λ(f − u), p ∈ ∂ |Du| (Ω), (1) inpainting proposed in [5]. The well-posedness of this new inpainting model is discussed and its efficient numerical real- where p is an element in the subdifferential of the total variization via an unconditionally stable solver developed in [15] ation ∂ |Du| (Ω). The steepest-descent approach is used to is presented. numerically compute a minimizer of J , whereby it is iteratively solved until one is close enough to a minimizer of J . For the numerical computation an element p of the subdifferential 1 Introduction is approximatedpby the anisotropic diffusion ∇ · (∇u/|∇u|ǫ ), where |∇u|ǫ = |∇u|2 + ǫ. An important task in image processing is the process of filling in missing parts of damaged images based on the information Now, TV inpainting, while preserving edge information in obtained from the surrounding areas. It is essentially a type of the image, fails in propagating level lines (sets of image points interpolation and is referred to as inpainting. Given an image with constant grayvalue) smoothly into the damaged domain, f in a suitable Banach space of functions defined on Ω ⊂ R2 , and in connecting edges over large gaps in particular. In an an open and bounded domain, the problem is to reconstruct attempt to solve these issues from second order image diffuthe original image u in the damaged domain D ⊂ Ω, called sions, a number of third and fourth order diffusions have been inpainting domain. In the following we are especially inter- suggested for image inpainting, e.g., [7, 9]. In this paper we present a fourth-order variant of total variested in so called non-texture inpainting, i.e., the inpainting ation inpainting, called TV-H−1 inpainting. The inpainted of structures, like edges and uniformly colored areas in the image u of f ∈ L2 (Ω), shall evolve via image, rather than texture. In the pioneering works of Caselles et al. [6] (with the term disocclusion instead of inpainting) and Bertalmio et al. [2] partial differential equations have been first proposed for digital non-texture inpainting. In subsequent works variational models, originally derived for the tasks of image denoising, deblurring and segmentation, have been adopted to inpainting. The most famous variational inpainting model is the total variation (TV) model, cf. [8, 10, 13, 14]. Here, the inpainted image u is computed as a minimizer of the functional ut = ∆p + λ(f − u), p ∈ ∂T V (u), (2) with ( |Du| (Ω) T V (u) = +∞ if |u(x)| ≤ 1 a.e. in Ω otherwise. (3) This inpainting approach has been proposed by Burger, He, and Schönlieb in [5] as a generalization of the sharp interface limit of Cahn-Hilliard inpainting [3, 4] to grayvalue images. The L∞ bound in the definition (3) of the total variation 1 2 functional T V (u) is motivated by this sharp interface limit J (u) = |Du| (Ω) + kλ(f − u)kL2 (Ω) , 2 and is part of the technical setup, which made it easier to where |Du| (Ω) is the total variation of u (cf. [1]), and λ is the derive rigorous results for this scheme. A similar form of this indicator function of Ω \ D multiplied by a (large) constant, higher-order TV approach already appeared in the context ∗ Department of Applied Mathematics and Theoretical of decomposition and restoration of grayvalue images, see for Physics (DAMTP), Centre for Mathematical Sciences, Wilber- example [12]. In the following we shall recall the main rigorforce Road, Cambridge CB3 0WA, United Kingdom. Email: ous results obtained in [5], present an unconditionally stable c.b.s.schonlieb@damtp.cam.ac.uk solver for (2), and show a numerical example emphasizing the † Department of Mathematics,UCLA (University of California Los Angeles), 405 Hilgard Avenue, Los Angeles, CA 90095-1555, USA. superiority of the fourth-order TV flow over the second-order Email: bertozzi@math.ucla.edu one. ‡ Institut für Numerische und Angewandte Mathematik, Fachbereich Mathematik und Informatik, Westfälische Wilhelms Universität (WWU) Münster, Einsteinstrasse 62, D 48149 Münster, Germany. Email: martin.burger@wwu.de § Johann Radon Institute for Computational and Applied Mathematics (RICAM), Austrian Academy of Sciences, Altenbergerstrasse 69, A4040 Linz, Austria. Email: lin.he@oeaw.ac.at SAMPTA'09 2 Well-Posedness of the Scheme In contrast to its second-order analogue, the well-posedness of (2) strongly depends on the L∞ bound introduced in (3). 1 201 Acknowledgments This is because of the lack of maximum principles which, in the second-order case, guarantee the well-posedness for all smooth monotone regularizations of p. The existence of a steady state for (2) is given by the following theorem. CBS acknowledges the financial support provided by project WWTF Five senses-Call 2006, Mathematical Methods for Image Analy- sis and Processing in the Visual Arts, by the Wissenschaftskolleg (Graduiertenkolleg, Ph.D. program) of the Faculty for Mathematics at Theorem 1 [5, Theorem 1.4] Let f ∈ L2 (Ω). The stationary equation ∆p + λ(f − u) = 0, p ∈ ∂T V (u) (4) the University of Vienna (funded by FWF), and by the FFG project Erarbeitung neuer Algorithmen zum Image Inpainting, projectnumber 813610. Further, this publication is based on work supported by Award No. KUK-I1-007-43, made by King Abdullah University of Science and admits a solution u ∈ BV (Ω). Technology (KAUST). For the hospitality and the financial support dur- Results for the evolution equation (2) are a matter of future research. In particular it is highly desirable to achieve asymptotic properties of the evolution. Note that additionally to the fourth differential order, a difficulty in the convergence analysis of (2) is that it does not follow a variational principle. ing parts of the preparation of this work, CBS thanks IPAM (UCLA), the US ONR Grant N000140810363, and the Department of Defense, NSF Grant ACI-0321917. The work of MB has been supported by the DFG through the project Regularization with singular energies. References 3 Unconditionally Stable Solver [1] L. Ambrosio, N. Fusco, and D. Pallara, Functions of Bounded Variation and Free Discontinuity Problems, Mathematical Monographs, Oxford University Press, 2000. [2] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester, Image inpainting, Computer Graphics, SIGGRAPH 2000, July, 2000. [3] A. Bertozzi, S. Esedoglu, and A. Gillette, Inpainting of Binary Images Using the Cahn-Hilliard Equation. IEEE Trans. Image Proc. 16(1) pp. 285-291, 2007. [4] A. Bertozzi, S. Esedoglu, and A. Gillette, Analysis of a twoscale Cahn-Hilliard model for image inpainting, Multiscale Modeling and Simulation, vol. 6, no. 3, pages 913-936, 2007. [5] M. Burger, L. He, C. Schönlieb, Cahn-Hilliard inpainting and a generalization for grayvalue images, UCLA CAM report 0841, June 2008. [6] V. Caselles, J.-M. Morel, and C. Sbert, An axiomatic approach to image interpolation, IEEE Trans. Image Processing, 7(3):376386, 1998. [7] T.F. Chan, S.H. Kang, and J. Shen, Euler’s elastica and curvature-based inpainting, SIAM J. Appl. Math., Vol. 63, Nr.2, pp.564–592, 2002. [8] T. F. Chan and J. Shen, Mathematical models for local nontexture inpaintings, SIAM J. Appl. Math., 62(3):10191043, 2001. [9] T. F. Chan and J. Shen, Non-texture inpainting by curvature driven diffusions (CDD), J. Visual Comm. Image Rep., 12(4):436449, 2001. [10] T. F. Chan and J. Shen, Variational restoration of non-flat image features: models and algorithms, SIAM J. Appl. Math., 61(4):13381361, 2001. [11] D. Eyre, An Unconditionally Stable One-Step Scheme for Gradient Systems, Jun. 1998, unpublished. [12] S. Osher, A. Sole, and L. Vese. Image decomposition and restoration using total variation minimization and the H -1 norm, Multiscale Modeling and Simulation: A SIAM Interdisciplinary Journal, Vol. 1, Nr. 3, pp. 349-370, 2003. [13] L. Rudin and S. Osher, Total variation based image restoration with free local constraints, Proc. 1st IEEE ICIP, 1:3135, 1994. [14] L.I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D, Vol. 60, Nr.1-4, pp.259-268, 1992. [15] C.-B. Schönlieb, and A. Bertozzi, Unconditionally stable schemes for higher order inpainting, in preparation. Motivated by the idea of convexity splitting schemes, e.g., [11], Bertozzi and Schönlieb propose in [15] the following timestepping scheme for the numerical solution of (2): Uk+1 −Uk ∆t + C1 ∆∆Uk+1 + C2 Uk+1 = C1 ∆∆Uk ∇Uk )) + C2 Uk + λ(f − Uk ), −∆(∇ · ( |∇U k| (5) ǫ with C1 > 1/ǫ, C2 > λ0 . Here, Uk is the kth iterate of the time-discrete scheme, which approximates a solution u of the continuous equation at time k∆t, ∆t > 0. It can be shown that (5) defines a numerical scheme that is unconditionally stable, and of order 2 in time, cf. [15]. 4 Numerical Results In Figure 1 a result of the TV-H −1 inpainting model computed via (5) and its comparison with the result obtained by the second order TV-L2 inpainting model for a crop of the image is presented. The superiority of the fourth-order TVH −1 inpainting model to the second order model with respect to the desired continuation of edges into the missing domain is clearly visible. Figure 1: First row: TV-H−1 inpainting (2): u(1000) with λ0 = 103 . Second row: (l.) u(1000) with TV-H−1 inpainting, (r.) u(5000) with TV-L2 inpainting (1) SAMPTA'09 2 202 Image Segmentation Through Efficient Boundary Sampling Alex Chen(1) , Todd Wittman(1) , Alexander Tartakovsky(2) , Andrea Bertozzi(1) (1) Department of Mathematics, University of California, Los Angeles, 520 Portola Plaza, Los Angeles, CA 90095 (2) Department of Mathematics, University of Southern California, 3620 S. Vermont Ave., KAP 108, Los Angeles, CA 90089 achen@math.ucla.edu, wittman@math.ucla.edu, tartakov@math.usc.edu, bertozzi@math.ucla.edu Abstract: This paper presents a combined geometric and statistical sampling algorithm for image segmentation inspired by a recently proposed algorithm for environmental sampling using autonomous robots [1]. 1. Introduction Segmentation is one of the most important problems in image processing. Partitioning an image into a small number of homogeneous regions highlights important features, allowing a user to analyze the image more easily. Applications include medical imaging, computer vision, and geospatial target detection. Image segmentation methods can be subdivided into region-based vs. edge-based methods. Region-based methods include the Mumford-Shah [2] and related Chan-Vese [3] methods which both involve energy minimization with a least squares fit of the data and a partition, between regions, whose length is minimized. Edge-based methods include the well-known image snakes [4] and Canny edge detector [5]. Other approaches to segmentation have also been effective. Statistical methods such as region competition rely on the fact that images have repetitive features that can be learned and exploited to obtain a segmentation [6]. A more recent fast statistical method called DistanceCut [7] is semisupervised (the user identifies segments in each region) and is based on weighted distances and kernel density estimation. All of these methods involve, at some level, sampling all the pixels in an image. For applications involving highdimensional or large data sets, it makes sense to subsample the image. This is especially important for high resolution data where it can be prohibitive to perform calculations on every pixel in the image. The proposed segmentation method is designed for this kind of application and is based on ideas for cooperative environmental sampling with robotic vehicles. The UUV-gas algorithm [8] utilizes robots that “walk” in a sinusoidal path along the boundary between two regions, changing directions as they cross from one region into another. This tracking method theoretically utilizes only those points that are near the boundary in question, resulting in substantial savings in run-time. The sinusoidal pattern has also been suggested as an efficient method for atomic force microscopy scanning [9]. Interestingly, SAMPTA'09 the same idea of tracking is behind the sinusoidal walking pattern in ants following pheromone trails [10]. As with curve evolution methods in image processing, noise can cause problems, since the tracking is done as a local search. It was proposed [1] that the use of a changepoint detection algorithm, e.g., Page’s cumulative sum (CUSUM) algorithm [11] could improve tracking performance in noisy images. Testbed implementations of the boundary tracking algorithm exploiting change-point detection methods suggested that robots can indeed track boundaries efficiently in the presence of moderately intense noise [12]. We propose to adapt the above tracking algorithms to the problem of segmentation, with the goal of computational efficiency. Further improvements can be made that are not practical in the environmental tracking case. Many of these improvements are based on hypothesis testing for two regions, with the use of the CUSUM algorithm as a special case. 2. A two level sampling algorithm The algorithm has two levels, namely a global searching method, which locates a boundary point, and a local sampling algorithm, which tracks the boundary using the global method as an initial point. Occasionally, if the tracker strays too far from the boundary, additional uses of the global algorithm are needed. We briefly discuss several options for the global search and then focus on the local sampling algorithm. 2.1 Global searching method – Locate an edge Initialized at some point, the global search looks for some instance of the boundary. This can be done in a few ways. One method is simply to move out in a spiral pattern until a boundary point is detected (see Figure 1). However, if the boundary is small and far away from the initial point, it may be positioned between revolutions of the spiral and missed. Other options include deterministic paths that do not have the tendency to miss boundaries or stochastic paths using a random walk. These searching methods assume no prior knowledge of the boundary location, but they can be easily modified when some information is known. Another possibility is to implement a coarse segmentation of the data first and use the resulting boundary detection as an initialization for the local sampling. More 203 details on the last option are given later. Once a boundary point has been detected, the local sampling algorithm begins. 2.2 Local sampling algorithm – Track an edge In the environmental tracking problem [1, 8], a robot tracks the boundary between two regions. The local sampling step is initialized at a point near the boundary, obtained from the global search. The robot then steers using a bang-bang steering controller, travelling in circular arcs, changing its direction of movement when it crosses into a different region. It is relatively straightforward to adapt the algorithm for an image with domain Ω. As before, the problem is to find the boundary B between two regions, which will be labelled Ω1 and Ω2 , so that Ω = Ω1 ∪ Ω2 ∪ B and Ω1 ∩ Ω2 = ∅. Define an initial starting point ~x0 = (x10 , x20 ) for the boundary tracker and an initial value θ0 , representing the angle from the +x1 direction, so that the initial direction vector is (cos θ0 , sin θ0 ). Also define the step size V and angular increment ω, which depend on estimates for image resolution and characteristics of the edge to be detected. In general, V is chosen smaller for greater detail, and ω is chosen smaller for straighter edges. A decision function between Ω1 and Ω2 must also be specified and has the following form:  if ~x ǫ Ω1 ,  1, 0, if ~x ǫ B, (1) d(~x) =  −1, if ~x ǫ Ω2 . The simplest example is thresholding of the image intensity I(~x) at a given spatial location ~x (in the case of a grayscale image):  if I(~x) > T ,  1, 0, if I(~x) = T , (2) d(~x) =  −1, if I(~x) < T , where T is a fixed threshold value. Later in this section we use statistical information about prior points sampled along the path to modify d(~x). At each step k, the direction θk and current location ~xk are updated recursively. Specifically, ~xk = ~xk−1 + V ∗ (cos θk−1 , sin θk−1 ) and θk is updated according to the location of the new tracking head ~xk . A simple update for θ is the bang-bang steering controller, defined by θk = θk−1 + ωd(~xk ). (3) An angle-correction modification [1] can be used for (3) if step k is a region crossing: θk = θk−1 + d(~xk )(tω − 2θref )/2, (4) where t is the number of steps since the last region crossing, and θref is a small fixed reference angle chosen based on the expected curvature of the edge being tracked. One stopping condition for the tracking of finite regions is termination if the tracker comes within some range of the first boundary point detected, given some minimum number of iterations. Midpoints of line segments formed from region crossings are labelled boundary points. SAMPTA'09 Figure 1: Left: Global search via a spiral-like pattern. The initial point is in blue, the final point (after a few iterations of local sampling) is in green, and the path is in red. Right: Basic procedure for the boundary tracking (local sampling) algorithm. The object is in cyan, the path of the tracking head is in red, and the detected boundary points are in yellow. Each small square represents one pixel. The tracker travels at fractional spatial values but samples at integral values. While the local sampling method works well for clean images, it is susceptible to unavoidable errors in noisy images. Averaging readings from nearby pixels can minimize errors in the decision due to noise. In particular, sequential change-point detection methods are well-suited for detecting and tracking image edges in noise. 2.3 Decision algorithm Change-point problems deal with detecting anomalies or changes in statistical behavior of data. The observations are obtained sequentially and, as long as their behavior is consistent with the normal state, one is content to let the process continue. If the state changes, then one is interested in detecting the change as soon as possible while minimizing false detections. More specifically, given a sequence of independent observations s1 = I(x1 ), . . . , sn = I(xn ) and two probability density functions (pdf) f (pre-change) and g (post-change), determine whether there exists N such that the pdf of si is f for i < N and g for i ≥ N . One of the most efficient change-point detection methods is the CUSUM algorithm proposed by Page in 1954 [11]. Write Zk = log[g(sk )/f (sk )] for the log-likelihood ratio and define recursively Uk = max (Uk−1 + Zk , 0) , U0 = 0 (5) the CUSUM statistic and the corresponding stopping time τ = min{k | Uk ≥ U }, where U is a threshold controlling the false alarm rate. Then τ is a time of raising an alarm. In our applications, assuming that f is the pdf of the data in Ω1 and g is the pdf in Ω2 , the value of τ may be interpreted as an estimate of the actual change-point, i.e., the boundary crossing from Ω1 to Ω2 . Note that if the pre-change and post-change densities f and g are completely specified, then the CUSUM algorithm performs optimally with respect to certain performance metrics [14]. However, in our applications these densities are usually unknown (while a Gaussian approximation may work well in certain scenarios). For this reason, the log-likelihood ratio Zk in (5) should be replaced 204 Figure 2: A 100 × 100 image was corrupted with additive Gaussian noise, N(0,0.5). Left: Boundary tracking without a change-point detection modification. Middle: Boundary tracking with the CUSUM algorithm. Right: Threshold dynamics [13]. Figure 3: A hybrid level set – boundary tracking segmentation on a 1000 × 1000 image. Left: Initial segmentation by threshold dynamics. The image is subsampled by a factor of 10 on each axis. Right: Final segmentation by boundary tracking, using points from the connected components of the initial segmentation as starting points for trackers. by a score function Gk sensitive to expected changes. Since we expect a change in the mean value, the appropriate score is Gk = sk − (θ1 + θ2 )/2, where θj is the mean of the previous observations si in Ωj . The resulting score-based CUSUM test is not guaranteed to be optimal anymore. Note, however, that this score is optimal for Gaussian distributions (i.e., when sensor noise and residual clutter may be well approximated by Gaussian processes) and can be easily adjusted to cover any member of the exponential family of distributions (Bernoulli, Poisson, double exponential, etc.). For further details, see [15]. Changes from Ω2 to Ω1 can also be tracked in this manner. Analogously to (5) define recursively the decision statistic Lk = max(Lk−1 − Gk , 0), L0 = 0 and the stopping time τ = min{k | Lk ≥ L}, where Gk is the score introduced above, which is taken to be equal to Zk if the distributions are known and where L is a threshold associated with a given false detection rate. Only one of the statistics Uk or Lk is used at a time, i.e., when the tracking head is in Ω1 , the change-detection statistic Uk is used for detecting a transition to Ω2 . Similarly, when the tracking head is in Ω2 , only Lk is used for detecting a change to Ω1 . Once the tracking head enters a new region, the other statistic is used, initialized at 0. Note that we have implicitly assumed that the intensity values on the path are independent observations. This assumption of independence is not entirely accurate, since SAMPTA'09 the samples are taken from the tracking path, which is not a random sampling of an area. However, if noise levels are large, independence of observations is a relatively accurate assumption due to the spatial independence of noise, while if noise levels are small, the use of a change-detection algorithm is less important. Furthermore, the proposed score-based CUSUM tests are robust with respect to prior assumptions, including independence. 3. Boundary Tracking Examples As mentioned above, one option for the global search is to run a coarse segmentation on a subsampled version of the image to obtain an initialization for the objects to be segmented. This “hybrid” method has an additional benefit of being able to detect mutiple objects and of giving a priori estimates for parameters in the decision function. The proposed two-stage hybrid boundary tracking algorithm that combines the UUV-gas algorithm with the CUSUM-based change-point detection identifies the true boundaries of an object accurately even in high levels of noise, as seen from Figure 2. The run-time and storage costs are minimal, compared to most other segmentation methods. An example of a noisy image is shown in Figure 3. The original image is 1000 × 1000. Threshold Dynamics [13] was first applied to a heavily subsampled version (100 × 100) of the image. Then one pixel from each connected 205 component was taken as the starting point for a boundary tracker. An example using multispectral data is shown in Figure 4. The hybrid method may be applied to more complicated images, but some problems arise. In the first step, when a coarse segmentation is applied to a subsampled image, small features may not be detected accurately. These small features will thus not be located by the boundary tracker either. Similarly, if some features are close in space, they may be placed in the same connected component class. In the boundary detection step, only one feature will thus be tracked. One solution is to use multiple initial points for each connected component. This will result in a decrease in efficiency but allow more objects to be tracked. Another problem is that different objects in the image may require different parameters to be chosen in the change-point detection algorithm. While some objects are detected accurately with certain parameters, often, some objects are not detected completely. Multichart CUSUM tests can be used effectively for this purpose. Figure 4: Boundary tracking of the San Francisco Bay coastline. A threshold of the Normalized Difference Vegetation Index (NDVI), commonly used for water detection [16], was taken as the decision function. 4. Discussion The boundary tracking algorithm provides a fast alternative to many traditional segmentation methods due to its local nature. With the addition of a change-point detection method, the combined hybrid algorithm allows for accurate boundary tracking and, therefore, segmentation even in highly noisy images. Furthermore, the algorithm can operate efficiently even in data of large size or high resolution, scaling only with the size of the boundary rather than the size of the image. While presented as a novel segmentation method, the boundary tracking algorithm can also be used in conjunction with other segmentation methods in a two-stage algorithm. Acknowledgments The authors thank C. Bachmann, Z. Hu and V. Meija. This work was supported by the Department of Defense, ONR SAMPTA'09 grant N000140810363, NSF ACI-0321917, ARO MURI 50363-MA-MUR. References: [1] Z. Jin and A. Bertozzi, “Environmental boundary tracking and estimation using multiple autonomous vehicles,” 2007 46th IEEE Conference on Decision and Control, pp. 4918–4923, December 2007. [2] D. Mumford and J. Shah, “Optimal approximation by piecewise smooth functions and associated variational problems,” Communications on Pure and Applied Math, vol. XLII, no. 5, pp. 577–684, July 1989. [3] T. Chan and L. Vese, “Active contours without edges,” IEEE Transactions on Image Processing, vol. 10, no. 2, pp. 266–277, February 2001. [4] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” International Journal of Computer Vision, vol. 1, no. 4, pp. 321–331, 1988. [5] J. Canny, “A computational approach to edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, pp. 679–714, 1986. [6] S. C. Zhu and A. L. Yuille, “Region competition: unifying snakes, region growing, and Bayes/MDL for multiband image segmentation,” IEEE Trans. on PAMI, vol. 18, no. 9, pp. 884–900, 1996. [7] X. Bai and G. Sapiro, “Distancecut: Interactive segmentation and matting of images and videos,” IEEE ICIP, vol. 2, pp. 249–252, 2007. [8] M. Kemp, A. L. Bertozzi, and D. Marthaler, “MultiUUV perimeter surveillance,” in Proceedings of 2004 IEEE/OES Workshop on Autonomous Underwater Vehicles, 2004, pp. 102–107. [9] P. I. Chang and S. B. Andersson, “Smooth trajectories for imaging string-like samples in AFM: A preliminary study,” in 2008 American Control Conference, Seattle, Washington, June 2008. [10] I. D. Couzin and N. R. Franks, “Self-organized lane formation and optimized traffic flow in army ants,” P. Roy. Soc. Lond. B. Bio., vol. 270, pp. 139–146, 2003. [11] E. S. Page, “Continuous inspection schemes,” Biometrika, vol. 41, no. 1-2, pp. 100–115, June 1954. [12] A. Joshi, T. Ashley, Y. Huang, and A. Bertozzi, “Experimental validation of cooperative environmental boundary tracking with on-board sensors.” preprint. [13] S. Esedoglu and Y. R. Tsai, “Threshold dynamics for the piecewise constant Mumford-Shah functional,” J. Comput. Phys., vol. 211, no. 1, pp. 367–384, 2006. [14] G. Moustakides, “Optimal stopping times for detecting changes in distributions,” Annals of Statistics, vol. 14, pp. 1379–1387, 1986. [15] A. G. Tartakovsky, B. L. Rozovskii, R. Blažek, and H. Kim, “Detection of intrusions in information systems by sequential change-point methods,” Statistical Methodology, vol. 3, no. 3, pp. 252–340, 2006. [16] J. Rouse, R. Hass, J. Schell, and D. Deering, “Monitoring vegetation systems in the grain plains with ERTS,” in Third ERTS Symposium, NASA SP-351 I, 1973, pp. 309–317. 206 SAMPTA'09 General Sessions SAMPTA'09 207 SAMPTA'09 208 The Class of Bandlimited Functions with Unstable Reconstruction under Thresholding Holger Boche and Ullrich J. Mönich Technische Universität Berlin, Heinrich-Hertz-Chair for Mobile Communications, Einsteinufer 25, 10578 Berlin, Germany. {holger.boche,ullrich.moenich}@mk.tu-berlin.de Abstract: The reconstruction of PW 1π -functions by sampling series is not possible in general if the samples are disturbed by the non-linear threshold operator which sets all samples whose absolute value is smaller than some threshold to zero. In this paper we characterize the set of functions for which the sampling series diverges as the threshold goes to zero and show that this set is a residual set. 1. Notation Before we start our discussion, we introduce some notations and definitions [4]. Let fˆ denote the Fourier transform of a function f , where fˆ is to be understood in the distributional sense. Lp (R), 1 ≤ p < ∞, is the space of all measurable, pth-power Lebesgue integrable functions on R, with the usual norm k · kp , and L∞ (R) is the space of all measurable functions for which the essential supremum norm k · k∞ is finite. For σ > 0 and 1 ≤ p ≤ ∞ we denote by PW pσ the Paley-Wiener space R σ of functions f with a representation f (z) = 1/(2π) −σ g(ω) eizω dω, z ∈ C, for some g ∈ Lp (−σ, σ). If f ∈ PW pσ then g(ω) = fˆ(ω). The norm for PW pσ , 1 ≤ p < ∞, is given by kf kPW pσ = Rσ (1/(2π) −σ |fˆ(ω)|p dω)1/p . Furthermore, we need the threshold operator. For complex numbers z ∈ C, the threshold operator κδ , δ > 0, is defined by ( z |z| ≥ δ κδ z = 0 |z| < δ. For continuous functions f : R → C, we define the threshold operator Θδ , δ > 0, pointwise, i.e., (Θδ f )(t) = κδ f (t), t ∈ R. 2. A well known fact [1, 2, 3] about the convergence behavior of the Shannon sampling series for f ∈ PW 1π is expressed by the following theorem. Theorem (Brown). For all f ∈ PW 1π and T > 0 fixed we have max N →∞ t∈[−T,T ] SAMPTA'09 f (t) − (Aδ f )(t) = N X k=−N f (k) sin(π(t − k)) = 0. π(t − k) (1) ∞ X f (k) k=−∞ |f (k)|≥δ sin(π(t − k)) . π(t − k) (2) Since f ∈ PW 1π we have limt→∞ f (t) = 0 by the Riemann-Lebesgue lemma, and it follows that the series in (2) has only finitely many summands, which implies Aδ f ∈ PW 2π ⊂ PW 1π . In general, Aδ f is only an approximation of f , and we want the function Aδ f to be close to f if δ is sufficiently small. The operator Aδ has several properties which complicate its analysis. Aδ , δ > 0, is non-linear. Furthermore, for each δ > 0, the operator Aδ : (PW 1π , k · kPW 1π ) → (PW 1π , k · k∞ ) is discontinuous. This implies that Aδ : (PW 1π , k · kPW 1π ) → (PW 1π , k · kPW 1π ) is discontinuous. For some f ∈ PW 1π , the operator Aδ is also discontinuous with respect to δ. Of course (2) can be written as ∞ X (Θδ f )(k) k=−∞ Motivation lim This theorem plays a fundamental role in applications, because it establishes the uniform convergence on compact subsets of R for a large class of functions, namely PW 1π , which is the largest space within the scale of Paley-Wiener spaces. The truncation of the series in (1) is done in the domain of the function f because only the samples f (k), k = −N, . . . , N are taken into account. In contrast, it is also possible to control the truncation of the series in the codomain of f by considering only the samples f (k), k ∈ Z, whose absolute value is larger than or equal to some threshold δ > 0. This leads to the approximation formula sin(π(t − k)) , π(t − k) (3) where Θδ denotes the threshold operator. Wireless sensor networks are one possible application where the threshold operator Θδ and the series (3) are important. The sensors sample some bandlimited signal in space and time and then transmit the samples to the receiver. In order to save energy, it is common to let the sensors transmit only if the absolute value of the current sample exceeds some threshold δ > 0. Thus, the receiver has to reconstruct the signal by using only the samples whose absolute value is larger than or equal to the threshold δ. 209 In addition to the sensor network scenario, the threshold operator can be used to model non-linearities in many other applications. For example, due to its close relation to the quantization operator, the threshold operator can be employed to analyze the effects of quantization in analog to digital conversion. 3. Problem Formulation and Main Result Since the series in (2) uses all “important” samples of the function, i.e., all samples that are larger than or equal to δ, one could expect Aδ f to have an approximation behavior similar to the Shannon sampling series. In particular the approximation error should decrease as the threshold δ goes to zero. But, we will see that Aδ f exhibits a significantly different behavior. In this paper we are interested in the structure of the set and the set D2 = {f ∈ PW 1π : lim sup|(Āδ f )(t)| = ∞ ∀ t ∈ R \ Z}. δ→0 Both threshold operators and thus Aδ and Āδ are meaningful in practical applications, and one would expect the difference being not important. However, as we will see, Āδ can be analyzed more easily. For t̂ ∈ R \ Z we furthermore define the sets D1 (t̂) = {f ∈ PW 1π : lim sup|(Aδ f )(t̂)| = ∞} δ→0 and D2 (t̂) = {f ∈ PW 1π : lim sup|(Āδ f )(t̂)| = ∞}. δ→0 Lemma 1 shows that we do not have to distinguish between the sets D1 and D1 (t̂) and between D2 and D2 (t̂). D1 = {f ∈ PW 1π : lim sup|(Aδ f )(t)| = ∞ ∀ t ∈ R \ Z}, Lemma 1. For all t̂ ∈ R \ Z we have D1 = D1 (t̂) and D2 = D2 (t̂). i.e., in the structure of the set of functions for which the approximation error |f (t) − (Aδ f )(t)| grows arbitrarily large for all t ∈ R \ Z as δ → 0. Proof. The inclusion D1 ⊂ D1 (t̂) is obvious. It remains to show that D1 (t̂) ⊂ D1 . Let f ∈ D1 (t̂). For all t1 ∈ R \ Z and δ > 0 a short calculation shows that Remark 1. The analysis of the operator Aδ is difficult because it is non-linear and discontinuous, and therefore the standard theorems of functional analysis, like the BanachSteinhaus theorem, cannot be used. 1 1 (Aδ f )(t1 ) − (Aδ f )(t̂) sin(πt1 ) sin(π t̂) ∞ |t̂ − t1 | X 1 = C1 (t1 , t̂, f ), ≤ kf kPW 1π π |t − k||t̂ − k| k=−∞ 1 δ→0 For the further discussion we need the following concepts from metric spaces [5]. A subset M of a metric space X is said to be nowhere dense in X if the closure [M ] does not contain a non-empty open set of X. M is said to be of the first category (or meager) if M is the countable union of sets each of which is nowhere dense in X. M is said to be of the second category (or nonmeager) if is not of the first category. The complement of a set of the first category is called a residual set. Sets of first category may be considered as “small”. According to Baire’s theorem [5] we have that in a complete metric space, the residual set is dense and a set of the second category. One property that shows the richness of residual sets is the following: The countable intersection of residual sets is always a residual set. In particular we will use the following fact in our proof. In a complete metric space an open and dense set is a residual set because its complement is nowhere dense. Theorem 1 will show that the set D1 it is a residual set. Thus the threshold operator destroys the good reconstruction behavior of the Shannon sampling series for “almost all” functions in PW 1π . 4. Proof of the Main Result In addition to the threshold operator that sets all samples whose absolute value is smaller than δ to zero, we consider the threshold operator that sets all samples whose absolute value is smaller than or equal to δ to zero. This operator gives rise to the sampling series (Āδ f )(t) := ∞ X k=−∞ |f (k)|>δ SAMPTA'09 sin(π(t − k)) f (k) π(t − k) (4) where C1 (t1 , t̂, f ) < ∞ is a constant that only depends on t1 , t̂, and f . It follows that |(Aδ f )(t1 )| ≥ |(Aδ f )(t̂)| sin(πt1 ) − C2 (t1 , t̂, f ). (5) sin(π t̂) Taking the limit superior on both sides of (5) gives lim sup|(Aδ f )(t1 )| = ∞. (6) δ→0 Since (6) is valid for all t1 ∈ R \ Z, it follows that f ∈ D1 . The same calculation shows that D2 = D2 (t̂). According to Lemma 1 it is sufficient to restrict the analysis to the sets D1 (t̂) and D2 (t̂) for some t̂ ∈ R \ Z. Furthermore, we can concentrate on one of both sets, because of the following lemma. Lemma 2. We have D1 = D2 . Proof. Let f ∈ D2 (t̂) be arbitrary but fixed. By the definition of D2 (t̂), we have lim supδ→0 |(Āδ f )(t̂)| = ∞. Thus, for every M > 0 there exists a δM > 0 such that |(ĀδM f )(t̂)| > M . Let T (M ) = {k ∈ Z : |f (k)| > δM } and f M = mink∈T (M ) |f (k)|. Then it follows that f M > δM . Moreover, for all δ with f M > δ > δM we have (Aδ f )(t̂) = ∞ X f (k) sin(π(t̂ − k)) π(t̂ − k) f (k) sin(π(t̂ − k)) = (ĀδM f )(t̂). π(t̂ − k) k=−∞ |f (k)|≥δ = ∞ X k=−∞ |f (k)|>δM 210 and consequently Consequently, sup|(Aδ f )(t̂)| > M. (7) |(Āδ̃M f )(t̂)| ≥ |(Āδ̃M f1 )(t̂)| − ǫ̃|T (M )| > M, δ>0 Since (7) is valid for all M > 0, it follows that supδ>0 |(Aδ f )(t̂)| = ∞, and, as a consequence, lim supδ→0 |(Aδ f )(t̂)| = ∞, because |(Aδ f )(t̂)| < ∞ for all δ > 0. This shows that f ∈ D1 (t̂), which implies that D2 (t̂) ⊂ D1 (t̂). The converse D2 (t̂) ⊃ D1 (t̂) is shown similarly. Hence D1 (t̂) = D2 (t̂), and the statement D1 = D2 follows from Lemma 1. In order to prove our main result, we need the important Lemma 3. Lemma 3. For all M ∈ N and t̂ ∈ R \ Z, D2 (t̂, M ) = {f ∈ PW 1π : sup|(Āδ f )(t̂)| > M } where the last inequality is due to (9). Therefore sup|(Āδ f )(t̂)| > M, δ>0 i.e., f ∈ D2 (t̂, M ), for all f ∈ PW 1π with kf1 − f kPW 1π < ǫ̃. Second, we show that D2 (t̂, M ) is dense in PW 1π . Let f ∈ PW 1π be arbitrary. We have to show that for every ǫ > 0 there exists a fǫ ∈ D2 (t̂, M ) such that kf − fǫ kPW 1π < ǫ. Let ǫ > 0 be arbitrary but fixed. Since PW 2π is dense in (1) PW 1π , there exists a fǫ ∈ PW 2π with δ>0 kf − fǫ(1) kPW 1π < is a residual set. Proof. Let M ∈ N and t̂ ∈ R \ Z be arbitrary but fixed. First, we show that D2 (t̂, M ) is an open set. Let f1 ∈ D2 (t̂, M ) be arbitrary. We have to show that there exists an ǫ > 0 such that, given any f ∈ PW 1π with kf − f1 kPW 1π < ǫ, f ∈ D2 (t̂, M ). By assumption, there exists a δM > 0 such that |(ĀδM f1 )(t̂)| > M. Furthermore, let T (M ) = {k ∈ Z : |f1 (k)| > δM } and f 1,M = mink∈T (M ) |f1 (k)|. Next, we choose δ̃M = δM + (f 1,M − δM )/2. Then we have that {k ∈ Z : |f1 (k)| > δ̃M } = T (M ). (2) . (9) For all f ∈ PW 1π with kf1 −f kPW 1π < ǫ̃ we have |f1 (k)− f (k)| < ǫ̃, k ∈ Z. It follows, for all k ∈ Z with |f (k)| > δ̃M , that |f1 (k)| ≥ |f (k)| − |f (k) − f1 (k)| > δ̃M − ǫ̃ > δM , i.e., k ∈ T (M ). Conversely, k ∈ T (M ) implies f1 (k) ≥ f 1,M , and it follows that 2L−1 X h(t, η, L) := g(t, η, L) :=h(t, η, L) − − k=−∞ |f (k)|>δ̃M ≤ X k∈T (M ) SAMPTA'09 N X (−1)k sin(π(t − k)) | | π(t − k) {z (1 − η) =:u1 } (−1)k sin(π(t − k)) . π(t − k) {z } =:u2 Note that g(k, η, L) = 0 for |k| ≤ N . We have ∞ sin(π(t̂ − k)) sin(π(t̂ − k)) X f1 (k) − π(t̂ − k) π(t̂ − k) k=−∞ |f1 (k) − f (k)| −1 X k=−N (10) |(Āδ̃M f )(t̂) − (Āδ̃M f1 )(t̂)| f (k) −2L < k < −L, −L ≤ k < 0, 0 ≤ k ≤ L, L < k < 2L, and Moreover, using (8) and (10), we obtain that ∞ X sin(π(t − k)) , π(t − k)  (−1)k (2(1 − η)+ 1−η  L k),   (−1)k (1 − η), h(k, η, L) =  (−1)k ,    (−1)k (2 − L1 k), > f 1,M − δ̃M + δM = δ̃M . = h(k, η, L) k=0 Thus we have (12) where |f (k)| ≥ |f1 (k)| − |f (k) − f1 (k)| > f 1,M − ǫ̃ {k ∈ Z : |f (k)| > δ̃M } = T (M ). ǫ . 3 Let N denote the smallest natural number such that N > t̂ (2) and fǫ (k) = 0 for all |k| > N . Furthermore, let T2 = (2) (2) = mink∈T2 |fǫ (k)|. {k ∈ Z : |fǫ (k)| 6= 0} and f (2) ǫ For 0 < η < 1 and L ∈ N, L > N , consider the functions h and g defined by k=−2L+1 |(Āδ̃M f1 )(t̂)| − M , δ̃M − δM |T (M )| (2) kfǫ(1) − fǫ(2) kPW 1π < We choose ǫ̃ < min (11) Moreover, there exists a fǫ ∈ PW 2π such that fǫ (k) 6= 0 only for finitely many k ∈ Z and (8) ! ǫ . 3 kg(t, η, L)kPW 1π ≤ kh( · , η, L)kPW 1π + ku1 kPW 1π + ku2 kPW 1π . (13) |f1 (k)|>δM sin(π(t̂ − k)) ≤ ǫ̃|T (M )| π(t̂ − k) The norm ku1 kPW 1π is upper bounded by ku1 kPW 1π < π + log(N + 1), 2 (14) 211 Observing that N − t̂ > 0, we obtain because 1 2π ku1 kPW 1π = Z Z N X π e−iωk (−1)k dω −π k=0 iω(N +1) e Z 1 π sin( N2+1 ω) 1− 1 = dω = dω 2π −π 1 − eiω π 0 sin( ω2 ) Z N +1 Z π sin( N2+1 ω) sin( π2 ω) dω = dω ≤ ω ω 0 0 Z Z 1 N +1 sin( π2 ω) π 1 dω + dω < + log(N + 1). ≤ ω ω 2 0 1 π A similar calculation gives ku2 kPW 1π < (15) kh( · , η0 (L), L)kPW 1π < 4. (16) Combining (13)–(16) gives, that for all L ∈ N, L > N there exists an 0 < η0 (L) < 1 such that kg( · , η0 (L), L)kPW 1π < 4 + π + 3 log(N + 1) =: C3 . It is important that the constant C3 does not depend on L. Next, we analyze Gǫ (t, L) = + µg(t, η0 (L), L), where µ > 0 is some real number that satisfies µ < ). By the choice of µ we have min(ǫ/(3C3 ), f (2) ǫ kfǫ(2) − Gǫ ( · , L)kPW 1π = µC3 < ǫ 3 (17) for all L > N . Combining (11), (12), and (17), we see that kf − Gǫ ( · , L)kPW 1π < ǫ (18) for all L > N , i.e., Gǫ ( · , L) lies in the ǫ-ball around f . Furthermore, for any L > N we can find a δ0 (L) that fulfills     1 µ < δ0 (L) < µ. max (1 − η0 (L))µ, 1 − L Since δ0 (L) < f (2) , by the definition of µ, it follows that ǫ (Āδ0 (L) Gǫ ( · , L))(t̂) N X = Gǫ (k, L) k=−N |Gǫ (k,L)|>δ0 (L) L X + Gǫ (k, L) k=N +1 |Gǫ (k)|>δ0 (L) = N X fǫ(2)(k) k=−N SAMPTA'09 sin(π(t̂ − k)) π(t̂ − k) sin(π(t̂ − k)) π(t̂ − k) L X (−1)k sin(π(t̂ − k)) sin(π(t̂ − k)) +µ π(t̂ − k) π(t̂ − k) k=N +1 = fǫ(2) (t̂) + µ L sin(π t̂) X 1 . π t̂ − k k=N k=N L−N X 1 1 = t̂ − k k + N − t̂ k=0 L−N X Z k+1 1 ≥ dτ τ + N − t̂ k k=0 Z L−N +1 1 = dτ τ + N − t̂ 0   L − t̂ , > log N − t̂ and consequently π + log(N ). 2 In addition we have kh( · , 0, L)kPW 1π ≤ 3, which can be proven easily, and limη→0 kh( · , η, L) − Therefore, there exists an h( · , 0, L)kPW 1π = 0. 0 < η0 (L) < 1 such that fǫ(2) (t) L X |(Āδ0 (L) Gǫ ( · , L))(t̂)| ≥µ |sin(π t̂)| log π  L − t̂ N − t̂  − |fǫ(2) (t̂)|. (19) The right-hand side of (19) can be made arbitrarily large by choosing L large. Let L1 > N be the smallest L such that the right hand side of (19) is larger than M . It follows that fǫ (t) = Gǫ (t, L1 ) is the desired function, because supδ>1 |(Āδ fǫ )(t̂)| ≥ |(Āδ0 (L1 ) fǫ )(t̂)| > M , i.e., fǫ ∈ D2 (t̂, M ), and because kf − fǫ kPW 1π < ǫ, according to (18). Theorem 1. D1 and D2 are residual sets. Proof. Since D2 = D1 , by Lemma 2, it is sufficient to show that D2 is a residual set. Let t̂ ∈ R \ Z be arbitrary but fixed. We have \ D2 (t̂, M ). D2 (t̂) = M ∈N From Lemma 3 we know that all D2 (t̂, M ), M ∈ N, are residual sets. It follows that D2 (t̂) is a residual set, because the countable intersection of residual sets is a residual set. The application of Lemma 1 completes the proof. References: [1] J. L. Brown, Jr. On the error in reconstructing a nonbandlimited function by means of the bandpass sampling theorem. Journal of Mathematical Analysis and Applications, 18:75–84, 1967. Erratum, ibid, vol. 21, 1968, p. 699. [2] P. L. Butzer, W. Splettstößer, and R. L. Stens. The sampling theorem and linear prediction in signal analysis. Jahresber. d. Dt. Math.-Verein., 90(1):1–70, January 1988. [3] P. L. Butzer and R. L. Stens. Sampling theory for not necessarily band-limited functions: A historical overview. SIAM Review, 34(1):40–53, March 1992. [4] John R. Higgins. Sampling Theory in Fourier and Signal Analysis – Foundations. Oxford University Press, 1996. [5] Kôaku Yosida. Functional Analysis. Springer-Verlag, 1971. 212 On Subordination Principles for Generalized Shannon Sampling Series Andi Kivinukk (1) and Gert Tamberg (2) (1) Dept. of Math., Tallinn University, Narva Road 25, 10120 Tallinn, Estonia (2) Dept. of Math., Tallinn University of Technology, Ehitajate tee 5 19086 Tallinn, Estonia andik@tlu.ee, gert.tamberg@mail.ee Abstract: This paper provides some subordination equalities and their applications for the generalized Shannon sampling series. Concerning some direct (Jackson-type) approximation theorems we present certain subordination equalities, which show that the sampling operators, like Rogosinski, Zygmund, and Hann, are in some sense basic. 1. 2. Introduction For the uniformly continuous and bounded functions f ∈ C(R) the generalized Shannon sampling series (see [3] and references cited there) are given by (t ∈ R; W > 0) (SW f )(t) := ∞ X k=−∞ f( k )s(W t − k), W (1) where the condition for the operator SW : C(R) → C(R) to be well-defined is that for the kernel function s = s(t) we assume ∞ X |s(u − k)| < ∞ (u ∈ R). k=−∞ Let be given an even window function λ ∈ C[−1,1] , λ(0) = 1, λ(u) = 0 (|u| > 1,) then in our approach the kernel function will be defined by the equality s(t) := sλ (t) := Z1 λ(u) cos(πtu) du. (2) Many window functions have been used in applications (see, e.g. [1], [2], [4], [8]), in Signal Analysis in particular. Next window functions are important for our subordination equalities. 1) λ(r) (u) = 1 − ur , r ≥ 1 defines the Zygmund (or Riesz) kernel, denoted by zr = zr (t), which special case r = 1, the Fejér (or Bartlett, see [8]) kernel sF (t) = 21 sinc 2 2t , is well-known; the special case r = 2 is called also as the Welch [8] kernel; 2) λj (u) := cos π(j + 1/2)u, j = 0, 1, 2, . . . defines the Rogosinski-type kernel (see [5]) in the form j 3) λH (u) := cos2 kernel (see [6]) (−1) (j + 1/2) cos πt ; π (j + 1/2)2 − t2 πu 2 (3) = 21 (1 + cos πu) defines the Hann 1 sinct . sH (t) := 2 1 − t2 SAMPTA'09 Subordination equalities state some relations between two sampling operators. 2.1 Subordination by the Rogosinski-type sampling series Let consider the Rogosinski-type sampling operators RW,j defined by the kernel functions rj in (3). These kernel functions are deduced by the window functions λj (u) := cos π(j + 1/2)u, (j ∈ N ) and as a family of functions it forms an orthogonal system on [0, 1]. Therefore, we may represent a quite arbitrary window function λ by its Fourier series. But the Fourier representation allows us to prove for a given kernel function s the sampling series ∞ X s(t) = 2 s(j + 1/2) rj (t). j=0 0 rj (t) := Subordination equalities (4) Bσp In following stands for the Bernstein class, it consists of those bounded functions f ∈ Lp (R) (1 6 p 6 ∞), which can be extended to an entire function f (z) (z ∈ C) of exponential type σ. For s ∈ Bπ1 the sampling series above is absolutely convergence and by (1) we get formally the equalities SW f = 2 ∞ X s(j + 1/2)RW,j f, j=0 f − SW f = 2 ∞ X s(j + 1/2)(f − RW,j f ), j=0 calling as the subordination equalities, since the approximation properties of the general sampling operators (1) can be described via the approximation properties of the Rogosinski-type sampling operators RW,j : C(R) → C(R). We have proved that [5] kRW,j k = 2j 2 4X 1 = log(j + 1) + O(1), π 2ℓ + 1 π ℓ=0 213 2.3 Subordination by the Zygmund sampling series thus the subordination equalities are valid, when ∞ X |s(j + 1/2)| log(j + 1) < ∞. j=0 Similar subordination equalities can be deduced for some interpolating sampling series, i.e. for which the equation k k (S̃W f )( W ) = f(W ) (k ∈ Z) is valid. In [7] we have proved that the interpolating sampling operators will be defined by (1) using the kernel s̃(t) := 2s(2t), where the kernel s is generated by (2) with a window function λ for which λ(u) + λ(1 − u) = 1 (u ∈ [0, 1]). α Let the operator SW : C(R) → C(R) be defined by the 1 kernel sα := α s(α·) ∈ Bαπ (0 < α ≤ 2), where s ∈ 1 α Bπ , and the modified Hann operator HW,j is defined by the kernel (5) Then here we have (see [7], Th. 2.3 and 2.4) α SW f =4 ∞ X λ(u) = 1 − α s(2j + 1)HW,j f, ∞ X cj uj . j=r Then the formal subordination equalities are in the shape SW f = ∞ X j cj ZW f, j=r f − SW f = 2 (2j + 1) α sα sinc(αt). H,j (t) := 2 (2j + 1)2 − (αt)2 r The Zygmund sampling operator ZW will be defined by the window function λ(r) (u) = 1 − ur , r ≥ 1. Let us consider the kernel s in (2), for which the corresponding window function has the power series representation ∞ X j cj (f − ZW f ). j=r Several other subordination equalities and their applications will be presented. 3. Acknowledgments j=0 α f − SW f =4 ∞ X α f ). s(2j + 1)(f − HW,j j=0 2.2 Subordination by the Rogosinski-type sampling series: 2D case The two-dimensional generalized sampling series has the form (SW f )(x, y) ∞ X k l f ( , )s(W x − k, W y − l), := W W k,l=−∞ in particular, the multiplicative Rogosinski-type sampling series we define as (RW ;i,j f )(x, y) ∞ X k l := f ( , )ri (W x − k)rj (W y − l), W W k,l=−∞ where the Rogosinski-type kernel rj is defined by (3). Here our subordination equalities read as SW f = 4 ∞ X s(i + 1/2, j + 1/2)RW ;i,j f, i,j=0 f − SW f = 4 ∞ X s(i + 1/2, j + 1/2)(f − RW ;i,j f ), i,j=0 provided ∞ X |s(i + 1/2, j + 1/2)| log i log j < ∞. i,j=1 By given subordination equalities we see that the nonmultiplicative sampling series may be studied by the multiplicative Rogosinski-type sampling series. SAMPTA'09 This research was partially supported by the Estonian Sci. Foundation, grants 6943, 7033, and by the Estonian Min. of Educ. and Research, projects SF0132723s06, SF0140011s09. References: [1] H. H. Albrecht. A family of cosine-sum windows for high resolution measurements. In IEEE International Conference on Acoustics, Speech and Signal Processing, Salt Lake City, Mai 2001, pages 3081–3084. Salt Lake City, 2001. [2] R. B. Blackman and J. W. Tukey. The measurement of power spectra. Wiley-VCH, New York, 1958. [3] P. L. Butzer, G. Schmeisser, and R. L. Stens. An introduction to sampling analysis. In F Marvasti, editor, Nonuniform Sampling, Theory and Practice, pages 17–121. Kluwer, New York, 2001. [4] F. J. Harris. On the use of windows for harmonic analysis. Proc. of the IEEE, 66:51–83, 1978. [5] A. Kivinukk and G. Tamberg. On sampling series based on some combinations of sinc functions. Proc. of the Estonian Academy of Sciences. Physics Mathematics, 51:203–220, 2002. [6] A. Kivinukk and G. Tamberg. On sampling operators defined by the Hann window and some of their extensions. Sampling Theory in Signal and Image Processing, 2:235–258, 2003. [7] A. Kivinukk and G. Tamberg. Interpolating generalized Shannon sampling operators, their norms and approximation properties. Sampling Theory in Signal and Image Processing, 8:77–95, 2009. [8] E. H. W. Meijering, W. J. Niessen, and M. A. Viergever. Quantitative evaluation of convolutionbased methods for medical image interpolation. Medical Image Analysis, 5:111–126, 2001. 214 Linear Signal Reconstruction from Jittered Sampling Alessandro Nordio (1) , Carla-Fabiana Chiasserini (1) and Emanuele Viterbo (2) (1) Dipartimento di Elettronica, Politecnico di Torino1 , I-10129 Torino, Italy. (2) DEIS, Università della Calabria, via P. Bucci, Cubo 42C, 87036 Rende (CS), Italy alessandro.nordio@polito.it, carla.chiasserini@polito.it, viterbo@deis.unical.it Abstract: This paper presents an accurate and simple method to evaluate the performance of AD/DA converters affected by clock jitter, which is based on the analysis of the mean square error (MSE) between the reconstructed signal and the original one. Using an approximation of the linear minimum MSE (LMMSE) filter as reconstruction technique, we derive analytic expressions of the MSE. Through asymptotic analysis, we evaluate the performance of digital signal reconstruction as a function of the clock jitter, number of quantization bits, signal bandwidth and sampling rate. 1. Introduction A significant problem in Analog Digital Conversion (ADC) of wide-band signals is clock jitter and its impact on the quality of signal reconstruction. Indeed, even small amounts of jitter can measurably degrade the performance of analog to digital and digital to analog converters. Clock jitter is typically detrimental because the analog to digital process relies upon a sample clock to indicate when a sample or snapshot of the analog signal is taken. The sample clock must be evenly spaced in time; any deviation will result in a distortion of the digitization process. If one had a perfect ADC and a perfect DAC and used the same clock to drive both units, then jitter would not have any impact on the reconstructed signal. In a real world system, however, a digitized signal travels through multiple processors, usually it is stored on a disk or piece of tape for a while, and then goes through more processing before being converted back to analog. Thus, during reconstruction, the clock pulses used to sample the signal are replaced with newer ones with their own subtle variations. Jitter may have different probability distributions which may have different effects on the quality of the reconstructed signal. While several results are available in the literature on jittered sampling [4, 5] as well as on experimental measurements and instruments performance [1, 3, 6, 7], an analytical methodology for the performance study of the AD/DA conversion is still missing. In this paper we fill this gap and propose a method for evaluating the performance of AD/DC converters affected by This work was supported by Regione Piemonte through the VICSUM project. SAMPTA'09 jitter, which is based on the analysis of the mean square error (MSE) between the reconstructed signal and the original one [7]. As reconstruction technique, we consider linear filtering methods, which typically have low complexity and are used in a wide variety of fields. If jitter were known exactly, the linear minimum MSE (LMMSE) reconstruction technique would be optimal, since it minimizes the MSE of the reconstructed signal. In practice this is not the case, hence we apply a reconstruction filter with the same structure of the LMMSE filter, where we let the jitter vanish. Then, we apply asymptotic analysis to derive analytical expressions of the MSE on the quality of the reconstructed signal. We then show that our asymptotic expressions provide an excellent approximation of the MSE even for small values of the system parameters, with the advantage of greatly reducing the computation complexity. We apply our method to study the performance of the AD/DA conversion system as a function of the clock jitter, number of quantization bits, signal bandwidth and sampling rate. 2. System model Throughout the paper we use the following notations. Column vectors are denoted by bold lowercase letters and matrices are denoted by bold upper case letters. The (k, q)th entry of the generic matrix Z is denoted by (Z)k,q . The n × n identity matrix is denoted by In , while I is the generic identity matrix. (·)T is the transpose operator, while (·)† is the conjugate transpose operator. We denote by fx (z) the probability density function (pdf) of the generic random variable x, and by E[·] the average operator. 2.1 Signal sampling and reconstruction We consider an analog signal s(t) sampled at constant rate fs = 1/Ts over the finite interval [0, M Ts ). Ts is the sample spacing. When observed over a finite interval, s(t) admits an infinite Fourier series expansion. Let N ′ denote the largest index of the non-negligible Fourier coefficients, then N ′ /Ts can be considered as the approximate onesided bandwidth of the signal. We therefore represent the signal by using a truncated Fourier series with N = 2N ′ + 215 2.2 1 complex harmonics as ′   N t 1 X aℓ exp j2πℓ , s(t) = √ M Ts N ℓ=−N ′ (1) 0 ≤ t < M Ts . The vector a = [a−N ′ , . . . , a0 , . . . , aN ′ ]T represents the complex discrete spectrum of the signal. Observe that the signal representation given in (1) includes sine waves of any fractional frequency f0 = fs N ′ /M (when aℓ = 0 for −N ′ < ℓ < N ′ and a−N ′ = a∗N ′ ), which are frequently used as reference signal for calibration of ADC [1, 2]. We note that when the signal s(t) is observed in the frequency domain through its M samples, the spectral resolution is given by ∆f = 1/(M Ts ). Therefore, considering the expression in (1), the signal = 2MNTs . By defining bandwidth is given by B = N ∆f 2 the parameter M β= (2) N as the oversampling factor of the signal s(t) with respect to the Nyquist rate, we can also write: B= fs /2 β (3) In this work, we consider that sampling locations suffer from jitter, i.e., the m-th sampling location is given by tm = mTs + dm , (4) m = 0, . . . , M − 1, where dm is the associated independent random jitter whose distribution is denoted by fd (z). Typically, we have |dm | ≪ Ts . Let the signal samples be s = [s0 , . . . , sM −1 ]T where sm = s(tm ), 0 ≤ m ≤ M − 1. Using (1), the set of signal samples can be written as s = V† a where V is an N × M random Vandermonde matrix defined as   1 tm (V)ℓ,m = √ exp −j2πℓ (5) M Ts N ℓ = −N ′ , . . . , N ′ , and m = 0, . . . , N − 1. Note that V accounts for the jitter in the AD/DA conversion process, and that the parameter β defined in (2) also represents the aspect ratio of matrix V. Furthermore, in addition to jittered sampling, we assume that signal samples are affected by some additive noise and are therefore given by y =s+n where n is a vector of M noise samples, modeled as zero mean i.i.d. random variables. In practice, the dominant additive noise error is due to the n-bit quantization process [10]. In order to reconstruct the signal we consider a reconstruction technique that provides an estimate â of the discrete spectrum a. The reconstruction ŝ(t) of s(t) obtained from â is given by   N′ t 1 X âℓ exp j2πℓ ŝ(t) = √ M Ts N ′ ℓ=−N SAMPTA'09 Reconstruction error We consider as performance metric of the AD/DA conversion process the mean square error (MSE) associated to the estimate. The MSE, evaluated in the observation interval [0, M Ts ), can be equivalently computed in both time and frequency domains as: # "Z   M Ts E ka − âk2 2 MSE = E |s(t) − ŝ(t)| dt = N 0 More specifically, we consider the MSE relative to the signal average power, i.e., J= MSE σa2 which can be thought of as a noise to signal ratio and will be plotted using a dB scale in our results. Among the possible techniques that can be applied to reconstruct the original signal, we focus on linear filters that provide an estimate of a through the linear operation â = By where B is an N × M matrix. 3. Jittered AD/DA conversion with linear filtering Let us assume kak2 = σa2 N and E[nn† ] = σn2 I, then we define the signal to noise ratio (SNR) in absence of jitter as σ2 γ = a2 σn Under the assumption that E[aa† ] = σa2 I, the linear filter that provides the best performance in terms of MSE is the linear minimum mean square error (LMMSE) filter, which is given by  −1 1 Bopt = VV† + I V (6) γ In [8], it has been shown that, by applying the LMMSE filter, we obtain: h n   −1 oi 1 J = 2 E ka − âk2 = E tr γVV† + I σa N where tr{·} is the normalized matrix trace operator and the average is over the randomness in V. Note, however, that the filter in (6) cannot be employed in practice, since the jitters dm (hence the matrix V) are unknown (see the definition of V in (5)). We therefore resort to an approximation of the optimum filter Bopt , based on the assumption that jitter has a zero mean. In particular, we approximate V with the matrix F defined as, F = V|dm =0 with the generic√element of F given by, (F)ℓ,m =  m / N , ℓ = −N ′ , . . . , N ′ , and m = exp −j2πℓ M 0, . . . , N − 1. We observe that F is such that: FF† = βI and it is related to the discrete Fourier transform matrix. Substituting the approximation of V in (6), we obtain: −1  1 F (7) B= β+ γ 216 Notice that the filter in (7) is the LMMSE filter adapted to the linear model y = F† a + n. By letting ω = (β + 1/γ)−1 , the noise to signal ratio J provided by the approximate filter (7) is given by   1 2 = E ka − ωFyk σa2 N       † † 2 † = tr ω E FV VF − 2ωℜ{E FV } J where, from (3), we used the fact that 1/βTs = fs /β = 2B. Similarly, we define ′ µ2  2 N 1 X 2πℓ = lim Cd N,M →+∞ N M Ts ′ ℓ=−N β = Z 1/2 2 |Cd (4πBx)| dx (11) −1/2 d d (8) By using (10) (11), and (9), the asymptotic expression of J is given by where the operator E[·] averages over the random jitters (β,γ) J∞ = 1+ω 2 β(1+1/γ)−2ωβµ1 +ω 2 β(β−1)µ2 (12) 2 +1 + ω β γ d dm , m = 0, . . . , M − 1. Assuming the jitters to be independent [1] and with characteristic function Cd (w) = E[exp(jwz)], the first two d terms in (8) are given by It is worth mentioning that for large SNRs (i.e., in absence (β,γ) of measurement noise), J∞ reduces to   1 1 (β) (β,γ) J∞ = lim J∞ = 1 + − 2µ1 + 1 − µ2 (13) γ→∞ β β ′   N   2πℓ β X † Cd tr E FV = N M Ts d ′ Equation (13) provides us with a floor that represent the best quality of the reconstructed signal (minimum MSE) we can hope for. ℓ=−N ′   N   2πℓ (β − 1) X † † C FV VF = β + β E d N M Ts d ′ 2 4.1 ℓ=−N Hence, we can write: J =  1 1+ω β 1+ γ 2 +ω 2 β  (β − 1) N ′   N β X 2πℓ − 2ω Cd N M Ts ′ ℓ=−N N′ X Cd ℓ=−N ′  2πℓ M Ts  2 (9) In order to reduce the complexity of the computation of the reconstruction error and provide simple but accurate analytical tools, in the next section we let the parameters N and M go to infinity, while the ratio β = M/N is kept constant. We therefore derive an asymptotic expression of J, which we will show well approximates the expression in (9) even for small N and M . 4. Asymptotic analysis When N and M grow to infinity while β is kept constant, we define the asymptotic noise to signal ratio J as: (β,γ) J∞ = lim N,M →+∞ β (β,γ) ′ =   N 1 X 2πℓ Cd N,M →+∞ N M Ts ′ = Z lim β SAMPTA'09 ℓ=−N 1/2 −1/2 Let us now assume the jitter to be uniformly distributed with pdf given by  1 −dmax ≤ z ≤ dmax 2dmax fd (z) = 0 elsewhere where dmax is the maximum jitter, independent of the sampling frequency fs . In this case, the characteristic function of the jitter is given by Cd (w) = sin(dmax w)/(dmax w). Then, µ1 = and µ2 = Si(2πηu ) 2πηu cos2 (2πηu ) + 2πηu Si(4πηu ) − 1 4π 2 ηu2 where Si(·) is the integral sine function and ηu = dmax B is a dimensionless parameter which relates maximum jitter and signal bandwidth. 5. Results J In [8], it has been shown that J∞ provides an excellent approximation of MSE/σa2 even for small values of N and M , with the advantage of greatly simplifying the computation. In the limit N, M → ∞ with constant β, we compute µ1 Example: uniform jitter distribution Cd (4πBx) dx (10) For the ease of representation, we assume that the dominant component of the additive noise is due to quantization, and we express the SNR in absence of jitter, γ, as a function of the number of quantization bits n of the ADC [9]: (γ)dB = 6.02n + 1.76 Then, in the following plots we show the value of J as a function of γ or, equivalently, of the number of quantization bits n. Figure 1 compares the value of J obtained through its asymptotic expression against the performance of a system with finite parameters values (i.e., the value of J computed using (9)). The results are derived for ηu = 217 10−1 , 10−2 , 10−3 , and β = 10. Solid lines refer to the asymptotic expression (12), while markers represent the values of J computed through (9), with N ′ = 100. We observe an excellent matching between our approximation (β,γ) of J∞ and the results computed through (9), even for small values of N and M . We point out that this tight match can be observed for any β > 1 and ηu ≪ 1. We also notice that J shows a floor, whose expression is given by (13). This floor is due to the mismatch between the filter F employed in the reconstruction and the matrix V characterizing the sampling system. 10 20 30 40 γ [dB] 50 60 0 80 90 Approx. -1 ηu = 10 ηu = 10-2 ηu = 10-3 Floor -10 -20 J [dB] 70 Conclusions We studied the performance of AD/DA converters, in presence of clock jitter and quantization errors. We considered that a linear filter approximating the LMMSE filter is used for signal reconstruction, and evaluated the system performance in terms of MSE. Through asymptotic analysis, we derived analytical expressions of the MSE which provide an accurate and simple method to evaluate the behavior of AD/DA converters as clock jitter, number of quantization bits, signal bandwidth and sampling rate vary. We showed that our asymptotic approach provides an excellent approximation of the MSE even for small values of the system parameters. Furthermore, we derived the MSE floor, which represents the best reconstruction quality level we can hope for and gives useful insights for the design of AD/DA converters. References: -30 -40 -50 -60 -70 2 4 6 8 10 12 14 16 n [bit] Figure 1: Comparison between the reconstruction error J (β,γ) derived through (9), the approximation of J∞ and the (β) floor J∞ in (13). Furthermore, in the case of unknown jitter, and, thus, of a floor in the behavior of J, there exists a number of quantization bits n = n∗ beyond which a further increase in the ADC precision does not provide a noticeable decrease in the reconstruction error J. The relation between ηu , β, and n∗ is shown in Figure 2. Note that n∗ is lightly affected by an increase of β, provided that β > 1, and a good compromise for choosing the oversampling rate is β = 5. -2 10 -3 10 10-4 ηu 6. 10-5 -6 10 β=1 β=2 β=5 β=10 β=100 10-7 10-8 6 8 10 12 14 16 18 20 22 24 [1] Project DYNAD, SMT4-CT98, Draft Standard Version 3.4, Jul. 12, 2001. [2] IEEE Standard for Terminology and Test Methods for Analog-to-Digital Converters, IEEE Std. 1241, 2000. [3] P. Arpaia, P. Daponte, and S. Rapuano, “Characterization of digitizer timebase jitter by means of the Allan variance,” Computer Standards & Interfaces, Vol. 25, pp. 15–22, 2003. [4] B. Liu, and T. P. Stanley, “Error bounds for jittered sampling,” IEEE Transactions on Automatic Control, Vol. 10, No. 4, pp. 449–454, Oct. 1965. [5] J. Tourabaly, and A. Osseiran, “A jittered-sampling correction technique for ADCs,” IEEE International Workshop on Electronic Design, Test and Applications, pp. 249–252, Los Alamitos, CA, USA, 2008. [6] E. Rubiola, A. Del Casale, and A. De Marchi, “Noise induced time interval measurement biases,” 46th IEEE Frequency Control Symposium, pp. 265–269, May 1992. [7] J. Verspecht, “Accurate spectral estimation based on measurements with a distorted-timebase digitizer,” IEEE Trans. on Instrumentation and Measurement Vol. 43, pp. 210–215, Apr. 1994. [8] A. Nordio, C.-F. Chiasserini, and E. Viterbo “Performance of linear field reconstruction techniques with noise and uncertain sensor locations,” IEEE Trans. on Signal Processing, Vol. 56, No. 8, pp. 3535–3547, Aug. 2008. [9] G. Gielen, “Analog building blocks for signal processing,” ESAT-MICAS, Leuven, Belgium, 2006. [10] S. C. Ergen, and P. Varaiya, “Effects of A-D conversion nonidealities on distributed sampling in dense sensor networks,” IPSN ’06, Nashville, Tennessee, Apr. 2006. * n [bit] Figure 2: Minimum number of bits n∗ required to reach (β,γ) the floor of J∞ as a function of β and ηu . SAMPTA'09 218 Uniform Sampling and Reconstruction of Trivariate Functions Alireza Entezari E301 CSE Building, University of Florida, Gainesville, FL, USA. entezari@cise.ufl.edu Abstract: The Body Centered Cubic (BCC) and Face Centered Cubic (FCC) lattices have been known to outperform the commonly-used Cartesian sampling lattice due to their improved spectral sphere packing properties. However, the Cartesian lattice has been widely used for sampling of trivariate functions with applications in areas such as biomedical imaging, scientific data visualization and computer graphics. The widespread use of Cartesian lattice is partly due to the availability of tensor-product approach that readily extend the univariate reconstruction methods to trivariate setting. In this paper we report on recent advances on non-separable reconstruction algorithms, based on box splines, for reconstruction of data sampled on the BCC and FCC lattices. It turns out that these box spline reconstructions are faster than the corresponding tensorproduct B-spline reconstructions on the Cartesian lattice. This suggests that not only the BCC and FCC lattices are more accurate sampling patterns, their respective reconstruction methods are also more computationally efficient than the tensor-product reconstructions – a fact which is contrary to the common assumption among practitioners. 1. Introduction Sampling and reconstruction play a vital role in visualization and computer graphics. Various volume rendering algorithms rely on accurate reconstruction as a key step since the quality and fidelity of the rendered image heavily depends on reconstruction. In image processing reconstruction is used in resampling, resizing, conversion, and manipulation of sampled data. In the realm of sampling, the term regular is often used to refer to the case that the sampling grid is uniform. Although there has been significant research, recently, in non-uniform sampling (e.g., sparse sampling, compressed sensing), the regular sampling is the most commonly-used sampling scheme in practice [21]. When it comes to sampling multivariate functions, the tensor-product of uniform sampling, which forms a Cartesian lattice, is almost always the choice. The simple structure of the Cartesian lattice and its separable nature allows one to readily apply a tensor-product paradigm to many problems in a multi-dimensional setting. The power of the dimensionality reduction will remain the major reason that the Cartesian lattice is the preferred tool in numerical SAMPTA'09 algorithms. The other attraction of the Cartesian lattice is that it simply exists in any dimension and often tools and theory extend to problems in a higher dimensional setting in a trivial manner. However, the Cartesian lattice has been known to be an inefficient lattice from the sampling-theoretic point of view. Miyakawa [12] and then Petersen and Middleton [16] were among the first people to discover the superiority of sphere-packing and sphere-covering lattices for sampling multivariate functions. In particular they have demonstrated that Cartesian lattice is very inefficient for sampling multivariate functions. 2. Optimal Sampling Lattices When sampling a multivariate function with a lattice, generated by (integer linear combinations of the columns of) a sampling matrix, M , the spectrum of the signal is contained in the Brillouin zone. Brillouin zone is the Voronoi cell of the reciprocal lattice. The reciprocal lattice to the lattice M is generated by the columns of the matrix 2πM −⊤ . The multivariate version of the Nyquist frequency is the boundary of the Brillouin zone. Without a priori knowledge when sampling multivariate functions, one often assumes that the underlying function has features possibly in all directions. Therefore, without knowledge about particular orientations of high-frequency features, we need to capture an isotropic spectrum during the sampling process. Therefore, the objective of optimal sampling is to maximize the isotropic content of the Brillouin zone. In other words, the sampling lattice whose Brillouin zone has the largest inscribing (hyper) sphere is the best sampling lattice. Therefore, the optimal sampling lattice in any dimension is the lattice whose reciprocal lattice allows for the densest packing of spheres. In the bivariate setting the hexagonal lattice is the best sampling lattice since its reciprocal lattice, which happens to be the dual hexagonal lattice, allows for the best packing of 2-D with disks. When compared to the commonlyused Cartesian lattice with the same sampling density, the hexagonal lattice allows for about 14% more information to be captured in the spectrum of the underlying signal. This is illustrated in Figure 1 as the area of inscribing disc to the Brillouin zone of the hexagonal lattice (i.e., hexagon) is larger than the area of inscribing disc to the Brillouin zone of the Cartesian lattice (i.e., square), even 219 rH rH rC Figure 1: A square and a hexagon with unit area corresponding to the Brillouin zone of Cartesian and hexagonal sampling. The area of inscribing disk to a square is about 14% less than the area of the inscribing disk to the hexagon. though the two Brillouin zones have the same area. In the trivariate setting, the optimal sampling lattice is the BCC lattice whose reciprocal lattice (i.e., the FCC lattice) is the densest sphere packing lattice. The sampling efficiency of the BCC lattice, when compared to the commonly-used Cartesian lattice is about 30% higher. Appendix A in [6] presents a thorough comparison of the Brillouin zone of the Cartesian, BCC and FCC lattices. The FCC lattice, is also superior to the Cartesian lattice as its efficiency compared to the Cartesian lattice is about 27% higher. Although among the FCC and BCC lattices the BCC wins, by a small margin, for optimal sampling, the FCC lattice appears to have good resistance to aliasing. This can be justified since its reciprocal lattice (i.e., the BCC lattice) allows for the best sphere covering of the space. The best covering of the space translates to replication of isotropic spectrum with minimal overlap between them– minimizing the aliasing for that sampling resolution. These facts about comparison of the Cartesian, BCC and FCC lattices together with their higher-dimensional counter parts are discussed for sampling stationary isotropic random processes [10]. The arguments of the optimal sampling (BCC) and resilience to aliasing (FCC) is generalized to the notion that the reciprocal lattice for optimal sphere-packing lattice is the best choice for sampling functions at relatively high resolutions, while the spherepacking lattice is the best option for sampling functions at relatively low resolutions [10]. 3. Reconstruction There is abundant research on reconstruction (i.e., interpolation or approximation) of data based on univariate filtering methods [15]. Various 1-D filters have a low-pass behavior and approximate the ideal kernel (i.e., sinc) for reconstruction into the space of band-limited functions. Bsplines, offer a framework for representation of piecewise polynomial functions and thus are widely used in reconstruction of univariate functions [3]. There are two common methods for extending the univariate reconstruction ‘kernels’ to multivariate setting. The separable approach builds the multivariate kernel by a simple tensor-product of univariate kernels. The separable approach is obviously suitable for reconstruction of data on the Cartesian lattice since the lattice itself is also separable. The radial basis approaches construct the multivariate reconstruction kernel by spherical extension of SAMPTA'09 univariate kernel. Due to the spherical extension, the radial basis approach ignores the underlying geometry of the sampling lattice and is often used for scattered data interpolation/approximation. Splines have been widely accepted for image processing [20]. In the context of image processing, splines are often constructed as a tensor-product of two univariate splines. Mitchell and Netravali [11], demonstrated the advantages of using splines for image processing. Recently, Van De Ville [22], developed the so called Hexsplines that are used for reconstruction of hexagonal images. Hex-splines can not be constructed as a tensorproduct of univariate splines. Due to the non-separable structure of hexagonal lattice, the tensor-product splines can not be applied for processing of hexagonal data. 3.1 Reconstruction of trivariate functions In the visualization community reconstruction filters have received a lot of attention since accurate reconstruction of trivariate functions and their gradients is crucial in fidelity of rendering algorithms [14, 1, 5, 13]. Similar to image processing, in volume visualization algorithms, often the tensor-product approach is used for reconstruction of Cartesian sampled data. Theußl [18] introduced the BCC sampling in volume rendering. However, since the BCC lattice is a nonseparable lattice, various ad-hoc tensor-product [17] and radial basis [18] algorithms fail to provide satisfactory reconstruction algorithms and they exhibit blurry artifacts. Csébfalvi [2] proposed a global pre-processing algorithm (based on generalized interpolation [19]) that reconstructs the BCC lattice based on its two Cartesian sub-lattices. This approach is computationally inefficient and does not guarantee approximation order. The author’s recent work in this area establishes the relationship between box splines and the above-mentioned sampling lattices. The box splines have been developed as a generalization of B-splines to the multivariate setting. While box splines have been considered as non-separable basis functions for approximation based on their shifts on the Cartesian lattice [4], here their shifts on BCC and FCC lattices are considered. The interesting fact about these box splines is that while their shifts on the Cartesian lattice do not form a linearly independent set of functions, their shifts on the FCC and BCC lattices are linearly independent – a rare and useful property for the spline space! 3.2 Four direction box splines on BCC The relation of box splines with the BCC lattice was established based on the fact that the immediate neighborhood of a lattice point on the BCC pattern forms a rhombic dodecahedron (see Figure 2). This polyhedron has the special property that is a projection of a four-dimensional hypercube (tesseract). This makes it a perfect match to be the support of a box spline since the geometric definition of box splines precisely amounts to projecting hypercubes (i.e., box) down to lower dimensional spaces. Generally, the class of polytopes that are the shadow of higher dimensional hypercubes are referred to as zonotopes. This linear box spline is defined by the four direction and is 220 z z y y x x Figure 5: The neighborhood of a FCC lattice point forms a truncated octahedron. This polyhedron is another zonohedron which is the support of a six-direction box spline. Figure 2: The neighborhood of a BCC lattice point forms a rhombic dodecahedron. This polyhedron is a zonohedron which is the support of a linear box spline. Figure 3: Benchmark example dataset. The CT dataset of a carp fish at a high resolution of 256 × 256 × 256. a C 0 kernel. The shifts of this box spline on the BCC lattice generate a spline space whose approximation order is two. By convolving this box spline by itself, one obtains a smoother, C 2 , quintic box spline that is specified by a repetition of the four principal directions. The shifts of this box spline generate a spline space whose approximation order is four [7, 8]. This smoothness and approximation order match that of the tricubic B-spline on the Cartesian lattice and hence we compare the two on a Carp fish dataset in first row in Figure 4. The piecewise polynomial representation of these box splines along with efficient evaluation methods can be found in [8]. 3.3 The six direction box spline on FCC Unlike the BCC lattice, the immediate neighborhood in the FCC lattice is not a zonohedron. However, by enlarging the neighborhood one finds the truncated octahedron which is a zonohedron Figure 5. This polyhedron is a projection of a six-dimensional hypercube and the corresponding box spline is a cubic six-direction box spline [6]. The spline space that is generated by shifts of this cubic box spline on the FCC lattice is a C 1 space whose approximation order is three. These characteristics match the triquadratic B-spline on the Cartesian lattice which is the base for our comparisons in second row in Figure 4. The piecewise polynomial representation of the cubic box spline along with efficient spline evaluation method on the FCC lattice is demonstrated in [9]. SAMPTA'09 3.4 Computational advantages Once efficient evaluation algorithms are derived for the four-direction box splines [8] and the six direction box spline [9], one can compare these box spline reconstructions to the commonly-used tensor-product B-spline reconstructions on the Cartesian lattice. For the C 2 , fourth-order method the tricubic B-spline uses a neighborhood of 4 × 4 × 4 = 64 points for reconstruction, while the quintic box spline only uses a total of 32 points for reconstruction. Therefore as documented in [8] the BCC non-separable box spline approach outperforms the comparable tensor-product B-spline approach by a factor of two. Similarly the triquadratic B-spline uses a neighborhood of 3 × 3 × 3 = 27 Cartesian data points, while the cubic box spline only requires a total of 16 FCC data points for the reconstruction. Therefore, the non-separable box spline reconstruction outperforms the comparable tensor-product B-spline approach as documented in [9]. 4. Conclusions The recent research on optimal sampling lattices suggests that not only the FCC and BCC lattices offer higherfidelity sampling schemes, but also their reconstruction algorithms outperform the corresponding tensor-product reconstructions on the traditionally-popular Cartesian lattice. These encouraging results are crucial for acceptance of these efficient lattices in practical applications. 5. Acknowledgments The author would like to thank Dimitri Van De Ville, Torsten Möller and Carl de Boor for valuable insight and advice at various stages of the work. References: [1] I. Carlbom. Optimal Filter Design for Volume Reconstruction and Visualization. In Proc. IEEE Conf on Visualization, pages 54–61, October 1993. [2] B. Csébfalvi. Prefiltered gaussian reconstruction for high-quality rendering of volumetric data sampled 221 Cartesian, C 2 , fourth order BCC, C 2 , fourth order Cartesian, C 1 , third order FCC, C 1 , third order Figure 4: The Carp dataset at 6% resolution on Cartesian, BCC and FCC subsampled from the ground truth volume data of Figure 3. Top row: the Cartesian dataset is reconstructed by the tricubic B-spline and the BCC dataset is reconstructed by the quintic box spline. Bottom row: the Cartesian dataset is reconstructed with the triquadratic B-spline, while the FCC dataset is reconstructed with the cubic box spline. Superiority of the FCC and the BCC sampling is demonstrated since their images offer more accurate reconstruction than the Cartesian specially on the ribs and tail area. [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] on a body-centered cubic grid. In IEEE Visualization, pages 311–318, 2005. C. de Boor. A practical guide to splines, volume 27 of Applied Mathematical Sciences. Springer-Verlag, New York, revised edition, 2001. C. de Boor, K. Höllig, and S. Riemenschneider. Box Splines. Springer Verlag, 1993. S. C. Dutta Roy and B. Kumar. Handbook of Statistics, volume 10, chapter Digital Differentiators, pages 159–205. Elsevier Science Publishers B. V., N. Holland, 1993. A. Entezari. Optimal Sampling Lattices and Trivariate Box Splines. PhD thesis, Simon Fraser University, Vancouver, Canada, July 2007. A. Entezari, R. Dyer, and T. Möller. Linear and Cubic Box Splines for the Body Centered Cubic Lattice. In Proceedings of the IEEE Conference on Visualization, pages 11–18, October 2004. A. Entezari, D. Van De Ville, and T. Möller. Practical box splines for volume rendering on the body centered cubic lattice. IEEE Trans. on Visualization and Comp Graphics, 14(2):313 – 328, 2008. M. Kim, A. Entezari, and J. Peters. Box Spline Reconstruction on the Face Centered Cubic Lattice. IEEE Trans. on Visualization and Computer Graphics, 14(6):1523–1530, 2008. HR Kunsch, E. Agrell, and FA Hamprecht. Optimal lattices for sampling. Information Theory, IEEE Transactions on, 51(2):634–647, 2005. D. P. Mitchell and A. N. Netravali. Reconstruction Filters in Computer Graphics. In Computer Graphics (Proceedings of SIGGRAPH 88), volume 22, pages 221–228, August 1988. H. Miyakawa. Sampling theorem of stationary stochastic variables in multidimensional space. Journal of the Institute of Electronic and Communication Engineers of Japan, 42:421–427, 1959. SAMPTA'09 [13] T. Möller, R. Machiraju, K. Mueller, and R. Yagel. A Comparison of Normal Estimation Schemes. In Proceedings of the IEEE Conference on Visualization, pages 19–26, October 1997. [14] T. Möller, K. Mueller, Y. Kurzion, R. Machiraju, and R. Yagel. Design of Accurate and Smooth Filters for Function and Derivative Reconstruction. Proceedings of the Symposium on Volume Visualization, pages 143–151, Oct 1998. [15] A.V. Oppenheim and R.W. Schafer. Discrete-Time Signal Processing. Prentice Hall Inc., Englewoods Cliffs, NJ, 1989. [16] D. P. Petersen and D. Middleton. Sampling and Reconstruction of Wave-Number-Limited Functions in N -Dimensional Euclidean Spaces. Information and Control, 5(4):279–323, December 1962. [17] T. Theußl, O. Mattausch, T. Möller, and E. Gröller. Reconstruction schemes for high quality raycasting of the body-centered cubic grid. TR-186-2-02-11, Institute of Computer Graphics and Algorithms, Vienna University of Technology, December 2002. [18] T. Theußl, T. Möller, and E. Gröller. Optimal Regular Volume Sampling. In Proc of the IEEE Conf on Visualization, pages 91–98, Oct 2001. [19] P. Thévenaz, T. Blu, and M. Unser. Interpolation revisited. IEEE Transactions on Medical Imaging, 19(7):739–758, July 2000. [20] M. Unser. Splines: A perfect fit for signal and image processing. IEEE Signal Processing Magazine, 16(6):22–38, November 1999. IEEE Signal Processing Society’s 2000 magazine award. [21] M. Unser. Sampling—50 Years after Shannon. Proceedings of the IEEE, 88(4):569–587, April 2000. [22] D. Van De Ville, T. Blu, M. Unser, W. Philips, I. Lemahieu, and R. Van de Walle. Hex-Splines: A Novel Spline Family for Hexagonal Lattices. IEEE Trans. on Img Proc., 13(6):758–772, June 2004. 222 1 An Efficient Algorithm for the Discrete Gabor Transform using full length Windows Peter L. Søndergaard Abstract—This paper extends the efficient factorization of the Gabor frame operator developed by Strohmer in [17] to the Gabor analysis/synthesis operator. The factorization provides a fast method for computing the discrete Gabor transform (DGT) and several algorithms associated with it. The factorization algorithm should be used when the involved window and signal have the same length. An optimized implementation of the algorithm is freely available for download. I. I NTRODUCTION The finite, discrete Gabor transform (DGT) of a signal f of length L is given by c (m, n, w) = L−1 X l=0 f (l, w)g (l − an)e−2πiml/M . II. D EFINITIONS We shall denote the set of integers between zero and some number L by hLi = 0, . . . , L − 1. (2) The Discrete Fourier Transform (DFT) of a signal f ∈ CL is defined by (1) Here g is a window (filter prototype) that localizes the signal in time and in frequency. The DGT is equivalent to a Fourier modulated filter bank with M channels and decimation in time a, [2]. Efficient computation of a DGT can be done by several methods: If the window g has short support (consists of relatively few filter taps), a filter bank based approach can be used. We shall instead focus on the case when g and f are equally long. The main advantage of the algorithm presented is its ease of use: The running time is guaranteed to be small even for long windows. This allows for the practical use of non-compactly supported windows like the Gaussian and its tight and dual windows without truncating them. In the case when the window and signal have the same length, a factorization of the frame operator matrix was found by Zibulski and Zeevi in [19]. The method was initially developed in the L2 (R) setting, and was adapted for the finite, discrete setting by Bastiaans and Geilen in [1]. They extended it to also cover the analysis/synthesis operator. A simple, but not so efficient, method was developed for the Gabor analysis/synthesis operator by Prinz in [15]. Strohmer [17] improved the method and obtained the lowest known computational complexity for computing the Gabor frame operator. This paper extends Strohmer’s method to also cover the Gabor analysis and synthesis operators. The advantage of the method developed in this paper as compared to the one developed in [1], is that it works with FFTs of shorter length, and does not require multiplication by complex exponentials caused by the quasi-periodicity of the Zak transform. The two methods have the same asymptotic complexity, O (N M log M ), where M is the number of channels and N is the number of time steps. A more accurate flop count is presented later in the paper. SAMPTA'09 We shall study the DGT applied to multiple signals at once. This is for instance a common subroutine in computing a multidimensional DGT. The DGT defined by (1) works on a multi-signal f ∈ CL×W , where W ∈ N is the number of signals. (FL f ) (k) = L−1 1 X √ f (l)e−2πikl/L . L l=0 (3) We shall use the · notation in conjunction with the DFT to denote the variable over which the transform is to be applied. To denote all elements indexed by a variable we shall use the : notation. As an example, if C ∈ CM ×N then C:,1 is a M × 1 column vector, C1,: is a 1 × N row vector and C:,: is the full matrix. This notation is commonly used in Matlab and FORTRAN programming and also in some prominent textbooks, [8]. The convolution f ∗ g of two functions f, g ∈ CL and the involution f ∗ is given by (f ∗ g) (l) = L−1 X k=0 f (k) g (l − k) , f ∗ (l) = f (−l), l ∈ hLi l ∈ hLi . (4) (5) It is well known how convolution can be computed efficiently using the discrete Fourier transform. We shall use a variant of this result  √ −1  LFL (FL f ) (·) (FL g) (·) (l) . (6) (f ∗ g ∗ ) (l) = The Poisson summation formula in the finite, discrete setting is given by ! b−1 X √ FM g(· + kM ) (m) = b (FL g) (mb), (7) k=0 where g ∈ CL , L = M b with b, M ∈ N. A family of vectors ej , j ∈ hJi of length L is called a frame if constants 0 < A ≤ B exist such that 2 A kf k ≤ J−1 X j=0 2 2 |hf, ej i| ≤ B kf k , ∀f ∈ CL . 223 (8) 2 Algorithm 1 Window factorization WFAC (g, a, M ) 1) for r = hci k = hpi, l = hqi 2) for s = hdi 3) tmp (s) ← g (r + c · (k · q − l · p + s · p · q mod d · p · q)) 4) end for 5) P hi (r, k, l, :) ←DFT(tmp) 6) end for 7) return Phi The constants A and B are called lower and upper frame bounds. If A = B, the frame is called tight. If J > L, the frame is redundant (oversampled). Finite- and infinite dimensional frames are described in [4]. A finite, discrete Gabor system (g, a, M ) is a family of vectors gm,n ∈ CL of the following form gm,n (l) = e2πilm/M g (l − na) , l ∈ hLi (9) for m ∈ hM i and n ∈ hN i where L = aN and M/L ∈ N. A Gabor system that is also a frame is called a Gabor frame. The analysis operator Cg : CL 7→ CM ×N associated to a Gabor system (g, a, M ) is the DGT given by given by (1). The Gabor synthesis operator Dγ : CM ×N 7→ CL associated to a Gabor system (γ, a, M ) is given by f (l) = N −1 M −1 X X n=0 m=0 c (m, n) e2πiml/M γ (l − an) . (10) In (1), (9) and (10) it must hold that L = N a = M b for some M, N ∈ N. Additionally, we define c, d, p, q ∈ N by c = gcd (a, M ) , d = gcd (b, N ) , (11) a b M N p= = , q= = , (12) c d c d where GCD denotes the greatest common divisor of two natural numbers. With these numbers, the redundancy of the transform can be written as L/ (ab) = q/p, where q/p is an irreducible fraction. It holds that L = cdpq. The Gabor frame operator Sg : CL 7→ CL of a Gabor frame (g, a, M ) is given by the composition of the analysis and synthesis operators Sg = Dg Cg . The Gabor frame operator is important because it can be used to find the canonical dual window g d = Sg−1 g and −1/2 g of a Gabor frame. the canonical tight window g t = Sg The canonical dual window is important because Dgd is a left inverse of Cg . This gives an easy way to construct an inverse transform of the DGT. Similarly, then Dgt is a left inverse of Cgt . For more information on Gabor systems and properties of the operators C, D and S see [9], [6], [7]. III. T HE ALGORITHM We wish to make an efficient calculation of all the coefficients of the DGT. Using (1) literally to compute all coefficients c (m, n, w) would require 8M N LW flops. To derive a faster DGT, one approach is to consider the analysis operator Cg as a matrix, and derive a faster algorithm SAMPTA'09 Algorithm 2 Discrete Gabor transform DGT (f, g, a, M ) 1) P hi =WFAC(g, a, M ) 2) for r = hci 3) for k = hpi, l = hqi, w = hW i 4) for s = hdi 5) tmp (s) ← f (r + (k · M + s · p · M − l · ha · a mod L) , w) 6) end for 7) P sitmp (k, l + w · q, ·) ←DFT(tmp) 8) end for 9) for s = hdi 10) G ← P hi (:, :, r, s) 11) F ← P sitmp (:, :, s) 12) Ctmp (:, :, s) ← GT · F 13) end for 14) for u = hqi, l = hqi, w = hW i 15) tmp ←IDFT(Ctmp (u, l + w · q, :)) 16) for s = hdi 17) coef (r + l · c, u + s · q − l · ha mod N, w) ← tmp (s) 18) end for 19) end for 20) end for 21) for n = hN i,w = hW i 22) coef (:, n, w) ←DFT(coef (:, n, w)) 23) end for 24) return coef through unitary matrix factorizations of this matrix. This is the approach taken by [17], [16]. Unfortunately, this approach tends to introduce many permutation matrices and Kronecker product matrices. Another approach is the one taken in [1] where the Zak transform is used. This approach has the downside that values outside the fundamental domain of the Zak transform require an additional step to compute. In this paper we have chosen to derive the algorithm by directly manipulating the sums in the definition of the DGT. To find a more efficient algorithm than (1), the first step is to recognize that the summation and the modulation term in (1) can be expressed as a DFT:   √ c (m, n, w) = LFL f (·, w)g (· − an) (mb) . (13) We can improve on this because we do not need all the coefficients computed by the Fourier transform appearing in (13), only every b’th coefficient. Therefore, we can rewrite by the Poisson summation formula (7): c (m, n, w) ! b−1 X √ = M FM f (· + m̃M, w)g (· + m̃M − an) (m) m̃=0 = (FM K (·, n, w)) (m) , (14) where K (j, n, w) = √ M b−1 X m̃=0 f (j + m̃M, w) g (j + m̃M − na) , (15) 224 3 for j ∈ hM i and n ∈ hN i. From (14) it can be seen that computing the DGT of a signal f can be done by computing K followed by DFTs along the first dimension of K. To further lower the complexity of the algorithm, we wish to express the summation in (15) as a convolution. We split j as j = r + lc with r ∈ hci, l ∈ hqi and introduce ha , hM ∈ Z such that the following is satisfied: c = hM M − ha a. (16) The two integers ha , hM can be found by the extended Euclid algorithm for computing the GCD of a and M . Using (16) and the splitting of j we can express (15) as K (r + lc, n, w) b−1 √ X = M f (r + lc + m̃M, w) × (18) K (r + lc, n − lha , w) b−1 √ X = M f (r + lc + (m̃ − lhM ) M, w) × m̃=0 ×g (r + m̃M − na) (19) b−1 √ X M f (r + m̃M + l (c − hM M ) , w) × = m̃=0 (20) We split m̃ = k + s̃p with k ∈ hpi and s̃ ∈ hdi and n = u + sq with u ∈ hqi and s ∈ hdi and use that M = cq, a = cp and c − hM M = −ha a: K (r + lc, u + sq − lha , w) p−1 X d−1 X M f (r + kM + s̃pM − lha a, w) × = √ k=0 s̃=0 ×g (r + kM − ua + (s̃ − s) pM ) (21) After having expressed the variables j, m̃, n using the variables r, s, s̃, k, l, u we have now indexed f using s̃ and g using (s̃ − s). This means that we can view the summation over s̃ as a convolution, which can be efficiently computed using a discrete Fourier transform. Define Ψfr,s (k, l + wq) = Fd f (r + kM + ·pM − lha a, w) , (22) √ Φgr,s (k, u) = M Fd g (r + kM + ·pM − ua) , (23) Using (6) we can now write (21) as K (r + lc, u + s̃q − lha , w) p−1   √ X d Fd−1 Ψfr,· (k, l + wq) Φgr,· (k, u) (s̃) (24) = k=0 = √ dFd−1 p−1 X k=0 SAMPTA'09 ! [1] Alg. 2 L 8L ag + 4N M log2 (M ) ” “ ” q + 4L 1 + pq log2 N + 4M N log2 (M ) p “ ” L (8q) + 4L 1 + pq log2 d + 4M N log2 (M ) “ L 8q + 1 + Flop counts for 4 different way of computing the DGT: By the linear algebra definition (1), by the method based on Poisson summation (14), by the method of Bastiaans and Geilen from [1] and by Algorithm 2. The term Lg denotes the length of the window used so Lg /a is the overlapping factor of the window. Note for comparison that log2 N = log2 d + log2 q IV. RUNNING TIME We substitute m̃ + lhM by m̃ and n + lha by n and get ×g (r + m̃M − na) Eq. (14) Flop count 8M N L (17) m̃=0 ×g (r + (m̃ + lhM ) M − (n + lha ) a) Alg.: Eq. (1) If we consider Ψfr,s and Φgr,s as matrices for each r and s, the sum over k in the last line can be written as matrix products. Algorithm 2 follows from this. m̃=0 ×g (r + l (hM M − ha a) + m̃M − na) b−1 √ X M f (r + lc + m̃M, w) × = Table I F LOP COUNTS Ψfr,· (k, l + wq) Φgr,· (k, u) (s̃) (25) When computing the flop count of the algorithm, we will assume that a complex FFT of length M can be computed using 4M log2 M flops. A nice review of flop counts for FFT algorithms is presented in [14]. Table I shows the flop count for Algorithm 2 and compares it with the definition of the DGT (1), with the algorithm for short windows using Poisson summation (14) and with the algorithm published in [1]. The algorithm by Prinz presented in [15] has the same computational complexity as the Poisson summation algorithm. For simplicity we assume that both the window and signal are complex valued. In the common case when both f and g are real-valued, all the algorithms will see a 2 to 4 times speedup. The flop count for definition (1) is that of a complex matrix multiplication. All the other algorithms share the 4M N log2 M term coming from the application of an FFT to each ’block’ of coefficients and only differ in how the application of the window is performed. The Poisson summation algorithm is very fast for a small overlapping factor Lg /a, but turns into an O L2 algorithm for a full length window. In this algorithms have an advantage. The term   case the other L 8q + 1 + pq in the [1] algorithm comes from calculation   of the needed Zak-transforms, and the 4L 1 + pq log2 N term comes from the transform to and from the Zak-domain. Compared to (22) and (23) this transformation uses longer FFTs. Algorithm 2 does away with the multiplication with complex exponentials in the [1] algorithm, and so the first term reduces to L (8q). Both the Poisson summation based algorithm and Algorithm 2 can do a DGT with L ≈ 2000000 in less than 1 second on a standard PC at the time of writing. We have not created an efficient implementation of the algorithm from [1] in C so therefore we cannot reliably time it. V. E XTENSIONS The algorithm just developed can also be used to calculate the synthesis operator Dγ . This is done by applying Algorithm 225 4 Algorithm 3 Canonical Gabor dual window GABDUAL (g, a, M ) 1) P hi =WFAC(g, a, M ) 2) for r = hci, s = hdi 3) G ← P hi (:, :, r, s) −1 ·G 4) P hid (:, :, r, s) ← G · GT 5) end for  6) g d =IWFAC P hid , a, M 7) return g d 2 in the reverse order and inverting each line. The only lines that are not trivially invertible are lines 10-12, which becomes 10) Γ ← P hid (:, :, r, s) 11) C ← Ctmp (:, :, s) 12) P sitmp (:, :, s) ← Γ · C where the matrices P hid (:, :, r, s) should be left inverses of the matrices P hi (:, :, r, s) for each r and s. The matrices P hid (:, :, r, s) can be computed by Algorithm 1 applied to a dual Gabor window γ of the Gabor frame (g, a, M ). It also holds that all dual Gabor windows γ of a Gabor frame (g, a, M ) must satisfy that P hid (:, :, r, s) are left inverses of the matrices P hi (:, :, r, s). This criterion was reported in [11], [12]. A special left-inverse in the Moore-Penrose pseudo-inverse. Taking the pseudo-inverses of P hi (:, :, r, s) yields the factorization associated with the canonical dual window of (g, a, M ), [3]. This is Algorithm 3. Taking the polar decomposition of each matrix in Φgr,s yields a factorization of the canonical tight window (g, a, M ). For more information on these methods, as well as iterative methods for computing the canonical dual/tight windows, see [13]. VI. S PECIAL CASES We shall consider two special cases of the algorithm: The first case is integer oversampling. When the redundancy is an integer then p = 1. Because of this we see that c = a and d = b. This gives (16) the appearance a = hM qa − ha a, (26) indicating that hM = 0 and ha = −1 solves the equation for all a and q. The algorithm simplifies accordingly, and reduces to the well known Zak-transform algorithm for this case, [10]. The second case is the short time Fourier transform. In this case a = b = 1, M = N = L, c = d = 1, p = 1, q = L and as in the previous special case hM = 0 and ha = −1. In this case the algorithm reduces to the very simple and well known algorithm for computing the STFT. VII. I MPLEMENTATION The reason for defining the algorithm on multi-signals, is that the multiple signals can be handled at once in the matrix product in line 12 of Algorithm 2. This is a matrix product of two matrices size q × p and p × qW , so the second matrix grows when multiple signals are involved. Doing it this way reuses the Φgr,s matrices as much as possible, and this is an SAMPTA'09 advantage on standard, general purpose computers with a deep memory hierarchy, see [5], [18]. The benefit of expressing Algorithm 2 in terms of loops (as opposed to using the Zak transform or matrix factorizations) is that they are easy to reorder. The presented Algorithm 2 is just one among many possible algorithms depending on in which order the r, s, k and l loops are executed. For a given platform, it is difficult a priory to estimate which ordering of the loops will turn out to be the fastest. The ordering of the loops presented in Algorithm 2 is the variant that uses the least amount of extra memory. Implementations of the algorithms described in this paper can be found in the Linear Time Frequency Toolbox (LTFAT) available from http://ltfat.sourceforge.net. The implementations are done in both the Matlab/Octave scripting language and in C. A range of different variants of Algorithm 2 has been implemented and tested, and the one found to be the fastest on a small range of computers is included in the toolbox. R EFERENCES [1] M. J. Bastiaans and M. C. Geilen. On the discrete Gabor transform and the discrete Zak transform. 49(3):151–166, 1996. [2] H. Bölcskei, F. Hlawatsch, and H. G. Feichtinger. Equivalence of DFT filter banks and Gabor expansions. In SPIE 95, Wavelet Applications in Signal and Image Processing III, volume 2569, part I, San Diego, july 1995. [3] O. Christensen. Frames and pseudo-inverses. J. Math. Anal. Appl., 195:401–414, 1995. [4] O. Christensen. An Introduction to Frames and Riesz Bases. Birkhäuser, 2003. [5] J. Dongarra, J. Du Croz, S. Hammarling, and I. Duff. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Software, 16(1):1– 17, 1990. [6] H. G. Feichtinger and T. Strohmer, editors. Gabor Analysis and Algorithms. Birkhäuser, Boston, 1998. [7] H. G. Feichtinger and T. Strohmer, editors. Advances in Gabor Analysis. Birkhäuser, 2003. [8] G. H. Golub and C. F. van Loan. Matrix computations, third edition. John Hopkins University Press, 1996. [9] K. Gröchenig. Foundations of Time-Frequency Analysis. Birkhäuser, 2001. [10] A. J. E. M. Janssen. The Zak transform: a signal transform for sampled time-continuous signals. Philips Journal of Research, 43(1):23–69, 1988. [11] A. J. E. M. Janssen. On rationally oversampled Weyl-Heisenberg frames. pages 239–245, 1995. [12] A. J. E. M. Janssen. The duality condition for Weyl-Heisenberg frames. In Feichtinger and Strohmer [6], chapter 1, pages 33–84. [13] A. J. E. M. Janssen and P. L. Søndergaard. Iterative algorithms to approximate canonical Gabor windows: Computational aspects. J. Fourier Anal. Appl., published online, 2007. [14] S. Johnson and M. Frigo. A Modified Split-Radix FFT With Fewer Arithmetic Operations. IEEE Trans. Signal Process., 55(1):111, 2007. [15] P. Prinz. Calculating the dual Gabor window for general sampling sets. IEEE Trans. Signal Process., 44(8):2078–2082, 1996. [16] S. Qiu. Discrete Gabor transforms: The Gabor-gram matrix approach. J. Fourier Anal. Appl., 4(1):1–17, 1998. [17] T. Strohmer. Numerical algorithms for discrete Gabor expansions. In Feichtinger and Strohmer [6], chapter 8, pages 267–294. [18] R. C. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimization of software and the ATLAS project. Technical Report UTCS-00-448, University of Tennessee, Knoxville, TN, Sept. 2000. [19] Y. Y. Zeevi and M. Zibulski. Oversampling in the Gabor scheme. IEEE Trans. Signal Process., 41(8):2679–2687, 1993. 226 Nonstationary Gabor Frames Florent Jaillet (1) , Peter Balazs (1) and Monika Dörfler (1) (1) Acoustics Research Institute, Austrian Academy of Sciences, Wohllebengasse 12-14, A-1040 Vienna,Austria florent@kfs.oeaw.ac.at, peter.balazs@oeaw.ac.at, monid@kfs.oeaw.ac.at Abstract: Introduction Gabor analysis [7] is widely used for applications in signal processing. For some of these applications, which include processing of signals using Gabor frame multipliers [6, 1], the rigid construction of the Gabor atoms results in important limitations on the signal analysis and processing ability of the associated schemes. The Gabor transform uses time-frequency atoms built by translation over time and frequency of a unique prototype function, leading to a signal decomposition having a fixed time-frequency resolution over the whole time-frequency plane. This can be very restricting when dealing with signals with characteristics changing over the time-frequency plane. For example, this led some people to prefer the use of alternative decompositions with time-frequency resolution evolving with frequency in some applications, to better fit the feature of interest of the signal. Examples of such decompositions are the wavelet transform [5] or the decompositions using filter banks based on perceptive frequency scales for processing of audio signals, as for example gammatone filters [9]. A case for which the limitation induced by the constant time-frequency resolution of the Gabor transform can be seen is shown on the didactic example of Figure 1. On this figure, two spectrograms of the same glockenspiel signal are represented. These spectrograms are obtained by plotting the square absolute value of the Gabor coefficients using a color scale with a level coding in dB. Both spectrograms are obtained from the Gabor coefficients using the same type of window, but using two different window lengths. We see that the signal contains two very contrasting types of components: • at the beginning of the notes, the signal presents sharp attacks which are spread in frequency, but very SAMPTA'09 • during the resonance of the notes, the signal contains quasi-sinusoidal components which are spread in time, but very localized in frequency. 20000 15000 Frequency (Hz) 1. localized in time, 10000 5000 0 0.2 0.4 0.6 0.8 1 1.2 0.8 1 1.2 Time (s) 20000 15000 Frequency (Hz) To overcome the limitation induced by the fixed timefrequency resolution over the whole time-frequency plane of Gabor frames, we propose a simple extension of the Gabor theory leading to the construction of frames with timefrequency resolution evolving over time or frequency. We describe the construction of such frames and give the explicit formulation of the canonical dual frame for some conditions. We illustrate the interest of the method on a simple example. 10000 5000 0 0.2 0.4 0.6 Time (s) Figure 1: Two spectrograms of the same glockenspiel signal obtained using two different window lengths. On the top plot, a narrow window of 6 ms is used, on the bottom plot, a wide window of 93 ms is used. We see that the use of the narrow window is well suited for the analysis and processing of the attacks, leading to a very sparse decomposition for these components, but gives an unsatisfying representation of the resonance, as the different sinusoidal components are not resolved. On the other hand, the wide window gives a good representation of the resonance part, but a blurred representation of the attacks. For this example, it appears that if we want to build an 227 2. 2.1 Frequency optimised scheme for processing of both attacks and the resonances at the same time, it would be suitable to be able to adapt the time-frequency resolution locally for the different types of components. The purpose of this paper is to describe one way to achieve this goal. For this, we show that, while staying in the context of frame theory [2, 4], the standard Gabor theory can be easily extended to provide some freedom of evolution of the time-frequency resolution of the decomposition in either time or frequency. Furthermore, this extension is well suited for applications as it can easily be implemented using fast algorithm based on fast Fourier transform [12]. We first describe the construction of the frames in Section 2., and then illustrate in Section 3. the potential of the approach on the preceding example of Figure 1. Time Figure 2: Example of sampling grid of the time-frequency plane when building a decomposition with time-frequency resolution evolving over time. Construction of the frames Resolution evolving over time As opposed to standard Gabor analysis, we replace time translation for the construction of atoms by the use of different windows for the different sampling positions in time. For each time position we still build atoms by regular frequency modulation. So using a set of functions {gn }n∈Z of L2 (R), for m ∈ Z and n ∈ Z, we define atoms of the form: relations: K(t, s) = XX = X m gn (t)gn (s)ei2πmbn (s−t) n gn (t)gn (s) X ei2πmbn (s−t) m n  X 1 X  k = gn (t)gn (s) δ s−t− bn bn n k i2πmbn t gm,n (t) = gn (t)e . In practice we will choose each window gn centered around a time an , and it will typically be constructed by translating a well localized window centered around 0 by an , as in the standard Gabor scheme, but with the possibility to vary the window gn for each position an . Thus the sampling of the time-frequency plane is done on a grid which is irregular over time, but regular over frequency. Figure 2 shows an example of such a sampling grid. It can be noted that some results exist in Gabor theory for semiregular sampling grids, as for example in [3]. Our study here uses a more general setting, as the sampling grid is in general not separable, and more importantly, the window can evolve over time. In this case, the coefficients of the decomposition are given by: cm,n = hf, gm,n i , and the frame operator is given by: Sf = XX m hf, gm,n igm,n . Sf (s) =     XX 1 k k gn (s) f s − gn s − bn bn bn n k In general, the inversion of S is not obvious. However we can identify a special case, which is analog to the “painless” case in standard Gabor analysis [8], for which the expression of S simplifies. More precisely, we suppose from now on that for every n ∈ Z, the function gn has a limited time support supp gn = [cn , dn ] such that dn − cn < b1n . Due to this support condition, the terms of the summation over k in the preceding equation are 0 for k 6= 0 and the frame operator S becomes a multiplication operator: X 1 Sf (s) = |gn (s)|2 f (s). b n n In this case the invertibility of the frame operator is easy to check and the system of functions P gm,n forms a frame for L2 (R) if and only if ∀t ∈ R, n b1n |gn (t)|2 ≃ 1. When this condition is fulfilled, the canonical dual frame elements are given by: n The frame operator can be described by its kernel K given the following relation, which holds at least in a weak sense: Z Sf (s) = K(t, s)f (t)dt. Here the kernel K simplifies according to the following SAMPTA'09 thus, g̃m,n (t) = P gn (t) ei2πmbn t , 1 2 |g (t)| bk k k and the associated canonical tight frame elements can be calculated by: ġm,n (t) = qP gn (t) 1 2 k bk |gk (t)| ei2πmbn t . 228 2.2 Resolution evolving over frequency Frequency An analog construction is possible with a sampling of the time-frequency plane irregular over frequency, but regular over time. An example of the sampling grid in such a case is given on Figure 3. In this case, we introduce a family of functions {hm }m∈Z of L2 (R), and for m ∈ Z and n ∈ Z, we define atoms of the form: hm,n (t) = hm (t − nam ). 2.3 For the practical implementation, we have developed the equivalent theory in a finite discrete setting, that is to say working with complex vectors as signals. This theory won’t be described here due to lack of space, but the construction is very similar to the one described in 2.1 and 2.2 and leads to a frame matrix which simplifies to a diagonal matrix in the “painless” case, suitable for applications. The implementation is then very similar to the implementation of the standard Gabor case and can exploit fast Fourier transform algorithms for efficiency. The only differences compared to standard Gabor implementation are due to the fact that the storage of coefficients requires more advanced storage structures due to the irregularity of the time-frequency sampling grid, and that the computation of the dual window must be performed for every time position resulting in a slight increase in computational cost. 3. Time Figure 3: Example of sampling grid of the time-frequency plane when building a decomposition with time-frequency resolution evolving over frequency. In practice we will choose each function hm as a well localized pass-band function having a Fourier transform centered around some frequency bn . In this case the frame operator is given by: XX Tf = hf, hm,n ihm,n , m n and the problem is completely analog to the preceding up to a Fourier transform, as we have: XX c = [ Tf hfb, h[ m,n ihm,n , m n −i2πnam ν c and h[ . So the preceding compum,n = hm (ν)e tation can be done, working on the Fourier transforms of the involved functions instead of directly on the functions. Now the “painless” case appears when we suppose that cn has a limited frequency for every m ∈ Z, the function h cn = [en , fn ] such that fn − en < 1 . Then support supp h an the following expression holds: c (ν) = Tf X 1 2b |hc m (ν)| f (ν), a m m and the system of functions hm,n forms a frame of L2 (R) P 2 if and only if ∀ν ∈ R, n a1m |hc m (ν)| ≃ 1. The associated canonical dual and tight frame can be computed as preceding, with the addition of an inverse Fourier transform. SAMPTA'09 Implementation Example The possibility to build a decomposition with timefrequency resolution evolving over time can be exploited to solve the problem described in example of Section 1. illustrated by Figure 1. For the corresponding glockenspiel signal, as we have seen before, the use of narrow window is suitable for the attacks of the notes, while a wide window should be used for the resonances. Figure 4 shows a representation built with our approach using a narrow window of 6 ms for the attacks and a wide window of 93 ms for the resonance. The frame used for this figure is a tight frame. It should be noticed that the evolution of the window size between the two target window lengths is smoothed in order to ensure that the atoms used for the decomposition maintain a “nice” shape, in the sense of having a good time-frequency concentration. This ensures the easy interpretability of the decomposition, especially for processing using frame multipliers. This figure gives an idea of the type of decompositions that can be constructed with our approach and should be compared to the decomposition obtained using standard Gabor analysis on Figure 1. With our approach, it becomes possible to have a simultaneous good representation of both types of components of this signal while keeping the same processing ability than with standard Gabor. We see that our approach allows to build decompositions with better time-frequency localization of the signal energy. This can be helpful for many processing tasks, in particular to reduce artifacts in component extraction or denoising. 4. Conclusion Our approach enables the construction of frames with flexible evolution of time-frequency resolution over time or frequency. The resulting frames are well suited for applications as they can be implemented using fast algorithms, at a computational cost close to standard Gabor frames. Exploiting evolution of resolution over time, the proposed approach can be of particular interest for applica- 229 20000 Frequency (Hz) 15000 10000 5000 0 0.2 0.4 0.6 0.8 1 1.2 Time (s) Figure 4: Spectrogram of the same glockenspiel signal as in Figure 1 using a nonstationary Gabor decomposition. tions where the frequency characteristics of the signal are known to evolve significantly with time. Order analysis [11], in which the signal analyzed is produced by a rotating machine having evolving rotating speed, is an example of such application. Exploiting evolution of resolution over frequency, the presented approach could be valuable for applications requiring the use of a tailored non uniform filter bank. In particular, it can be used to build filter banks following some perceptive frequency scale. One difficulty when using our approach is to adapt the time-frequency resolution to the evolution of the signal characteristics. If prior knowledge is available, this can be done by hand, as for the example of Figure 4. But to go further, our approach could be extended to construct an adaptive decomposition of the signal by automatically adapting the resolution to the signal. To achieve this, we plan to investigate the possibility to couple our approach with the use of sparsity criterion as proposed in [10]. The general idea would then be to consider time segments of the signal, and for each time segment compare the sparsity criterion obtained for Gabor transforms computed with different possible windows. We would then use in our decomposition the window corresponding to the best criterion for each time segment, leading to a decomposition optimizing the sparsity of the decomposition over time. [4] O. Christensen. An Introduction To Frames And Riesz Bases. Birkhäuser, 2003. [5] I. Daubechies. Ten Lectures On Wavelets. CBMSNSF Regional Conference Series in Applied Mathematics. SIAM Philadelphia, 1992. [6] H. G. Feichtinger and K. Nowak. A first survey of Gabor multipliers. In H. G. Feichtinger and T. Strohmer, editors, Advances in Gabor analysis, chapter 5, pages 99–128. Birkhäuser Boston, 2003. [7] H. G. Feichtinger and T. Strohmer. Gabor Analysis and Algorithms - Theory and Applications. Birkhäuser Boston, 1998. [8] K. Gröchenig. Foundations of Time-Frequency Analysis. Birkhäuser Boston, 2001. [9] W. M. Hartmann. Signals, Sounds, and Sensation. Springer, 1998. [10] F. Jaillet and B. Torrésani. Time-frequency jigsaw puzzle: adaptive multiwindow and multilayered gabor representations. International Journal for Wavelets and Multiresolution Information Processing, 5(2):293–316, 2007. [11] H. Shao, W. Jin, and S. Qian. Order tracking by discrete Gabor expansion. IEEE Transactions on Instrumentation and Measurement, 52(3):754–761, 2003. [12] J. S. Walker. Fast Fourier Transforms. CRC Press, 1991. Acknowledgment This work was supported by the WWTF project MULAC (“Frame Multipliers: Theory and Application in Acoustics”, MA07-025). References: [1] P. Balazs. Basic definition and properties of Bessel multipliers. Journal of Mathematical Analysis and Applications, 325(1):571585, January 2007. [2] P. G. Casazza. The art of frame theory. Taiwanese J. Math., 4(2):129–202, 2000. [3] P. G. Casazza and O. Christensen. Gabor frames over irregular lattices. Adv. Comput. Math., 18(2-4):329– 344, 2003. SAMPTA'09 230 A Nonlinear Reconstruction Algorithm from Absolute Value of Frame Coefficients for Low Redundancy Frames Radu Balan Department of Mathematics, CSCAMM and ISR, University of Maryland, College Park, MD 20742, USA rvbalan@math.umd.edu Abstract: In this paper we present a signal reconstruction algorithm from absolute value of frame coefficients that requires a relatively low redundancy. The basic idea is to use a nonlinear embedding of the input signal Hilbert space into a higher dimensional Hilbert space of sesquilinear functionals so that absolute values of frame coefficients are associated to relevant inner products in that space. In this space the reconstruction becomes linear and can be performed in a polynomial number of steps. 1. 2. Assume e ∈ E , e = 1 is so that Kx e = 0. Then: Let us denote by E n the n-dimensional space of signals (e.g. E n = Rn or E n = Cn ), and assume we are given a frame of m vectors {f 1 , . . . , fm } ⊂ En that span E n . Thus necessarily m ≥ n. In this paper we look at the following problem: Given c l = |x, fl |, 1 ≤ l ≤ m, reconstruct the original signal x ∈ E n up to a constant phase ambiguity, that is, obtain a signal y ∈ E n such that y = eiϕ x for some ϕ ∈ [0, 2π). This problem arises in several areas of signal processing (see [BCE06] for a more detailed discussion of these issues). In particular, in X-Ray Crystallography (see [LFB87]) it is known as the phase retrieval problem. In speech processing it is related to the use of cepstral coefficients in Automatic Speech Recognition as well as direct reconstruction from denoised spectogram (see [NQL82]). By the same token the solution posed here can be viewed as a new, nonlinear signal generating model. Recently ([BBCE09]) we proposed a quasi-linear reconstruction algorithm that requires the frame to have high redundancy (m = O(n 2 )). The algorithm works as follows. First note that two vectors x, y ∈ E n that are equivalent (i.e. equal to one another up to a constant phase) generate the same rank-one operators K x = Ky , where (1) with u = x or u = y. Conversely, if K x = Ky then necessarily there exists a phase ϕ so that y = e iϕ x. Thus the reconstruction problem reduces to obtaining first K x , and then a representative of the class x̂. Next notice that the absolute value of frame coefficient |x, f l | is related to the Hilbert-Schmidt SAMPTA'09 inner product between K x and Kfl : Kx , Kfl  := trace(Kx Kf∗l ) = |x, fl |2 l=1 n Introduction Ku : En → En , Ku (z) = z, uu Hence, if {Kfl , 1 ≤ l ≤ m} form a frame for the set of Hilbert-Schmidt operators (this is the same as the set of quadratic forms), then K x can be reconstructed from d 2l with a linear algorithm, from where a vector y ∈ x̂ can be obtained. Explicitely, the algorithm is as follows: First l : En → En , 1 ≤ l ≤ m} the canonical denote by { K dual frame of {K fl , 1 ≤ l ≤ m}. 1. Compute: m  l Kx = c2l K (2) 1 Kx (e) y=  Kx (e), e (3) is a vector in En equivalent to x. While very appealing from a computational perspective, this algorithm requires the set {K fl , 1 ≤ l ≤ m} to be complete (spanning) in the Hilbert space of n × n quadratic forms. In the real case (E = R) this latter Hilbert space is of dimension n(n + 1)/2. In the complex case (E = C) the dimension becomes n 2 . Thus the algorithm requires the original frame set {f l , 1 ≤ l ≤ m} to have m = O(n2 ) vectors. In practice this requirement may not be feasible. Furthermore, in [BCE06] we obtained that generically m ≥ 4n − 2 should suffice in the complex case, and n ≥ 2n − 1 should suffice in the real case. In this paper we present an algorithm that applies to a generic frame set of m = 5.394n − 4.394 vectors in the complex case, and m = 2n − 1 in the real case. The main ingredient of this algorithm is the nonlinear embeding of En into a linear space Λ d,d of (d, d)-sesquilinear symmetric forms where the absolute value of frame coefficients provide the inner products with a frame set. 2. Nonlinear Embeddings Let En be the signal n-dimensional Hilbert space. Let F = {f1 , . . . , fm } be a spanning set of m vectors in E n . Its redundancy is r = m/n ≥ 1. Fix an integer d ≥ 1 which is going to measure the embedding depth. Let Λd,d (En ) denote the linear space of (d, d)-sesquilinear functionals, that is 231 Λd,d (En ) = { α : En × · · · En → C } 2d (4) 3. The Reconstruction Algorithm where α(y1 , . . . , yd , z1 , . . . , zd ) is linear in y1 , . . . , yd , and antilinear in z 1 , . . . , zd . Note Λd,d (En ) is a vector space of dimension n 2d . Let {ek , 1 ≤ k ≤ n} be an orthonormal basis of E n . For each 2d-tuple (k 1 , . . . , k2d ) of integers from 1, . . . , n (repetitions are allowed) define Under Assumption A, let us denote by { ψ j1 ,...,jd , 1 ≤ j1 ≤ · · · ≤ jd ≤ m} the canonical dual frame to P Ψ. This dual frame allows us to recover Φ(x). Recall n {e δk1 ,...,k2d (y1 , . . . , yd , z1 , . . . , zd ) = y1 , ek1  · · · yd , ekd  · 1 , . . . , en } is an orthonormal basis of E . Notice the following relations: (5) ekd+1 , z1  · · · ek2d , zd  Φ(x)(ek , . . . , ek ) = |x, ek |2d (11) Note ∆ = {δk1 ,...,k2d ; 1 ≤ kl ≤ n, 1 ≤ l ≤ 2d} n  1/d forms a basis in Λd,d (En ). We define an inner product (Φ(x)(ek , . . . , ek )) = x 2 (12) n on Λd,d (E ) so that this basis is orthonormal. Consider k=1 two sesquilinear functionals in Λ d,d (En ): Φ(x)(ej , . . . , ej , ek ) = |x, ej |2d−2 ej , xx, ek     α(y , . . . , y , z , . . . , z ) = y , a  · · · y , a b , z  · · · b , z  1 d 1 d 1 1 d d 1 1 d d 2d−1 β(y1 , . . . , yd , z1 , . . . , zd ) = y1 , g1  · · · yd , gd h1 , z1  · · · hd , zd  Then their inner product is defined as α, β := g1 , a1  · · · gd , ad b1 , h1  · · · bd , hd  From (11) and (13) we obtain: (6) Extend this binary operation to an inner product on Λd,d (En ). With this inner product ∆ becomes an orthonormal basis for the Hilbert space Λ d,d (En ). Now we are ready to define the nonlinear embedding of the input Hilbert space E n in Λd,d (En ). This is given by the map Φ : En → Λd,d(En ) Φ(x)(y1 , . . . , yd , z1 , . . . , zd ) = ·x, z1  · · · x, zd  (13) y1 , x · · · yd , x · (7) Let Ed = span(Φ(Λd,d (En ))) be the linear span of the embedding. Note in general E d  Λd,d (En ) unless d = 1. Let P denote the orthogonal projection onto E d , P : Λd,d (En ) → Ed . Define now the following sesquilinear functionals associated to the frame set F . Fix 1 ≤ j 1 , . . . , jd ≤ m. Φ(x)(ej , . . . , ej , ek ) x, ej  |x, ej | (Φ(x)(ej , . . . , ej , ej ))(2d−1)/2d (14) The Reconstruction Algorithm is as follows. Reconstruction Algorithm Input: Coefficients c1 = |x, f1 |, ... cm = |x, fm |. m Step 0. If k=1 c2k = 0 then y = 0 and stop. Otherwise continue. Step 1. Construct the following sesquilinear functional  c2j1 · · · c2jd ψ (15) α= j1 ,...,jd x, ek  = 1≤j1 ≤···≤jd ≤m Step 2. Find a 1 ≤ j0 ≤ n so that α(ej0 , · · · , ej0 ) > 0. This is possible due to (12). Set ν= 2d α(ej0 , . . . , ej0 ) (16) ψj1 ,...,jd (y1 , . . . , yd , z1 , . . . , zd ) = y1 , fj1  · · · yd , fjd  · Step 3. Set ·fj1 , z1  · · · fjd , zd  (8) n 1  α(ej0 , . . . , ej0 , ek )ek (17) y = d Note there are m distinct such functionals, however the ν 2d−1    k=1 number of distinct projections onto E d is much smaller. 2d−1 Notice Summarizing all results obtained so far we obtain: 2 2 Φ(x), ψj1 ,...,jd  = |x, fj1 | · · · |x, fjd | (9) Theorem 3..1 For every x ∈ E n there is z ∈ C so that Thus if (k1 , . . . , kd ) is a permutation of (j 1 , . . . , jd ) then |z| = 1 and the output of the Reconstruction Algorithm x,e  P ψk1 ,...,kd = P ψj1 ,...,jd . For converse we need to assatisfies x = zy. Specifically z = |x,ejj0 | , with j0 ob0 sume first that frame vectors belong to distinct equivatained in Step 2. lence classes (that is, for any two 1 ≤ l < j ≤ m and any a ∈ [0, 2π), f l = eia fj ). Then we get that 4. Redundancy Constraint P ψk1 ,...,kd = P ψj1 ,...,jd if and only if (k 1 , . . . , kd ) is a permutation of (j 1 , . . . , jd ). Thus we obtain that for In this section we analyse the necessary condition |Ψ| ≥ frames with frame vectors in distinct equivalence classes dim(Ed ). the set Ψ = {ψj1 ,...,jd , 1 ≤ j1 ≤ j2 ≤ · · · ≤ jd ≤ m} (10) is a maximal set of sesquilinear functionals of type (8) that have distinct projections through P . For our algorithm to work we need to assume: Assumption A. The set P Ψ := {P ψ , ψ ∈ Ψ} is spanning in Ed . In section 4. we analyze the dimensionality constraint SAMPTA'09 |P Ψ| ≥ dim(Ed ), and in section 5. we present numerical results supporting Assumption A for a generic frame. 4.1 The Cardinal of Set Ψ The set Ψ given in (10) has the same cardinal as {(k1 , . . . , kd ) , 1 ≤ k1 ≤ · · · ≤ kd ≤ m} (18) Let us denote this number by M m,d . In order to compute it, consider the following cardinal equivalent set: 232 {(n1 , . . . , nm ) , 0 ≤ n1 , . . . , nm ≤ d, n1 +· · ·+nm = d} (19) The bijective correspondence between d-tuples of (18) and m-tuples of (19) is given by the following interpretation: nl is the number of times l is presented in the d-tuple (k1 , . . . , kd ). Then, one can obtain the following recursion: d  Mm,d Mm+1,d = r=0 where we set Mm,0 = 1. Since M1,d = 1, one obtains by induction that: Mm,d = m+d−1 m−1 = m(m + 1) · · · (m + d − 1) d! (20) 4.2 The Dimension of Ed d-tuples l: Nn,d = (Mn,d )2 = n(n + 1) · · · (n + d − 1) d! We shall group together terms containing same t k terms. The real case will be treated separately from the complex case. To simplify the exposition, we introduce notation common to both cases. Let us denote by k = (k 1 , . . . , kr ) an ordered r-tuple of integers each from 1 to n, where the length r is equal to 2d (in the real case), or d (in the complex case). Let us denote by P r the set of rpermutations, and by P k the quotient set Pk = P/ ∼k where π ′ , π ′′ ∈ Pr are equivalent π ′ ∼k π ′′ if and only if π ′ (k) = π ′′ (k). Note |Pk | = r! m1 ! · · · mn ! where ml denotes the number of repetitions of l in k. The Complex Case In the complex case, t k and tk can be treated as indepedent (real) variables. Then terms in (21) are grouped using two independent d-tuples, j = (j 1 , . . . , jd ) and l = (l1 , . . . , ld ) as follows   tj1 · · · tjd tl1 · · · tld × 1≤j1 ≤···≤jd ≤n 1≤l1 ≤···≤ld ≤n ×   δπ(j1 ),...,π(jd ),ρ(l1 ),...,ρ(ld ) π∈Pj ρ∈Pl Then the following sesquilinear functionals are orthonormal and form a basis in E d :   1 δπ(j1 ),...,π(jd ),ρ(l1 ),...,ρ(ld )  |Pj | |Pl | π∈Pj ρ∈Pl (22) SAMPTA'09 Their number (and hence dimension of E d ) is equal to the number of ordered d-tuples j times the number of ordered dj,l = (23) where we used (20). Note N n,1 = n2 and we recover the complex case considered in [BBCE09]. The Real Case In the real case, tk and tk are the same variables. Then the independent terms in (21) are indexed by 2d-tuples k = (k1 , . . . , k2d ) as follows:   tk1 · · · tk2d δπ(k1 ),...,π(k2d ) 1≤k1 ≤k2d ≤n π∈Pk and an orthonormal basis of E d is given by the following vectors indexed by ordered 2d-tuples k: Recall Ed is the linear span of vectorx Φ(x) in Λ d,d(En ).  1 Recall also that ∆ whose n2d vectors are defined in (5) is dk =  δπ(k1 ),...,π(k2d ) n an orthonormal basis in Λ d,d (E ). Let us denote by N n,d |Pk | π∈P k the dimension of E d . We will describe an orthonormal basis in Ed . Fix t1 , . . . , tn ∈ C and expand: The dimension of E d in real case is then:  tk1 · · · tkd tkd+1 · · · tk2d · Nn,d = Mn,2d = n(n + 1) · · · (n + 2d − 1) Φ(t1 e1 + · · · tn en ) = (2d)! 1≤k1 ,...,k2d ≤n ·δk1 ,...,k2d 2 (21) Note Nn,1 = [BBCE09]. n(n+1) 2 (24) (25) and this recovers the real case in 4.3 The Optimal Depth and Redundancy Condition For given n we would like to find the minimum m = m ∗ so that Mm,d ≥ Nn,d for some d ≥ 1. The Complex Case We need to solve m(m + 1) · · · (m + d − 1) ≥ d! n(n + 1) · · · (n + d − 1) d! or, completing the factorials: (m + d − 1)! d! ((n − 1)!)2 ≥ (m − 1)! ((n + d − 1)!)2 Let us denote R(n, m, d) = (m + d − 1)!d!((n − 1)!)2 (m − 1)!((n + d − 1)!)2 (26) Ideally we would like to solve: (1) d∗ (n, m) = argmaxd R(n, m, d) (2) m∗ (n) = minR(n,m,d∗ (n,m))≥1 m Instead we make the following choices for d = d(n) and m = m(n), and then optimize using Stirling’s formula: d = n−1 (27) m = A(n − 1) + 1. (28) √ Using Stirling’s formula n! = 2πnnn e−n we obtain for R(n + 1, An + 1, n), R(n+1, An+1, n) = 233 n  8π(A + 1)n A + 1 1 (1 + )A A 16 A 2 F(j,l),k = ψk , dj,l . Explicitely this becomes 1.8 1.6 F(j,l),k 1.4 1.2   1 eπ(j1 ) , fk1  · · ·  |Pj | |Pl | π∈Pj ρ∈Pl u ·eπ(jd ) , fkd fk1 , eρ(l1 )  · · · fkd , eρ(ld )  q(A) 1 0.8 0.6 0.4 0.2 0 = 0 1 2 3 4 5 A 6 7 8 9 10 Figure 1: The plot of q = q(A) from (29). To obtain R ≥ 1 for large n, we need q(A) = A+1 1 (1 + )A ≥ 1 16 A (29) In Figure 1 we plot the function q = q(A). Numerically we obtain A = 5.394. The √ remaining factor in R(n + 1, An + 1, n) becomes 5.376 n ≥ 1 for all n. Thus we obtain as sufficient conditions: d = m = n−1 5.394n − 4.394 (35) Thus P Ψ is frame for E d if and only if the N n,d × Mm,d matrix F is of full rank. The frame operator is given by S = F F ∗. We considered the complex case (E = C) with the following parameters n = 5 and d = 3. For m = 21 the ratio function (26) takes the value R(5, 21, 3) = 1.4457 > 1. Note for the algorithm in [BBCE09] to work m has to be greater than or equal to n 2 , that is m ≥ 25. For a frame with 21 vectors in dimension 5 whose vectors are obtained as realizations of complex valued normal random variables of zero mean and variance 2 (each real and imaginary part is i.i.d. N (0, 1)), the distribution of eigenvalues of its frame operator is plotted in Figure 2. Note the conditioning number is cond(S) = 6267.7. While relatively large, the important thing to note is that the realization P Ψ is frame (spanning) for E d . While this result is by no (30) (31) The Real Case In the real case we need to solve m(m + 1) · · · (m + d − 1) n(n + 1) · · · (n + 2d − 1) ≥ d! (2d)! Following the same approach we obtain the following ratio function that we need to make supraunital: R(n, m, d) = (m + d − 1)!(n − 1)!(2d)! (m − 1)!(n + 2d − 1)!d! (32) Figure 2: Distribution of eigenvalues for a random frame.. means a proof, or even an exhaustive experiment, it suggests the Assumption A might be generically true whenever R(n, m, d) > 1. It follows: R(n + 1, 2n + 1, n) = 1 Hence a possible choice is d = m = n−1 2n − 1 (33) (34) It is interesting to note that in the real case we recover the critical case m ≥ 2n − 1. 5. Numerical Evidence Supporting Genericity of the Assumption A. While the previous section computed necessary conditions for Assumption A to hold true, we still need to prove (or check) that P Ψ is frame in E d . In this section we plot the distribution of eigenvalues of the frame operator associated toSAMPTA'09 P Ψ for one randomly generated example. Using (22), each vector P ψ k is represented by a N n,d vector whose components are indexed by a pair (j, l), References: [BCE06] R. Balan, P. Casazza, D. Edidin, On signal reconstruction without phase, Appl.Comput.Harmon.Anal. 20 (2006), 345–356. [BBCE09] R. Balan, B. Bodman, P. Casazza, D. Edidin, Painless reconstruction from magnitudes of frame coefficients, to appear in the Journal of Fourier Analysis and Applications, 2009. [LFB87] R. G. Lane, W. R. Freight, and R. H. T. Bates, Direct Phase Retrieval, IEEE Trans. ASSP 35, no. 4 (1987), 520–526. [NQL82] H. Nawab, T. F. Quatieri, and J. S. Lim, Signal Reconstruction from the Short-Time 234 Fourier Transform Magnitude, in Proceedings of ICASSP 1984. Matrix Representation of Bounded Linear Operators By Bessel Sequences, Frames and Riesz Sequence Peter Balazs Acoustics Research Institute, Austrian Academy of Sciences, Wohllebengasse 12-14, 1040 Wien, Austria. peter.balazs@oeaw.ac.at Abstract: In this work we will investigate how to find a matrix representation of operators on a Hilbert space H with Bessel sequences, frames and Riesz bases as an extension of the known method of matrix representation by ONBs. We will give basic definitions of the functions connecting infinite matrices defining bounded operators on l2 and operators on H. We will show some structural results and give some examples. Furthermore in the case of Riesz bases we prove that those functions are isomorphisms. We are going to apply this idea to the connection of Hilbert-Schmidt operators and Frobenius matrices. Finally we will use this concept to show that every bounded operator is a generalized frame multiplier. 1. Introduction From practical experience it became apparent that the concept of an orthonormal basis is not always useful. This led to the concept of frames, which was introduced by Duffin and Schaefer [12] and today it is one of the most important foundations of sampling theory [1]. The standard matrix description [8] of operators O using an ONB (ek ) is by constructing an matrix M with the entries Mj,k = hOek , ej i. In [6] a concept was presented, where Ean operator R is described by the maD trix Rφj , φ̃i with (φi ) being a frame and (φ̃i ) its g ∈ H2 then define the inner tensor product as an operator from H2 to H1 by (f ⊗i g) (h) = hh, gi f for h ∈ H2 . 2.1.1 Hilbert Schmidt Operators A bounded operator T ∈ B(H1 , H2 ) is called a HilbertSchmidt (HS) [18] operator s if there exists an ONB (en ) ⊆ ∞ P 2 H1 such that kT kHS := kT en kH2 < ∞. Let n=1 HS(H1 , H2 ) denote the space of Hilbert Schmidt operators from H1 to H2 . 2.2 Frames A sequence Ψ = (ψk |k ∈ K) is called a frame [5, 7] for the Hilbert space H, if constants A, B > 0 exist, such that X 2 2 2 A · kf kH ≤ |hf, ψk i| ≤ B · kf kH ∀ f ∈ H (1) k A sequence Ψ = (ψk ) is called a Bessel sequence with Bessel bound B if it fulfills the right inequality above. The index set will be omitted in the following, if no distinction is necessary. A complete sequence (ψk ) in H is called a Riesz basis if there exist constants A, B > 0 such that the inequalities 2 2 A kck2 ≤ i,j canonical dual. Such a kind of representation is used for the description of operators in [15] using Gabor frames and [19] using linear independent Gabor systems. In this work we are presenting the main ideas for Bessel sequences, frames and Riesz sequences and also look at the dual function which assigns an operator to a matrix. For proofs and details we refer to [3]. X 2 ck ψk k∈K H ≤ B kck2 hold for all finite sequences (ck ). 3. Representing Operators with Frames Let (ψk ) be a frame in H1 . An existing operator U ∈ B(H1 , H2 ) is uniquely determined by its images of the P frame elements. For f = ck ψk k 2. 2.1 Notation and Preliminaries Hilbert spaces and Operators Let B(H1 , H2 ) denote the set of all linear and bounded operators from the Hilbert space H1 to H2 . Furthermore we will denote the range of an operator A by ran(O) and its kernel by ker(A). Let X, Y, Z be sets, f : X → Z, g : Y → Z be arbitrary functions. The Kronecker product ⊗o : X × Y → Z is defined by (f ⊗o g) (x, y) = f (x) · g(y). Let f ∈ H1 , SAMPTA'09 X X U (f ) = U ( ck ψk ) = ck U (ψk ). k k On the other hand, contrary to the case for ONBs, we cannot just choose a Bessel sequence (ηk ) and define an P operator just by P choosing V (ψk ) := ηk and setting V ( ck ψk ) = ck ηk . This is in general not wellk k defined. Only if X X X X ck ψk = dk ψk =⇒ ck ηk = dk ηk k k k k 235 this definition is non-ambiguous, i.e. if ker (Dψk ) ⊆ ker (Dηk ). This condition is certainly fulfilled, if Dψk is injective, i.e. for Riesz bases. This problem can be avoided by using the following definition E XD (2) V (f ) := f, ψ̃k ηk . k As (ηk ) forms a Bessel sequence, the right hand side of Eq. (2) is well-defined. It is clearly linear, and it is bounded. The Bessel condition is necessary in the case of ONBs to get a bounded operator, too [8]. But contrary to the ONB case, here, in general, V (ψk ) 6= ηk . So this option does not seem very useful. Instead of changing the sequence with which the coefficients are resynthezised, an operator can also be described by changing the coefficients, as presented in the following sections. 4. 4.1 Matrix Representation Motivation: Solving Operator Equalities ⇐⇒ X k X k 1. Let O : H1 → H2 be a bounded, linear operator. Then the infinite matrix   M(Φ,Ψ) (O) = hOψn , φm i m,n defines a bounded operator from l2 to l2 with √ kMkl2 →l2 ≤ B · B ′ · kOkH1 →H2 . As an operator l2 → l2 M(Φ,Ψ) (O) = CΦ ◦ O ◦ DΨ This means the function M(Φ,Ψ) : B(H1 , H2 ) → B(l2 , l2 ) is a well-defined bounded operator. 2. On the other hand let M be an infinite matrix defin2 ing to l2 , (M c)i = P a bounded operator from l(Φ,Ψ) Mi,k ck . Then the operator O defined by     X X  Mk,j hh, ψj i φk , O(Φ,Ψ) (M ) h = O(Φ,Ψ) (M ) This gives us an algorithm for finding an approximative solution to the inverse operator problem Of = g. 1. Set M = M(Φ,Φ̃) (O). 2. Find a good finite dimensional approximation MN of M by using the finite section method [14, 16] and 3. then apply an algorithm like e.g. the QR factorization [21] to find a solution for the operator equation. 4. and synthezise with the dual frame Φ̃. H1 →H2 ≤ √ O(Φ,Ψ) (M ) = DΦ ◦M ◦CΨ = B · B ′ kM kl2 →l2 . XX k j Mk,j ·φk ⊗i ψ j This means the function O(Φ,Ψ) : B(l2 , l2 ) → B(H1 , H2 ) is a well-defined bounded operator. O(Φ,Ψ) (M ) H1 E D hf, φk i Oφ̃k , φk = hg, φk i It can be easily seen that this is equivalent to projecting c on ran(C), solving M CΦ DΦ̃ c = d, which is a common idea found in many algorithms, for example for a recent one see [20]. j k hf, φk i Oφ̃k = g ⇐⇒ ⇐⇒ M(Φ,Φ̃) (O) · CΦ f = CΦ g. SAMPTA'09 Theorem 4.2.1 Let Ψ = (ψk ) be a Bessel sequence in H1 with bound B, Φ = (φk ) in H2 with B ′ . (3) for example using the pseudoinverse [7]. Still, if using frames, we can not expect to find a true solution for the operator equality just by applying DΦ̃ on c as in general c is not in ran(CΦ ) even if d is. But we see the following: Of = g ⇐⇒ Bessel sequences k Given an operator equality O · f = g it is natural to discretize it to find a solution. Let Φ = (φk ) be a frame. Let us suppose that for a given g with coefficients d = (dk ) = (hg, φk i) and a matrix representation M of O there is an algorithm to find the least square solution of M ·c=d 4.2 ✻ DΨ CΨ ❄ l2 ✲ O H2 ✻ DΦ M(Ψ,Φ) (O) M CΦ ❄ ✲ l2 Figure 1: The operator induced by a matrix M and the matrix induced by an operator O. If we do not want to stress the dependency on the frames and there is no change of confusion, the notation M(O) and O(M ) will be used. In the above theorem we have avoided the issue, when an infinite matrix defines a bounded operator from l2 to l2 . A criterion has been proved in [9]: 236 Theorem 4.2.2 An infinite matrix M defines a bounded n operator from l2 to l2 , if and only if (M ∗ M ) is defined i1/n h n < for all n = 1, 2, 3, . . . and sup sup (M ∗ M )l,l n l ∞. For similar conditions see [17]. 5. Matrix Representation of HS Operators We now have the adequate tools to state that HS operators correspond exactly to the Frobenius matrices, as expected. Let A be an m by n matrix, then kAkf ro = s n−1 P m−1 P 2 |ai,j | is the Frobenius norm. Let us denote i=0 j=0 4.3 the set of all matrices with finite Frobenius norm by l(2,2) , the set of Frobenius matrices. Frames Proposition 4.3.1 Let Ψ = (ψk ) be a frame in H1 with bounds A, B, Φ = (φk ) in H2 with A′ , B ′ . Then     1. O(Φ,Ψ) ◦ M (Φ̃,Ψ̃) = Id = O(Φ̃,Ψ̃) ◦ M (Φ,Ψ) . And therefore for all O ∈ B(H1 , H2 ): O= XD k,j E Oψ̃j , φ̃k φk ⊗i ψ j 2. M(Φ,Ψ) is injective and O(Φ,Ψ) is surjective. 3. Let H1 = H2 , then O(Ψ,Ψ̃) (Idl2 ) = IdH1 4. Let Ξ = (ξk ) be any frame in H3 , and O : H3 → H2 and P : H1 → H3 . Then   M(Φ,Ψ) (O ◦ P ) = M(Φ,Ξ) (O) · M(Ξ̃,Ψ) (P ) As a direct consequence we get the following corollary: Corollary 4.3.2 For the frame Φ = (φk ) the function M(Φ,Φ̃) is a Banach-algebra monomorphism between the algebra of bounded operators (B(H1 , H1 ), ◦) and the infinite matrices of B(l2 , l2 ), · . Lemma 4.3.3 Let O : H1 → H2 be a linear and bounded operator, let Ψ = (ψk ) and Φ = (φk ) be frames in H1 resp. H2 . Then M(Φ,Ψ̃) (O) maps ran (CΨ ) into ran (CΦ ) with (hf, ψk i)k 7→ (hOf, φk i)k . Proposition 5.0.2 Let Ψ = (ψk ) be a Bessel sequence in H1 with bound B, Φ = (φk ) in H2 with B ′ . Let M be a matrix in l(2,2) . Then O(Φ,Ψ) (M ) ∈ HS(H1 , H2 ), the Hilbert Schmidt√class of operators from H1 to H2 , with kO(M )kHS ≤ BB ′ kM kf ro . (Φ,Ψ) (O) ∈ l(2,2) with Let O ∈ HS, √ then M ′ kM(O)kf ro ≤ BB kOkHS . 5.1 Matrices and the Kernel Theorems For L2 (Rd ) the HS operators are exactly those integral operators with kernels in L2 R2d [18]. This means that there exists a κO ∈ L2 (R2d ) such an operator can be described as Z (Of ) (x) = κO (x, y)f (y)dy Or in weak formulation Z Z hOf, gi = κO (x, y)f (y)g(x)dydx = hκO , f ⊗o gi . (4) From 4.2.1 we know that E XD O= Oψ̃j , φ̃k φk ⊗i ψ j j,k and so  Corollary 5.1.1 Let O ∈ HS L2 Rd . Let Ψ = (ψj ) and Φ = (φk ) be frames in L2 Rd . Then the kernel of O is given as: If O is surjective, then M(Φ,Ψ̃) (O) maps ran (CΨ ) onto ran (CΦ ). If O is injective, M(Φ,Ψ̃) (O) is also injective. κO = X j,k M(Ψ̃,Φ̃) (O)k,j · φk ⊗o ψ j The other function O is in general not so “well-behaved”. It is, if the dual frames are biorthogonal. In this case these functions are isomorphisms, see the next section. This directly leads to the next concept. 4.4 Let m be a sequence and diag(m) the matrix that has this sequence as diagonal. Then define Riesz sequences Theorem 4.4.1 Let Φ = (φk ) be a Riesz basis for H1 , Ψ = (ψk ) one for H2 . The functions M(Φ,Ψ) and O(Φ̃,Ψ̃) between B(H1 , H2 ) and the infinite matrices in B(l2 , l2 ) are bijective. M(Φ,Ψ) and O(Φ̃,Ψ̃) are inverse to each other. For H1 = H2 the identity is mapped on the identity by M(Φ,Ψ) and O(Φ̃,Ψ̃) . If furthermore Ψ = Φ then M(Φ,Φ̃) and O(Φ,Φ̃) are Banach algebra isomorphisms, respecting the identities idl2 and idH . SAMPTA'09 6. Generalized Bessel Multipliers Mm,Φ,Ψ := O(Φ,Ψ) (diag(m)) = X k mk · φk ⊗ ψk This means we have arrived quite naturally at the definition of frame multipliers as introduced in [2]. It is a very natural idea to extend this definition to include more side-diagonals: 237 Definition 6.0.2 Let H1 , H2 be Hilbert-spaces, let (ψk )k∈L ⊆ H1 and (φk )k∈K ⊆ H2 be Bessel sequences. Let M be a (K × L)-matrix that defines a bounded operator from l2 to l2 . Define the operator MM,(φk ),(ψk ) : H1 → H2 , the generalized Bessel multiplier for the Bessel sequences (ψk ) and (φk ), as the operator XX Mm,(φk ),(ψk ) (f ) = Ml,k hf, ψk i φl . l k The sequence m is called the symbol of M. If the sequence is a frame, we call the operator a ’generalized frame multiplier’. For Gabor frames, this is a particular case of the ’generalized Gabor multipliers’ as found in [10] or [11] in this volume. Using the results above we can write Proposition 6.0.3 For two frames (ψk ) ⊆ H1 and (φk } ⊆ H2 every operator O : H1 → H2 can be written as frame multiplier with the symbol D a generalized E Ml,k = Oψ̃k , φ̃l . Further results as the following are easy to prove: Theorem 6.0.4 Let M = Mm,φk ,ψk be a Bessel multiplier for the Bessel sequences (ψk ) ⊆ H1 and (φk } ⊆ H2 with the bounds B and B ′ . Then 1. If M, M ∗ ∈ l1,∞ with kM k1,∞ = K1 and kM ∗ k1,∞ = K2 then M is a well defined bounded √ operator with kMkOp ≤ B ′ BK1 K2 . 2. If sup M (n) = K < ∞ then M is a well de√ fined bounded operator with kMkOp ≤ B ′ BK. n Op n 3. If (M ∗ M ) is defined for n = 1, 2, . . . and h i1/n n = K < ∞ then sup sup hM ∗ M )i,i n i √ kMkOp ≤ B ′ BK. 4. If φk = ψk and M ∈ B(l2 ) is a positive matrix, M is positive. ∗ 5. Let M ∈ B(l2 ), then MM,(φk ),(ψk ) = MM ∗ ,(ψk ),(φk ) . Therefore if M is self-adjoint and φk = ψk , M is self-adjoint. 6. Let M ∈ B(l2 ) be a matrix such that (n) lim M − M Op = 0, then M is compact. n 7. If M ∈ l2,2√ , M√is a Hilbert Schmidt operator with kM kHS ≤ B ′ B kM k2,2 . Here for an operator A we denote A(n) = Pn APn , where Pn (x0 , x1 , x2 , . . . ) = (x1 , x2 , . . . , xn−1 , 0, 0, . . . ), see [14] (finite sections). 7. Perspectives In this work we have investigated the basic idea of matrix representations using frames. An interesting question, as discussed in Section 4.1, is how to find a good finite approximation matrix. For first ideas in the Gabor case see [13, 10, 11, 22, 4]. SAMPTA'09 8. Acknowledgments The author would like to thank Jean-Pierre Antoine for many helpful comments and suggestions. This work was partly supported by the WWTF project MULAC (Frame Multipliers: Theory and Application in Acoustics, MA07-025). References: [1] A. Aldroubi and K. Gröchenig. Non-uniform sampling and reconstruction in shift-invariant spaces. SIAM Review, 43:585–620, 2001. [2] P. Balazs. Basic definition and properties of Bessel multipliers. Journal of Mathematical Analysis and Applications, 325(1):571–585, January 2007. [3] P. Balazs. Matrix-representation of operators using frames. Sampling Theory in Signal and Image Processing (STSIP), 7(1):39–54, Jan. 2008. [4] J. Bendetto and G. Pfander. Frame expansions for Gabor multipliers. Applied and Computational Harmonic Analysis (ACHA)., 20(1):26–40, Jan. 2006. [5] P. G. Casazza. The art of frame theory. Taiwanese J. Math., 4(2):129–202, 2000. [6] O. Christensen. Frames and pseudo-inverses. J. Math. Anal. Appl, 195(2):401–414, 1995. [7] O. Christensen. An Introduction To Frames And Riesz Bases. Birkhäuser, 2003. [8] J. B. Conway. A Course in Functional Analysis. Graduate Texts in Mathematics. Springer New York, 2. edition, 1990. [9] Lawrence Crone. A characterization of matrix operator on l2 . Math. Z., 123:315–317, 1971. [10] M. Dörfler and B. B. Torrésani. Spreading function representation of operators and gabor multiplier approximation. In Proceedings of SAMPTA’07, 2007. [11] M. Dörfler and B. B. Torrésani. Representation of operators by sampling in the time frequency domain. In Proceedings of SAMPTA’09, 2009. [12] R. J. Duffin and A. C. Schaeffer. A class of nonharmonic Fourier series. Trans. Amer. Math. Soc., 72:341–366, 1952. [13] H. G. Feichtinger, M. Hampejs, and G. Kracher. Approximation of matrices by Gabor multipliers. IEEE Signal Procesing Letters, 11(11):883–886, 2004. [14] I. Gohberg, S. Goldberg, and M. Kaashoek. Basic Classes of Linear Operators. Birkhäuser, 2003. [15] K. Gröchenig. Time-frequency analysis of Sjöstrand’s class. Rev. Mat. Iberoam., 22:(to appear), 2006. [16] O.Christensen and T.Strohmer. The finite section method and problems in frame theory. Journal of Approximation Theory, 133(2):221–237, 2005. [17] W. H. Ruckle. Sequence spaces. Research Notes in Mathematics 49. Pitman London, 1981. [18] R. Schatten. Norm Ideals of Completely Continious Operators. Springer Berlin, 1960. [19] T. Strohmer. Pseudodifferential operators and Banach algebras in mobile communications. Appl.Comp.Harm.Anal., 20(2):237–249, 2006. [20] G. Teschke. Multi-frame representations in linear inverse problems with mixed multi-constraints. Applied and Computational Harmonic Analysis, 22(1):43–60, Jan. 2007. DFG-SPP-1114 preprint 90. [21] L. N. Trefethen and D. Bau III. Numerical Linear Algebra. SIAM Philadelphia, 1997. [22] P. Wahlberg and P Schreier. Gabor discretization of the Weyl product for modulation spaces and filtering of nonstationary stochastic processes. Appl.Comp.Harm.Anal., 26:97–120, 2009. 238 Quasi-Random Sequences for Signal Sampling and Recovery Mirosław Pawlak (1) and Ewaryst Rafajłowicz (2) (1) Dept. of Electrical & Computer Eng., University of Manitoba, Winnipeg, Manitoba, Canada, R3T 2N2 (2) Institute of Computer Eng., Control and Robotics, Wrocław University of Technology, Wroclaw, Poland pawlak@ee.umanitoba.ca, ewaryst.rafajlowicz@pwr.wroc.pl Abstract: The problem of reconstruction of band-limited signals from sampled and noisy observations is studied. It is proposed to sample a signal at quasi-random points, that form a deterministic sequence with properties resembling a random variable being uniformly distributed. Such quasi-random points can be easily and efficiently generated yielding signal reconstruction algorithms with the improved accuracy. In fact, in this paper we propose a reconstruction method based on the modified orthogonal sampling formula where the sampling rate and the reconstruction rate are treated separately. This distinction is necessary to ensure consistency of the reconstruction algorithm in the presence of noise. Asymptotical properties of the algorithm are evaluated including its convergence to the true signal and the corresponding rate. It is shown that the rate of convergence is better than that for reconstructions algorithms that utilize the traditional uniform sampling. Similar results are also obtained for the case of multivariate signals. 1. Introduction Signal sampling is an inherent part of the modern signal processing theory and as such it has attracted a great deal of research activities lately [9], [10]. In particular, the problem of signal sampling and recovery from imperfect data has been addressed in a number of recent works [1], [2], [7]. In this case, one assumes that the signal samples {f (kτ )} are observed with noise, i.e., we have yk = f (kτ ) + zk , where zk is uncorrelated noise process with E zk = 0, var(zk ) = σ 2 < ∞. Throughout the paper we assume that f (t) has a bounded spectrum and that f (t) is a finite energy type signal. Any signal with such a property is referred to as band-limited and will denote this class of signals as BL(Ω), where Ω is the bandwidth of f (t). The celebrated Whittaker-Shannon theorem says that SAMPTA'09 any band-limited signal f (t) can be perfectly recovered from its discrete values {f (kτ )} provided that τ ≤ π/Ω. Application of the resulting interpolation formula to noisy data would lead to the following reconstruction scheme based on 2n + 1 random samples X  fn (t) = yk sinc πτ −1 (t − kτ ) , (1) |k|≤n where sinc(t) = sin(t)/t, and τ ≤ π/Ω. The fundamental question, which arises is whether fn (t) can be a consistent estimate of f (t) for any f ∈ BL(Ω). Hence, whether ̺(fn , f ) → 0 as n → ∞, in a certain probabilistic sense, for some distance measure ̺. Since f (t) is assumed to be square integrable, then the natural measure between fn (t) and f (t) is the mean integrated square error Z ∞ (fn (t) − f (t))2 dt. (2) M ISE(fn ) = E −∞ It can be easily shown, see [6], that M ISE(fn ) → ∞ as n → ∞ for any fixed τ ≤ π/Ω. This unpleasant property of the estimate fn (t) is caused by the presence of the noise process in the observed data and the fact that fn (kτ ) = yk , i.e., fn (t) interpolates the noisy observations. It is clear that one should avoid interpolation schemes in the presence of noise since they would retain random errors. The aim of this paper is to propose a consistent estimate of f (t) being a smooth correction of the naive algorithm fn (t) . This task is carried out by sampling a signal at irregularly spaced quasi-random points and by carefully selecting the number of terms in the sampling series. The conditions for consistency of our estimate are established and the corresponding rate of convergence is evaluated. The statistical aspects of signal sampling and recovery have been examined first in [5], and next in [6], [7], [2], [1]. In [2], [1] the sampling rate τ has been assume to be a fixed constant. This assumption, however, cannot lead to consistent estimates of the true signal of the bandlimited type. On the other hand, in [5], [6], [7] τ = τn 239 such that τn → 0 as n → ∞ with a controlled rate. Such a choice of τ allows us to design a signal recovery algorithm for which the reconstruction error M ISE tends to zero with a certain rate. In this paper, we propose a nonlinear sampling scheme based on the theory of quasirandom sequences, i.e., we observe the following noisy samples yk = f (τk ) + zk , where {τk } is a sequence of quasi-random points. We show that a proper choice of {τk } leads to the reconstruction algorithm with the improved convergence rate. 2. Reconstruction Algorithms with QuasiRandom Points The notion of quasi-random sequences has been originally established in the theory of numerical integration [4]. A sequence of real numbers {xj } is said to be a quasi-random sequence in [0, 1] if for every continuous function b(x) on [0, 1] we have n 1X b(xj ) = n→∞ n j=1 lim Z 1 b(x) dx. (3) 0 Quasi-random sequences are also called equidistributed sequences, since (3) means that the sequence {xj } behaves like uniformly distributed random variables. Nevertheless, an important property of quasi-random sequences is that they are more uniform than random uniform sequences which tend to clump. A consequence of this fact is that the accuracy of approximating integrals based on quasi-random sequences is superior to the accuracy obtained by random sequences. In fact, the celebrated Koksma-Hlawka inequality [4] says that for any function of bounded variation on [0, 1] we have n−1 n X j=1 f (xj ) − Z 0 1 f (t)dt ≤ V(f )Dn∗ , where V(f ) is the total variation of f on [0, 1], and Dn∗ denotes the so-called discrepancy of the quasi-random sequence {xj }. The discrepancy measures the strength of the sequence to approximate the uniform distribution on [0, 1]. There are quasi-random sequences with discrepancy of order O(log(n)/n) [4]. This should be contrasted with a random sequence of uniformly distributed points√on [0, 1] that possesses the discrepancy of order O(1/ n). This basic observation plays a key role in our developments concerning the signal recovery problem from quasi-random points. Numerous quasi-random sequences have been constructed that have the aforementioned property of approximating the uniform distribution. The simplest, and sufficient for our purposes, way SAMPTA'09 of generating a quasi-random sequence is the following xj = frac(j ϑ), (4) where ϑ is an irrational number and frac(.) denotes the fractional part of √ a number in the parenthesis. A good choice of ϑ is ( 5 − 1)/2, see [8] for an extensive discussion on the choice of ϑ. Since band-limited signals are defined on the whole real line we need a rescaled version of quasi-random sequences. Thus, let us define the following sampling points on the interval [−T, T ] τj = T sgn(j) frac(|j| ϑ), j = 0, ±1, ±2, . . . , n, (5) where sgn(.) is the sign of a number. The observation horizon T must increase with n such that T (n) → ∞ as n → ∞. In order, however, to establish the consistency result of our reconstruction algorithm we must control the growth of T (n). The approximation property of quasi-random sequences applied to the sequence defined in (5) reads now as Z T 2T X f (τj ) ≈ f (t) dt. 2n + 1 −T (6) |j|≤n It has been known since the work of Hardy [3] that the cardinal expansion can be viewed as the orthogonal expansion in BL(Ω). Using this fact and then the reasoning as in [5] we can define the following estimate of f (t) f˜n (t) = X c̃k sk (t) , (7) |k|≤N c̃k = X 2T yj sk (τj ) , (2n + 1) h (8) |j|≤n where {τj } is the quasi-random sequence defined in (5). Here {sk (t) = sinc(πh−1 (t − kh)), k = 0, ±1, . . .} forms the orthogonal and complete system in BL(Ω) provided that h ≤ π/Ω R ∞. The corresponding Fourier coefficient is ck = h−1 −∞ f (t)sk (t)dt. It is also clear that for f ∈ BL(Ω) we have ck = f (kh). The parameter h is called the reconstruction rate. In (7) the parameter N defines the number of terms in the expansion which are taken into account and 2n + 1 is the sample size. The truncation parameter plays important role in our asymptotic analysis, i.e., N depends on n such that N (n) → ∞ with the controlled rate. It is also worth noting that the sampling rate is nonuniform (defined by the discrepancy of the quasi-random sequence in (5)) and different than the reconstruction rate h. We assume that h is constant and not greater than π/Ω. Throughout the paper we use the worst localized base system utilizing the sinc function. The methodology 240 presented in this paper can be extended to the windowed version of the estimate f˜n (t) of the form fˆn (t) = X wk c̃k sk (t), |k|≤n where {wk , |k| ≤ n} is a sequence of numbers such that 0 ≤ wk ≤ 1. The proper choice of this window sequence yields an estimate with better time-localized properties and consequently better convergence rates. The case when wk = 1 for |k| ≤ N and wk = 0 otherwise corresponds to the estimate f˜n (t). 3. The MISE Consistency and Rate This assumption can be also expressed in the frequency domain by requiring that the Fourier transform of f (t) has r derivatives on [−Ω, Ω]. A further analysis of the reconstruction error leads to the following bound  M ISE(f˜n ) ≤ (2 N + 1) C1 T −(2r+1)  C2 T 3 log2 (n) C3 T + + (9) n2 n +C4 N −(2r+1) , for some constants C1 , C2 , C3 , C4 . By optimizing the above bound we can obtain the following asymptotically optimal choice of T (n) and N (n). 1 In this section we summarize the results concerning the convergence of MISE(f˜n ) to zero as n → ∞ for any signal f ∈ BL(Ω). Also the rate of convergence is established. Due to Pareseval’s formula we can decompose the M ISE(f˜n ) as follows: M ISE(f˜n ) = h X var(c̃k ) + h |k|≤N +h X |k|≤N X (Ec̃k − ck )2 c2k . T ∗ (n) = an 2r+3 1 N ∗ (n) = bn 2r+3 , subject to the condition a > bh. Plugging these values of T (n) and N (n) back into the bound for M ISE(f˜n ) we obtain the following rate 2r+1 M ISE(f˜n ) = O(n− 2r+3 ). It is worth noting that under Assumption (F) the best possible rate obtained for the reconstructionr algorithms discussed in [6] and [7] is of order O(n− r+1 ). This is clearly a slower rate than the one obtained in this paper. |k|≥N The first term of the decomposition controls the stochastic part of the estimate, whereas the the remaining term describe the systematic error (bias) of the estimate. A careful examination of these terms lead to the following result on the consistency of our estimate. Theorem 1 Let f ∈ BL(Ω) and let the reconstruction rate h be constant such that h ≤ π/Ω. Suppose that N (n) < T (n)/h. Assume T (n) → ∞, N√(n) → ∞ such that T (n) does not grow faster than n/log(n). Let, moreover, N (n)T (n) → 0. n Then M ISE(f˜n ) → 0 as n → ∞. The conditions required on the parameters T (n) and N (n) in Theorem 1 impose some general restrictions on their growth. In order further see how to choose T (n) and N (n) let us assume the following condition on the decay of band-limited signals. (F) There exists r ≥ 0 and a constant Cf > 0 such that for |t| sufficiently large we have |f (t)| ≤ Cf /|t|r+1 . SAMPTA'09 4. Concluding Remarks In this paper we have proposed an algorithm for recovering a band-limited signal observed under noise. Assuming that the signal is a square integrable function the sufficient conditions for the convergence of the mean integrated square error have been established. The distinguishing feature of the proposed approach is its utilization of nonuniform samples taken at quasi-random points. When quasi-random sequences are applied to the problem of numerical evaluation of integrals they reveal the approximation rate O(log(n)/n) for a class of bounded variation functions. This rate is superior to √ the rate O(1/ n) that characterizes usual numerical algorithms and classical Monte Carlo methods. This advantage of quasi-random sequences seems to be carried out to the problem of signal sampling and recovery. In our consistency results we assume that the reconstruction rate h is constant and could be chosen as large as π/Ω. One could also consider the case when h = h(n) and h(n) → 0 as n → ∞. The estimates with variable h would be needed for the problem of recovering not necessarily band-limited signals. Finally, let us mention that the results of this paper can be extended to the ddimensional case, where the orthogonal system can be obtained in the form of the product of sinc functions, Qd i.e., sk (t) = i=1 ski (ti ), where k = (k1 , k2 , . . . , kd ), 241 t = (t1 , t2 , . . . , td ). We should mention that multidimensional quasi-random sequences can be generated in a relatively straightforward way. Moreover, they exhibit the favorite discrepancy of order O(n−1 (log(n))d ) for any d. This fact may have important consequences for sampling problems of two-dimensional objects like images. 5. Acknowledgements The work of E. Rafajłowicz was supported by the Research and Development Grant from the Ministry of Science and Higher Education of Poland. References: [1] A. Aldroubi, C. Leonetti, and Q. Sun. Error analysis of frame reconstruction from noisy samples. IEEE Trans. Signal Processing, 56:2311– 2315, 2008. [2] Y.C. Eldar and M. Unser. Non-ideal sampling and interpolation from noisy observations in shiftinvariant spaces. IEEE Trans. Signal Processing, 54:2636–2651, 2006. [3] G.H. Hardy. Notes on special systems of orthogonal functions (iv): the orthogonal functions of Whittaker’s cardinal series. Proc. Camb. Phil. Soc., 37:331–348, 1941. [4] L. Kuipers and H. Niederreiter. Uniform Distribution of Sequences. Wiley, New York, 1974. [5] M. Pawlak and E. Rafajłowicz. On restoration of band-limited signals. IEEE Trans. Information Theory, 40:1490–1503, 1994. [6] M. Pawlak, E. Rafajłowicz, and A. Krzyżak. Postfiltering versus prefiltering for signal recovery from noisy samples. IEEE Trans. Information Theory, 49:3195–3212, 2003. [7] M. Pawlak and U. Stadtmüller. Signal sampling and recovery under dependent noise. IEEE Trans. Information Theory, 53:2526–2541, 2007. [8] E. Rafajłowicz and R. Schwabe. Equidistributed designes in nonparametric regression. Statistica Sinica, 13:129–142, 2003. [9] M. Unser. Sampling – 50 years after Shannon. Proceedings of the IEEE, 88:569–587, 2000. [10] P.P. Vaidyanathan. Generalizations of the sampling theorems: seven decades after Nyquist. IEEE Trans. on Ciruits and Systems – I : Fundamental Theory and Applications, 48:1094–1109, 2001. SAMPTA'09 242 On the incoherence of noiselet and Haar bases Tomas Tuma, Paul Hurley IBM Research, Zurich Laboratory 8803 Rüschlikon, Switzerland E-mail: {uma,pah}@zurich.ibm.com Abstract: Noiselets are a family of functions completely uncompressible using Haar wavelet analysis. The resultant perfect incoherence to the Haar transform, coupled with the existence of a fast transform has resulted in their interest and use as a sampling basis in compressive sampling. We derive a recursive construction of noiselet matrices and give a short matrix-based proof of the incoherence. whenever the products AC, BD exist. This property is sometimes called the mixed product property. Definition 2. Let A be a m × n matrix. A(k, ∗) denotes the (row) vector (A(k, 1) A(k, 2) . . . A(k, n)) while, A(∗, l) similarly denotes the (column) vector (A(1, l) A(2, l) . . . A(m, l))T . 2.2 1. Introduction The noiselet basis, originally described in [2], has garnered interest recently because noiselets (1) are maximally incoherent to the Haar basis and (2) have a fast algorithm for their implementation. Thus, they have been employed in compressive sampling to sample signals that are sparse in the Haar domain [1]. The work presented here was motivated by the observation that it had not been previously shown in a straightforward way that the discrete Haar transform is maximally incoherent to a discretized version of the noiselet transform. Additionally, the exact form of a noiselet matrix needed to be inferred from the original work. The main contributions are the derivation of a recursive, tensor product-based, construction of noiselet matrices, the unitary matrices that result from the noiselet transform for discrete input, and an intuitive proof showing its incoherence to the corresponding Haar matrix. 2. 2.1 Noiselets Noiselets [2] are functions that are completely uncompressible under the Haar transform. The family of noiselets is constructed on the interval [0, 1) as follows: f1 (x) = χ[0,1) (x), f2n (x) = (1 − i)fn (2x) + (1 + i)fn (2x − 1) f2n+1 (x) = (1 + i)fn (2x) + (1 − i)fn (2x − 1) Here, χ[0,1) (x) = 1 on the definition interval [0, 1) and 0 otherwise. It is shown in [2] that {fj } is a basis: Theorem 1. The set {fj |j = 2N , . . . , 2N +1 − 1} is an orthogonal basis of the vector space V2N , which is the space of all possible approximations at the resolution 2N of functions in L2 [0, 1). 2.3 Haar Transform Haar wavelet transform can be described by a real square matrix. For our purposes, it is advantageous to recursively build the Haar matrix using the Kronecker product [3]: Preliminaries General definitions Definition 1. Let A be an m×n matrix, and B be a matrix of an arbitrary size. The Kronecker product of A and B is   a11 B · · · a1n B  ..  . .. A ⊗ B =  ... . .  am1 B ··· amn B The Kronecker product (see e.g. [4]) is a bilinear and associative operator which is not generally commutative. It can be combined with a standard maxtrix multiplication as follows: (A ⊗ B)(C ⊗ D) = AC ⊗ BD SAMPTA'09   1 Hn/2 ⊗ (1 1) Hn = √ . 2 In/2 ⊗ (1 −1)   The iteration starts with H1 = 1 . The normalization constant √12 ensures that HnT Hn = I. Haar wavelets are the rows of Hn . 3. Matrix construction of noiselets First we extend and discretize the noiselet functions. Definition 3. The extensions of noiselets to the interval [0, 2m − 1] sampled at points 0, . . . , 2m − 1 is the series 243 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 1 2 3 4 5 6 7 1 8 (a) Real part of 8x8 noiselet matrix 2 3 4 5 6 7 8 (b) Imaginary part of 8x8 noiselet matrix 10 10 20 20 30 30 40 40 50 50 60 60 10 20 30 40 50 60 10 (c) Real part of 64x64 noiselet matrix 20 30 40 50 60 (d) Imaginary part of 64x64 noiselet matrix Figure 1: Noiselet matrix: graphical view. In figures (a) and (b), the black and white colors denote values of −0.25 and 0.25 respectively. In figures (c) and (d), the black, gray and white colors denote values of −0.125, 0 and 0.125 respectively. . of functions fm (k, l) ( 1 l = 0, . . . , 2m − 1 fm (1, l) = 0 otherwise Lemma 1. Let m > 0. The noiselet matrices N1 , N2 , N4 , . . . , N2m are built up from a series of discretised and extended noiselets fm : m fm (2k, l) = (1 − i)fm (k, 2l) + (1 + i)fm (k, 2l − 2 ) fm (2k + 1, l) = (1 + i)fm (k, 2l) + (1 − i)fm (k, 2l − 2m ) where m denotes the range of extension, k = 1, . . . , 2m+1 is the function index and l = 0, . . . , 2m − 1 is the sample index. Starting with a 1 × 1 matrix N1 , a sequence of noiselet matrices N1 , N2 , N4 , . . . , N2m of sizes 1 × 1, 2 × 2, 4 × 4, . . . , 2m × 2m , respectively, is generated. The rows of the Nn matrix are noiselets which form an orthonormal basis for the space Cn .   Definition 4. For n = 1, N1 = 1 . Then the n × n noiselet matrix Nn is built up recursively according to: Nn (k, ∗) = 1 k (1 − i 1 + i) ⊗ Nn/2 ( , ∗) 2 2 when k = 0, 2, 4, . . . , n − 2 and Nn (k, ∗) = 1 k−1 (1 + i 1 − i) ⊗ Nn/2 ( , ∗) 2 2 when k=1,3,. . . ,n-1. SAMPTA'09 Nn (k, l) = fm (n + k, 2m l), k, l = 0, . . . , n − 1. n Proof. Let m > 0 be fixed. For n = 1 N1 (0, 0) = fm (1, 0) = 1. By induction, for a matrix of size n = 2p , p = 1, . . . , m, its basis vector k = 0, 2, 4, . . . , n − 2 and vector indices l = 0, . . . , n2 − 1 k Nn (k, l) = (1 − i)Nn/2 ( , l) 2 n k 2m 2m l). = (1 − i)fm ( + , n l) = fm (n + k, 2 2 2 n For the same n, k and l = n 2,...,n − 1, k n Nn (k, l) = (1 + i)Nn/2 ( , l − ) 2 2 n k 2m l 2m = (1 + i)fm ( + , 2 − 2m ) = fm (n + k, l). 2 2 n n To see this, observe that fm is zero outside of [0, 2m − 1] and therefore, the first half of samples of fm (k, l) are defined exclusively by the expression (1 ± i)fm (k, 2l) 244 whereas the second half of the samples are defined exclusively by (1 ± i)fm (k, 2l − 2m ). For k odd (k = 1, 3, . . . , n − 1) the proof is similar. Specially, the noiselet matrix Nn for n = 2m can be found as the “tail” of the function series fm . Indeed, the expression in Theorem 1 becomes N (k, l) = fm (n + k, l) for n = 2m . 4. Incoherence of noiselets and Haar In what follows, we adhere to the terminology of basis coherence which is common in the field of compressive sampling. See for example [1] for details on these definitions and related literature. Mutual coherence of two bases is defined as the maximum scalar product of any pair of their basis vectors: Definition 5. Mutual coherence between two orthonormal bases Ψ, Φ is µ(Ψ, Φ) = max|hψk , φj i|. k,j The minimal coherence is usually termed maximal or perfect incoherence, which means that µ(Ψ, Φ) = O(1). In other words, the matrix of scalar products ΨΦ∗ is “flat”. As Candès and Romberg suggest [1], we will show the perfect incoherence of Haar and noiselets in the following setting. Given an orthonormal n × n Haar matrix H, we compute the matrix of scalar products for a corresponding noiselet matrix N normalized such that N ∗ N = nI. By doing so, the product will be flat with all values having the magnitude of 1. For clarity of the main proof, it saves some technical work to define a “twisted” noiselet basis.   Definition 6. The twisted noiselet matrix N̂1 = 1 . Then the n × n twisted noiselet matrix N̂n is built up recursively by N̂n (k, ∗) = k 1 N̂n/2 ( , ∗) ⊗ (1 − i 1 + i) 2 2 when k = 0, 2, 4, . . . , n − 2 and N̂n (k, ∗) = 1 k−1 N̂n/2 ( , ∗) ⊗ (1 + i 1 − i) 2 2 when k = 1, 3, . . . , n − 1. The difference between this and the definition of the noiselet matrix N (Definition 4) is that the order of operands in the Kronecker product is changed. In fact, each one is just a permutation of the other. Lemma 2. For n = 2m , the bases Nn , N̂n consist of the same set of basis vectors. The claim holds for n = 1. For n = 2, 4, 8, . . . , 2m , Pn Nn (k, l) = Pn (k, ∗)Nn (l, ∗)T as it can easily be shown that Nn is symmetric. Using the recurrent equations for Pn and Nn and applying the mixed product rule, we get, for k = 0, 2, 4, . . . , n − 2, 1 k l (1 − i)Pn/2 ( , ∗)Nn/2 (∗, ) 2 2 2 when l = 0, 2, 4, . . . , n − 2 and Pn Nn (k, l) = 1 k−1 l (1 + i)Pn/2 ( , ∗)Nn/2 (∗, ) 2 2 2 when l = 1, 3, . . . , n − 1. By induction, Pn Nn (k, l) = 1 k N̂n/2 ( , ∗) ⊗ (1 − i 1 + i) 2 2 for even k indices. This situation for odd k is similar. Pn Nn (k, ∗) = Now the main result can be shown. Theorem 2. Let n = 2m where m is a non-negative integer. Let Nn be the noiselet matrix of size n × n and let Hn be the Haar matrix of size n × n. Then Hn and Nn are maximally incoherent. Proof. Without loss of generality, assume the bases are normalized such that HnT Hn = I and Nn∗ Nn = nI. For the case of n = 1,       H1 N1∗ = 1 · 1 = 1 For n = 2m , m > 1, the incoherence is shown by induction. Suppose we know maximal incoherence holds for n2 and we want to show it for n. In the induction step, we use the iterative construction of the Haar matrix by means of Kronecker product. By computing the product Hn N̂n∗ = H(Nn∗ Pn∗ ) = (Hn Nn∗ )PnT we will still be able to conclude on magnitude of the elements of (Hn Nn∗ ), since permutation matrices do not change magnitudes. The product Hn N̂n∗ can be computed per-column; we take the j-th column of N̂n∗ , j = 0, 2, 4, . . . , n − 2 and transform it by Hn , getting   1 Hn/2 ⊗ (1 1) ∗ Hn N̂n (∗, j) = √ 2 In/2 ⊗ (1 −1) j 1 ∗ (∗, ) ⊗ (1 − i 1 + i)∗ · √ N̂n/2 2 2 Note the altered normalization factor of noiselets. Now the mixed product property can be applied to get    1+i j ∗ Hn/2 N̂n/2 (∗, 2 ) ⊗ (1 1)  1  1 − i  =  1 + i 2 ∗ In/2 N̂n/2 (∗, 2j ) ⊗ (1 −1) 1−i " # ∗ (∗, 2j ) ∗ 2 1 Hn/2 N̂n/2 . ∗ 2 In/2 N̂n/2 (∗, 2j ) ∗ 2i Proof. Indeed, we can write N̂n = Pn Nn where P is the permutation matrix: ∗ (i, 2j )| = 1 and By induction, it follows that |Hn/2 N̂n/2 ( ∗ (i, 2j )| = 1 for i = 1, . . . , n2 . The Kronecker (1 0) ⊗ Pn/2 ( k2 , ∗) k = 0, 2, 4, . . . , n − 2 |In/2 N̂n/2 P (k, ∗) = multiplication is only by entries with magnitude 2, thus the (0 1) ⊗ Pn/2 ( k−1 2 , ∗) k = 1, 3, . . . , n − 1 resulting magnitudes are 12 ∗2 = 1. The proof is equivalent for j = 1, 3, . . . , n − 1. starting with P = [1]. SAMPTA'09 245 References: [1] Emmanuel Candès and Justin Romberg. Sparsity and incoherence in compressive sampling. Inverse Problems, 23(3):969–985, 2007. [2] R. Coifman, F. Geshwind, and Y. Meyer. Noiselets. Applied and Computational Harmonic Analysis, 10:27–44, 2001. [3] B.J. Falkowski and S. Rahadja. Walsh-like functions and their relations. In IEE Proceedings on Vision, Image and Signal Processing, volume 143, pages 279 – 284, 1996. [4] Alan J. Laub. Matrix Analysis for Scientists and Engineers. SIAM, 2005. SAMPTA'09 246 Adaptive compressed image sensing based on wavelet modeling and direct sampling Shay Deutsch (1), Amir Averbuch (1) and Shai Dekel (2) (1) Tel Aviv University. Israel (2) GE Healthcare, Israel shayseut@post.tau.ac.il, Shai.dekel@ge.com, amir@math.tau.ac.il 1.2 The “single pixel” camera Abstract: We present Adaptive Direct Sampling (ADS), an algorithm for image acquisition and compression which does not require the data to be sampled at its highest resolution. In some cases, our approach simplifies and improves upon the existing methodology of Compressed Sensing (CS), by replacing the ‘universal’ acquisition of pseudo-random measurements with a direct and fast method of adaptive wavelet coefficient acquisition. The main advantages of this direct approach are that the decoding algorithm is significantly faster and that it allows more control over the compressed image quality, in particular, the sharpness of edges. 1. Introduction Compressed Sensing (CS) [1, 3, 4, 6] is an approach to simultaneous sensing and compression which provides mathematical tools that, when coupled with specific acquisition hardware architectures, can perhaps reduce the acquired dataset sizes, without reducing the resolution or quality of the compressed signal. CS builds on the work of Candès, Romberg, and Tao [4] and Donoho [6] who showed that a signal having a sparse representation in one basis can be reconstructed from a small number of non-adaptive linear projections onto a second basis that is incoherent with the first. The mathematical framework of CS is as follows: Consider a signal x   N that is k -sparse in the basis  for  N . In terms of matrix representation we have x  f , in which f can be well approximated using only k  N non zero entries and  is called the sparse basis matrix. Consider also an n  N measurement matrix  , where the rows of  are incoherent with the columns of  . The CS theory states that such a good approximation of signal x can be reconstructed by taking only n  O ( k log N ) linear, non adaptive measurements as follows: [1, 3]: (1.1) y  x , where y represents an n  1 sampled vector. Working under this ‘sparsity’ assumption an approximation to x can be reconstructed from y by ‘sparsity’ minimization, such as l1 minimization  SAMPTA'09 1 f  y min f l1 (1.2) For imaging applications, the CS framework has been applied within a new experimental architecture for a ‘single pixel’ digital camera [10]. The CS camera replaces the CCD and CMOS acquisition technologies by a Digital Micro-mirror Device (DMD). The DMD consists of an array of electrostatically actuated micromirrors where each mirror of the array is suspended above an individual SRAM cell. In [10] the rows of the CS sampling matrix  are a sequence of n pseudorandom binary masks, where each mask is actually a ‘scrambled’ configuration of the DMD array (see also [2]). Thus, the measurement vector y , is composed of dot-products of the digital image x with pseudo-random masks. At the core of the decoding process, that takes place at the viewing device, there is a minimization algorithm solving (1.2). Once a solution is computed, one obtains from it an approximate ‘reconstructed’ image by applying the transform  to the coefficients. The CS architecture of [10] has few significant drawbacks: 1. Poor control over the quality of the output compressed image: the CS architecture of [10] is not adaptive and the number of measurements is determined before the acquisition process begins, with no feedback during the acquisition process on the progressive quality. 2. Computationally intensive sampling process: Dense measurement matrices such as the sampling operator of the random binary pattern are not feasible because of the huge space and multiplication time requirements. Note that in the one single pixel camera, the sampling operator is based on the random binary pattern, which requires a huge memory and a high computation cost. For example, to get 512  512 image with 64k measurements (25% sampling rate) a random binary operator requires nearly a gigabyte of storage and Giga-flop operations, which makes the recovery almost impossible [14]. The designing of an efficiently measurement basis was proposed [14, 16] by using highly sparse measurements operators, which solve the infeasibility of Gaussian measurement matrix or a random binary masks such as in the one pixel camera. Note, however, in [16], the trade-off between acquisition time and visual quality. To obtain good visual quality, when using TV minimization (which significantly increase the decoding time, compared to LP decoding time) 247 3. recovery times of a 256  256 ‘boat’ image are around 60 min. Computationally intensive reconstruction algorithm: It is known that all the algorithms for the minimization (1.2) are very computationally intensive. 2. Direct and adaptive image sensing Our proposed architecture aims to overcome the drawbacks of the existing CS approach and achieve the following design goals: 1. An acquisition process that captures n measurements, with n  N and n  O  k  , where N is the dimension of the full high-resolution image, assumed to be ‘ k sparse’. The acquisition process is allowed to adaptively take more measurements if needed to achieve some compressed image target quality. 2. A decoding process which is not more computationally intensive than the existing algorithm in use today such as JPEG or JPEG2000 decoding. We now present our ADS approach: Instead of acquiring the visual data using a representation that is incoherent with wavelets, we sample directly in the wavelet domain. We use the DMD array architecture in a very different way than in [10]: 1. Any wavelet coefficient is computed from two measurements of the DMD array. 2. We take advantage of the ‘feedback’ architecture of the DMD where we make decisions on future measurements based on values of existing measurements. This adaptive sampling process relies on a well-known modeling of image edges using a wavelet coefficient tree-structure and so decisions on which wavelet coefficients should be sampled next are based on the values of wavelet coefficients obtained so far [8, 9]. First we explain how the DMD architecture can be used to calculate a wavelet coefficient from two DMD measurements. Modeling an image as a we have the wavelet function f  L2   2  , representation f  x    f , ej ,l  ej ,l , where e  1, 2,3 is the subband, j   the scale and l   2 the shift. For measurements. Moreover, there exist DMD arrays with micro-mirrors that can produce a grayscale value, not just 0 or 1 (contemporary DMD can produce 1024 grayscale value). We can use these devices for computation of arbitrary wavelet transforms, where the computation of each coefficient requires only two measurements, since the result of any real-valued functional g acting on the data can be computed as a difference of two ‘positive’ g  , g  ‘functionals’, i.e. ,where the coefficients are positive: g  g  g  ,g  , g  0 . 3. Modeling of image edges by wavelet treeStructures and the ADS algorithm Most of the significant wavelet coefficients are located in the vicinity of edges. Wavelets can be regarded as multiscale local edge detectors, where the absolute value of a wavelet coefficient corresponds to the local strength of the edge. We impose the tree-structure of the wavelet coefficients. Due to the analysis properties of wavelets, coefficient values tend to persist through scale. A large wavelet coefficient in magnitude generally indicates the presence of singularity inside its support. A small wavelet coefficient generally indicates a smooth region. We use this nesting property and acquire wavelet coefficients in the higher resolutions if their parent is found to be significant. For further detection of singularities at fine scales, we estimate the Lipschitz exponent. 3.1 The Lipschitz exponent Our goal is to estimate the significance of wavelet coefficients that were not sampled yet, using values of coefficients that were already sampled. To this end we use the well known characterization of local Lipschitz smoothness by the decay of wavelet coefficients across scales [12]. A function f is said to be Lipschitz  in the neighborhood of ( x1 , x2 ) if there exists 1 and  2 as well as A  0 such that for any h1  1 and h2   2 f ( x1  h1 , x2  h2 )  f ( x1 , x2 )  A( h12  h22 ) / 2 (3.1) e , j ,l orthonormal wavelets  ej ,l   ej , l . If we consider the Haar basis as an example, than a bivariate Haar wavelet coefficient of type 1 can be computed as follows  2  l1 1 f , 1j ,l  2 j    2 j l 1  j  2  j  l2 1 2   2 j l2 f  x1 , x2  dx1dx2 2  j  l1 1 2  j  l2 1  2  j l1  2  j  l2 1 2   f  x1 , x2  dx1dx2  ,   (2.1) i.e., the difference of pixel sums over two neighboring dyadic rectangles multiplied by 2 j . By Similar computation we can sample the Haar wavelet coefficients of the second and third kinds with two SAMPTA'09 We actually use a subtler, ‘directional’ notion of local Lipschitz smoothness. So, for example, for the horizontal subband, e  1 , we defined local 1 Horizontal Lipschitz smoothness by the minimal A  0 satisfying for h1  1 f ( x1  h1 , x2 )  f ( x1 , x2 )  Ah11 . If the function is locally  e Lipschitz at ( x1 , x2 ) then for any wavelet  ej,l whose support contains ( x1 , x2 ) , f , ej , l  C  2 j  . By taking the e we have that logarithm we have log 2 f , ej, l   j  log 2 (C ) . (3.3) 248 Thus the Lipschitz exponents can be determined from log 2 f , the slope of the decay of e j,l across scales (see also [15]). These slopes are considered measurements of local singularities, such that when 0   e  1 a function f has a directional singularity which increases as  e  0 . Thus we estimate the existence of local directional singularities and the significance of unsampled coefficients at high scales, using estimates of local directional Lipschitz exponents from wavelet coefficients that were already sampled. 3.2 The ADS Algorithm Our adaptive CS algorithm works as follows: 1. Acquire the values of all low-resolution coefficients up to a certain low-resolution J . Each computation is done using two DMD array measurements as in (2.1). In one embodiment the initial resolution J can be selected  log N  as  2   const . In any case, J should be bigger if  2  the image is bigger. Note that the total number of 2 1 J coefficients at resolutions  J is 2   N , which is a small fraction of N . 2. Initialize a ‘sampling queue’ containing the indices of each of the four children of significant coefficients at the resolution J . Thus for a significant coefficient with index  e, J , l  , we add to the queue the coefficients with  e, J  1,  2l , 2l   ,  e, J  1,  2l , 2l  1  ,  e, J  1,  2l  1, 2l   and  e, J  1,  2l  1, 2l  1  . indices: 1 1 2 2 1 1 2 2 3. Process the sampling queue until it is exhausted as follows: a. Sample the wavelet coefficient corresponding to the index  e, j , l  at the beginning of the queue using two DMD array measurements (see Section 2). b. Add to the end of the queue the indices of the coefficient’s four children, only if one of the following holds: (i) The coefficient is at a resolution j  J  2 and the coefficient’s absolute value is greater than a given threshold tlow . wavelet coefficients, which can be substantially smaller than the number of pixels N . The number of samples is influenced by the size of the thresholds used by the algorithm in step 3.b. It is also important to understand that the number of samples is influenced by the amount of visual activity in the image. If there are more significant edges in the image, then their detection at lower resolutions will lead to adding higher resolution sampling to the queue. 4. Experimental results To evaluate our approach, we use the optimal k -term wavelet approximation as a benchmark. It is well known [5] that for a given image with N pixels, the optimal orthonormal wavelet approximation using only k coefficients is obtained using the k largest coefficients f , ej11, l1  f , ej22 ,l2  f , ej33,l3   , f   f , ejii ,li  ejii ,li k i 1   L2  2  min f  #  k   e , j , l  f , ej ,l  ej ,l For biorthogonal wavelets this ‘greedy’ approach gives a near-best result, i.e. within a constant factor of the optimal k -term approximation. One can apply thresholding and construct a k -term approximation using only coefficients whose absolute value is above the threshold, which still requires the order of N computations. In contrast, our ADS algorithm is output sensitive and requires only order of n computations. To simulate our algorithm in software, we first pre-compute the entire wavelet transform of a given image. However, we strictly follow the recipe of our ADS algorithm and extract a wavelet coefficient from the pre-computed coefficient matrix only if its index was added to the adaptive sampling queue. In fig 1(a) we see a ‘benchmark’ near-best 7000-term biorthogonal [9,7] wavelet approximation of the Lena image, extracted from the ‘full’ wavelet representation by thresholding. In fig 1(b) we see a 6782-term approximation extracted from an ADS adaptive sampling process with n =12796 sampled wavelet coefficient. (ii) The coefficient is at resolution 1  j  J  2 and the corresponding estimated absolute value of its children using the local Lipschitz exponent method (see Section 3.1) is greater than a given threshold thigh . c. Remove the processed index from the queue and go to step (a). In a way, our algorithm can be regarded as an adaptive edge acquisition device where the acquisition resolution increases only in the vicinity of edges! Observe that the algorithm is output sensitive. Its time complexity is of the order n where n is the total number of computed SAMPTA'09   L2 2 249 . REFERENCES (a) 7000-term (b) ADS 6782-term Fig.1. (a) Near-best 7000-term [9,7] approximation computed from the ‘full’ wavelet representation N=262,144, PSNR=31 dB (b) ADS 6782-term [9,7] approximation, extracted from n=12,796 adaptive wavelet samples, PSNR=28.7 dB. 5. Conclusion We present an architecture that acquires and compresses high resolution visual data, without fully sampling the entire data at its highest resolution. By sampling in the wavelet domain we are able to acquire low resolution coefficients within a small number of measurements. We then exploit the wavelet tree structure to build an adaptive sampling process of the detail wavelet coefficients. Experimental results show good visual and PSNR results with a small number of measurements. The coefficients acquired by the ADS algorithm can be streamed into a tree-based wavelet compression algorithm whose decoding time is significantly faster then the solution of (1.2). SAMPTA'09 1. R. Baraniuk, Compressive Sensing, Lecture Notes in IEEE Signal Processing Magazine, Vol. 24, No. 4, pp. 118-120, July 2007. 2. R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, A simple proof of the restricted isometry property for random matrices, Constructive Approximation 28 (2008), 253-263. 3. E. Candès, Compressive sampling, Proc. International Congress of Mathematics, 3 (2006), 1433-1452. 4. E. Candès, J. Romberg, and T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inf. Theory 52 (2006), 489–509. 5. R. DeVore, Nonlinear approximation, Acta Numerica7 (1998), 50-51. 6. D. Donoho, Compressed sensing, IEEE Trans. Information Theory, 52 (2006), 1289-1306. 7. C. La and M. Do, Signal reconstruction using sparse tree representations, Proc. SPIE Wavelets XI, San Diego, September 2005. 8. A. Said and W. Pearlman, A new fast and efficient image codec based on set partitioning in hierarchical trees, IEEE Trans. Circuits Syst. Video Tech., 6 (1996), 243-250. 9. J. Shapiro, Embedded image coding using zerotrees of wavelet coefficients, IEEE Trans. Signal Process. 41 (1993), 3445-3462. 10 .D. Takhar, J. Laska, M. Wakin, M. Duarte, D. Baron, S. Sarvotham, K. Kelly and R. Baraniuk, A New Compressive Imaging Camera Architecture using Optical-Domain Compression, Proc. of Computational Imaging IV , SPIE, 2006. 11. S. Dekel, Adaptive compressed image sensing based on wavelet-trees, report 2008. 12. S. Mallat, “a wavelet tour of signal processing”. 13. S. Mallat and W. L. Hwang, “singularity detection and processing with wavelets,” IEEE Trans. Inf. Theory 38,617-642(1992). 14. L. Gan, T. Do, T. Tran , Fast compressive imaging using scrambled Hadamard transform ensemble, preprint 2008. 15. Z. Chen and M. A. Karim, Forest representation of wavelet transforms and feature detection, Opt. Eng. 39 (2000), 1194–1202. 16. R. Berinde, P. Indik, sparse recovery using sparse random matrices, Tech. Report of MIT 2008. 17. F Rooms, A. Pizurica and, W. Philips, estimating image blur in the wavelet domain, IEEE Benelux Signal Processing Symposium (SPS-2002). 250 Asymmetric Multi-channel Sampling in Shift Invariant Spaces Sinuk Kang (1) and K.H. Kwon (1) (1) KAIST, 335 Gwahangro, Yuseong-gu, Daejeon 305-701, S. Korea. sukang@kaist.ac.kr, khkwon@kaist.edu Abstract: We consider a multi-channel sampling with asymmetric sampling rates in shift invariant spaces, while related previous works have supposed that each channel has a symmetric(uniform) sampling rate. Motivated by the fact that shift invariant spaces are isomorphic images of L2 [0, 2π], we obtain a sampling expansion in shift invariant spaces by using frame or Riesz basis expansion in L2 [0, 2π]. The samples in the expansion are expressed in terms of frame coefficients of an appropriate function with respect to a certain frame in L2 [0, 2π]. The involved reconstruction functions are given explicitly by using the frame operator. We also present relation between asymmetric multichannel sampling and symmetric one. 1. reconstruction functions by means of the frame operator. The theory contains both a frame and Riesz basis expansion as sampling formulas. 2. Asymmetric multi-channel sampling Assume that φ(t) is everywhere well defined complex valued square integrable function on R throughout the paper. Moreover, let φ(t) be a Riesz generator with Cφ (t) < ∞ for any t ∈ R so that V (φ) is an RKHS(see Proposition 2.4 in [4]). We now are given a LTI system {Lj [·]}N j=1 whose impulse response is {lj (t) : lj ∈ L2 (R), j = 1, 2, · · · , N }. The aim of this paper is to recover any f (t) ∈ V (φ) via discrete samples from {Lj [f ]}N j=1 as Introduction Reconstructing a band-limited signal f from samples which are taken from several channeled versions of f is called multi-channel sampling. The multi-channel sampling method goes back to the work of Shannon [6] and Fogel [2], where the reconstruction of a band-limited signal from samples of the signal and of its derivatives was suggested. Generalized sampling expansion for arbitrary multi-channel sampling was introduced first by Papoulis [5]. Papoulis’ result has been extended to a general shiftinvariant space [1, 7, 8]. Here, a shift invariant space V (φ) with a generator φ ∈ L2 (R) is defined by the closed subspace of L2 (R) spanned by integer translates {φ(t − n) : n ∈ Z} of φ. Recently Garcı́a and Pérez-Villalón [3] derived stable generalized sampling in a shift-invariant space by using some special dual frames in L2 [0, 1]. The previous works related to the multi-channel sampling have assumed that numbers of samples from each channel are uniform, namely, sampling rates of channels are same. In this paper we consider a multi-channel sampling with asymmetric sampling rates in shift invariant spaces. We find an expression for the samples as frame coefficients of an appropriate function in L2 [0, 2π] with respect to some particular frame in L2 [0, 2π] and present the sufficient and necessary condition under which a sequence of functions of particular form becomes a frame or a Riesz basis for L2 [0, 2π]. Using isomorphism between a shift invariant space V (φ) and L2 [0, 2π], we derive sampling theory in V (φ) with some Riesz generator φ and find a formula of SAMPTA'09 f (t) = N X X Lj [f ](σj + rj n)sj,n (t), (1) j=1 n∈Z where {sj,n (t) : j = 1, · · · , N and n ∈ Z} is a frame or a Riesz bases of V (φ) and 0 ≤ σj < rj with a positive integer rj for j ∈ {1, 2, · · · , N }. 2.1 An expression for the samples Define an isomorphism J from L2 [0, 2π] onto V (φ) by JF (t) = 1 X hF (ξ), e−inξ iφ(t−n), F (ξ) ∈ L2 [0, 2π]. 2π n∈π By the isomorphism J : L2 [0, 2π] → V (φ), the reconstruction formula (1) is equivalent to the following one: F (ξ) = N X X Lj [f ](σj +rj n)Sj,n (ξ), F (ξ) ∈ L2 [0, 2π], j=1 n∈Z (2) where f (t) = JF (t) and sj,n (t) = JSj,n (t). Notice further that Lj f (σj + rj n) is represented by an inner product of F (ξ) and some function in L2 [0, 2π]. Lemma 2.1.1 Let L[·] be a LTI system with an impulse response l(t) ∈ L2 (R) and ψ(t) = L[φ](t) = (φ ∗ l)(t). (a) L is a bounded operator from L2 (R) into L∞ (R), o kf ∗ lk∞ ≤ kf k2 klk2 and Lf (t) ∈ C∞ (R), (b) supR Cψ (t) < ∞, 251 (c) (cf. Lemma 2 in [3]) for any f (t) = (c ∗ φ)(t) with c ∈ ℓ2 in V (φ), L[f ](t) = (c∗ψ)(t) converges absolutely and uniformly on R. For any f (t) = JF (t) ∈ V (φ) with F (ξ) ∈ L2 [0, 2π], L[f ](t) = hF (ξ), 1 Zψ (t, ξ)iL2 [0,2π] . 2π In particular, L[f ](σj +rj n) = hF (ξ), Theorem 2.2.2 Let φ(t) be a Riesz generator with Cφ (t) < ∞, t ∈ R and {Lj [·]}N j=1 be LTI systems with ∈ L2 (R) . Let {ψj (t) = an impulse response {lj (t)}N j=1 N (φ ∗ lj )(t)}j=1 , rj ≥ 1 an integer and 0 ≤ σj < rj . (a) If 0 < αG ≤ βG < ∞, i.e., 0 < αG and Zψj (σj , ξ) ∈ L∞ [0, 2π], 1 ≤ j ≤ N , then there is a frame {sj,n (t) : 1 ≤ j ≤ N, n ∈ Z } of V (φ) for which 1 Zψ (σj , ξ)e−irj nξ iL2 [0,2π] . 2π (3) f (t) = N X X Lj f (σj +rj n)sj,n (t), f (t) ∈ V (φ). j=1 n∈Z (4) 2.2 The sampling theorem For a given LTI system {Lj [·]}N j=1 , let Lj φ(t) = ψj (t), 1 ≤ j ≤ N . Using equation (3), the expansion (2) is equivalent to F (ξ) = N X X hF (ξ), j=1 n∈Z 1 Zψ (σj , ξ)e−irj nξ iL2 [0,2π] 2π j ·Sj,n (ξ), F (ξ) ∈ L2 [0, 2π], where f (t) = JF (t) and sj,n (t) = JSj,n (t). For convenience, we introduce a few more notations. Let gj (ξ) ∈ L2 [0, 2π] for 1 ≤ j ≤ N , gj,mj (ξ) := gj (ξ)eirj (mj −1)ξ for 1 ≤ mj ≤ rrj and G(ξ) = [Dg1,1 (ξ), Dg1,2 (ξ), · · · , Dg1, rr (ξ), Dg2,1 (ξ), · · · , DgN, rr (ξ)] , Remark 2.2.3 Asymmetric multi-channel sampling series with LTI system {Lj [·]}N j=1 whose impulse response is {lj (t)}N can be considered as symmetric multi-channel j=1 N, N where D is an unitary operator from L2 [0, 2π] onto L2 (I)r defined by (DF )(ξ) = [F (ξ), F (ξ + 2π r ), · · · , F (ξ + T 2 (r − 1) 2π )] , F (ξ) ∈ L [0, 2π]. Note that G(ξ) is PN r r the j=1 rj × r matrix whose entries are in L2 [0, 2π r ]. And define λM (ξ)(resp. λm (ξ)) as the largest(resp. the smallest) eigenvalue of r × r matrix G(ξ)∗ G(ξ), βG as kλM (ξ)k∞ and αG as kλm k0 . Lemma 2.2.1 Let gj ∈ L2 [0, 2π] and rj be a positive integer for 1 ≤ j ≤ N . Define r as the least common multi−irj nξ plier of {rj }N : 1 ≤ j ≤ N, n ∈ j=1 . Then {gj (ξ)e Z } is a (a) Bessel sequence in L2 [0, 2π] if and only if kλM (ξ)k∞ < ∞ if and only if gj ∈ L∞ [0, 2π] for 1 ≤ j ≤ N . In this case, optimal bound is 2π r kλM (ξ)k∞ ; (b) frame of L2 [0, 2π] if and only if 0 < kλm (ξ)k0 ≤ PN kλM (ξ)k∞ < ∞ so that r ≤ j=1 rrj and optimal 2π bounds are 2π r kλm (ξ)k0 ≤ r kλM (ξ)k∞ ; 2 (c) Riesz basis of L [0, 2π] if and only if frame of PN 1 PN r L2 [0, 2π] and r = j=1 rj , i.e., 1 = j=1 rj if and only if gj (ξ) ∈ L∞ [0, 2π] for 1 ≤ j ≤ N , PN 1 = j=1 r1j and | det G(ξ)| ≥ ∃α > 0 a.e.. SAMPTA'09 In all cases, sampling series (4) converges in L2 (R), absolutely on R and uniformly on any subset of R on which Cφ (t) is bounded. j sampling series with LTI system {L̃j,mj [·]}j=1,m with j =1 T 1 2π Zψj (σj , ξ) (c) Assume that Zψj (σj , ξ) ∈ L∞ [0, 2π], 1 ≤ j ≤ N . Then there is a Riesz basis {sj,n (t) : 1 ≤ j ≤ N, n ∈ Z } of V (φ) for which (4) holds if and only PN if 0 < αG and 1 = j=1 r1j . N, rr 1 Appealing to the setting gj (ξ) = j ≤ N , we have (b) Assume that Zψj (σj , ξ) ∈ L∞ [0, 2π], 1 ≤ j ≤ N . Then there is a frame {sj,n (t) : 1 ≤ j ≤ N, n ∈ Z } of V (φ) for which (4) holds if and only if 0 < αG . for 1 ≤ r rj , where ˜lj,mj (t) = impulse response {˜lj,mj (t)}j=1,m j =1 lj (rj (mj − 1) + t). 2.3 Reconstruction functions Let S be a frame operator with frame {gj (ξ)e−irj nξ }j,n . For any F (ξ) ∈ L2 [0, 2π], r SF (ξ) = rj N X X gj (ξ)e−irj (mj −1)ξ j=1 mj =1 · 2π gj,m (ξ)T DF (ξ) r so that DSF (ξ) = 2π ∗ G G(ξ)DF (ξ). r Then, from Lemma 2.2.1 (b), there exists (G∗ G)−1 (ξ) a.e. such that r D(S −1 (gj (ξ)e−irj nξ )) = (G∗ G)−1 (ξ)D(gj (ξ)e−irj nξ ) 2π for 1 ≤ j ≤ N and n ∈ Z . Hence, r {sj,n }j,n = { JD−1 [(G∗ G)−1 (ξ)D(gj (ξ)e−irj nξ )]}j,n . 2π Remark 2.3.1 One sufficient condition under which {sj,n }j,n is translates of a single function in L2 [0, 2π] is that r divides rj for all 1 ≤ j ≤ N . Since r is the least common multiplier of {rj }N j=1 , the condition holds if and only if r = rj for all 1 ≤ j ≤ N . 252 References: [1] I. Djokovic, P. P. Vaidyanathan, Generalized sampling theorems in multiresolution subspaces, IEEE Trans. Signal Process., 45:583-599, 1997. [2] L. J. Fogel, A note on the sampling theorem, IRE Tran. Infor. Theory IT-1:47-48, 1995. [3] A. G. Garcı́a and G. Pérez-Villarón, Dual frames in L2 (0, 1) connected with generalized sampling in shift-invariant spaces, Appl. Comput. Harmon. Anal., 20:422-433, 2006. [4] J. M. Kim, K. H. Kwon, Sampling expansion in shift invariant spaces, Intern. J Wavelets, Multiresolution and Inform. Processing, 6(2):223-248, 2008. [5] A. Papoulis, Generalized sampling expansion, IEEE Trans. Circuits Systems, 24(11), 652-654, 1977. [6] C. E. Shannon, Communication in the presence of noise, Proc. IRE, 37:10-21, 1949. [7] M. Unser, J. Zerubia, Generalized sampling: Stability and performance analysis, IEEE trans. Signal Process., 45(12):2941-2950, 1997. [8] M. Unser, J. Zerubia, A generalized sampling theory without band-limiting constraints, IEEE Trans. Circuits Syst. 2, 45(8):959-969, 1998. SAMPTA'09 253 SAMPTA'09 254 Sparse Data Representation on the Sphere using the Easy Path Wavelet Transform Gerlind Plonka (1) and Daniela Roşca (2) (1) Department of Mathematics, University of Duisburg-Essen, Campus Duisburg, 47048 Duisburg, Germany. (2) Department of Mathematics, Technical University of Cluj-Napoca, 400020 Cluj-Napoca, Romania. gerlind.plonka@uni-due.de, Daniela.Rosca@math.utcluj.ro Abstract: In this paper we consider the Easy Path Wavelet Transform (EPWT) on spherical triangulations. The EPWT has been introduced in [7] in order to obtain sparse image representations. It is a locally adaptive transform that works along pathways through the array of function values and exploits the local correlations of the data in a simple appropriate manner. In our approach the usual one-dimensional discrete wavelet transform (DWT), orthogonal or biorthogonal, can be applied. 1. Introduction One important problem in data analysis is to construct efficient low-level representations using only a very small part of the original data. However, these sparse approximations should provide a precise characterization of relevant features of the data like discontinuities (edges) and texture components. It is well-known that wavelets can represent piecewise smooth signals efficiently. However, higher-dimensional structures may not be represented suitably by sparse wavelet decompositions based on tensor product wavelets, because directional geometrical properties of the data cannot be adapted. The last years have seen many attempts to construct locally adaptive wavelet-based schemes that take into account the special geometry of the data. In particular, for sparse representation of images, different ideas, that try to exploit the local correlations of the data, have been developed (see e.g. [1, 2, 3, 4, 5, 6, 7, 10]). We will focus on the EPWT recently introduced in [7] for sparse image representation. In this paper, we want to adapt the EPWT to triangulations of the sphere. For this purpose, we apply the idea used by Roşca [8, 9] to obtain a suitable spherical triangulation. We employ a polyhedral subdivision domain. The triangular faces of the polyhedron are successively subdivided into four smaller triangles. Each triangle can be transported radially to the sphere. This approach has been used in [8, 9] for the construction of Haar wavelets and of locally supported rational spline wavelets on the sphere. The idea of the EPWT on spherical triangulations is very simple. First we fix a certain neighborhood of a triangle, e.g. the three triangles that have common edges with the SAMPTA'09 reference triangle. Next, we use a one-dimensional indexing of all triangles of the fixed triangulation and assume that each function value of a given data vector is associated to one triangle, or rather to its corresponding (onedimensional) index. In the first step we select a path through the complete index set in such a way that data points associated to neighbor indices in the path are strongly correlated. For this purpose, for each index we choose “the best” neighbor index that has not been used in the path yet, such that the absolute difference between neighboring data values is the smallest. The complete path vector can be seen as a permutation of the original index vector. Then we apply a suitable (one-dimensional) discrete wavelet transform to the data vector along the path, and the choice of the path will ensure that most wavelet coefficients remain small. The same procedure can be successively applied to the down-sampled data. After a suitable number of iterations, we apply a shrinkage procedure to all wavelet coefficients in order to find a sparse digital representation of the function. For reconstruction one needs the path vector at each level in order to apply the inverse wavelet transform. 2. Spatial and spherical triangulations Consider the sphere S 2 = {x ∈ R3 , x2 = 1} and let Π be a convex polyhedron with triangular faces, containing O inside. For example we can take an icosahedron, a cube with triangulated faces, an octahedron, etc. The boundary of the polyhedron will be denoted by Ω. We denote by T 0 = {T1 , . . . , TM } the set of faces of Π. For each triangle T ∈ T 0 we take the mid-points of its edges and construct four triangles of equal area, as in Figure 1. All these small triangles will form a refined triangulation of T 0 , denoted T 1 . Continuing the refinement process in the same manner, we obtain a triangulation T j of Ω, for j ∈ N. For application of the EPWT we will stop the refinement process at a suitable sufficiently high (fixed) level j depending on the data set in the application. For application of the EPWT we will need a one-dimensional index set J = J j for the triangles in T j . Using the octahedron, this one-dimensional index set J can be as in Figure 1 (right). Observe that for the octahedron the number of triangles at the jth level is given by #J = #T j = 22j+3 . In order to obtain a spherical triangulation, for the given 255 A B F 16 28 27 32 B A 4 15 14 26 E 17 5 6 1 2 3 12 13 25 D C E 29 18 19 B F 7 C 8 9 10 11 23 24 31 20 21 22 D F 30 F F Figure 1: Illustration of the octahedron with triangulation T 1 (left) and a fold apart version of the octahedron on the plane, with a one-dimensional indexing of all triangles. polyhedron Π we define the radial projection p : Ω → S 2 , p(x, y, z) = (x2 +y 2 +z 2 )−1/2 ·(x, y, z), (x, y, z) ∈ Ω. The set U j = {U = p(T ), T ∈ T j } will be a triangulation of the sphere S 2 . For indexing the spherical triangles in U j , we use the same index set J as for the triangulation T j of the polyhedron. Decomposition First level We first determine a complete path vector p L through the index set J = {1, 2, . . . , N } and then apply a suitable discrete one-dimensional (periodic) wavelet transform to the function values f L = (f L (j))j∈J along the path p L . We start with pL (1) := 1. Next, for pL (2) we take pL (2) := argmin {|f L (1) − f L (k)|, k ∈ N (1)}. k We proceed in this manner, thereby determining a path vector through the index set J, that is locally adapted to the function f (easy path). With the procedure described above, we obtain a pathway such that the absolute differences between neighboring function values f L (l) along the path are as small as possible. In general, for a given the index pL (l), 1 ≤ l ≤ N − 1, the next value p L (l + 1) is defined by pL (l + 1) := argmin {|f L (pL (l)) − f L (k)|, k 3. Definitions and Notations for the EPWT In order to explain the idea of the EPWT, where we want to use the discrete one-dimensional wavelet transform along path vectors through the data, we need some definitions and notations. Let us assume that a fixed refined spherical triangulation U j is given.Let J be a one-dimensional index set for the spherical triangles in U j . We define a neighborhood of an index ν ∈ J as N (ν) = {µ ∈ J\{ν} : Tµ and Tν have a common edge}. Hence, each index ν ∈ J has exactly three neighbors. One may also use a bigger neighborhood, e.g. N (ν) = {µ ∈ J \ {ν} : Tµ and Tν have a common edge or a common vertex }, in which case each index has 12 neighbors. We also need a definition of neighborhood of subsets of an index set. We shall consider disjoint partitions of J of the form  {J1 , J2 , . . . , Jr }, where Jµ ∩ Jν = ∅ for µ = ν and rν=1 Jν = J. We then say that two different subsets Jν and Jµ from the partition are neighbors, and we write Jν ∈ N (Jµ ), if there exist the indices l ∈ J ν and l1 ∈ Jµ such that l ∈ N (l1 ). We consider a function f being piecewise constant on the triangles of U j , i.e., we identify each spherical triangle in U j with a value of f . Hence, f is uniquely determined by the data vector (f ν )ν∈J . We will look for path vectors through index subsets of J and we apply a one-dimensional wavelet transform along these path vectors. Any orthogonal or biorthogonal onedimensional wavelet transform can be used here. 4. Description of the EPWT In this section we give a summary of the idea of the EPWT, described in more details in [7]. We start with the decomposition of the real data (f ν )ν∈J , and we assume that N = #J is a multiple of 2L with L ∈ N. Then we will be able to apply L levels of the EPWT. For the considered octahedron we have N = 2 2j+3 . SAMPTA'09 k ∈ N (pL (l)) \ {pL (ν), ν = 1, . . . , l}}. It can happen that the choice of the next index value pL (l + 1) is not unique, if the above minimum is attained for more than one index. In this case, one may fix favorite directions in order to determine a unique pathway. Another situation which can occur during the procedure is that all indices in the neighborhood of an index p L (l) have already been used in the path p L . In this case we have an interruption in the path vector. We need to choose one index pL (l+1) from the remaining indices in J, which have not been taken yet in p L . There are different possibilities for finding a suitable next index. One simple choice is to take the smallest index from J that has not been used so far. Another choice is to look for a next index, such that again the absolute difference |f L (pL (l)) − f L (pL (l + 1))| is minimal, i.e., we take in this case pL (l + 1) = argmin {|f L (pL (l)) − f L (k)|, k k ∈ J \ {pL (ν), ν = 1, . . . , l}}. By proceeding in this manner, we finally obtain a path vector pL ∈ ZN , which is a permutation of (1, 2, . . . , N ). After having constructed the path p L , we apply one level of the 1-D Haar DWT (or any other orthogonal or biorthogonal periodic DWT) to the vector of function valL ues (f L (pL (l)))N l=1 along the path p . We obtain the vecN/2 L−1 tor f ∈ R , containing the low-pass part, and the vector of wavelet coefficients g L−1 ∈ RN/2 . While the wavelet coefficients will be stored in g L−1 , we further proceed with the low-pass vector f L−1 at the second level. Further levels If N = 2L r with r ∈ N being greater than or equal to the lengths of low-pass and high-pass filters in the chosen DWT, then we may apply the procedure L times. For a given vector f L−j , 0 < j < L, at the (j + 1)-th level we consider the index sets L−j+1 j JlL−j := JpL−j+1 L−j+1 (2l−1) ∪ JpL−j+1 (2l) , l = 1, . . . , N/2 , 256 N/2j with the corresponding function values (f L−j (l))l=1 . In particular, the index sets at the second level are J lL−1 := {pL (2l−1), pL(2l)}, l = 1, . . . , N/2, determining a partition of J. We repeat the procedure described in the first step, but replacing the single indices with the new index sets J lL−j , and the corresponding function values with the smoothed function values f L−j (l). j The new path vector p L−j ∈ ZN/2 should now be a permutation of (1, 2, . . . , N/2 j ). We start again with the first index set J1L−j , i.e., pL−j (1) = 1. Having already found pL−j (l), 1 ≤ l ≤ N/2j − 1, we determine the next value pL−j (l + 1) as B A 5 17 18 29 F 1 6 7 19 2 9 8 C 4 3 10 20 22 14 13 12 11 16 14 6 B E A 1 2 11 16 31 D 13 F F 21 30 F 10 3 E 5 14 25 24 23 F 15 32 27 26 16 15 B B F 28 6 7 4 12 9 C 12 9 D F 8 F Figure 2. Illustration of first path through the triangulation T 1 of the octahedron (left) and of the low-pass part after the first level of EPWT with Haar DWT (right). Index sets at the second level are illustrated by different gray values, and path vectors are represented by arrows. pL−j (l + 1) = argmin {|f L−j (pL−j (l)) − f L−j (k)|, k L−j JkL−j ∈ N (JpL−j (ν), ν = 1, . . . , l}}. L−j (l) ) \ {p If the new value p L−j (l+1) is not uniquely determined by the minimizing procedure, we can fix favorite directions in order to obtain a unique path. If for the set J pL−j L−j (l) there is no neighboring index set that has not been used yet in the path vector p L−j , then we have to interrupt the path and to find a new good index set (that has been not used so far) to continue the path. As at the first level, we try to keep the differences of function values along the path as small as possible. Finally, we apply the (periodic) wavelet transform to the N/2j vector (f L−j (pL−j (l)))l=1 along the path p L−j , thereby j+1 obtaining the low-pass vector f L−j−1 ∈ RN/2 and the N/2j+1 L−j−1 vector of wavelet coefficients g ∈R . Output As output of the complete procedure after L iterations we obtain the coefficient vector g = (f 0 , g0 , g1 , . . . , gL−1 ) ∈ RN and the vector determining the paths at each iteration step L p = (p1 , p2 , . . . , pL ) ∈ R2N (1−1/2 ) . These two vectors contain the entire information about the original function f . In order to find a sparse representation of f , we apply a shrinkage procedure to the wavelet coefficients in the vecj . tors gj , j = 0, . . . , L − 1 and obtain the vectors g Reconstruction  = (f 0 , g 0 , g 1 , . . . , g L−1 ) The reconstruction of f L from g and p is given as follows.  f 0 = f 0; For j = 0 to L − 1 j j ) ∈ Rr2 - Apply the inverse DWT to the vector ( fj, g j+1 in order to obtain  fpj+1 ∈ Rr2 . - Apply the permutation  f j+1 (pj+1 (k)) =  fpj+1 (k), for j+1 k = 1, . . . , r2 . 5. Example sphere, where each function value corresponds to a spherical triangle that has been obtained by radial projection of the triangulated octahedron in Figure 1 (left). The values are given as a vector f = f 5 of length 32, corresponding to the one-dimensional indexing of the triangles in Figure 1 (right), f = (0.4492, 0.4219, 0.4258, 0.4375, 0.4141, 0.4531, 0.4180, 0.4258, 0.4375, 0.4292, 0.4219, 0.4219, 0.4219, 0.4258, 0.4023, 0.4141, 0.4219, 0.4219, 0.4297, 0.4375, 0.4141, 0.4023, 0.4258, 0.4219, 0.4258, 0.4180, 0.4531, 0.4141, 0.4375, 0.4258, 0.4219, 0.4492). Starting with the index 1, with the function value 0.4492, we determine the first path vector. This index has the three neighbors 2, 4, and 6, with the corresponding values 0.4219, 0.4375 and 0.4531, respectively (see Figure 2). Hence, the second index in the path is 6. Proceeding further according to Section 4 we obtain p5 =(1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 26, 25, 24, 31, 30, 21, 22, 23; 3, 2, 17, 18, 19, 20; 4, 15, 16, 5; 28, 27, 32, 29), where the interruptions in the path are indicated by semicolons. This path has four interruptions and is illustrated by arrows in Figure 2 (left). An application of the Haar DWT (with unnormalized filter coefficients h 0 = h1 = 1/2, g0 = 1/2, g1 = −1/2) along this path gives (with truncation after four digits) the low-pass coefficients f 4 = (0.4512, 0.4219, 0.4334, 0.4219, 0.4238, 0.4219, 0.4219, 0.4200, 0.4140, 0.4238, 0.4219, 0.4336, 0.4199, 0.4141, 0.4336, 0.4434), and the wavelet coefficients g4 = (−0.0020, −0.0039, −0.0042, 0., −0.0020, −0.0039, 0., 0.0058, −0.0118, 0.0020, 0., −0.0039, 0.0176, 0., −0.0195, 0.0058). We illustrate the simple idea of function decomposition with the EPWT on the sphere in the following small example. Let a set of 32 function values be given on the SAMPTA'09 We now proceed to the second level. For the smoothed vector of function values f 4 corresponding to the 16 index 257 B B F 3 B A 7 6 B E 7 1 5 2 C F 4 D E 1 3 2 4 6 A 4 3 8 F 2 8 F F 4 C 2 D F 4 4 Figure 3. Illustration of the third and fourth paths. Figure 4. Approximation f 6 at level 6 of the original dataset topo and the compressed version e f 6 with threshold 2500. sets that are illustrated by gray values in Figure 2 (right), we obtain the next path The results are contained in Table 1, where the mean of f 6 is −2329. F F p4 = (1, 10, 4, 5, 6, 7, 8, 9, 3, 2, 12, 11, 14, 13; 15, 16), illustrated by arrows in Figure 2 (right). An application of the Haar DWT along p 4 gives f 3 = (0.4375, 0.4229, 0.4219, 0.4170, 0.4276, 0.4278, 0.4170, 0.4385), g = (0.0136, −0.0010, 0., 0.0030, 0.0057, 0.0058, 3 0.0029, −0.0049). At the third level we start with the smoothed vector f 3 corresponding to the 8 index sets that are illustrated by gray values in Figure 3 (left). We find now the path p 3 = (1, 5, 6, 8, 3, 2, 4; 7), see Figure 3 (left). This leads to f2 g2 = (0.4326, 0.4331, 0.4224, 0.4170), = (0.0049, −0.0054, 0.0005, 0.). At the fourth level we have only 4 index sets that correspond to the values in f 2 , see Figure 3 (right). Hence we find p2 = (1, 2, 3, 4) and f 1 = (0.4328, 0.4197), g1 = (−0.0003, 0.0027). Finally, with p1 = (1, 2), the last transform yields f 0 = (0.4263) and g0 = (0.0066). 6. Numerical experiments To illustrate the efficiency of our method, we took the dataset topo and we considered the regular octahedron with triangulation T 6 , containing 32768 triangles. The approximation f 6 at level 6 is represented in Figure 4. We applied the EPWT with different thresholds, obtaining the compressed vector  f 6 , and we measured the SNR given as SN R = 20 · log10 threshold 1 100 500 1000 1500 2000 2500 f 6 − mean(f 6 )2 . f 6 −  f 6 2 number of remaining wavelet coeff. 27732 14185 5230 3313 2699 2402 2265 l 2 -norm of error 26.4031 5.34e+03 2.47e+04 3.97e+04 5.00e+04 5.79e+04 6.35e+04 SNR 84.72 38.59 25.30 21.17 19.18 17.89 17.10 Table 1: Compression results for the dataset topo. SAMPTA'09 Acknowledgments This research in this paper is supported by the project 436 RUM 113/31/0-1 of the German Research Foundation (DFG). This is gratefully acknowledged. References: [1] R.L. Claypoole, G.M. Davis, W. Sweldens, and R.G. Baraniuk. Nonlinear wavelet transforms for image coding via lifting. IEEE Trans. Image Process. 12:1449–1459, 2003. [2] A. Cohen and B Matei. Compact representation of images by edge adapted multiscale transforms. In Proc. IEEE Int. Conf. on Image Process. (ICIP), Thessaloniki, pages 8–11, 2001. [3] S. Dekel and D. Leviatan. Adaptive multivariate approximation using binary space partitions and geometric wavelets. SIAM J. Numer. Anal. 43:707–732, 2006. [4] W. Ding, F. Wu, X, Wu, S. Li, and H. Li. Adaptive directional lifting-based wavelet transform for image coding. IEEE Trans. Image Process. 16:416–427, 2007. [5] D.L. Donoho. Wedgelets: Nearly minimax estimation of edges. Ann. Stat. 27:859–897, 1999. [6] S. Mallat. Geometrical grouplets. Appl. Comput. Harmon. Anal., 26 (2): 143–290, 2009. [7] G. Plonka. The easy path wavelet transform: A new adaptive wavelet transform for sparse representation of two-dimensional data. Multiscale Model. Simul. 7:1474–1496, 2009. [8] D. Roşca. Haar wavelets on spherical triangulations. In Dodgson, N.A., Floater, M.S., Sabin, M.A., editors, Advances in Multiresolution for Geometric Modelling, Springer, pages 405–417, 2005. [9] D. Roşca. Locally supported rational spline wavelets on a sphere. Math. Comput. 74:1803–1829, 2005. [10] R. Shukla, P.L. Dragotti, M.N. Do, and M. Vetterli. Rate-distortion optimized tree structured compression algorithms for piecewise smooth images. IEEE Trans. Image Process. 14:343–359, 2005. 258 A fully non-uniform approach to FIR filtering Brigitte Bidégaray-Fesquet (1) and Laurent Fesquet (2) (1) LJK, CNRS / Grenoble University, B.P. 53, 38042 Grenoble Cedex 9, France. (2) TIMA, 46 avenue Félix Viallet, 38031 Grenoble Cedex, France. Brigitte.Bidegaray@imag.fr, Laurent.Fesquet@imag.fr Abstract: We propose a FIR filtering technique which takes advantage of the possibility of using a very low number of samples for both the signal and the filter transfer function thanks to non-uniform sampling. This approach leads to a summation formula which plays the role of the discrete convolution for usual FIR filters. Here the formula is much more complicated but it can be implemented and the evaluation of more elaborate expressions is compensated by the very low number of samples to process. 1. Introduction Reducing the power consumption of mobile systems – such as cell phones, sensor networks and many others electronic devices – by one to two orders of magnitude is extremely challenging but will be very useful to increase the system autonomy and reduce the equipment size and weight. In order to reach such a goal, this paper proposes a solution applicable to FIR filtering which completely rethinks the signal processing theory and the associated system architectures. Today the signal processing systems uniformly sample analog signals (at Nyquist rate) without taking advantage of their intrinsic properties. For instance, temperature, pressure, electro-cardiograms, speech signals significantly vary only during short moments. Thus the digitizing system part is highly constrained due to the Shannon theory, which fixes the sampling frequency at least twice the input signal frequency bandwidth. It has been proved in [4] and [6] that Analog-to-digital Converters (ADCs) using a non equi-repartition in time of samples leads to interesting power savings compared to Nyquist ADCs. A new class of ADCs called A-ADCs (for Asynchronous ADCs) based on level-crossing sampling (which produces non-uniform samples in time) [2, 3] and related signal processing techniques [1, 5] have been developed. This work suggests an important change in the FIR filter design. As sampling analog signals is usually performed uniformly in time, sampling the filter transfer function is also done in a regular way with a constant frequency step. Non-uniform sampling leads to an important reduction of the weight-function coefficients. Combined with a nonuniform level-crossing sampling technique performed by an A-ADC, this approach drastically reduces the compu- SAMPTA'09 tation load by minimizing the number of samples and operations, even if they are more complex. 2. Principle and notations For a large class of signal, non-uniform sampling leads to a reduced number of samples, compared to a Nyquist sampling. This feature has already been used in [1] to design non-uniform filtering techniques based on interpolation. In this work the authors however used a classical (uniform) filter, that is a usual discretization in time of the impulse response. Here we want to go further and take advantage of the fact that the filter transfer function (the Fourier transform of the impulse response) is a very smooth function with respect to frequency. It can therefore be well approximated by the linear interpolation of quite few samples. 2.1 Level crossing sampling The initial signals are supposed to be analog ones. The signal which we want to filter is given in the time domain and is denoted by s(t). The filter transfer function is given in the frequency domain and is denoted by H(ω). The result of the filtering process x(t) is then theoretically the convolution of s(t) with the impulse response h(t) which is the inverse Fourier transform of H(ω): Z +∞ x(t) = h(t − τ )s(τ )dτ, −∞ h(t) = 1 2π Z +∞ H(ω)e−iωt dω. −∞ These signal are sampled in their initial domain using a level crossing scheme. This technique has to be adapted for the filter transfer function. Indeed level crossing has a sense if an order can be defined, for example for a real valued function. The filter transfer function is complex valued, therefore we can choose to sample either when the amplitude crosses some predefined values, or the phase, or both. The samples read (sn , δtn ) for the signal and (Hk , δωk ) for the filter transfer function. These samples are formed of a value and the (time or frequency) interval length ”elapsed” since the last sample. To give results or describe algorithms we will use Pnthe sample times or frequencies defined as tn = t0 + 1 δtn′ and ωk = Pk ω0 + 1 δωk′ but computations will be performed using 259 only the time and frequency intervals δtn and δωk . We will also denote by In = [tn−1 , tn ] and Jk = [ωk−1 , ωk ] the time and frequency intervals. which has a compact support. The convolution reads x̄(t) Z = +∞ h̄(t − τ )s̄(τ )dτ −∞ 2.2 Linear interpolation n h(t − τ )sn (τ )dτ tn−1 n X = s̄(t) = [an + bn t]χIn , an H̄(ω) = i(γk +δk ω) (αk + βk ω)e h0nk (t) χJk , where χ denotes the indicator function of the set given in index. The coefficients an and bn can be expressed in terms of sn , sn−1 , tn and δtn . The coefficients αk , βk , γk and δk can be expressed in terms of Hk , Hk−1 , ωk and δωk . In fact these formulae cover the piecewise constant case (only take bn = βk = δk = 0) in three possible forms: constant on intervals In or nearest neighbor interpolation, with a possible need to modify the definition of tn and δtn in the algorithms. They also cover two ways to linearly interpolate the complex valued filter transfer function: either interpolate separately the amplitude and the phase (αk and βk are real) or interpolate in the complex plane (αk and βk are complex, γk and δk are zero). The digital filter then consists in computing (possibly) for all time x̄(t) = +∞ h̄(t − τ )s̄(τ )dτ, h̄(t) 3. 3.1 = Z +∞ H̄(ω)e −iωt dω. = h1nk (t) = A summation formula The impulse response h̄(t) can be P split in contributions for each frequency sample h̄(t) = k hk (t) with Z ωk (αk + βk ω)ei(γk +δk ω) e−iωt dω ωk−1 for which we will give an explicit expression in Section 3.2. Although the piecewise linear function H̄(ω) has a compact support (we only have a finite number of samples), the functions hk (t) have an infinite support. This is not a problem since the convolution will involve s̄(t) SAMPTA'09 ! h1nk (t) k tn Z hk (t − τ )dτ, tn Z hk (t − τ )τ dτ. tn−1 We obtain a summation formula as in the classical FIR filtering case where it takes the form of a discrete convolution. To be closer to this classical case, we should write this as X X sn hnk (t), x̄(t) = n k which is possible but the effective expression depends on the type of interpolation used (piecewise constant or linear). There remains to make explicit these two types of elementary contributions. 3.2 Elementary impulse responses A straightforward computation of the integral formulation for hk (t) yields hk (t) = αk eiγk 2π −∞ Deriving a filtering formula in the general context 1 hk (t) = 2π X tn−1 −∞ 1 2π h0nk (t) + bn k k Z X where n X hk (t − τ )(an + bn τ )dτ tn−1 k n X tn XXZ = To derive the FIR algorithm and approximate the theoretical integral formula, we form new analog functions from the previously described samples. To this aim we choose linear interpolation and we have tn XZ = Z ωk ei(δk −t)ω dω ωk−1 Z ωk + βk eiγk 2π = αk eiγk ei(δk −t)ωk − ei(δk −t)ωk−1 2πi(δk − t) + + ei(δk −t)ω ωdω ωk−1  βk eiγk ωk ei(δk −t)ωk − ωk−1 ei(δk −t)ωk−1 2πi(δk − t)  iγk i(δk −t)ωk βk e e − ei(δk −t)ωk−1 . 2π(δk − t)2  These formulae seem singular when t = δk . This is not the case and has no reason to be since the function we integrate is smooth with respect to all parameters and variables. The limiting value for t = δk is clearly hk (δk ) = = αk eiγk 2π ωk βk eiγk dω + 2π ωk−1 Z Z ωk ωdω ωk−1 1 eiγk δωk (αk + βk (ωk−1 + ωk )). 2π 2 260 3.3 Elementary summation coefficients A quick glance at the explicit expression of hk (t) clearly provides the impression that the explicit formulae for h0nk (t) and h0nk (t) will not fit in the columns here. We will give only their flavor. Indeed we want to compute the time integrals of of hk (t − τ ) and hk (t − τ )τ for τ ∈ In . This leads to integrate the product of a rational function with a complex exponential function. The results cannot be given in terms of simple functions but only in terms of the exponential integral function Z ∞ dy π Ei(ix) = − eiy +i . y 2 x We give in the next section a simple example of elementary summation coefficient calculation in the piecewise linear context. 4. 4.1 A simple and ideal example Computation of the coefficients Our sampling for the filter transfer function yields a particularly simple formulation for the ideal low-pass filter which is 1 on the frequency interval [−ωc , ωc ] and zero elsewhere. This yields a single sample (1, 2ωc ) and linearly interpolated coefficients α1 = 1, β1 = 0, γ1 = 0 and δ1 = 0. The expression for the elementary impulse response is  e−iωc t − eiωc t ωc = sinc(ωc t). h1 (t) = −2πit π Then we have to compute Z tn Z 0 h1 (t − τ )dτ = − hn1 (t) = tn−1 t−tn h1 (τ )dτ t−tn−1 1 = − (Si(ωc (t − tn )) − Si(ωc (t − tn−1 )), π where Si is the special function known as sine integral and defined by Z x dy 1 π sin(y) = (Ei(ix) − Ei(−ix)) + , Si(x) = y 2i 2 0 4.2 Numerical results To illustrate this simple example we filter the signal s(t) = 0.45 sin(2πt) + 0.45 sin(10πt) + 0.9 with the ideal low pass filter with the cutoff frequency ωc = 4π. The theoretical result is therefore supposed to be x(t) = 0.45 sin(2πt) + 0.9. This is not the typical sort of signal which is supposed to be addressed by our technique since it is not a sporadic one and a relatively large number of samples are taken. We perform the computations within the M ATLAB SPASS (Signal Processing for ASynchronous Systems) framework (http://ljk.imag.fr/membres/Brigitte.Bidegaray/SPASS/). This signal is sampled with a M -bit Asynchronous A/D Converter (AADC) which leads to a level crossing sampling over the amplitude range [0, 1.8]. We can choose as we want the times at which the filtered signal is computed. To display the results we choose the sequence of times tm = .17m (m integer) to have sampling points dispatched irregularly over the obtained solution. On Figure 1, you can see the result for a linear interpolation of the signal non-uniform samples and a 3-bit AADC. We plot continuous functions with lines: the initial signal s(t) (dashed line) and the theoretical filtered signal x(t) (solid line). We plot the sampled results with markers: the non-uniformly sampled initial signal sn (asterisk markers) and the computed filtered samples xm (circle markers) at times tm . 1.8 s(t) x(t) sn 1.6 xm 1.4 and h1n1 (t) of a numerical implementation of these algorithms. Moreover these functions are however very smooth: the Si function for example is almost linear in the neighborhood of 0 and tends to ±π/2 at ±∞ with very gentle oscillations. This feature makes possible the construction of efficient lookup tables in view of a hardware implementation. = tn Z 1.2 h1 (t − τ )τ dτ tn−1 Z t−tn = − 1 h1 (τ )(t − τ )dτ 0.8 t−tn−1 Z 1 t−tn−1 + = t sin(ωc τ )dτ π t−tn 1 = t h0n1 (t) − (cos(ωc (t − tn )) πωc − cos(ωc (t − tn−1 ))). h0n1 (t) This case is simple due to its minimal number of samples in the frequency domain, but it displays all the difficulties of the general case, i.e. the need to evaluate special functions. These functions are built in many libraries in view SAMPTA'09 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 Figure 1: Filtering result. Initial signal (dashed line), theoretical filtered signal (solid line), non-uniformly sampled initial signal (asterisk markers) and computed filtered samples (circle markers). 261 This very simple test case has quite a low number of parameters compared to the full problem for which we can finely tune the filter transfer function sampling for example. We compare here the results obtained for a zeroth and a first order interpolation of the signal and for different values (2, 3, 4 and 5) of the AADC resolution. On Table 1 we give the relative l1 error between computed filtered samples xm at times tm = .01m (m integer) and the theoretical values x(tm ). M M M M =2 =3 =4 =5 0th order 0.0608 0.0076 0.0052 0.0046 1st order 0.0584 0.0046 0.0045 0.0045 [3] [4] [5] Table 1: l1 error of the filtering method for 0th and first order interpolation of the signal and and M bit resolution of the AADC (M = 2, 3, 4, 5). In the case of the 2-bit AADC, there are 2.8 points per wavelength for the highest frequency part of the signal. This is a very low rate, and we are however able to have only 6% error on the filtered result which is quite sufficient for a large range of applications. The other results all show less than 1% error. The values displayed on Table 1 are very dependent on the choice of the function to filter. Finer results (allowing less than .45% error) should certainly be obtained by using a higher order interpolation for the signal. 5. [6] converter. In 12th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC’06), pages 11–22, Grenoble, France, March 2006. Emmanuel Allier, Gilles Sicard, Laurent Fesquet, and Marc Renaudin. A new class of asynchronous A/D converters based on time quantization. In 9th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC’03), pages 197–205, Vancouver, Canada, May 2003. Jon W. Mark and Terence D. Todd. A nonuniform sampling approach to data compression. IEEE Trans. on Communications, COM-29(1):24–32, January 1981. Saeed Mian Qaisar, Laurent Fesquet, and Marc Renaudin. Adaptive rate filtering for a signal driven sampling scheme. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP07), pages III–1465–III–1468, Honolulu, Hawaii, USA, April 2007. N. Sayiner, H.V. Sorensen, and T.R. Viswanathan. A level-crossing sampling scheme for A/D conversion. IEEE Trans. on Circuits and Systems II, 43(4):335– 339, April 1996. Conclusions We have presented a novel approach to FIR filtering based on the non-uniform sampling of the signal but also the non-uniform sampling in frequency of the filter transfer function. The final result is complex but is nonetheless possible to implement in hardware devices and of course in numerical codes. This complexity is balanced by the very low number of samples and the relatively low number of operations needed for each evaluation. This approach is very promising to achieve a lower power consumption in mobile systems. 6. Acknowledgments This work has been supported by a funding from the Joseph Fourier-Grenoble 1 University: MSTIC project TATIE. References: [1] Fabien Aeschlimann, Emmanuel Allier, Laurent Fesquet, and Marc Renaudin. Asynchronus fir filters, towards a new digital processing chain. In 10th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC’04), pages 198–206, Crete, Greece, April 2004. [2] Filipp Akopyan, Rajit Manohar, and Alyssa B. Apsel. A level-crossing flash asynchronous analog-to-digital SAMPTA'09 262 ADAPTIVE TRANSMISSION FOR LOSSLESS IMAGE RECONSTRUCTION Elisabeth Lahalle, Gilles Fleury, Rawad Zgheib Department of Signal Processing and Electronic Systems, Supélec, Gif-sur-Yvette, France E-mail : firstname.lastname@supelec.fr tel: +33 (0)1 69 85 14 27, fax: +33 (0)1 69 85 14 29 ABSTRACT This paper deals with the problem of adaptive digital transmission systems for lossless reconstruction. A new system, based on the principle of non-uniform transmission, is proposed. It uses a recently proposed algorithm for adaptive stable identification and robust reconstruction of AR processes subject to missing data. This algorithm offers at the same time an unbiased estimation of the model’s parameters and an optimal reconstruction in the least mean square sense. It is an extension of the RLSL algorithm to the case of missing observations combined with a Kalman filter for the prediction. This algorithm has been extended to 2D signals. The proposed method has been applied for lossless image compression. It has shown an improvement in bit rate transmission compared to the JPEG2000 as well as the JPEG-LS standards. Index Terms— adaptive, lossless, compression 1. INTRODUCTION Lossless compression methods are important in many medical applications where large data set need to be transmitted without any loss of information. Actually, some lesions risk becoming undetectable due to the effects of lossy compression. General lossless compression coders are considered to be composed of two main blocks: a data decorrelation block and an entropy coder for the decorrelated data. Two main tendencies may be noticed for the methods used for the decorrelation step: methods based on wavelet transforms and methods based on predictive coding. They have led to the main compression standards : the JPEG2000 for the former group of methods [1], the JPEG-LS for the latter [2]. Intensive attention is paid to transform based compression methods with many algorithms which perform well regarding the bit rate such as SPIHT [3], QT [4], etc. All these coders use a uniform transmission of the binary elements to transmit. In a previous paper [5], the design of digital systems based upon non-uniform transmission of signal samples was introduced. The idea behind is to avoid sending a sample if it can be efficiently predicted, e.g. with a prediction error smaller than the quantization one, thus reducing the average transmission bit rate and increasing the signal SAMPTA'09 to noise ratio (SNR). A speech coder based on the Adaptive Pulse Code Modulation (ADPCM) principle and non-uniform transmission of signals have already been proposed in [6]. It uses the Least Mean Square (LMS)-like algorithm [7] for the prediction of the samples that were not sent. However, this algorithm converges toward biased estimations of the model’s parameters and does not use an optimal predictor in the least mean square sense. Recently, we proposed a Recursive Least Square Lattice (RLSL) algorithm for adaptive stable identification of non stationary Autoregressive (AR) processes subject to missing data, using a Kalman filter as a predictor [8]. This algorithm is fast, guarantees the stability of the model identified and offers at the same time an optimal reconstruction error in the least mean square sense and an unbiased estimation of the model’s parameters in addition to the fast adaptivity to the variations of the parameters in the case of non stationary processes. Non stationnary AR processes can model a large number of signals in practical situations, such as images in the bi-dimensional case [9]. A new lossless image coder based on a non-uniform transmission principle is proposed: it is based on an adaptation of the algorithm proposed in [8] for optimal prediction and identification of 2D AR processes subject to missing observations. In the following, begin by presenting the non-uniform transmission idea for lossless compression. In a second part, the adaptive algorithm for reconstruction of AR processes with missing observations [8] is described and extended to 2D AR processes. Its integration into a non-uniform transmission system is studied in the third section. Finally, an example illustrates the performances of the proposed system. It is compared to a uniform digital transmission system : the JPEG2000. 2. NON-UNIFORM TRANSMISSION SYSTEM FOR LOSSLESS RECONSTRUCTION The proposed system uses predictive coding and non-uniform transmission to reduce the bit rate transmission. An AR signal modeling is considered for the prediction. Let xn be the amplitude of the signal at time n. The prediction of a sample will be noted x̂n,P and the prediction error en,P = xn − x̂n,P . In the receiver, a sample xn is predicted using the estimated model parameters at time n − 1, ân−1 , and the available sam- 263 ples. The key ideas of the proposed system are the following. If en,P ≈ 0, xn is replaced by x̂n,P in the receiver without any loss, requiring only one bit flag to be transmitted for the first and the last sample where en,P ≈ 0. If an efficient prediction method for non-uniformly sampled data is used, the above situation occurs many times during the transmission. This is the case for example outside the region of interest of the image where the sample value is constant or null. The whole number of transmitted samples is thus considerably reduced. As some of the samples are not transmitted, the receiver has to deal with the problem of online identification and reconstruction of signals subject to missing samples. The probability law of the prediction error of the image to transmit is then used to adapt the number of bit coding the prediction error in the case where it is non zero. 3. PREDICTION/RECONSTRUCTION FOR NON-UNIFORMLY SAMPLED DATA Let {xn } be an AR process of order L with parameters {ak }, and {ǫn } the corresponding innovation process of variance σǫ2 . The loss process is modeled by an i.i.d binary random variable {cn }, where cn = 1 if xn is available, otherwise cn = 0. Let {zn } be the reconstruction of the process {xn }. If xn is available zn = xn , otherwise, zn = x̂n , the prediction of xn . In order to identify, in real time, the AR process subject to missing data, the algorithm proposed in [8] can be summarised as follows. The reflection coefficients of the lattice structure are determined by minimizing the weighted sum (l) (l) of the quadratic forward, ft , and backward, bt , prediction errors : n ³ ´ X wn−i fn(l)2 + bn(l)2 . En(l) = (1) i=1 A Kalman filter provide an optimal prediction of the signal using the AR estimated parameters. These parameters are deduced from the estimated reflection coefficients using the Durbin Levinson recursions. At time n + 1, the first line of the matrix A of the state space representation of an AR process is built with â(L)⊤ , the vector of the parameters estimated n at time n. The matrix is then named An+1 . (L) (L) . . . . . . âL,n 0 0 An+1 .. .. . . 0 1 0 Pn+1|n = An+1 Pn|n A⊤ n+1 + Rǫ , x̂n+1|n = An+1 x̂n|n ŷn+1|n = cn+1 x̂n+1|n â1,n  1  =  SAMPTA'09 −1 Kn+1 = Pn+1|n cn+1 (c⊤ , n+1 Pn+1|n cn+1 ) Pn+1|n+1 = (Id − Kn+1 c⊤ n+1 )Pn+1|n , (3a) (3b) x̂n+1|n+1 = x̂n+1|n + Kn+1 (yn+1 − ŷn+1|n ) (3c) The predictions of the previous missing data up to time n − L + 1 are updated thanks to the filtering of the state in equation (3c). It is convenient now to calculate all the variables of the lattice filter since the last available observation at time n − h, where h ≥ 0 depends on the observation pattern. At each time t, for n − h + 1 ≤ t ≤ n + 1, the recursive equations of the RLSL algorithm given by (5) are applied to es(l) timate the different reflection coefficients k̂t and prediction (l) (l) errors fˆt , b̂t for 1 ≤ l ≤ L. The values of the forward and backward prediction errors are initialized using the updated estimates of the missing samples (those contained within the (0) (0) filtered state x̂n+1|n+1 ), i.e. fˆt = b̂t = x̂t|n+1 . Hence, • For t = n − h + 1 to n + 1 3.1. Kalman RLSL algorithm  If xn+1 is available, i.e. cn+1 = 1,    ,  (2) – Initialize for l = 0 (0) (0) (0) fˆt = b̂t = x̂t|n+1 , k̂t = 1, (4) – For l = 1 to min(L, n) (l) (l−1) (l) Ct = λCt−1 + 2fˆ(l−1)t b̂t−1 , (5a) (l) (l) (l−1)2 (l−1)2 Dt = λDt−1 + fˆt + b̂t−1 , (5b) (l) k̂t = (l) fˆt = (l) b̂t = (l) C − t(l) , Dt (l−1) (l) (l−1) fˆt − k̂t b̂t−1 , (l−1) (l) (l−1) b̂t−1 − k̂t fˆt , (5c) (5d) (5e) – end • end. (L) The AR parameters at time n+1, (âi,n+1 )1≤i≤L , are deduced (l) from the reflection coefficients (k̂n+1 )1≤i≤L using the Durbin Levinson recursions. owever if xn+1 is absent, cn+1 = 0, the predicted state, x̂n+1|n , is not filtered by the Kalman filter, and the parameters are not updated since the reflection coeffi(l) cients (k̂n+1 )1≤l≤L are not yet calculated, Kn+1 = 0, (6a) Pn+1|n+1 = Pn+1|n , (6b) x̂n+1|n+1 = x̂n+1|n , (6c) (L) ân+1 = â(L) n . (6d) The cost function minimized by this algorithm is the weighted mean of all quadratic prediction errors. When a sample is 264 missing, the prediction error can not be calculated, it is replaced by its estimation. Indeed, recall that in order to update the reflection coefficients at a time n, the lattice filter variables must have been calculated at all previous times. Therefore, using this algorithm, the lattice filter variables are estimated at all times even when a sample is missing. Consequently, this algorithm presents an excellent convergence behavior and have fast parameter tracking capability even for a large probability of missing a sample. The computational complexity of this algorithm is found to be O((1 − q)N L2 ), where q is the bernoulli’s probability of losing a sample, N is the size of the signal and L the order of the AR model. 3.2. Adaptation to 2D signals A first solution to use the previous algorithm for 2D signals is to use the classical video scanning of the image in order to get a 1D signal. However, only a 1D decorrelation is achieved using this method. In order to get a 2D decorrelation of the image, a 2D AR predictor x̂i,j of the sample xi,j (7) must be used in addition to the video scanning of the image. m S n Fig. 1. AR 2D: prediction support X ân,m xi−n,j−m (7) n,m∈S In order to integrate this 2D AD predictor into the previous algorithm, the first line of the A matrix is built with the ân,m ⊤ parameters, and the regressor vector [xn−1 . . . xn−L ] is re⊤ placed by [xi−1,j . . . xi−n,j−m . . . xi−p,j−q ] . The renumbering task excepted, to built the A matrix, the computational time of these 2D algorithm is similar to the 1D one. 4. PROPOSED ADAPTATIVE TRANSMISSION ALGORITHM In this section, we propose to use the algorithms discussed in section 3 as efficient predictors in the non uniform transmission system proposed in section 2 in order to minimize the number of bit to transmit. At each time n, knowing all transmitted samples and using the same identification and reconstruction method as the one used in the receiver, the transmitter evaluates the signal reconstruction performance in the SAMPTA'09 • In the transmitter: . en,P = xn − x̂n,P . if (|en,P | = 1e−5 and |en−1,P | > 1e−5 or |en,P | > 1e−5 and |en−1,P | = 1e−5 ), one bit flag is transmitted, . else if |en,P | < S2 , . if |en,P | < S3 , B3 bits are transmitted, . else B2 bits are transmitted, . else B1 bits are transmitted. • In the receiver, the method described in 3 is used for adaptive identification and reconstruction of a signal subject to missing data: if a new sample is received, the AR parameters are updated. Otherwise, the missing sample is predicted in terms of the past available samples and the current estimation of the parameters. 5. SIMULATIONS x(i, j ) x̂i,j = receiver. This can be done by comparing the receiver prediction error, |en,P |, with different thresholds, S1 ≈ 0, S2 , ..., Si . Thus, if the receiver is able to reconstruct the sample without error (error greatly smaller than the quantification error (1e−5 )), only a one bit flag is transmitted to indicate the first and the last missing sample. The number of thresholds Si and their values are chosen according to the probability law of the prediction error to transmit only the Bi bits required to code the prediction error for each threshold. The proposed coding decoding algorithm can be summarized, at a time n, as: The performances of both proposed methods are compared to the JPEG2000. The first method uses a 1D AR model of order 3 of the signal. In the second method, the image is modeled by a 2D AR process of order (2, 2). The performances of the different methods are evaluated in term of bit rate (in bpp) on CT images. The PSNR is computed for the proposed methods to show the lossless reconstruction of the image. The PSNR which have been reached for all the simulations corresponds to the infinity value. Table 1 shows the results for CT images of (512x512x12) bits presented in figures 2, 3 and 4 (Images courtesy of Dr Kopans, MGH Boston, USA. Tomosynthesis investigational device from GE Healthcare (Chalfont St Giles, UK)). In these images the prediction error is in most of the case small (lower than 32), but for the pixels of the edge of the ROI the prediction error requires 12 bits to be coded. Consequently, the following values are chosen for the number of bit to code the prediction error : B1 = 13, B2 = 8, B3 = 6. 6. CONCLUSION A new digital transmission system for lossless image reconstruction has been proposed. It is based on a non-uniform transmission principle and on extensions to 2D of the algorithm proposed in [8] for real time identification and reconstruction of AR processes subject to missing data. The pro- 265 50 50 100 100 150 150 200 200 250 250 300 300 350 350 400 400 450 450 500 500 50 100 150 200 250 300 350 400 450 500 Fig. 2. CT1 image 50 100 150 200 250 300 350 400 450 500 Fig. 4. CT3 image coding system, 2000. 50 100 [2] ISO/IEC 14495-1, “Information technology - lossless and near-lossless compression of continuous-tone still images,” JPEG-LS standard, Baseline, 2000. 150 200 250 [3] A. Said and W. A. Pearlman, “A new fast and efficient image codec based on set partitionning in hierarchical trees,” IEEE Trans. on Circuits and systems for Video Technology, vol. 6, pp. 243–250, June 1996. 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 Fig. 3. CT2 image posed methods, applied on CT images, has shown in their two forms (2D as well as 1D) an improvement in bit rate comparing to the JPEG2000 and JGPEG-LS standards. Comparing to the JPEG2000, significant gains for lossless compression are reached: 3.4% for CT3 image up to 4.6% for CT1 image. Comparing to the JPEG-LS, the most significant gains (2.7% up to 3.6%) are reached for CT2 and CT1 images where the RLE coding of the JPEG-LS is not used. 7. REFERENCES [1] ISO/IEC 15444-1, “Information technology - jpeg2000 image coding system,” JPEG2000 standard, Part 1-Core Table 1. Comparison of the three methods in bit rate (in bpp) for CT images of (512x512x12) bits: Method CT 1 CT 2 CT 3 1 6.53 6.67 5.10 2 6.45 6.61 5.10 JPEG2000 6.76 6.89 5.28 JPEG-LS 6.69 6.79 5.15 SAMPTA'09 [4] A. Munteanu and J. Cornelis, “Wavelet based lossless compression scheme with progressive transmission capability,” International Journal of Imaging Systems and Tecnology, vol. 10, pp. 76–85, January 1999. [5] S. Mirsaidi, G. Fleury, and J. Oksman, “Reducing quantization error using prediction/non uniform transmission,” in Proc. International Workshop on Sampling Theory and Applications. IEEE, 1997, pp. 139–143. [6] E. Lahalle and J. Oksman, “ADPCM speech coder with adaptive transmission and ARMA modelling of non-uniformly sampled signals,” in 5th Nordic Signal Processing Symposium, CD-ROM proceedings, Norway. IEEE, 2002. [7] S. Mirsaidi, G. Fleury, and J. Oksman, “LMS like AR modeling in the case of missing observations,” IEEE Transactions on Signal Processing, vol. 45, pp. 1574– 1583, June 1997. [8] R. Zgheib, G. Fleury, and E. Lahalle, “Lattice algorithm for adaptive stable identification and robust reconstruction of non stationary ar processes with missing observations,” IEEE Transactions on Signal Processing, vol. 56, pp. 2746–2754, July 2008. [9] N. S. Jayant and P. Noll, “Digital coding of waveform, principles and applications to speech and video,” Prentice Hall, 1984. 266 Geometric Sampling of Images, Vector Quantization and Zador’s Theorem Emil Saucan (1) , Eli Appleboim (2) and Yehoshua Y. Zeevi (2) (1) Department of Mathematics, Technion - Israel Institute of Technology, Haifa 32000, Israel. (3) Electrical Engineering Department, Technion - Israel Institute of Technology, Haifa 32000, Israel. semil@tx.technion.ac.il, eliap@ee.technion.ac.il, zeevi@ee.technion.ac.il Abstract: We present several consequences of the geometric approach to image sampling and reconstruction we have previously introduced. We single out the relevance of the geometric method to the vector quantization of images and, more important, we give a concrete and candidate for the optimal embedding dimension in Zador’s Thorem. An additional advantage of our approach is that that this provides a constructive proof of the aforementioned theorem, at least in the case of images. Further applications are also briefly discussed. 1. Introduction In recent years it became common amongst the signal processing community, to consider images and other signals as well, as Riemannian manifolds embedded in higher dimensional spaces. Usually, the embedding manifold is taken to be Rn , but other options can, and had been considered. Along with that, sampling is an essential preliminary step in processing of any continuous signal by a digital computer. This step lies at heart of any digital processing of any (presumably continuous) data/signal. It is therefore natural to strive to achieve a sampling method for images, viewed as such, that is as higher dimensional dimensional objects (i.e. manifolds), rather than their representation as 1-dimensional signals. In consequence, our sampling and reconstruction techniques stem from the the fields of differential geometry and topology, rather than being motivated by the traditional framework of harmonic analysis. More precisely, our approach to Shannon’s Sampling Theorem is based on sampling the graph of the signal, considered as a manifold, rather than a sampling of the domain of the signal, as is customary in both theoretical and applied signal and image processing. In this context it is important to note that Shannon’s original intuition was deeply rooted in the geometric approach, as exposed in his seminal work [14]. Our approach is based upon the following sampling theorem for differentiable manifolds that was recently presented and applied in the context image processing [12]: Theorem 1 Let Σn ⊂ RN , n ≥ 2 be a connected, not necessarily compact, smooth manifold, with finitely many compact boundary components. Then, there exists a sampling scheme of Σn , with a metric density D = D(p) = SAMPTA'09 ³ ´ 1 D k(p) , where k(p) = max{|k1 |, ..., |kn |}, and where k1 , ..., kn are the principal curvatures of Σn , at the point p ∈ Σn . In particular, if Σn is compact, then there exists a sampling of Σn having uniformly bounded density. Note, however, that this is not necessarily the optimal scheme (see [12]). The constructive proof of this theorem is based on the existence of the so-called fat (or thick) triangulations (see [11]). The density of the vertices of the triangulation (i.e. of the sampling) is given by the inverse of the maximal principal curvature. An essential step in the construction of the said triangulations consists of isometrically embedding of Σn in some RN , for large enough N (see [10]), where the existence of such an embedding is guaranteed by Nash’s Theorem ([9]). Resorting to such a powerful tool as Nash’s Embedding Theorem appears to be an impediment of our method, since the provided embedding dimension N is excessively high (even after further refinements due to Gromov [4] and Günther [5]). Furthermore, even finding the precise embedding dimension (lower than the canonical N ) is very difficult even for simple manifolds. However, as we shall indicate in the next section, this high embedding dimension actually becomes an advantage, at least from the viewpoint of information theory. The resultant sampling scheme is in accord with the classical Shannon theorem, at least for the large class of (bandlimited) signals that also satisfy the condition of being C 2 curves. In our proposed geometric approach, the radius of curvature substitutes for the condition of the Nyquist rate. To be more precise, our approach parallels, in a geometric setting, the local bandwidth of [7] and [16]. In other words, manifolds with bounded curvature represent a generalization of the locally band limited signals considered in those papers. We concentrate here only on some of the consequences of Theorem 1. More precisely, we present, in Sections 2 and 3, two applications of our geometric sampling method and of the embedding technique employed in the proof, namely to the vector quantization of images and to determining the embedding dimension in Zador’s Theorem, respectively. Further directions of study are briefly discussed in the concluding section. 267 2. Vector Quantization for Images A complementary byproduct of the constructive proof of Theorem 1 is a precise method of vector quantization (or block coding). Indeed, the proof of Theorem 1 consists in the construction of a Voronoi (Dirichlet) cell complex {γ̄kn } (whose vertices will provide the sampling points). The centers ak of the cells (satisfying a certain geometric density condition) represent, as usual, the decision vectors. An advantage of this approach, besides its simplicity, is entailed by the possibility to estimate the error in terms of length and angle distortion when passing from the cell complex {γ̄kn } to the Euclidean cell complex {c̄nk } having the same set of vertices as {γ̄kn } (see [10]). Indeed, in contrast to other related studies, our method not only produces a piecewise-flat simplicial approximation of the given manifold, it also actually renders a simplicial complex on the manifold. Moreover, one can actually compute the local distortion resulting by passing from the Euclidean geometry of the piecewise-flat approximation to the intrinsic geometry of its projection on the manifold. If M = M n is a manifold without boundary, then locally, for any triangulation patch the following inequality holds [10]: 3 5 dM (x, y) ≤ deucl (x̄, ȳ) ≤ dM (x, y) ; 4 3 where deucl , dM denote the Euclidean and intrinsic metric (on M ) respectively, and where x, y ∈ M and x̄, ȳ are their preimages on the piecewise-flat complex. For manifolds with boundary, the same estimate holds (for the intM and ∂M ), except for a (small) zone of “mashing” triangulations (see [11]), where the following weaker distortion formula is easily obtained: 5 3 dM (x, y)−f (θ)η∂ ≤ deucl (x̄, ȳ) ≤ dM (x, y)+f (θ)η∂ ; 4 3 where f (θ) is a constant depending on the θ = min {θ∂ , θint M } – the fatness of the triangulation of ∂M and int M, respectively, and η∂ denotes the mesh of the triangulation of a certain neighbourhood of ∂M (see [11]). In other words, the (local) projection mapping π between the triangulated manifold M and its piecewise-flat approximation Σ is (locally) bi-lipschitz if M is open, but only a quasi-isometry (or coarsely bi-lipschitz) if the boundary of M is not empty. But the main advantage of a geometric sampling of images resides in the fact that the sampling is done according to the geometric, hence intrinsic, features of the image, rather in the arbitrary (as far as features are concerned) manner of classical approach that transforms the image into a 1-dimensional array (signal). Therefore, the resulting sampling is adaptive, hence sparse in regions of low curvature, and, as shown in [1], it is even compressive in some special cases. 3. Zador’s Theorem A more important application stems, however, from Zador’s Theorem [15], implying that we can turn into an SAMPTA'09 advantage the inherent “curse of dimensionality”. Indeed, by of Zador’s Theorem, the average mean squared error per dimension: Z 1 E= deucl (x, pi )p(x)dx , N RN pi being the code point closest to x and p(x) denoting the probability density function of x, can be reduced by making avail of higher dimensional quantizers (see [2]). Since for embedded manifolds it obviously holds that p(x) = p1 (x)χM , we obtain: Z 1 E= deucl (x, pi )p1 (x)dx , N Mn It follows that, if the main issue is accuracy, not simplicity, then 1-dimensional coding algorithms (such as the classical Ziv-Lempel algorithm) perform far worse than higher dimensional ones. Of course, there exists an upper limit for the coding dimension, since otherwise one could just code the whole data as one N -dimensional vector (albeit of unpractically high dimension). The geometric coding method proposed here provides a natural high dimension for the quantization of M n – the embedding dimension N . Moreover, it closes (at least for images and any other data that can be represented as Riemannian manifolds) an open problem related to Zador’s Theorem: finding a constructive method to determine the dimension of the quantizers (Zador’s proof is nonconstructive). In fact, for a uniformly distributed input (as manifolds, hence noiseless images, can assumed to be, at least in first approximation) a better estimate of the average mean squared error per dimension can be obtained, namely: E= 1 N R d (x, pi )dx M n R eucl Mn dx = 1 N R deucl (x, pi )dx , Vn (M n )dx Mn where Vn denotes the n-dimensional volume (area) of M . Whence, for compact manifolds one obtains the following expression for E: E= 1 N R d (x, pi )dx = dx eucl n MP mR i Vi 1 N R d (x, pi )dx M n eucl P , m V (V n i )dx i where Vi represent the Voronoi cells of the partition. Moreover, we have the following estimate for the quantizer problem, that is: Chose centers of cells such that the quantity R 1 d (x, pi )dx 1 m M n eucl . Q= ¢1+ N2 ¡ P m N 1 V m i n is minimized. Here, again, the high embedding dimension N furnishes us with yet an additional advantage. Indeed, manifolds N increases dramatically, even for compact manifolds and even taking into consideration Gromov’s and Günther’s improvement of Nash’s original method (see [4], resp. [5]). For instance, n = 2 requires embedding dimension N = 10 and n = 3 the necessitates N = 14. Hence, for large enough n one can write the following rough estimate: 268 1 Q≈ N R Mn deucl (x, pi )dx Pm . i Vn 4. Conclusions and Future work As we have stressed above, our geometrical approach to sampling lends itself to consideration of a much broader range of topics in communications, for such problems as Coding, Channel Capacity, amongst others (see [13]). In particular, and almost as an afterthought of the ideas presented in Section 2, it offers a new method for PCM (pulse code modulation – see [2] for a brief yet lucid presentation) of images, considered as such and not as 1dimensional signals. This approach is endowed with an inherent advantage in that the sampling points are associated with relevant geometric features (via curvature) of the image, viewed as a manifold of dimension ≥ 2, and are not chosen via the Nyquist rate of some rather arbitrarily computed 1-dimensional signal. Moreover, the sampling is in this case adaptive and, indeed, compressive, lending itself to interesting technological benefits. The implementation of the PCM method described above, as well as experimenting with the geometric quantization method, represent the applicative directions of study that are natural and interesting to pursue further. A better understanding of the geometry of images, included color, texture and other relevant features, in terms of curvature, represent the theoretical directions to be pursued in future. In particular, determining the lowest embedding dimension and finding global curvature constraints are, as we have seen, important for a highly compressive sampling. 5. The role of curvature We briefly discuss here the crucial role of curvature in determining the embedding dimension (and hence the Zador dimension) by illustrating it on a “toy” example, namely that of the torus. For a “round” torus of revolution Tr2 in R3 , the embedding dimension is N = 3, since the metric of Tr2 is the intrinsic one induced by the Euclidian one of the ambient space R3 , thus in this case our method does not depart too much from standard ones. However, if one considers the flat torus Tf2 , i.e. of Gaussian curvature K ≡ 0, then the minimal dimension needed for isometric embedding is N = 4 (see, e.g. [3]). (Before we proceed further, let us note that such tori arise naturally when considering planar rectangles with opposite sides identified – that is, “glued” – via translations. In a practical context, these would model 2-dimensional repetitive patterns on a computer screen, e.g. screen savers. Flat tori also appear in another context relevant to Computer Graphics and Image Processing, namely as solutions for discrete curvature flows (on triangular meshes), see e.g. [8].) In general, given a 2-dimensional torus, equipped with generic Riemannian metric, the whole range of dimensions, up to, and including, the one prescribed by the Nash-GromovGünther Theorem, is possible. There are huge differences arising not only from the sign of the curvature, but from SAMPTA'09 its “speed of change” as well – for a exhaustive treatment of this subject see [6]. 6. Acknowledgments The authors would like to thank Professor Peter Maass, for his constructive critique and encouragement. The first author would also like to thank Professor Shahar Mendelson – his warm support is gratefully acknowledged. References: [1] Eli Appleboim, Emil Saucan and Yehoshua Y. Zeevi. Geometric Sampling For Signals With Applications to Images. Proceedings of Sampta 2007, 2008. [2] John H. Conway and Neil J. A. Sloane Sphere Packings, Lattices and Groups. Springer, New York, 1999. [3] Manfredo P. do Carmo Differential Geometry of Curves and Surfaces. Prentice-Hall, Englewood Cliffs, N.J., 1976. [4] Mikhail Gromov. Partial differential relations, Springer-Verlag, Ergeb. der Math. 3 Folge, Bd. 9, Berlin-Heidelberg-New-York, 1986. [5] Matthias Günther. Isometric embeddings of Riemannian manifolds. Proc. ICM Kyoto, pages 1137–1143, 1990. [6] Qing Han and Jia-Xing Hong Isometric embeddings of Riemannian manifolds in Euclidean Spaces. AMS MSM 130, Providnce, RI, 2006. [7] K. Horiuchi. Sampling principle for continuous signals with time-varying bands. Information and Control, 13(1): 53-61, 1968. [8] Miao Jin, J. Kim and David Gu. Discrete Surface Ricci Flow: Theory and Applications. In Mathematics of Surfaces, LNCS 4647, pages 209–232, 2007. [9] John Nash. The embedding problem for Riemannian manifolds. Ann. of Math. 63:20–63, 1956. [10] Kirsi Peltonen. On the existence of quasiregular mappings. Ann. Acad. Sci. Fenn., Series I Math., Dissertationes 1992. [11] Emil Saucan. Note on a theorem of Munkres. Mediterr. j. math. 2(2):215–229, 2005. [12] Emil Saucan, Eli Appleboim, and Yehoshua Y Zeevi. Sampling and Reconstruction of Surfaces and Higher Dimensional Manifolds. Journal of Mathematical Imaging and Vision 30(1):105–123, 2008. [13] Emil Saucan, Eli Appleboim, and Yehoshua Y Zeevi. Geometric Approach to Sampling and Communication. Technion CCIT Report #707, November 2008. [14] Claude E. Shannon. Communication in the presence of noise. Proceedings of the IRE 37(1):10–21, 1949. [15] Paul G. Zador. Asymptotic Quantization Error of Continuous Signals and the Quantization Dimension. IEEE Trans. on Info. Theory, 12(1):23–86, 1982. [16] Yehoshua Y. Zeevi and E. Shlomot. Nonuniform sampling and antialiasing in image representation. IEEE Trans. Signal Process., 41(3):1223–1236, 1993. 269 SAMPTA'09 270 On average sampling restoration of Piranashvili–type harmonizable processes Andriy Ya. Olenko† and Tibor K. Pogány‡ † Department of Mathematics and Statistics, La Trobe University, Victoria 3086, Australia. ‡ Faculty of Maritime Studies, University of Rijeka, Studentska 2, HR-51000 Rijeka, Croatia. a.olenko@latrobe.edu.au, poganj@pfri.hr Abstract: Such a process will be called Piranashvili process in the sequel [11], [12]. The harmonizable Piranashvili – type stochastic processes are approximated by a finite time shifted average sampling sum. Truncation error upper bound is established; various consequences and special cases are discussed. MSC(2000): 42C15, 60G12, 94A20. Keywords: WKS sampling theorem; time shifted sampling; Piranashvili–, Loève–, Karhunen– harmonizable stochastic process; weakly stationary stochastic process; local averages; average sampling reconstruction. 1. Introduction and preparation  Given a probability space Ω, F, P and the related Hilbert–space L2 (Ω) := {X : E|X|2 < ∞}. Let us consider a non–stationary, centered stochastic L2 (Ω)–process ξ : R × Ω 7→ R having covariance function (associated to some domain Λ ⊆ R with some sigma–algebra σ(Λ)) in the form: Z Z B(t, s) = f (t, λ)f ∗ (s, µ)Fξ (dλ, dµ), (1) Λ Λ with analytical exponentially bounded kernel function f (t, λ), while Fξ is a positive definite measure on R2 provided the total variation kFξ k(Λ, Λ) of the spectral distribution function Fξ such that satisfies Z Z Fξ (dλ, dµ) < ∞. kFξ k(Λ, Λ) = Λ Λ (We mention that the sample function ξ(t) ≡ ξ(t, ω0 ) and f (t, λ) possess the same exponential types [1, Theorem 4], [11, Theorem 3]). Then, by the Karhunen–Cramér theorem the process ξ(t) has the spectral representation as a Lebesgue integral Z ξ(t) = f (t, λ)Zξ (dλ); (2) Λ in (1) and (2) Fξ (S1 , S2 ) = EZξ (S1 )Zξ∗ (S2 ) SAMPTA'09 S1 , S2 ⊆ σ(Λ). Being f (t,P λ) entire, it possesses the Maclaurin expansion ∞ f (t, λ) = n=0 f (n) (0, λ)tn /n!. Put q γ := sup c(λ) = sup lim n |f (n) (0, λ)| < ∞ . (3) Λ n Λ As the exponential type of f (t, λ) is equal to γ, for all w > γ there holds X  nπ  sin(wt − nπ) ξ(t) = ξ , (4) w wt − nπ n∈Z uniformly in the mean square and in the almost sure sense [11, Theorem 1]. This result we call Whittaker–Kotel’nikov–Shannon (WKS) stochastic sampling theorem [12]. Specifying Fξ (x, y) = δxy Fξ (x) in (1) we conclude the Karhunen–representation of the covariance function Z f (t, λ)f ∗ (s, λ)Fξ (dλ). B(t, s) = Λ Also, putting f (t, λ) = eitλ in (1) one gets the Loèverepresentation: Z Z ei(tλ−sµ) Fξ (dλ, dµ). B(t, s) = Λ Λ Here is c(λ) = |λ|. Therefore, WKS–formula (4) holds for all w > γ = sup |Λ|. Then, the Karhunen process with the Fourier kernel f (t, λ) = eitλ we recognize as the weakly stationary stochastic process having covariance Z eiτ λ Fξ (dλ), τ = t − s. B(τ ) = Λ Deeper insight into different kind harmonizabilities present [5, 13, 14] and the related references therein. Finally, using Λ = [−w, w] for some finite w in this considerations, we get the band–limited variants of the same kind processes. By physical and applications reasons the measured samples in practice may not be the exact values of the measured process ξ(t), or its covariance B(t, s) itself, near to the sample time tn , but only the local average of the signal ξ near to tn . So, the measured sample values will be Z hξ, un iU = ξ(x)un (x)dx, U = supp(un ) (5) U 271  for a sequence u := un (t) n∈Z of non–negative, normalized, that is h1, un i ≡ 1, averaging functions such that   supp(un ) ⊆ tn − σn′ , tn + σn′′ . (6) The local averaging method was introduced by Gröchenig [2] and developed by Butzer and Lei. Recently Sun and Zhou gave some results in this direction, while the stochastic counterpart of this average sampling was intensively studied in the last three–four years by He, Song, Sun, Yang and Zhu in a set of articles [15], [16] and their references therein; see for example the exhaustive references list in [4]. The listed, recently considered stochastic average sampling results are restricted to weakly stationary stochastic processes, while the approximation average sampling sums are used around the origin. Our intentions are to extend these results to time shifted average sampling, considered for the very wide class of Piranashvili processes. Theorem 1 Let f (z) be entire, bounded on the real axis and exponentially bounded having type γ < w. Denote Lf := sup f (x) , L0 (z) := R 2wLf | sin(wz)| . π(w − γ) 1 − e−π  Then for all z ∈ int ΓN (x) and N ∈ N enough large it holds X  π  sin(wz − nπ) f n w wz − nπ Z\IN (x) ≤ L0 (z)e−(N +1/2)π(w−γ)/w (N + 1/2) 1 − |z−Nx |w (N +1/2)π < L0 (z) . N (7) The proving method is contour integration, following Piranashvili’s traces [11]. Denote here and in what follows X  nπ  sin(wt − nπ) YN (ξ; t) := ξ w wt − nπ IN (t) 2. the time shifted truncated WKS restoration sum. Time shifted average sampling Now, instead to follow the approach used in [16] we take time shifted [7], [8] finite average sampling sum in approximating the initial stochastic signal ξ. First, we consider weighted average over Jn (t) := nπ/w − σn′ (t), nπ/w + σn′′ (t) for the measured value of ξ(t) at nπ/w, n ∈ IN (t) where IN (t) := {n ∈ Z : |tw/π − n| ≤ N }, N ∈ N. Let Nt be the integer nearest to tw/π. By obvious reasons we restrict the study to  π . σ := max sup max σn′ (t), σn′′ (t) ≤ 2w IN (t) R Let us define the time shifted average sampling approximation sum in the form X sin(wt − nπ) Au (ξ; t) = hξ, un iJn (t) · , wt − nπ Z and its truncated variant X sin(wt − nπ) hξ, un iJn (t) · . Au,N (ξ; t) = wt − nπ IN (t) One defines mean–square, time shifted, average sampling 2 truncation error Tu,N (ξ; t) := E ξ(t) − Au,N (ξ; t) . Now, we are interested in some reasonably simple efficient mean square truncation error upper bound appearing in the approximation ξ(t) ≈ Au,N (ξ; t). Let us introduce some auxiliary results. As Nx stands for the integer nearest to xw/π, x ∈ R, let n πo ΓN (x) := z ∈ C : |z − Nx | ≤ N + 12 w , N ∈ N. In what follows denote int(R) the interior of some R, while the series ∞ X 1 λ(q) := (2n − 1)q n=1 stands for the Dirichlet lambda function. SAMPTA'09 By simple use of (1), (2) and the Theorem 1 one deduces the following modest generalization of [11, Theorem 2] to time shifted case of sampling restoration procedure. Theorem 2 Let ξ(t) be a Piranashvili process with exponentially bounded kernel function f (t, λ) and let e f := sup sup |f (t, λ)|, L Λ R e e 0 (t) := 2Lf w | sin(wt)|  . L π(w − γ) 1 − e−π  Then for all t ∈ int ΓN (t) , we have E ξ(t) − YN (ξ; t) 2 < e 2 (t) L 0 kFξ k(Λ, Λ) . N2 (8) (9) Remark 1 Let us point out that the straightforward consequence of (9) is not only the exact L2 –restoration of the initial Piranashvili–type harmonizable process ξ by a sequence of approximants YN (ξ; t) when N → ∞, but since  2 E ξ(t) − YN (ξ; t) = O N −2 , the perfect reconstruction is possible in the a.s. sense as well (by the celebrated Borel–Cantelli Lemma). Second, the first order difference ∆x,y B [3] of B(t, s) on the plane satisfies  ∆x,y B (t, s) = B(t + x, s + y) − B(t + x, s) − B(t, s + y) + B(t, s) Z xZ y  ∂2 = B t + u, s + v dvdu . (10) 0 0 ∂u∂v Theorem 3 Let ξ(t) be a Piranashvili process with the covariance B(t, t) ∈ C 2 (R). Let (p, q) be a conjugated Hölder pair of exponents: 1 1 + = 1, p q p > 1. 272 Then we have 2 E YN (ξ; t) − Au,N (ξ; t) N n X ≤1+C n=1 ∞ X 2 ≤ Cq π sup B ′′ (t, t) · (2N + 1)2/p , 4w2 R < 1 + 2C (11) o 1 1 + (n − ∆)q (n + ∆)q 1 (n − 1/2)q n=1 < 1 + 2q+1 C λ(q) , where  2/q 2q+1 | sin(wt)|q Cq = 1 + λ(q) . q π (12) where | sin(wt)|q . πq Collecting all these estimates, we deduce (11). C= P ROOF. Having on mind (1), the properties of averaging functions sequence u and (10), we clearly derive  2 E YN (ξ; t) − Au,N (ξ; t) X  sin(wt − nπ) =E hξ nπ w − ξ(x), un iJn (t) · wt − nπ 2 IN (t) XZ = I2N (t) Z x ′′ σn (t) ′ (t) −σn Z Z ′′ σm (t) ′ (t) −σm π π un (x + n w )um (y + m w ) y  ∂2 π , v + m πv dvdu B u + nw ∂u∂v 0 0 sin(wt − nπ) sin(wt − mπ) · wt − nπ wt − mπ X sin(wt − nπ) sin(wt − mπ) ≤ wt − nπ wt − mπ 2 · 3. Main result We are ready to formulate our upper bound result for the mean square, time shifted average sampling truncation error Tu,N (ξ; t). The almost sure sense restoration procedure has been treated too. As we use average sampling sum Au,N (ξ; t) instead of YN (ξ; t) to obtain asymptotically vanishing Tu,N (ξ; t), it is not enough letting N → ∞ as in Remark 1. For average sampling we need additional conditions upon w or σ to guarantee smaller average intervals for larger/denser sampling grids. IN (t) · sup x,y≤σ Z 0 x Z 0 y  ∂2 π , v + m πv dvdu B u + nw ∂u∂v being u normalized. For the sake of brevity let us denote Hσ (n, m) the sup–term in the last display. Then, by the Hölder inequality with conjugate exponents p, q; p > 1, we get 2 E YN (ξ; t) − Au,N (ξ; t) )2/q ( )1/p ( X sin(wt − nπ) q X p ≤ Hσ (n, m) . wt − nπ 2 IN (t) IN (t) It is not hard to see that for all n, m ∈ IN (t) there holds ∂ 2 B(t, s) Hσ (n, m) ≤ σ sup ∂t∂s R2 Theorem 4 Assume the conditions of Theorems 2 and 3 have been fulfilled. Then, we have Tu,N (ξ; t) ≤ + e 2 (t) 2L 0 kFξ k(Λ, Λ) N2 Cq π 2 sup B ′′ (t, t) · (2N + 1)2/p , 2w2 R (13) e 0 , Cq are described by (8), (11) respectively. where L  Moreover, when w = O N 1/2+1/p+ε , ε > 0, we have  (14) P lim Au,N (ξ; t) = ξ(t) = 1 N →∞ for all t ∈ R. 2 ∂ 2 B(t, s) π2 ≤ sup . 4w2 R2 ∂t∂s PROOF. By direct calculation we deduce Tu,N (ξ; t) = E ξ(t) − Au,N (ξ; t) = E ξ(t) − YN (ξ; t) + YN (ξ; t) − Au,N (ξ; t) Applying now the Cauchy–Bunyakovsky–Schwarz inequality to the covariance ∂ 2 B, we deduce sup R2 R It remains to evaluate the sum of qth power of the sinc– functions. As sin(wt − Nt π) ≤1 wt − Nt π we conclude X sin(wt − nπ) q IN (t) SAMPTA'09 ≤ 2E ξ(t) − YN (ξ; t) wt − nπ 2 2 2 ∂ 2 B(t, t) ∂ 2 B(t, s) ≤ sup ∂t∂s ∂t2 R = sup |B ′′ (t, t)| . 2 + 2E YN (ξ; t) − Au,N (ξ; t) . Now, we get the asserted upper bound by (9) and (11). To derive (14), we apply the Chebyshev inequality to evaluate the probability  PN := P ξ(t) − Au,N (ξ; t) ≥ η ≤ η −2 Tu,N (ξ; t) . e 0 (t) = O(1) as N → ∞, we have Accordingly, since L X N PN ≤ K X 1 (2N + 1)2/p  + < ∞, N2 w2 N 273 K being a suitable absolute constant. Therefore, by the Borel–Cantelli Lemma, the the a.s. convergence result (14) holds true.  Remark 2 Theorem 4 ensures the perfect time shifted average sampling restoration in the mean square sense when  w = O N 1/p+ε , ε > 0: lim Tu,N (ξ; t) = 0 . N →∞ The a.s. sense restoration (14) requires stronger assumption, it holds when w = O N 1/2+1/p+ε . Remark 3 In both cases we use the so called approximate sampling procedure, that is, when in the restoration procedure w → ∞ in some fashion. The consequence of these results is that we have to restrict ourselves to the case Λ = R, such that we recognize as the non–bandlimited Piranashvili type harmonizable process case. The importance of approximate sampling procedures for investigations of aliasing errors in sampling restorations and different conditions on joint asymptotic behaviour of N and w have been discussed in detail in [7]. 4. Conclusions We have analyzed upper bounds on truncation error for time shifted average sampling restorations in the stochastic initial signal case. The convergence of the truncation error to zero was discussed. However, certain new questions immediately arise: • to derive sharp upper bounds in Theorems 3 and 4; • to obtain new results for Lp –processes using recent deterministic findings [9], [10]; • to obtain similar results for irregular/nonuniform sampling restoration using methods exposed in [6] and [10]. Acknowledgements The recent investigation was supported in part by Research Project No. 112-2352818-2814 of Ministry of Sciences, Education and Sports of Croatia and in part by La Trobe University Research Grant–501821 ”Sampling, wavelets and optimal stochastic modelling”. References: [3] Muhammed K. Habib and Stamatis Cambanis. Sampling approximation for non–band–limited harmonizable random signals. Inform. Sci 23:143–152, 1981. [4] Gaiyun He, Zhanjie Song, Deyun Yang and Jianhua Zhu. Truncation error estimate on random signals by local average. In Y. Shi et al., editors. ICCS 2007, Part II, Lecture Notes in Computer Sciences 4488, pages 1075–1082, 2007. [5] Yûichirô Kakihara. Multidimensional Second Order Stochastic Processes. World Scientific, Singapore, 1997. [6] Andriy Ya. Olenko and Tibor K. Pogány. Direct Lagrange–Yen type interpolation of random fields. Theor. Stoch. Proc. 9(25)(3–4): 242–254, 2003. [7] Andriy Ya. Olenko and Tibor K. Pogány. Time shifted aliasing error upper bounds for truncated sampling cardinal series. J. Math. Anal. Appl. 324(1): 262–280, 2006. [8] Andriy Ya. Olenko and Tibor K. Pogány. On sharp bounds for remainders in multidimensional sampling theorem. Sampl. Theory Signal Image Process. 6(3): 249–272, 2007. [9] Andriy Ya. Olenko and Tibor K. Pogány. Universal truncation error upper bounds in sampling restoration. (to appear) [10] Andriy Ya. Olenko and Tibor K. Pogány. Universal truncation error upper bounds in irregular sampling restoration. (to appear) [11] Zurab A. Piranashvili. On the problem of interpolation of random processes. Teor. Verojat. Primenen. XII(4): 708–717, 1967. (in Russian) [12] Tibor K. Pogány. Almost sure sampling restoration of bandlimited stochastic signals. In John R. Higgins and Rudolf L. Stens, editors. Sampling Theory in Fourier and Signal Analysis: Advanced Topics, Oxford University Press, pages 203–232, 284–286, 1999. [13] Maurice B. Priestley. Non–linear and Non– stationary Time Series. Academic Press, London, New York, 1988. [14] Malempati M. Rao. Harmonizable processes: structure theory. Einseign. Math. (2) 28(3–4): 295–351, 1982. [15] Zhanjie Song, Zingwei Zhu and Gaizun He. Error estimate on non–bandlimited random signals by local averages. In V.N. Aleksandrov it et al., editors. ICCS 2006, Part I, Lecture Notes in Computer Sciences 3991, pages 822–825, 2006. [16] Zhan–jie Song, Wen–chang Sun, Shou–yuan Yang and Guang–wen Zhu. Approximation of weak sense stationary stochastic processes from local averages. Sci. China Ser. A 50(4): 457–463, 2007. [1] Yuri K. Belyaev. Analytical random processes. Teor. Verojat. Primenen. IV(4): 437–444, 1959. (in Russian) [2] Karlheinz Gröchenig. Reconstruction algorithms in irregular sampling. Math. Comput. 59: 181–194, 1992. SAMPTA'09 274 Sampling of Homogeneous Polynomials Somantika Datta (1) , Stephen D. Howard (2) , and Douglas Cochran (1) (1) Arizona State University, Tempe, Arizona 85287, USA. (2) Defence Science & Technology Organisation, Edinburgh, South Australia. somantika.datta@asu.edu, stephen.howard@dsto.defence.au, cochran@asu.edu Abstract: Conditions for reconstruction of multivariate homogeneous polynomials from sets of sample values are introduced, together with a frame-based method for explicitly obtaining the polynomial coefficients from the sample data. 1. Introduction Several authors have noted the importance of interpolation and reconstruction of multivariate polynomials from sample data in applications. Zakhor [10], for example, considered the problem of interpolation of bivariate polynomials from irregularly spaced sample values in connection with two-dimensional filter design and image processing. The case of multivariate polynomials presents significant difficulties not encountered with polynomials of one variable, in particular due to the zeros of these entire functions of several variables not being isolated as occurs in the univariate setting. Consequently, it is not surprising that, in her work, Zakhor develops conditions in which suitable sampling sets are constrained to lie on certain algebraic curves. Very recent work by Varjú [9] and Benko and Króo [1] develops Weierstraß types of results for approximation of smooth multivariate functions by homogeneous polynomials. This suggests the potential utility of interpolation and reconstruction of homogeneous polynomials from sample values. It is well known that the linear space Hk (Cn ) of homogeneous polynomials of degree k in n complex variables is isomorphic to the space Symk (Cn ) of symmetric k-tensors over Cn . This fact was used by the authors in [3] to develop results concerning frames and grammians on Symk (Cn ). In this paper, a similar perspective is used to derive conditions under which coefficients of a multivariate homogeneous polynomial of known degree can be reconstructed explicitly from sets of sample values. It is shown that a sampling set that suffices for n-variate homogenous polynomials of degree k is also suitable for reconstructing the coefficients of any homogeneous polynomial in n variables of degree 1 6 ℓ < k. Further, it is noted that, modulo general position issues, the number of samples is the crucial issue in determining suitability of a sampling set. Nevertheless, some sampling sets are “better” than others in that they provide snugger frames and hence the numerical advantages they en- SAMPTA'09 tail. The relative merits of sampling sets in this respect do not depend on the particular polynomial to be reconstructed, thus allowing generically good sampling sets to be designed before any sampling is actually carried out. Before beginning the mathematical sections of the paper, a few comments on notation and terminology are in order. For x = [x(1) · · · x(n) ]T and y = [y (1) · · · y (n) ]T in Cn , their inner product will be denoted by hx, yi = n X x̄(j) y (j) j=1 where the bar denotes complex conjugate; i.e., the inner product is conjugate linear in its first argument and linear in its second argument. The corresponding convention will be used for inner products in other complex Hilbert spaces. Given a finite frame X = {x1 , ..., xm } for an n-dimensional complex vector space V , the function F : V → ℓ2 ({1, . . . , m}) = Cm given by F (w) = [hx1 , wi . . . hxm , wi]T will be called the frame operator associated with X, while F = F ∗ F : V → V (i.e., the composition of the adjoint of F with F ) will be called the metric operator associated with X. The k-fold tensor product V ⊗k of an n-dimensional vector space V is a vector space spanned by elements of the form v1 ⊗ · · · ⊗ vk where each vi ∈ V [8]. The vector (ℓ) v1 ⊗ · · · ⊗ vk has nk coordinates {vi |i = 1, . . . , k; ℓ = (ℓ) 1, . . . , n} where vi denotes the ℓth coordinate of the vector vi . The space of symmetric k-tensors associated with V , denoted Symk (V ), is the subspace of V ⊗k consisting of those tensors which remain fixed under permutation (see Chapter 10 of [8]). Symk (V ) is spanned by the tensor powers v ⊗k where ¡ v ∈¢ V . If kV has dimension n then dim Symk (V ) = n+k−1 . Sym (V ) has a natural inner k product with the property ­ ⊗k ⊗k ® k = hv, wiV . (1) v ,w Symk (V ) 2. Sampling of Homogeneous Polynomials It is well known (see, e.g., [8]) that Hk (Cn ), the linear space of homogeneous polynomials of total degree k in variables z̄ (1) , . . . , z̄ (n) is isomorphic to Symk (V ). This section points out a connection between the condition that k ⊗k X (k) = {x⊗k 1 , . . . , xm } is a frame for Sym (V ) and the n reconstructability of polynomials in Hk (C ) from the values they take at sets of m points in Cn . 275 Beginning with k = 1, let w ∈ V = Sym1 (V ) and denote by [w(1) · · · w(n) ]T ∈ Cn the coordinates of w in some orthonormal basis for V . There is an obvious isomorphism that takes w ∈ V to the polynomial pw ∈ H1 (Cn ) defined by pw (z (1) , . . . , z (n) ) = w(1) z̄ (1) + · · · w(n) z̄ (n) . If X = {x1 , . . . , xm } is a frame for V , the associated frame operator F : V → Cm is given by     (1) (n) pw (x1 , . . . , x1 ) hx1 , wi     .. ..  . (2) F (w) =  = . .   hxm , wi (1) (n) pw (xm , . . . , xm ) In other words, F (w) is a vector of values obtained by evaluating (i.e., “sampling”) pw at the points x1 , . . . , xm . One may ask whether this set of m sample values is sufficient to uniquely determine pw . To address this question, define a sampling function PX : H1 → Cm by   (n) (1) p(x1 , . . . , x1 )   ..  PX (p) =  .   (1) (n) p(xm , . . . , xm ) and note that (2) shows the frame operator is given by F (w) = PX (pw ). Because the frame operator is invertible, w is uniquely determined by F (w). Hence any pw ∈ H1 is uniquely determined by its samples PX (pw ). Conversely, if X fails to frame V , the mapping F defined by (2) is still well-defined, but has non-trivial kernel K. In this case, PX (pw ) = PX (pw+u ) for all u ∈ K. So, in particular, pw is not uniquely determined from its samples at x1 , ..., xm . A similar situation occurs for k > 1, where the space of interest is Symk (V ) and the frame is X (k) = ⊗k {x⊗k 1 , . . . , xm }. As in the k = 1 case, mapping a polynomial to its coefficient sequence defines an isomorphism between Hk (Cn ) and Symk (V ) for k > 1. If v = w⊗k ∈ Symk (V ) is a pure tensor power of w ∈ V , then  ­ ⊗k ⊗k ®  x1 , w   .. (k) F (v) =   . ­ ⊗k ⊗k ® xm , w    k  pv (x1 ) hx1 , wi     .. .. =  =  . . k hxm , wi pv (xm ) k where pv ∈ Hk is defined by pv (z) = hz, wi . Symk (V ) is spanned by pure tensor powers of elements in V [8]. Thus, for arbitrary v ∈ Symk (V ), F (k) (v) is a vector of m samples of a polynomial in Hk taken at points x1 , ..., xm . Thus, as in the k = 1 case, polynomials in (k) Hk are uniquely determined by the samples PX (p) = [p(x1 ), . . . , p(xm )]T if and only if X (k) frames Symk (V ). Theorem 1 given below implies that if one can reconstruct a polynomial in Hk (Cn ) from a certain sampling set then the same set can be used to reconstruct polynomials in Hℓ (Cn ) for all 1 6 ℓ < k. Conversely, almost every SAMPTA'09 sampling set in Cn for H1 gives rise to a sampling set for Hk where k > 1, provided there are enough vectors in the set. Theorem 1. (i) Given n and m with m > n, if X (k) = k ⊗k ⊗k (ℓ) {x⊗k 1 , x2 , . . . , xm } is a frame for Sym (V ), then X ℓ is a frame for Sym (V ) for all 1 6 ℓ < k. (ii) Almost Cn such that m > ¡n+k−1 ¢ every set of m vectors in k results in a frame for Sym (Cn ) for k > 1. k Proof. (i) Suppose that X (ℓ) is not a frame for Symℓ (V ). Then X (ℓ) does not span Symℓ (V ) and there exists g ∈ (span(X (ℓ) ))⊥ ⊂ Symℓ (V ). Take some h ∈ ⊗(k−ℓ) Symk−ℓ (V ). Let h = x1 . Then E D ­ ® ⊗(k−ℓ) ⊗ℓ g ⊗ h, x⊗k = g ⊗ h, x ⊗ x i i i E D ® ­ ⊗(k−ℓ) h, xi = g, x⊗ℓ i E D ⊗(k−ℓ) ⊗(k−ℓ) =0 = 0 · x1 , xi (k) which is a contradiction since g ⊗h ∈ Symk (V ­ ) and X⊗k ® k is a frame for Sym (V ) so that for any i, g ⊗ h, xi cannot be zero. (ii) It has been shown in [7] that for almost every set of vectors X = {x1 , . . . , xm } in Cn , the rank¢of the ¡n+k−1 ⊗k when grammian of X (k) = {x⊗k , . . . , x } is m 1 k ¡n+k−1¢ . This means that the maximum number of m> k ¡ ¢ linearly independent vectors in X (k) is n+k−1 which is k k n the same as the dimension of Sym (C ) and hence X (k) is a frame for Symk (Cn ). 3. Illustrative Examples Example 1. Consider the space V = C2 over the field C. Let x1 = [1, 0]T , x2 = [0, 1]T and x3 = [1, 1]T . The set X = {x1 , x2 , x3 } is a frame for V with corresponding frame operator   1 0 F =  0 1 . 1 1 The metric operator is F = F ∗F = · 2 1 1 2 ¸ . The eigenvalues of F are 1 and 3, which are the optimal lower and upper frame bounds respectively. · ¸ 1 2 −1 F −1 = 2 3 −1 which is the metric operator of the dual frame The dual frame is denoted by X̃ = {x̃1 , x̃2 , x̃3 } where ¸T · 2 1 −1 , ,− x̃1 = F x1 = 3 3 · ¸T 1 2 −1 x̃2 = F x2 = − , , and 3 3 · ¸T 1 1 −1 x̃3 = F x3 = , . 3 3 276 The minimum and maximum eigenvalues of F are .2679 and 3.7321, which are the optimal lower and upper frame bounds respectively. The metric operator for the dual frame is   −1 −1 0 3 −1  F −1 =  −1 0 −1 1 Plane z = u+v 2 1 0 −1 −2 −1 −0.5 0 0.5 u 1 0.5 0 −0.5 −1 1 v Figure 1: The plane z = u+v, a homogeneous polynomial of degree one. Consider reconstruction of the homogeneous polynomial p of degree one in two variables defined by p(u, v) = c(1) u + c(2) v from the three frame elements. Here k = 1, n = 2 and m = 3. Any c = [c(1) , c(2) ]T ∈ C2 can be reconstructed via the frame reconstruction formula c= 3 X hxi , ci x̃i . (3) i=1 Since p(xi ) = hxi , ci and for this example p(x1 ) = c(1) , p(x2 ) = c(2) , and p(x3 ) = c(1) + c(2) , the right side of (3) is c(1) x̃1 + c(2) x̃2 + (c(1) + c(2) )x̃3 = [c(1) , c(2) ]T . This shows that the coefficients of p(u, v) can be reconstructed from its samples at the frame elements. The polynomial p(u, v) = u + v together with the sampling set is shown in Figure 1. Example 2. If the homogeneous polynomial to be reconstructed is of degree two as given by p(u, v) = c(1) u2 + c(2) uv + c(3) v 2 then one considers the space Sym2 (C2 ) ⊂ C⊗2 . The dimension of Sym2 (C2 ) is three, which is the same as the dimension of H2 (C2 ). Hence at least three sampling points are needed. Consider the same set of sampling points as in Example 1; i.e., x1 = [1, 0]T , x2 = [0, 1]T and x3 = [1, 1]T . One can extend this set to C⊗2 by taking Kronecker products and restricting to ⊗2 T T Sym2 (C2 ) yields x⊗2 1 = [1, 0, 0] , x2 = [0, 0, 1] , and ⊗2 ⊗2 ⊗2 ⊗2 T (2) x3 = [1, 1, 1] . Let X = {x1 , x2 , x3 }. The polynomial p can be uniquely determined from its sample values at x1 , x2 and x3 because c(1) = p(x1 ), c(3) = p(x2 ), and c(2) = p(x3 ) − p(x2 ) − p(x1 ). This means that X (2) is a frame for Sym2 (C2 ). The frame operator is   1 0 0 F = 0 0 1  1 1 1 making the metric operator  2 F = 1 1 SAMPTA'09 1 1 1  1 1 . 2 ⊗2 = [1, −1, 0]T , = F −1 x⊗2 making the dual frame xg 1 1 ⊗2 ⊗2 xg = F −1 x⊗2 = [0, −1, 1]T , and xg = F −1 x⊗2 = 2 2 3 3 T [0, 1, 0] . The polynomial p(u, v) = c(1) u2 + c(2) uv + c(3) v 2 satisfies p(x1 ) = c(1) , p(x2 ) = c(3) , and p(x3 ) = c(1) + c(2) + c(3) . The coefficients of p can be obtained from its samples at x1 , x2 , and x3 by the frame reconstruction formula for Sym2 (C2 ); i.e., g g ⊗2 ⊗2 ⊗2 [c(1) , c(2) , c(3) ]T = p(x1 )xg 1 + p(x2 )x2 + p(x3 )x3 . Example 3. Consider now the frame for C2 formed by x1 = [1, 0]T , x2 = [2, 0]T , and x3 = [0, 1]T . In this case, reconstruction of p(u, v) = c(1) u2 + c(2) uv + c(3) v 2 from samples p(x1 ), p(x2 ), and p(x3 ) is not generally possible, even though the number of samples is the same as the dimension of H2 (C2 ). This is because x1 and x2 are scalar multiples of each other and the corresponding vectors in Sym2 (C2 ), {[1, 0, 0]T , [2, 0, 0]T , [0, 0, 1]T } do not constitute a frame for Sym2 (C2 ). This is an example where the tensor powers of a frame for V do not frame Symk (V ), even though the number of vectors is adequate. Example 4. Reconstruction of homogeneous polynomials in H3 (C2 ) requires at least four points, since the dimension of Sym3 (C2 ) and hence that of H3 (C2 ) is four. Taking the frame X = {x1 , x2 , x3 , x4 } = {[1, 0]T , [0, 1]T , [1, 1]T , [1, −1]T } for C2 , computing Kronecker products and restricting to Sym3 (C2 ) yields X (3) ⊗3 ⊗3 = {x⊗3 , x⊗3 2 , x3 , x4 } 1     1 0 1        0 0  , , 1 =  0   0   1    1 1 0  1     −1   . ,   1    −1   A homogeneous polynomial of the form p(u, v) = c(1) u3 + c(2) u2 v + c(3) uv 2 + c(4) v 3 can be reconstructed from its samples at these points as c(1) = p(1, 0), c(2) = 1 (3) = 12 (p(1, 1) + 2 (p(1, 1) + p(1, −1) − 2p(1, 0)), c (4) p(1, −1) − 2p(1, 0)), and c = p(0, 1) so that X (3) constitutes a frame for Sym3 (C)2 . The frame operator is   1 0 0 0  0 0 0 1   F =  1 1 1 1  1 −1 1 −1 making the metric operator  3  0 F =  2 0 0 2 0 2 2 0 2 0  0 2  . 0  3 277 4. z = u3 + u2v + uv2 + v3 4 3 2 1 0 −1 −2 −3 −4 −1 −0.5 0 0.5 1 0 −0.5 −1 0.5 1 v u Figure 2: A homogeneous polynomial of degree three. The optimal lower and upper frame bounds are A = 0.4384 and B = 4.5616. The metric operator of the dual frame is   1 0 −1 0  0 1.5 0 −1  . F −1 =   −1 0 1.5 0  0 −1 0 −1 ⊗3 = [1, 0, −1, 0]T , = F −1 x⊗3 The dual frame is xg 1 1 ⊗3 ⊗3 = = F −1 x⊗3 = [0, −1, 0, 1]T , xg = F −1 x⊗3 xg 3 3 2 2 g ⊗3 ⊗3 T −1 T [0, .5, .5, 0] , and x = F x = [0, −.5, .5, 0] . 4 4 The coefficients of a degree-three polynomial p(u, v) = c(1) u3 + c(2) u2 v + c(3) uv 2 + c(4) v 3 are given by g ⊗3 ⊗3 [c(1) , c(2) , c(3) , c(4) ]T = p(x1 )xg 1 + p(x2 )x2 g ⊗3 ⊗3 + p(x3 )xg 3 + p(x4 )x4 . Such a polynomial and the sampling points are shown in Figure 2. Example 5. As the degree k or the dimension n gets larger numerical issues arise in calculating the inverse of the metric operator in order to get the dual frame that is needed for the reconstruction [4]. Ideally, one would like to construct tight frames for Symk (Cn ). Since the upper and lower frame bounds determine the numerical merits of a particular frame, it is interesting to observe how starting with a fixed frame for C2 the frame bounds change as this frame is extended to frames for Symk (C2 ) as k increases. Taking the frame for C2 to be {[1, 0]T , [0, 1]T , [1, 1]T , [1, −1]T }, the frame bounds for C2 , Sym2 (C2 ), and Sym3 (C2 ) are tabulated below. Space C2 Sym2 (C2 ) Sym3 (C2 ) Optimal lower frame bound A 3 1 .4384 Optimal upper frame bound B 3 5 4.5616 B/A 1 5 10.4 In this particular case it appears that the ratio B/A increases as k, the degree of the polynomial increases. SAMPTA'09 Conclusions and Future Work It has been shown that homogeneous polynomials of degree k in n variables can be reconstructed from their samples at elements of a frame for Symk (Cn ). Such a set can also be used to reconstruct n-variate homogeneous polynomials of all degrees ℓ where 1 6 ℓ < k. In recent work [1], [9] conditions under which a smooth function can be approximated by homogeneous polynomials have been established. Combining these results to approximately reconstruct smooth functions from sampled data and a possible construction of tight frames for Symk (Cn ) will be given in a detailed version of this work. The metric operator and the grammian of a frame have the same non-zero eigenvalues. Also G ◦k , the grammian of ⊗k ⊗k X (k) = {x⊗k 1 , x2 , . . . , xm }, is the k-fold Hadamard product of G, the grammian of X = {x1 , x2 , . . . , xm }. Relationship between the eigenvalues of G and G ◦k ([2], [5], [6]) may be used to obtain information about the frame bounds for a frame for Symk (Cn ) which comes from a frame for Cn , see Example 5. 5. Acknowledgments The authors would like to thank John McDonald for useful discussions on the topic of this paper. References: [1] D. Benko and A. Kroó. A Weierstrass-type theorem for homogeneous polynomials. Transactions of the American Mathematical Society, 361(3):1645 – 1665, 2009. [2] G. Cheng, X. Cheng, T. Huang, and T. Tam. Some bounds for the spectral radius of the Hadamard product of matrices. Applied Mathematics E-Notes, 5:202–209, 2005. [3] S. Datta, S. D. Howard, and D. Cochran. Geometry of the Welch bounds. IEEE Transactions on Information Theory. In review. [4] I. Daubechies. Ten Lectures on Wavelets. SIAM, 1992. [5] M. Fang. Bounds on eigenvalues of the Hadamard product and the Fan product of matrices. Linear Algebra and its Applications, 425:7–15, 2007. [6] E. I. Im. Narrower eigenbounds for Hadamard products. Linear Algebra and its Applications, 264:141– 144, 1997. [7] I. Peng and S. Waldron. Signed frames and Hadamard products of Gram matrices. Linear Algebra and its Applications, 347:131–157, 2002. [8] R. Shaw. Linear algebra and group representations, volume 2. Academic Press, 1983. [9] P. Varjú. Approximation by homogeneous polynomials. Constructive Approximation, 26:317 – 337, 2007. [10] A. Zakhor and G. Alvstad. Two-dimensional polynomial interpolation from nonuniform samples. IEEE Transactions on Signal Processing, 40(1):169 – 180, 1992. 278 On sampling lattices with similarity scaling relationships Steven Bergner (1) , Dimitri Van De Ville(2) , Thierry Blu(3) , and Torsten Möller(1) (1) GrUVi-Lab, Simon Fraser University, Burnaby, Canada. (2) BIG, Ecole Polytechnique F édérale de Lausanne, Switzerland. (3) The Chinese University of Hong Kong, Hong Kong, China. sbergner@cs.sfu.ca, thierry.blu@m4x.org, Dimitri.VanDeVille@epfl.ch, torsten@cs.sfu.ca Abstract: R= We provide a method for constructing regular sampling lattices in arbitrary dimensions together with an integer dilation matrix. Subsampling using this dilation matrix leads to a similarity-transformed version of the lattice with a chosen density reduction. These lattices are interesting candidates for multidimensional wavelet constructions with a limited number of subbands.  0 1 −0.3307 −0.375  ,K=  2 4 −1 −1  , θ = 69.3◦ 1.5 1 0.5 0 1. Primer on sampling lattices and related work A sampling lattice is a set of points {Rk : k ∈ Z n } ⊂ Rn that is closed under addition and inversion. The nonsingular generating matrix R ∈ R n×n contains basis vectors in its columns. Lattice points are uniquely indexed by k ∈ Zn and the neighbourhood around each sampling point is identical. This makes them suitable sampling patterns for the reconstruction in shift-invariant spaces. Subsampling schemes for lattices are expressed in terms of a dilation matrix K ∈ Z n×n forming a new lattice with generating matrix RK. The reduction rate in sampling density corresponds to |det K| = αn = δ ∈ Z+ . (1) Dyadic subsampling discards every second sample along each of the n dimensions resulting in a δ = 2 n reduction rate. To allow for fine-grained scale progression we are particularly interested in low subsampling rates, such as δ = 2 or 3. As discussed by van de Ville et al. [8], the 2D quincunx subsampling is an interesting case permitting a twochannel relation. With the implicit assumption of only considering subsets of the Cartesian lattice it is shown that a similarity two-channel dilation may not extend for n > 2. Here, we show that by permitting more general basis vectors in Rn the desired fixed-rate dilation becomes possible for any n. Our construction produces a variety of lattices making it possible to include additional quality criteria into the search as they may be computed from the Voronoi cell of the lattice [9] including packing density and expected quadratic quantization error (second order moment). Agrell et al. [1] improve efficiency for the computation by extracting Voronoi relevant neighbours. Another possible sampling quality criterion appears in the SAMPTA'09 −0.5 −1 −1.5 −1.5 −1 −0.5 0 0.5 1 1.5 Figure 1: 2D lattice with basis vectors and subsampling as given by R and K in the diagram title. The spiral shaped points correspond to a sequence of fractional subsamplings RKs for s = 0..1 with the notable feature that for s = 1 one obtains a subset of the original lattice sites shown as thick dots. This repeats for any further integer power of K, each time reducing the sample density by |det K| = 2. work of Lu et al. [4] in form of an analytic alias-free sampling condition that is employed in a lattice search. 2. Lattice construction We are looking for a non-singular lattice generating matrix R that, when sub-sampled by a dilation matrix K with reduction rate δ = αn , results in a similarity-transformed version of the same lattice, that is, it can be scaled and rotated by a matrix Q with Q T Q = α2 I. An illustration of a 1 subsampling resulting in a rotation by θ = arccos 2√ in 2 2D is given in Figure 1. Formally, this kind of relationship can be expressed as QR = RK (2) leading to the observation that subsampling K and scaled rotation Q are related by a similarity transform R−1 QR = K. (3) 279  1 j it is possible to diago1 −j nalize a 2D rotation matrix by the following similarity transform    jθ  0 cos θ − sin θ e −1 = J2 J2 = J−1 2 ∆J2 . 0 e−jθ sin θ cos θ (4) Using this observation to replace the scaled rotation matrix Q in Equation 3 leads to Using a matrix J2 =  K = R−1 QR −1 K = αR−1 J−1 Jn R n S∆S −1 K = αP∆P with R Q −1 = J−1 n SP −1 = αJn ∆Jn . (5) (6) Thus, given a matrix K that has an eigen-decomposition corresponding to that of a uniformly scaled rotation matrix, we can compute the lattice generating matrix R as in Equation 6. The elements of the diagonal matrix S inserted in the construction of R scale the otherwise unit eigenvectors in the columns of P. Below, we will refer to this construction as function formRQ(K, S) using S = I by default. 2.1 Constructing suitable dilation matrices K The eigenvalues of K, ∆ and Q impose restrictions on their shared polynomial d(λ) = det(K − n characteristic k c λ as discussed in the appendix. For λI) = k=0 k the case n = even with the only non-zero integer coefficients c0 = δ, c2n/2 < 4δ, cn = 1 this leaves a finite number of different options for c n/2 . The case n = odd permits a single possible polynomial with non-zero coefficients c0 = −δ, cn = 1. For these monic polynomials it is possible to directly construct a candidate K via the companion matrix ([6], p. 192) ⎤ ⎡ 0 −c0 ⎢ 1 0 −c1 ⎥ ⎥ ⎢ ⎥ ⎢ .. ⎥. ⎢ 1 0 . K=⎢ (7) ⎥ ⎥ ⎢ . . . . . . −c ⎦ ⎣ n−2 1 −cn−1 This allows to construct a lattice fulfilling the self-similar subsampling condition for any dimensionality n, one for every possible characteristic polynomial. With this starting point it is possible to construct additional suitable dilation matrices via a similarity transform with a unimodular matrix T KT = TKT−1 = PT ∆P−1 T . (8) Using a unimodular rather than any non-singular T guarantees that T−1 is also unimodular following from the fact that T−1 can be constructed from the adjugate (the transposed co-factor matrix) of T. Thus, K T remains an integer matrix by this transform. Possible generators for this unimodular group are discussed in ([5], pp. 23). Our implementation, referred to as function genUnimodular(n), SAMPTA'09 uses a construction of T = LU from several random integer lower and upper triangular matrices having ones on their diagonal. It is not guaranteed that all possible K for a given characteristic polynomial can be generated through a similarity transform with some T. However, formRQ(K T ) provides numerous non-equivalent R T lattice generators. Among them it is possible to apply further criteria to select the “best” lattice. An alternative to transforming K is the eigenvector scaling by diagonal matrix S in Equation 6. Using non-unit scaling allows to produce further lattices for any given K resulting in an n-dimensional continuous search space. 2.2 Construction Algorithm The steps for constructing lattices with the desired subsampling matrices are summarized in algorithm 1. The function compoly(n, α, C) is defined in the Algorithm 1 genLattices(n, δ) 1: Llist ← {} 2: Ks ← genKompans(n, δ) 3: Ts ← genUnimodular(n) ∪{I} 4: for all K ∈ Ks do 5: for all T ∈ Ts do 6: KT = TKT−1 7: (RT , QT ) ← formRQ(KT ) 8: Llist← Llist∪{(KT , RT , QT )} 9: end for 10: end for 11: return Llist Algorithm 2 genKompans(n, δ) 1: Ks = {} 2: if n is even then 3: for all C ∈ Z : C 2 < 4δ do 1 4: Ks ← Ks ∪ compoly(n, δ n , C) 5: end for 6: else {n is odd} 1 7: Ks ← {compoly(n, δ n )} 8: end if 9: return Ks appendix. A possible implementation for the function genUnimodular(n) is described in Section 2.1 and formRQ(K) is defined below Equation 6. It should be noted that the list of lattices returned by genLattices may contain several equivalent copies of the same lattice. A Gram matrix implicitly represents angles between basis vectors as A = R T R. Two lattices R1 and R2 , scaled to have the same determinant, are equivalent if their Gram matrices are related via A 1 = TT A2 T with a unimodular matrix T ∈ Z n×n and |det T| = 1. Determining this unimodular matrix is known to be a difficult problem, as it for instance also occurs when relating the adjacency matrices of two supposedly isomorphic graphs. Hence, our current method employs a simpler necessary test for equivalence by comparing the first few elements 280 1: R = [0.71 −0;−0.71 1.4] K = [2 −2;1 0] θ=45 2: R = [0 0.58;−1.7 0.65] K = [2 −1;4 −1] θ=69.3 3: R = [0 0.84;−1.2 0] K = [0 −1;2 0] θ=90 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 2 4 0 2 4 0 0 2 4 Figure 2: Three non-equivalent 2D lattices obtained for a design with dilation matrices having |det K| = 2. The lattice on the left is the well known quincunx sampling with a θ = 45 ◦ rotation. The other two are new schemes with different rotation angles. The black markers show the sample positions that are retained after subsampling by K. 1: R = [−0 0.93;1.1 −0.54] 2: R = [0 0.84;−1.2 0] K = [1 −1;2 1] θ=54.74 K = [−1 2;−2 1] θ=90 5 5 3: R = [0 0.74;−1.3 0.22] 4: R = [0.66 0;1.1 −1.5] K = [1 −1;3 0] θ=73.22 K = [−3 4;−3 3] θ=90 5 5 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 0 0 0 0 5 0 5 0 5 0 0 5 Figure 3: Three non-equivalent 2D lattices obtained for a design with dilation matrices having |det K| = 3. The lattice on the left is the well known hexagonal lattice with a θ = 30 ◦ rotation. The other three are new schemes with different rotation angles. of the set q(A) = {kT Ak : k ∈ Zn } using the Gram matrices of the respective lattices. If the sorted lists q(A 1 ) and q(A2 ) disagree in any element, R 1 and R2 are not equivalent ([5], p. 60). It is possible to restrict the set of indices k ∈ Zn to the Voronoi relevant neighbours [1]. Further, since these neighbours determine the hyperplanes bounding the Voronoi polytope of the lattice, they can also be used for a sufficient test for equivalence. kissing # = 2, # f = 14, # v = 24, G(P) = 0.081904, # zones = 6 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 0.5 −0.8 3. Constructions for different dimensions and subsampling ratios For the 2D case we have created lattices permitting a reduction rate 2 in Figure 2 and rate 3 in Figure 3. In both cases, familiar examples arise in the quincunx and the hex lattice for the respective ratios. A search of 3D lattices enjoying the self-similar subsampling property with rate 2 dilations resulted in 53 nonequivalent cases. These lattices were compared in terms of their dimensionless second order moments, corresponding to the expected squared vector quantization error ([2], p. 451). When performing the continuous optimization mentioned at the end of Section 2.1, all of these cases converged to the same optimum lattice shown in Figure 4. The dimensionless second order moment for the Voronoi Cell of this lattice is G = 0.081904. For comparison, the Cartesian cube has Gcc = 0.0833 and the truncated octahedron of the BCC lattice has G bcc = 0.0785. 4. Discussion and potential applications The current formation of candidate matrices K based on similarity transforms of one valid example is not guaran- SAMPTA'09 0 −0.5 0 0.5 −0.5 Figure 4: The best 3D lattice obtained for a design with dilation matrices having |det K| = 2. The letters f and v in the title line indicate faces and vertices, respectively. teed to produce all possible solutions. For 2D and 3D we also employed an exhaustive search over a range of integer matrices with values in [−3, 3] resulting in the same number of non-equivalent 2D cases as the construction via K T . However, for dimensionality n > 3 the exhaustive search had to be replaced by a random sampling of integer matrices ultimately rendering the method infeasible for n > 5. In that light the current construction via scaled eigenvectors of the companion matrix is a significant improvement as it allows to produce a large number of non-equivalent lattices for any dimensionality. Our subsampling schemes may have applications for multidimensional wavelet transforms [7]. Another direction for possible investigation is the construction of sparse grids that are employed in the context of high-dimensional integration and approximation adapting to smoothness conditions of the underlying function space [3]. 281 Appendix: Characteristic polynomial of a scaled rotation matrix in Rn The similarity relationship between K and Q in Equation 2 implies that they share the same characteristic polynomial d(λ) = det(K − λI) = det(Q − λI) leading to an agreement in eigenvalues d(λ k ) = 0 and determinant d(0) ([6], p. 184). Further, since K is an integer matrix the polynomial d(λ) ∈ Z[λ] has integer coefficients c k . In order to find integer matrices K with the eigenvalues of a scaled rotation matrix, it will be important to distinguish the two different forms of the diagonal matrix ∆ in Equation 4 and 5 for the case n = even ∆ = diag[e jθ1 e −jθ1 ...e jθn/2 e −jθn/2 ] and the case n = odd Thus, if n ck λk d(λ) = k=0 n =− ck k=0 n α2 λ k λ α n (13) cn−k αn−2k λk =− k=0 2k ⇔ ck = −αn−2k cn−k = −δ 1− n cn−k . By the same reasoning as for the even case, c k = 0 for all k = 1, 2, . . . n−1 2 resulting in only one possible characteristic polynomial ∆ = diag[1 ejθ1 e−jθ1 . . . ejθ(n−1)/2 e−jθ(n−1)/2 ] d(λ) = λn − αn . with analogue block-wise constructions for J n . For dimensionality n = even the characteristic polynomial of K and Q fulfills To refer to the above procedure we will invoke a function compoly(n, α, C) that returns a companion matrix (Equation 7) with a characteristic polynomial as in Equation 11 or 14. n/2 (αejθk − λ)(αe−jθk − λ) d(λ) = (14) References: k=1 n/2 (α2 − 2λα cos θk + λ2 ) = (9) k=1 n/2 = k=1 =d  4 ( 3 2 α λ α − 2 cos θk + α2 ) 2 2 λ λ α α2 λ  n λ α Thus, if n ck λk d(λ) = k=0 n = ck k=0 n k α2 λ n λ α (10) cn−k αn−2k λk = k=0 2k ⇔ ck = αn−2k cn−k = δ 1− n cn−k . 2k If ck = 0 and ck , δ ∈ Z then δ 1− n ∈ Q. This is impossible for 0 < 2k < n, assuming small values of δ, such as 2, 3 or any simple product of primes. This implies that ck = cn−k = 0 for k = 1, 2, . . . n2 − 1. For k = n2 the ck can be non-zero leading to n d(λ) = λn + Cλ 2 + αn (11) with the requirement that C 2 < 4αn so that the complex eigenvalues d(λk ) = 0 are evenly distributed on the complex circle of radius |λ k | = α. For dimensionality n = odd the polynomial fulfills (n−1)/2 (αejθk − λ)(αe−jθk − λ) d(λ) = (α − λ) (12) k=1 ⇒ d(λ) = − SAMPTA'09 λ α [1] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger. Closest point search in lattices. Information Theory, IEEE Transactions on, 48(8):2201–2214, August 2002. [2] J.H. Conway and N.J.A. Sloane. Sphere Packings, Lattices and Groups. – 3rd ed. Springer, 1999. [3] M. Griebel. Sparse grids and related approximation schemes for higher dimensional problems. In L. Pardo, A. Pinkus, E. Suli, and M.J. Todd, editors, Foundations of Computational Mathematics (FoCM05), Santander, pages 106–161. Cambridge University Press, 2006. [4] Y.M. Lu, M.N. Do, and R.S. Laugesen. A Computable Fourier Condition Generating Alias-Free Sampling Lattices. IEEE Transactions on Signal Processing, 57(5):(15 pages), May 2009. [5] M. Newman. Integral Matrices. Academic Press, 1972. See http://www.dleex.com/read/ ?3907 for a digital copy. [6] L.N. Trefethen and D. Bau III. Numerical Linear Algebra. SIAM, 1997. [7] D. Van De Ville, T. Blu, and M. Unser. Isotropic polyharmonic B-Splines: Scaling functions and wavelets. IEEE Transactions on Image Processing, 14(11):1798–1813, November 2005. [8] D. Van De Ville, T. Blu, and M. Unser. On the multidimensional extension of the quincunx subsampling matrix. IEEE Signal Processing Letters, 12(2):112– 115, February 2005. [9] E. Viterbo and E. Biglieri. Computing the Voronoi cell of a lattice: The diamond-cutting algorithm. Information Theory, IEEE Trans. on, 42(1):161–171, 1996. n d α2 λ 282 General Perturbations of Sparse Signals in Compressed Sensing Matthew A. Herman and Thomas Strohmer Department of Mathematics, University of California, Davis, CA 95616-8633, USA. {mattyh,strohmer}@math.ucdavis.edu Abstract: We analyze the Basis Pursuit recovery of signals when observing sparse data with general perturbations. Previous studies have only considered partially perturbed observations Ax + e. Here, x is a K-sparse signal which we wish to recover, A is a measurement matrix with more columns than rows, and e is simple additive noise. Our model also incorporates perturbations E (which result in multiplicative noise) to the matrix A in the form of (A + E)x + e. This completely perturbed framework extends the previous work of Candès, Romberg and Tao on stable signal recovery from incomplete and inaccurate measurements. Our results show that, under suitable conditions, the stability of the recovered signal is limited by the noise level in the observation. Moreover, this accuracy is within a constant multiple of the best-case reconstruction using the technique of least squares. 1. Introduction Employing the techniques of compressed sensing (CS) to recover signals with a sparse representation has enjoyed a great deal of attention over the last 5–10 years. The initial studies considered an ideal unperturbed scenario: b = Ax. (1) Here b ∈ Cm is the observation vector, A ∈ Cm×n (m ≤ n) is a full-rank measurement matrix or system model, and x ∈ Cn is the signal of interest which has a Ksparse representation (i.e., it has no more than K nonzero coefficients) under some fixed basis. More recently researchers have included an additive noise term e into the received signal [1, 2, 4, 8], creating a partially perturbed model: b̂ = Ax + e (2) This type of noise generally models simple, uncorrelated errors in the data or at the receiver/sensor. As far as we can tell, practically no research has been done yet on perturbations E to the matrix A. Our completely perturbed model extends (2) by incorporating a perturbed sensing matrix in the form of cally implementing the matrix A in a sensor. When A represents a system model, such as in the context of radar [7] or telecommunications, then E can absorb errors in assumptions made about the transmission channel, as well as quantization errors arising from the discretization of analog signals. In general, these perturbations can be characterized as multiplicative noise, and are more difficult to analyze than simple additive noise since they are correlated with the signal of interest. To see this, simply substitute A =  − E in (2); there will be an extra noise term Ex. (Note that it makes no difference whether we account for the perturbation E on the “encoding side” (2), or on the “decoding side” (7). The model used here was chosen so as to agree with the conventions of classical perturbation theory which we use in Section 4.) 1.1 Assumptions and Notation Without loss of generality, assume the original data x to (K) be a K-sparse vector for some fixed K. Denote σmax (Y ), (K) kY k2 , and rank(K) (Y ) respectively as the maximum singular value, spectral norm, and rank over all K-column (K) submatrices of a matrix Y . Similarly, σmin (Y ) is the minimum singular value over all K-column submatrices of Y . Let the perturbations in (2) be relatively bounded by (K) kEk2 (K) kAk2 It is important to consider this kind of noise since it can account for precision errors when applications call for physi- SAMPTA'09 kek2 ≤ εb kbk2 (3) (K) with kAk2 , kbk2 6= 0. In the real world we are only (K) interested in the case where both εA , εb < 1. 2. 2.1 CS ℓ1 Perturbation Analysis Previous Work In the partially perturbed scenario (i.e., E = 0 in (2)) we are concerned with solving the Basis Pursuit (BP) problem [3]: z ⋆ = argmin kẑk1 s.t. kAẑ − b̂k2 ≤ ε′ ẑ  = A + E. (K) ≤ εA , (4) for some ε′ ≥ 0. The restricted isometry property (RIP) [2] for any matrix A ∈ Cm×n defines, for each integer K = 1, 2, . . . , 283 the restricted isometry constant (RIC) δK , which is the smallest nonnegative number such that where (1 − δK )kxk22 ≤ kAxk22 ≤ (1 + δK )kxk22 CBP (5) holds for any K-sparse vector x. In the context of the √ (K) (K) RIC, we observe that kAk2 = σmax (A) = 1 + δK , √ (K) and σmin (A) = 1 − δK . √ Assuming K-sparse x, δ2K < 2 − 1 and kek2 ≤ ε′ , Candès has shown in Theorem 1.2 of [1] that the solution to (4) obeys kz ⋆ − xk2 ≤ CBP ε′ (6) for some constant CBP . 2.2 Incorporating nontrivial perturbation E Now assume the completely perturbed situation with E, e 6= 0 in (2). In this case the BP problem of (4) can be generalized to include a different decoding matrix Â: z ⋆ = argmin kẑk1 s.t. kÂẑ − b̂k2 ≤ ε′A,K,b ẑ (7) for some ε′A,K,b ≥ 0. The following two theorems summarize our results. Theorem 1 (RIP for Â). For any K = 1, 2, . . . , assume and fix the RIC δK associated with A, and the relative (K) perturbation εA associated with E in (3). Then the RIC ´2 ¡ ¢³ (K) −1 δ̂K := 1 + δK 1 + εA (8) for matrix  = A + E is the smallest nonnegative constant such that (1 − δ̂K )kxk22 ≤ kÂxk22 ≤ (1 + δ̂K )kxk22 (9) holds for any K-sparse vector x. Remark 1. The flavor of the RIP is defined with respect to the square of the operator norm. That is, (1 − δK ) and (1 + δK ) are measures of the square of minimum and maximum singular values of A, and similarly for Â. In keeping with the convention of classical perturbation the(K) ory however, we defined εA in (3) just in terms of the operator norm (not its square). Therefore, the quadratic (K) dependence of δ̂K on εA in (8) makes sense. Moreover, in discussing the spectrum of Â, we see that it is really a (K) linear function of εA . Theorem 2 (Completely perturbed observation). Fix the (K) (2K) relative perturbations εA , εA and εb in (3). Assume that the RIC for matrix A satisfies δ2K < √ ¡ (2K) ¢−2 2 1 + εA − 1. Set ³ (K) ε′A,K,b := c εA √ ´ + εb kbk2 , (10) 1+δK where c = √1−δ . If x is K-sparse, then the solution to K the BP problem (7) obeys kz ⋆ − xk2 ≤ CBP ε′A,K,b , SAMPTA'09 (11) ´ ³ √ (2K) 4 1 + δ2K 1 + εA µ ¶. := ´2 ³ √ (2K) 1 − ( 2 + 1) (1 + δ2K ) 1 + εA −1 (12) Remark 2. Theorem 2 generalizes of Candès’ results in [1] for K-sparse x. Indeed, if matrix A is unperturbed, then (K) E = 0 and εA = 0. It follows that δ̂K = δK in (8), and the RIPs for A and  coincide. √ Moreover, the condition in Theorem 2 reduces to δK < 2 − 1, and the total perturbation (see (17)) collapses to kek2 ≤ ε′b := εb kbk2 ; both of these are identical to Candès’ assumptions in (6). Finally, the constant CBP in (12) reduces to the same as outlined in the proof of [1]. It is also interesting to examine the spectral effects due to the assumptions of Theorem 2. Namely, we want to be assured that the rank of submatrices of A are unaltered by the perturbation E. Lemma 1. If the hypothesis of Theorem 2 is satisfied, then for any k ≤ 2K (k) (k) σmax (E) < σmin (A), (13) and therefore rank(k) (Â) = rank(k) (A). This fact is necessary (although, not explicitly stated) in the least squares analysis Section 4. The utility of Theorems 1 and 2 can be understood with two simple numerical examples. Suppose that measurement matrix A in (2) is designed to have an RIC of δ2K = 0.100. Assume, however, that its physical implementation will experience a worst-case relative error (2K) of εA = 5%. Then from (8) we can design a matrix  with RIC δ̂2K = 0.213 to be used in (7) which will yield a solution whose accuracy is guaranteed by (11) with CBP = 9.057. Note from (12), we see that if there had been no perturbation, then CBP = 5.530. Consider now a different example. Suppose instead (2K) that δ2K = 0.200 and εA = 1%. Then δ̂2K = 0.224 and CBP = 9.643. Here, if A was unperturbed, then we would have had CBP = 8.473. These numerical examples show how the stability constant CBP of the BP solution gets worse with perturbations to A. It must be stressed however, that they represent worst-case instances. It is well-known in the CS community that better performance is normally achieved in practice. 2.3 Numerical Simulations Numerical simulations were conducted as follows. Gaussian matrices of size 128 × 512 were randomly generated in M ATLAB. The entries of matrix A were normally dis2 2 tributed N (0, σA ) where σA = 1/128, while those of ma2 2 trix E were N (0, σE ) with σE = ε2A /128. The parameter εA is a measure of the relative perturbation of matrix A and took on values {0, 0.01, 0.05, 0.10}. Next, a random 284 ||z* − x||2/||x||2 0.6 εA = 0.10 0.5 εA = 0.05 0.4 εA = 0.01 • Equality occurs in (15) whenever x is in the direction of the vector associated with the value (1 + δK ) in the RIP for A. • Equality occurs in (16) since, in this hypothetical case, we assume that E = βA for some 0 < β < 1. (K) Therefore, the relative perturbation εA in (3) no longer represents a worst-case deviation (i.e., the ra- εA = 0 0.3 (K) tio 0.2 kEk2 (K) kAk2 (K) = β =: εA ). The full details of this proof can be found in [6] 0.1 3.2 10 20 30 40 Sparsity K 50 60 Figure 1: Average (100 trials) relative error of BP solution z ⋆ with respect to K-sparse x vs. Sparsity K for different relative perturbations εA of A ∈ C128×512 (and εb = 0) . vector x of sparsity K = 1, . . . , 64 was randomly generated (nonzero entries uniformly distributed with N (0, 1)) and b̂ = Ax in (2) was created (note, we set e = 0 so as to focus on the effect of perturbation E). Given b̂ and  = A + E, the BP program (7) was implemented with cvx software [5]. For each value of εA and K, 100 trials were performed. Fig. 1 shows the average relative error kz ⋆ −xk2 /kxk2 as a function of K for each εA . As a reference, the ideal, noise-free case can be seen for εA = 0. It is interesting to notice that all perturbations, including εA = 0, experience significant jumps simultaneously at several places, such as K = 31, 42, 43, 44, etc. Now fix a particular value of K ≤ 30 and compare the relative error for the three nonzero values of εA . It is clear that the error scales roughly linearly with εA . This empirical study essentially confirms the conclusion of Theorem 2, that the stability of (K) the BP solution scales linearly with εA (i.e., the singular values of E). Note that better performance in theory and in simulation can be achieved if BP is used solely to determine the support of the solution. Then we can use least squares to find a better result. This is similar to the the best-case, oracle least squares solution discussed in Section 4. 3. Proofs 3.1 Proof Sketch of Theorem 1 From the triangle inequality, (5) and (3) we have ¡ ¢2 kÂxk22 ≤ kAxk2 + kExk2 (14) ´2 ³p (K) ≤ 1 + δK + kEk2 kxk22 (15) ³ ´2 (K) ≤ (1 + δK ) 1 + εA (16) kxk22 . Moreover, this inequality is sharp for the following reasons: • Equality occurs in (14) if E is a multiple of A. SAMPTA'09 ¤ Bounding the perturbed observation Before proceeding, we need some sense of the size of the total perturbation incurred by E and e. We don’t know a priori the exact values of E, x, or e. But we can find an upper bound in terms of the relative perturbations in (3). The main goal in the following lemma is to remove the total perturbation’s dependence on the input x. Lemma 2 (Total perturbation bound). Set ε′A,K,b := ´ ³ √ (K) (K) 1+δK , and εA and cεA + εb kbk2 , where c = √1−δ K εb are defined in (3). Then the total perturbation obeys kExk2 + kek2 ≤ ε′A,K,b (17) for all K-sparse x. Proof. From (1), (5) and (3) we have µ ¶ kExk2 kek2 kExk2 + kek2 = kbk2 + kAxk2 kbk2 ! à (K) kEk2 kxk2 kek2 √ kbk2 ≤ + kbk2 1 − δK kxk2 ´ ³ (K) ≤ c εA + εb kbk2 for all x which are K-sparse. Note that the results in this paper can easily be expressed in terms of the perturbed observation by replacing kbk2 ≤ kb̂k2 . 1 − εb This can be useful in practice since one normally only has access to b̂. 3.3 Proof Sketch of Theorem 2 We duplicate the techniques used in Candès’ proof of Theorem 1.2 in [1], but with decoding matrix A replaced by Â. Set the BP minimizer in (7) as z ⋆ = x+h. Here, h is the perturbation from the true solution x induced by E and e. Instead of Candès’ (9), we determine that the image of h under  is bounded by kÂhk2 ≤ kÂz ⋆ − b̂k2 + kÂx − b̂k2 ≤ 2 ε′A,K,b which follows from the BP constraint in (7) as well as x being a feasible solution (i.e., it satisfies Lemma 2). The rest of this proof can be found in [6] ¤ 285 5. 3.4 Proof of Lemma 1 Assume the hypothesis of Theorem 2. It is easy to show that this implies p √ 4 (2K) kEk2 < 2 − 1 + δ2K . Simple algebraic manipulation then confirms that p p √ 4 (2K) 2 − 1 + δ2K < 1 − δ2K = σmin (A). Therefore, (13) holds with k = 2K. Further, for (k) (2K) any k ≤ 2K we have σmax (E) ≤ σmax (E) and (2K) (k) σmin (A) ≤ σmin (A), which proves the lemma. ¤ 4. Classical ℓ2 Perturbation Analysis Let the subset T ⊆ {1, . . . , n} have cardinality |T | = K, and note the following T -restrictions: AT ∈ Cm×K denotes the submatrix consisting of the columns of A indexed by the elements of T , and similarly for xT ∈ CK . Suppose the “oracle” case where we already know the support T of K-sparse x. By assumption, we are only interested in the case where K ≤ m in which AT has full rank. Given the completely perturbed observation of (2), the least squares problem consists of solving: z# T = argmin kÂT ẑ T − b̂k2 . ẑ T Since we know the support T , it is trivial to extend z # T to z # ∈ Cn by zero-padding on the complement of T . Our goal is to see how the perturbations E and e affect z # . More discussion on the oracle least squares analysis can be found in [6]. In the end, we find using the same ε′A,K,b in (10) that its stability is where CLS kz # − xk2 ≤ CLS ε′A,K,b √ := 1/ 1 − δK . (18) 4.1 Comparison of LS with BP Now, we can compare the accuracy of the least squares solution in (18) with the accuracy of the BP solution found in (11). In both cases the error bound is of the form C ε′A,K,b . A detailed numerical comparison of CLS with CBP is not entirely valid, nor illuminating. This is due to the fact that we assumed the oracle setup in the the least squares analysis, which is the best that one could hope for. In this sense, the least squares solution we examined here can be considered a “best, worst-case” scenario. In contrast, the BP solution really should be thought of as a “worst, of the worst-case” scenarios. The important thing to glean is that the accuracy of the BP solution, like the least squares solution, is on the order of the noise level ε′A,K,b in the perturbed observation. This is an important finding since, in general, no other recovery algorithm can do better than the oracle least squares solution. These results are analogous to the comparison by Candès, Romberg and Tao in [2], although they only consider the case of additive noise e. SAMPTA'09 Conclusion We introduced a general perturbed model for CS, and found the conditions under which BP could stably recover the original data. This completely perturbed model extends previous work by including a multiplicative noise term in addition to the usual additive noise term. We only considered K-sparse signals, however these results can be extended to also include compressible signals (see [6]). Simple numerical examples were given which demonstrated how the multiplicative noise reduced the accuracy of the recovered BP solution. In terms of the spectrum of the perturbed matrix Â, we showed that the penalty on δ̂K was a graceful, linear function of the relative per(K) turbation εA . Numerical simulations were performed with εb = 0 and appear to confirm the conclusion of The(K) orem 2, that the BP solution scales linearly with εA . We also found that the rank of  did not exceed the rank of A under the assumed conditions. This permitted an analysis of the oracle least squares solution which showed that its accuracy, like the BP solution, was limited by the total noise in the observation. Acknowledgment This work was partially supported by NSF Grant No. DMS-0811169 and NSF VIGRE Grant No. DMS0636297. References: [1] E. J. Candès. The restricted isometry property and its implications for compressed sensing. Académie des Sciences, I(346):589–592, 2008. [2] E. J. Candès, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math., 59:1207–1223, 2006. [3] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Journal Sci. Comput., 20(1):33–61, 1999. [4] D. L. Donoho, M. Elad, and V. Temlyakov. Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inf. Theory, 52(1):6– 18, Jan. 2006. [5] M. Grant, S. Boyd, and Y. Ye. cvx: Matlab software for disciplined convex programming. http://www.stanford.edu/∼boyd/cvx/. [6] M. A. Herman and T. Strohmer. General Deviants: An analysis of perturbations in compressed sensing. http://www.math.ucdavis.edu/∼mattyh/ publications.html. [7] M. A. Herman and T. Strohmer. High-resolution radar via compressed sensing. To appear in IEEE Trans. Signal Processing, Jun. 2009. [8] J. A. Tropp. Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inf. Theory, 51(3):1030–1051, Mar. 2006. 286 Analysis of High-Dimensional Signal Data by Manifold Learning and Convolutions Mijail Guillemard (1) and Armin Iske (1) (1) Department of Mathematics, University of Hamburg, D-20146 Hamburg, Germany. guillemard@math.uni-hamburg.de, iske@math.uni-hamburg.de Abstract: A novel concept for the analysis of high-dimensional signal data is proposed. To this end, customized techniques from manifold learning are combined with convolution transforms, being based on wavelets. The utility of the resulting method is supported by numerical examples concerning low-dimensional parameterizations of scale modulated signals and solutions to the wave equation at varying initial conditions. 1. Introduction Recent advances in nonlinear dimensionality reduction and manifold learning have provided new methods for the analysis of high-dimensional signals. In this problem, a very large data set U ⊂ Rn of scattered points is given, where the data points are assumed to lie on a compact submanifold M of Rn , i.e. U ⊂ M ⊂ Rn . Moreover, the dimension k = dim(M) of M is assumed to be much smaller than the dimension of the ambient space Rn , k ≪ n. Now, the primary goal in the dimensionality reduction is the construction of a low-dimensional representation of the data U . In this paper, a novel concept for signal data analysis through dimensionality reduction is proposed. To this end, suitable techniques from manifold learning are combined with convolution transforms. Moreover, another important ingredient is a (suitable) projection map P : Rn → Rk that finally outputs the desired low-dimensional representation for U . Note that for the sake of approximation quality, we need to preserve intrinsic geometrical and topological properties of the manifold M, and so the construction of the composite dimensionality reduction method requires particular care. In the proposed data analysis, the geometric distortion of the manifold, being incurred by the chosen convolution transform, plays a key role. We remark that similar concepts from differential geometry are enjoying increasing interest in related applications of sampling theory, including surface reconstruction in reverse engineering and image analysis [5]. Further related concepts can be found in classical dimensionality reduction schemes, such as in principal component analysis and multidimensional scaling, while more recent techniques are including Isomap and LLE methods [4, 7] Local Tangent Space Alignment (LTSA) [6], SAMPTA'09 Sample Logmaps [1], and, most recently, Riemannian Normal Coordinates [2, 3]. The outline of the paper is as follows. In the following Section 2, the main ingredients of the proposed nonlinear dimensionality reduction scheme, especially the construction of the convolution and projection map, are explained. Then, in Section 3 relevant aspects concerning distortion analysis are addressed. Finally, Section 4 shows the good performance of the resulting nonlinear dimensionality reduction method. To this end, numerical examples concerning low-dimensional parameterization of scale modulated signals and solutions to the wave equation at varying initial conditions are illustrated. 2. Construction of the Data Analysis Given a set of signals U = {ui }m i=1 ⊂ M, that we assume to lie in (or near) a low-dimensional Riemannian compact submanifold M, of Rn , we wish to analyse the given data for the purpose of dimensionality reduction. Therefore, we assume that there is an embedding A : Ω → M, giving a parameterization of M, where the domain Ω ⊂ Rd lies in a low-dimensional Euclidean space Rd , i.e., d ≪ n. But the parameter domain Ω is unknown. Therefore, the goal of dimensionality reduction is to find a sufficiently accurate approximation Ω′ of Ω, through which the desired low-dimensional representation for U is obtained. We remark that the construction of the data analysis is required to depend on intrinsic geometrical and topological properties of the manifold M. To this end, we apply a particular convolution transform T : M → MT , MT = {T (p) : p ∈ M}, to each of the data sites ui , followed by a suitable projection P : MT → Ω′ , yielding a nonlinear data transformation for dimensionality reduction. The following diagram reflects our concept. Ω ⊂ Rd A / U ⊂ M ⊂ Rn (1) T Ω′ ⊂ Rd o P  UT ⊂ MT ⊂ Rn Note that both the construction of the transformation T and the projection need particular care. Indeed, in order to maintain the intrinsic geometrical properties of the manifold M, it is required to investigate the curvature distortion of M under the transform T . For this purpose, convolution filters are powerful tools for the construction of 287 suitable signal transforms T . This is supported by our numerical results in Section 4., where wavelet transforms are used for a customized construction of T . Finally, let us remark that standard methods in signal processing rely on on special characteristics of a discrete-time signal uk ∈ Rn , such as frequency content, time duration, phase and amplitude information, etc. In typical application scenarios, signal data are not just isolated items of information, but they are rather incorporating correlations reflecting characteristic properties of the sampled object. Therefore, when designing customized signal transforms, one should exploit available context information on characteristic properties of the target object in order to improve the quality of the data analysis. In our particular application scenario, special emphasis needs to be placed on intrinsic geometrical properties of the manifold M, where a preprocessing distortion analysis of the curvature is of vital importance. 3. When considering the linear transformation T representing the convolution filter, an important case is when T is represented by a Toeplitz matrix, with filter coefficients H = (h1 , . . . , hm ), i.e.,  Curvature Distortion Analysis Our main objective is to estimate the curvature distortion in the geometry of the manifold M incurred by the application of the linear transformation T : M → MT , where T may, for instance, representing a wavelet or a convolution filter. To this end, we first need to evaluate relevant effects on the geometrical deformation of M under various specific transformations T . This then amounts to constructing suitable transformations T which are welladapted to the characteristic properties of the specific data. Preferable choices for T : M → MT are diffeomorphisms, in which case dim(M) = dim(MT ). 3.1 If T is invertible, then the Gaussian curvature KMT in MT can be computed as a function of the metric g in M by using a pullback of the curvature tensor R in M with respect to the inverse map T −1 : MT → M, or, equivalently, by using a pushforward of the curvature tensor R in M with respect to T : M → MT . An alternative strategy is to consider the composition of T with a particular system of local coordinates (x1 , . . . , xn ) of M, along with the metric tensor   ∂ ∂ . , gij (p) = gij (x1 , . . . , xm ) = ∂xi ∂xj Sectional Curvature Distortions In general, a fundamental invariant of a manifold with respect to its isometries are the sectional curvatures. This concept is derived from the idea of the Gaussian curvature in the setting of 2-manifolds, and is defined as KM = < R(X, Y )Y, X > , kXk2 kY k2 − < X, Y >2 for the curvature tensor R, defined for a triple of smooth vector fields X, Y, Z as R(X, Y )Z = ∇X ∇Y Z − ∇Y ∇X Z − ∇[X,Y ] Z. We recall that the affine connection (a Levi-Cevita connection for our situation) is a bilinear map h1 h2 .. .      T =  hm  0   .  .. 0 3.2 T DK (p) = KM (p) − KMT (T (p)) SAMPTA'09 for p ∈ M. 0 ... ... ... ... ... 0 0 .. . h1 h2 .. . hm       .      Curvature Distortions for Curves As for the special case of a curve r : I = [t0 , t1 ] → Rm , Rt with arc-length parameterization s(a, t) = a kr′ (x)k dx, ′′ recall that the curvature of r is k(s) = kr (s)k. For an arbitrary parameterizations of r, its curvature is given by K2 = kr̈k2 kṙk2 − < r̈, ṙ >2 . (kṙk2 )3 In the remainder of this section, we briefly discuss the curvature distortion under linear maps (e.g. convolution transform) and under smooth maps. To compute the curvature distortion of a curve r : I = [t0 , t1 ] → Rm under a linear map T , we consider the curvature of rT = {T r(t), t ∈ I}, computed as follows. ℓ=1 In order to estimate the distortion caused by the linear map T : M → MT , we compare the Gaussian curvatures between M and MT , denoted respectively KM , and KMT , hm−1 hm .. . ... ... Note that the curvature distortion caused by the map T will be controlled by the singular values of T , which due to the Toeplitz matrix structure, are obtained from the Fourier coefficients of H. Now, our primary objective is to investigate the influence of the filter coefficients in H on the curvature distortion T . Moreover, we study filters being required to obtain a DK given curvature distortion. The latter is particularly useful for the adaptive construction of a low dimensional representation of U . ∇ : C ∞ (M, T M) × C ∞ (M, T M) → C ∞ (M, T M) that can be expressed with the Christoffel symbols defined, for a particular Pnsystem of local coordinates (x1 , . . . , xn ), as ∇∂i ∂j = k=1 Γkij ∂k . The Christoffel symbols can be described with respect to the metric tensor via  m  ∂giℓ ∂gij 1 X ∂gjℓ k + + Γij = g ℓk . 2 ∂xi ∂xj ∂xℓ 0 h1 .. . KT2 ≡ KT2 (t) = kT r̈k2 kT ṙk2 − < T r̈, T ṙ >2 . (kT ṙk2 )3 (2) As for the general case of smooth maps F : Rm → Rr , the curvature distortion can be approximated by using the 288 Jacobian matrix JF and its singular value decomposition, JF (p)   =  ∂f1 ∂x1 (p) .. . ∂fr ∂x1 (p) ... .. . ... ∂f1 ∂xm (p) .. . ∂fr ∂xm (p) = UF (p)DF (p)VFT (p)    for p ∈ M. The curvature distortion of a curve r : [t0 , t1 ] → Rm under F can in this case be analyzed through the expression kJF r̈k2 kJF ṙk2 − < JF r̈, JF ṙ >2 KF2 ≡ KF2 (p) = , (kJF ṙk2 )3 where, unlike in the linear case (2), the Jacobian matrices JF depend on p ∈ M. 4. Numerical Examples This section presents three different numerical examples to illustrate basic properties of the proposed analysis of high-dimensional signal data. Further details shall be discussed during the conference. 4.1 Low-dimensional parameterization of scale modulated signals In this example, we illustrate the geometrical effect of a convolution transform for a set of functions lying on a curve embedded in a high dimensional space. More precisely, we analyze a scale modulated family of functions U ⊂ R64 , parameterized by three values in Ω ⊂ R3 , ( ) 3 X 2 U = fα(t) = e−αi (t)(· −bi ) : α(t) ∈ Ω . presents a curvature correction that recovers the original geometry of Ω fairly well. To explain the resulting curvature correction, we need to analyze the singular values and singular vectors of the convolution map T . In fact, the singular values of T can be viewed as scaling factors (stretching or shrinking) along corresponding axis in the (local) embedding of U . Moreover, the spectrum of T depends on the particular filter design. 4.2 Low dimensional parameterization of wave equation solutions In this second example, we regard the one-dimensional wave equation ∂u ∂u = c2 , ∂t ∂x Figure 1 (left) shows the parameter domain Ω, a star shaped curve in R3 . A PCA projection in R3 , applied to the set U ⊂ R64 , is also displayed in Figure 1 (middle). The projection illustrates the curvature distortion caused by the nonlinear map A : Ω ⊂ R3 → U ⊂ R64 , A(α(t)) = fα(t) . (3) with initial conditions u(0, x) = f (x), ∂u (0, x) = g(x), ∂t 0 ≤ x ≤ 1. (4) We make use of the previous example to construct a set of initial values (i.e. functions) parameterized by a star shaped curve U0 = U . Our objective is to investigate the distortion caused by the evolution Ut of the solutions on given initial values U0 . Recall that the evolution of the wave equation is constituted by the set of solutions Ut = {uα ≡ uα (t, x) : uα satisfying (3) with initial condition f ≡ fα in (4) for α ∈ Ω}. Now, the solution of the wave equation can numerically be computed by using finite differences, yielding the iteration u(j+1) = Au(j) + b(j) , i=1 The parameter set for the scale modulation is given by the curve  Ω = α(t) = (α1 (t), α2 (t), α3 (t))T ∈ R3 , : t ∈ [t0 , t1 ] . 0 < x < 1, t ≥ 0, where for µ = γ∆t/(∆x)2 , the iteration matrix is given by   1 − 2µ µ   µ 1 − 2µ µ     µ 1 − 2µ µ A= .   .. .. ..   . . . 0 µ 1 − 2µ Recall that in the convergence analysis of the iteration, which can be rewritten as, u(j+2) = Au(j+1) + b(j+1) = A(Au(j) + b(j) ) + b(j+1) = A(2) u(j) + Ab(j) + b(j+1) , Figure 1: Parameter set Ω ⊂ R3 , data U ⊂ R64 , and wavelet correction T (U ) ⊂ R64 . Finally, Figure 1 (right), shows the resulting data transformation T (U ) using a Daubechies wavelet w.r.t. a specific band of the multiresolution analysis, resulting in a filtering process for each element in U . The resulting T (U ), SAMPTA'09 the spectrum of the matrices Ak play a key role. In fact, due to the decomposition Ak = U Dk U T , the geometrical distortion in the evolution of Ut depends on the evolution of the eigenvalues of A. 4.3 Topological Distortion via Filtering In this final example, we illustrate one relevant phenomenon concerning the topological distortion caused by 289 utilized convolution involves a selection of suitable bands from the corresponding wavelet multiresolution decomposition. Further details on this shall be explained during the conference. ftorus12 0.01 0.005 0 Figure 2: One solution of the wave equation u(t, x) and one measurement u(tk , x), tk = 20. −0.005 −0.01 −0.02 0 −0.015 −0.01 −0.005 0 0.005 0.01 0.015 0.02 Figure 5: PCA projection of U ⊂ R64 onto R3 . band4 band3 0.015 0.01 0.01 0.005 0.005 0 0 −0.005 −0.005 −0.01 −0.01 0.01 −0.02 0 Figure 3: Curvature distortion of the initial manifold under the evolution of the wave equation. The outer curve represents the initial conditions U0 while the inner curve reflects the corresponding solutions Ut for some time t. the utilized convolution transformation. In this couple of two test cases, we take one 1-torus Ω1 ⊂ R3 and one 2torus Ω2 ⊂ R3 as parameter space, respectively. As in the previous examples, we generate a corresponding set of scale modulation functions U1 and U2 (see Figure 4), using Ω1 and Ω2 as parameter domains. This gives, for j = 1, 2, two different data sets ( ) 3 X j j 2 e−αi (t)(· −bi ) : αj (t) ∈ Ωj . Uj = fαj (t) = i=1 ftorus2 ftorus1 0.015 0.01 0.01 0.005 0.005 0 0 −0.005 −0.005 −0.02 −0.01 −0.02 −0.01 −0.01 0 0 0.01 0.01 0.02 0.02 −0.01 0.01 0.02 0.01 0.005 0 −0.005 0 −0.01 −0.015 −0.01 Figure 4: PCA projections of U1 , U2 ⊂ R64 onto R3 , generated by Ω1 , Ω2 ⊂ R3 , two tori of genus 1 and 2. Now we combine the set U1 and U2 by  U = ft = fα1 (t) + fα2 (t) : α1 (t) ∈ Ω1 , α2 (t) ∈ Ω2 . The resulting projection of the data U is shown in Figure 5. For the purpose of illustration, we recover the sets U1 and U2 from U . Note that this is a rather challenging task, especially since the genus of surfaces U1 and U2 are different. Figure 6 shows the reconstructions of the two surfaces U1 and U2 . Note that the both the geometrical and topological properties of U1 and U2 are recovered fairly well, which supports the good performance of our convolution transform yet once more. The reconstruction of the SAMPTA'09 −0.01 −0.02 −0.015 −0.01 −0.005 0 0.005 0.01 0.015 −0.015 0.02 −0.01 0.01 0 0 −0.01 −0.02 0.01 Figure 6: Reconstruction of U1 (left), U2 (right) from U . 5. Acknowledgments The authors were supported by the priority program DFGSPP 1324 of the Deutsche Forschungsgemeinschaft. References: [1] A. Brun, C. Westin, M. Herberthsson, and H. Knutsson. Sample logmaps: Intrinsic processing of empirical manifold data. Proceedings of the (SSBA) Symposium on Image Analysis, 1, 2006. [2] A. Brun, C.-F. Westin, M. Herberthson, and H. Knutsson. Fast manifold learning based on riemannian normal coordinates. In Proceedings of the SCIA;05, pages 920–929, Joensuu, Finland, June 2005. [3] T. Lin, H. Zha, and S.U. Lee. Riemannian Manifold Learning for Nonlinear Dimensionality Reduction. Lecture Notes in Computer Science, 3951:44, 2006. [4] S.T. Roweis and L.K. Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding, 2000. [5] E. Saucan, E. Appleboim, and Y.Y. Zeevi. Sampling and Reconstruction of Surfaces and Higher Dimensional Manifolds. Journal of Mathematical Imaging and Vision, 30(1):105–123, 2008. [6] H. Zha and Z. Zhang. Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment. SIAM Journal of Scientific Computing, 26(1):313–338, 2004. [7] H. Zha and Z. Zhang. Continuum Isomap for manifold learnings. Computational Statistics and Data Analysis, 52(1):184–200, 2007. 290 Geometric Reproducing Kernels for Signal Reconstruction Eli Appleboim (1) , Emil Saucan (2) and Yehoshua Y. Zeevi (1) (1) Technion, Dept. of Electrical Engineering (2) Technion, Dept. of Mathematics eliap@ee.technion.ac.il, semil@ee.technion.ac.il, zeevi@ee.technion.ac.il Abstract: In this paper we propose a smoothing method for non smooth signals, which control the geometry of a sampled signal. The signal is considered as a geometric object and the smoothing is done using a smoothing kernel function that controls the curvature of the obtained smooth signal in a close neighborhood of a metric curvature measure of the original signal. 1. Introduction In [11], [12], a sampling scheme for signals that posses Riemannian geometric structure was introduced.It turns out that a variety of signals fall in this setting while gray scale images is just one such example. Rather then some Nyquist rate, the sampling scheme presented in [11], [12], is based on geometric characteristics of the sampled signals. Being precise, the following sampling theorem was proved. Theorem 1 Let Σn , n ≥ 2 be a connected, not necessarily compact, smooth manifold, with finitely many compact boundary components. Then there exists a sampling of Σn , with a proper density D = D(p) = ´ ³ scheme 1 , where k(p) = max{|k1 |, ..., |k2n |}, and where D k(p) k1 , ..., k2n are the principal (normal) curvatures of Σn , at the point p ∈ Σn . While the assumed Riemannian structure relies on the assumption that the signal satisfies C 2 smoothness criteria, the authors presented in [11], an extended version of Theorem 1 also for non smooth geometric signals, where the proposed strategy uses smoothing of the original signal. The following theorem was proved. Theorem 2 Let Σ be a connected, non-necessarily compact surface of class C 0 . Then, for any δ > 0, there exists a δ-sampling of Σ, such that if Σδ → Σ, then Dδ → D, where Dδ and D denote the densities of Σδ and Σ, respectively. In the above Theorem 2 Σδ is a smoothing of Σ obtained by a convolution of Σ with a partition of unity kernel. Such a kernel being very common for manifolds smoothing indeed guarantees that the resultant manifold is as smooth as we wish however, in this process we do not have any control on the curvature of the obtained manifold. Some natural question raise in this context, SAMPTA'09 1. To what extent can we smooth the original signal, using such a reproducing kernel while assuming a predefined bounds on the curvature of the resultant manifold? 2. Can the reproducing kernel be made local, namely, can we have different kernel characteristics for different areas along the sampled signals, while being able to glue the smoothed signal along common boundaries? 3. In what way if at any, we can give affirmative answers to 1 and 2 that are adaptive to the signal? Meaning, how can we have good prior estimates for the desired curvature bounds? This paper aims at answering the above questions. Note that answering question 1 is analogous to smoothen a signal to have a predefined frequency band-pass, using a band-pass filter as commonly done in signal processing for decades. Answering 1, 2, 3 is equivalent to the use of filter banks with different band-pass characteristics. In all, giving affirmative answers to all above questions give rise to an adaptive non uniform sampling scheme for a variety of signals. We will focus along the paper on signals that are do not admit a Riemannian structure but rather have a more general geometric structure of the so called Alexandrov spaces. We will term such signals as geometric-signals. 2. Preliminaries In this section we will give some basic preliminary definitions and notations. 2.1 Alexandrov spaces Definition 3 (Alexandrov - Toponogov) [ [9]] A complete metric space X, satisfies the triangle comparison condition w.r.t κ ∈ R if for every geodesic triangle ∆pqr ∈ X, there exists a comparison triangle, i.e. a triangle, ∆p′ q′ r′ ∈ M2κ , such that pq = p′ q ′ ; qr = q ′ r′ ; rp = r′ p′ so that, for every point s ∈ pr we have that dX (s, q) > dM2κ (s′ , q ′ ) 291 where s′ ∈ p′ r′ such that ps = p′ s′ ; sr = s′ r′ Where M2κ is a complete simply connected surface of constant curvature κ. If X is an Alexandrov space then there exists a self-adjoint operator ∆, called the Laplacian defined on L2 (X) so that, Z Z v∇udHn < ∇u, ∇v > dHn = X X n th where H is the n Hausdorff measure of X, u ∈ D(∆), v ∈ W 1,2 (X). Theorem 7 ( [6]) 1. If X is compact then the spectrum of ∆ is discrete. 2. There exists a continuous heat kernel ht (x, y) on X so that, Z e−t∆ u(x) = ht (x, y)u(y)dHn (y) X 2.2 Figure 1: Comparison triangle. Definition 4 A complete metric space X, is an Alexandrov space of curvature > κ iff 1. For all x, y, ∈ X there exists a length minimizing curve γ joining x and y such that, L(γ) = dX (x, y); where L denotes the arc length of curves in X and dX stands for the metric given on X. γ is called a minimal geodesic. 2. X satisfies the triangle comparison condition for κ. 3. dimH X < ∞; dimH = Hausdorff dimension. Remark 5 In a similar way, while reversing the direction of inequalities, one can define Alexandrov space of curvature < κ. For instance, in the comparison triangle condition, we will demand, dX (s, q) < dM2κ (s′ , q ′ ) Definition 6 (Gromov) If X is an Alexandrov space of curvature < κ and κ ≤ 0 then X is called CAT (κ)space. CAT = Cartan-Alexandrov-Toponogov. 2.1.1 Examples: 1. Every complete Riemannian manifold of bounded sectional curvature. 2. The boundary of convex set in Rn is an Alexandrov space of curvature ≥ 0. 3. If Xi is a sequence of n-dimensional Alexandrov spaces of curv. ≥ κ then their Gromov-Hausdorff limit, if exists, is an Alexandrov space of curv. ≥ κ and dimension ≤ n. If the limit of the above sequence is of dimension < n we say the sequence collapses. SAMPTA'09 Approximations of manifolds Let M be a complete Riemannian manifold of bounded sectional curvature. Let p ∈ M be some point and let φi be some C ∞ kernel function supported on some ǫi neighborhood of p. For example one can take φ to be partition of unity, heat kernel and others. Let Mi be the manifold obtained by convolution, Z φi ∗ M dµ; Mi = M Note that Mi is smooth in a δi neighborhood of p even if M fails to be smooth at p. Well known results (see for instance, [7]) in differential topology assert that, ǫi → 0 ⇒ Mj → M ; where convergence of manifolds is considered in the Gromov-Hausdorff topology. While the above result concerns the convergence on a topological level, in order to have curvature control we have to account for geometric convergence as well. This is guaranteed from the studies in [3], [4] and [10]. In [3], [4] it is proved that similar convergence to the above also exist for Betti numbers which are generalizations of Euler characteristic to all dimensions and are related to curvature through higher dimensional of Gauss-Bonnet type theorems [2]. In [10] the question of proper gluing of approximations in adjacent neighborhoods is addressed. It is shown that one can obtain geometric convergence in different neighborhoods V, U of the points p, q resp. so that, on the common boundary ∂V ∩ ∂U the approximations coincide. In addition, if we write the heat operator on a manifold, N , as e−t∆N f (x), where f ∈ L2 (N ) and t > 0, x ∈ N , and ∆N , denotes the Laplace-Beltrami operator associated with N , then there is a smooth kernel function KN , such that, Z KN (t, x, y)f (y)dy; e−t∆N f (x) = N In [3] convergence of the heat kernel is also achieved, e−t∆Mi → e−t∆M 292 3. Smoothing geometric signals with curvature control In this section we present the results concerning questions 1, 2 and 3 posed in the introduction. These results give us the ability to smoothen a geometric signal while having an adaptive control on obtained curvatures. Definition 8 We say that a signal is a geometric signal iff it admits a structure of an Alexandrov space for some κ ∈ R. Let Σ be a geometric signal of sectional curvature bounded from below (above). Let p ∈ Σ be a point, and U (p) ⊂ Σ some compact neighborhood of p. Let κ = lim sup K 3. Smooth the signal while controlling the curvature of the smoothed signal to suitably approximate the estimated curvature. 4. Sample the smoothed signal according to Theorem 1 4.1 Special case - images It is common to regard images as surfaces embedded in some Rn . For gray scale images R3 is considered while for color images it is usual to take R5 . Figure 2 shows image re-sampled according to the geometric sampling proposed in Theorem1. In this example no smoothing was applied prior to sampling and artifacts of this can be seen in the reconstructed image. “Flat areas” of the image have 20 times reduced sampling resolution with respect to the original resolution. such that U (p) is an Alexandrov space of curvature > K. 3.1 Approximations of geometric signals Theorem 9 ( [1]) Given a point p on Σ, there exists smooth local kernel φi as above, yielding a sequence of manifolds Mi , smooth inside an ǫi neighborhoods of p, such that 1. Mi = Z φi ∗ Σdµ → Σ, Σ as ǫ → 0. 2. If we further assume that while the Riemannian manifolds Mi converge to Σ, no collapse occurs i.e. the Hausdorff dimension of Σ is the same as of Mi , then, the sectional curvature Ki (p) of Mi at p satisfies, lim Ki (p) = κ; ǫ→0 The theorem above answers both questions 1 and 2. We can control the curvature of the obtained smooth signals in an adaptive way by making it converge to the lim sup of Alexandrov curvature of the signal Σ. 3.2 Gluing By arguments similar to those in [10] we have, Theorem 10 ( [1]) Let the above smooth approximations of Σ be given in neighborhoods of two points p, q. Then they coincide as well as their sectional curvatures Ki,Vi , Ki,Ui on the common boundary, if non empty. 4. Sampling of geometric signals We propose the following scheme for sampling of a geometric signals. 1. Consider the signal as an Alexandrov space. This requires the representation of the signal as a tame metric space in a meaningful manner. 2. Assess the appropriate Alexandrov curvature bound. This can be done by the use of discrete metric curvature measures. SAMPTA'09 Figure 2: Geometric sampling of a gray scale image. Top to bottom - original Lena; Lena resampled. The white dots are the new sampling points. One can see the sparseness w.r.t the original; Lena reconstructed. Reconstruction using linear interpolation over the sampling points. No smoothing was done. In order to estimate the curvature of an image as an Alexandrov space we can take the set of discrete curvature measures proposed in [5] where such measures are suggested for very general cell-complexes. It is shown in [5] 293 that the one-dimensional curvature measure resembles the Ricci curvature of a cell-complex which, in the case of images (since they are 2-dimensional manifolds) coincides with the Gaussian curvature. Figure 3 shows the combinatorial Ricci (= Gauss) curvature of the image in Figure 2, see [13] for details about the adoption of the curvature measures introduced in [5] to images. Figure 3: Discrete Ricci curvature of Lena. Apart from giving an assessment for the curvature of the image as an Alexandrov space, it also serves as an excellent edge detector as itself. [4] Cheeger, J. and Gromov, M., Bounds on the Von Neumann dimension of L2 -cohomology and the GaussBonnet theorem for open manifolds, J. Diff. Geom. 21, 1985. [5] Forman, R., Bochner‘s method for cell-complexes and combinatorial Ricci curvature, Disc. Comp. Geom., 29, 2003. [6] Kuwae, K. Machigashira, Y. and Shioya, T., Sobolev spaces, Laplacian and heat kernel on Alexandrov spaces, Math. Z. 238, 2001. [7] Munkres, J. Elementary Differential Topology, Ann. Math. Stud. 54, 1966. [8] Nash, J., The Imbedding problem for Riemannian manifolds, Ann. Math. 63, 1956. [9] Otsu, Y. and Shioya, T., The Riemannain stracture of Alexandrov Spaces, J. Diff. Geom., 39, 1994. [10] Petersen, P., Wei, G. and Ye, R., Controlled geometry via smoothing, Comm. Math. Helv., 74, 1999. [11] Saucan, E., Appleboim, E. and Zeevi, Y. Y. Sampling and Reconstruction of Surfaces and Higher Dimensional Manifolds, J. Math. Imaging. Vis., 30, 2008. [12] Saucan, E., Appleboim, E. and Zeevi, Y. Y. Geometric Sampling of Manifolds for Image Representation and Processing LNCS, 4485, 2007. [13] Saucan, E. Appleboim, E., Wolansky G. and Zeevi, Y. Y., Combinatorial Ricci curvature for image processing, Midas Jour. Proc. MICCAI 2008 5. Further study Current and future studies of geometric sampling of images and signals, focus on two aspects. First we wish to modify the smoothing process introduced herein so it will be done in the Fourier domain rather than the spatial domain. Namely, we wish to smooth the Fourier transform of the signal while considering curvature in the Fourier plane. This is inspired by the Nash embedding Theorem [8] while the Fourier transform of a manifold is smoothen prior to its embedding thus achieving a higher degree of smoothness with respect to smoothing in the spatial domain. Another direction of study is devoted to the development of a geometric theory of sparse representations and geometric compress sensing. References: [1] Appleboim, E., Saucan, E. and Zeevi, Y. Y. Geometric reproducing kernels for signals, preprint. [2] Bochner, S. and Yano, K., Curvature and Betti numbers, Ann. Math. Stud. 32, 1953. [3] Cheeger, J. and Gromov, M., On the characteristic numbers of complete manifolds of bounded curvature and finite volume, Diff. Geom. and Com. Anal. Chavel Farkas Ed., Springer, 1985. SAMPTA'09 294 Multivariate Complex B-Splines, Dirichlet Averages and Difference Operators Brigitte Forster (1,2) and Peter Massopust (2,1) (1) Zentrum Mathematik, M6, Technische Universität München, Germany (2) Institut für Biomathematik und Biometrie, Helmholtz Zentrum München, Germany forster@ma.tum.de, massopust@ma.tum.de Abstract: For the Schoenberg B-splines, interesting relations between their functional representation, Dirichlet averages and difference operators are known. We use these relations to extend the B-splines to an arbitrary (infinite) sequence of knots and to higher dimensions. A new Fourier domain representation of the multidimensional complex B-spline is given. 1. Complex B-Splines Complex B-splines are a natural extension of the classical Curry-Schoenberg B-splines [2] and the fractional splines first investigated in [16]. The complex B-splines Bz : R → C are defined in Fourier domain as  z Z 1 − e−iω −iωt F(Bz )(ω) = Bz (t)e dt = iω R for Re z > 1. They are well-defined, because of { 1−eiω | ω ∈ R} ∩ {y ∈ R | y < 0} = ∅ they live on the main branch of the complex logarithm. Complex B-splines are elements of L1 (R) ∩ L2 (R). They have several interesting basic properties, which are discussed in [5]. Let Re z, Re z1 , Re z2 > 1. −iω • Complex B-splines Bz are piecewise polynomials of complex degree. • Smoothness and decay: – Bz ∈ W2r (R) for r < Re z − 12 . Here W2r (R) denotes the Sobolev space with respect to the L2 -Norm and with weight (1 + |x|2 )r . – Bz (x) = O(x−m ) for m < Re z +1, |x| → ∞. • Recursion formula: Bz1 ∗ Bz2 = Bz1 +z2 . • Complex B-splines are scaling functions and generate multiresolution analyses and wavelets. B-splines Dirichlet averages Difference operators Figure 1: Relations between classical B-splines, difference operators and Dirichlet averages. 2. Representation in time-domain We defined complex B-splines in Fourier domain, and Fourier inversion shows that these functions are piecewise polynomials of complex degree: Proposition 1. [5] Complex B-splines have a timedomain representation of the form z  1 X z−1 (t − k)+ , (−1)k Bz (t) = k Γ(z) k≥0 pointwise for all t ∈ R and in L2 (R)-norm. Here,  z t = ez ln t , if t > 0, z t+ = 0, if t ≤ 0, is the truncated power function, and Γ : C \ Z− 0 → C denotes the Euler Gamma function. Compare: The cardinal B-spline Bn , n ∈ N, has the similar representation n X 1 n−1 (−1)k (t − k)+ k (n − 1)! n Bn (t) = k=0 n 1 X (−1)k (t − k)n−1 + . Γ(n) k ∞ = k=0 • But in general, they don’t have compact support. • Last but not least: They relate difference and differential operators. In this paper, we take closer look at this last relation and the respective multivariate setting. To this end, we will consider the known relations between classical B-splines, difference operators and Dirichlet averages. SAMPTA'09 3. Relations to Difference Operators It is well-known that in the construction of the CurrySchoenberg B-splines difference operators are deeply involved. The same is true for complex B-splines. To establish the corresponding relation, let us first recall the definition of the backward difference operator ∇. 295 Let g : R → C be a function. Then the backward difference operator ∇ = ∇1 is recursively defined as follows: ∇g(t) ∇ g(t) = g(t) − g(t − 1), = ∇(∇n g(t)) for n ∈ N. n+1 This definition yields the explicit representation n   X n (−1)k g(t − k). ∇n g(t) = k k=0 For the cardinal B-splines Bn we can write:   n X 1 n−1 k n (−1) (t − k)+ Bn (t) = (n − 1)! k k=0 = 1 ∇n tn−1 + . (n − 1)! In comparison: For the complex B-splines, we have an analog representation:   ∞ 1 X z−1 k z , Re z ≥ 1. (t − k)+ Bz (t) = (−1) k Γ(z) k=0 This invites to define a complex difference operator: Definition 2. [5, 6] The difference operator ∇z of complex order z is defined as   ∞ X z ∇z g(t) := (−1)k g(t − k), z ∈ C, Re z ≥ 1. k k=0 Hence a second time domain representation of the complex B-spline is Bz (t) = 1 z−1 . ∇ z t+ Γ(z) In a similar way, we can establish a relation to divided differences. Recall that for a knot sequences {t0 , . . . , tn } ⊂ R, n ≥ 1, divided differences are recursively defined as follows. Let g : R → C be some function. [t0 ]g [t0 , . . . , tn ]g = g(t0 ), [t0 , . . . , tn−1 ]g − [t1 , . . . , tn ]g = t0 − t n n X g(tj ) Q = . l6=j (tj − tl ) j=0 For the cardinal B-spline, n X 1 (−1)k (t − k)n−1 + (n − 1)! k Definition 3. Let g : R → C be some function. We define the complex divided differences for the knot sequence N0 via [z; N0 ]g := X k≥0 (−1)k g(k) . Γ(z − k + 1)Γ(k + 1) Then the complex B-spline can be written as z−1 . Bz (t) = z[z, N0 ](t − •)+ Comparing “old” and “new” divided difference operator for z = n ∈ N, yields (−1)n [0, 1, . . . n] = [n, N0 ]. Proposition 4. [6, 7] Let Re z > 0 and g ∈ S(R+ ). Then Z 1 Bz (t)g (z) (t) dt, [z; N0 ]g = Γ(z + 1) R where g (z) = W z g is the complex Weyl derivative: For n = ⌈Re z⌉, ν = n − z,   Z ∞ n 1 z n d ν−1 W g(t) = (−1) (x − t) g(x) dx . dtn Γ(ν) t Sketch of proof: Z 1 Bz (t)g (z) (t) dt Γ(z + 1) R Z 1 z−1 W z g(t) dt z[z, N0 ](t − •)+ = Γ(z + 1) R Z ∞ 1 z−1 = [z, N0 ] W z g(t) dt (t − •)+ Γ(z) • = [z, N0 ]W −z W z g = [z, N0 ]g. R∞ z−1 1 f (t) dt is the complex (t − •)+ Here, W −z f = Γ(z) • Weyl integral of the function f , i.e., the inverse operator of W z .  Now we are able to establish a first relation between divided difference operators and Dirichlet averages. Proposition 5. (Generalized Hermite-Genocchi-Formula: Divided Differences and Dirichlet Averages) [6, 7] Let ∆∞ be the infinite-dimensional simplex N0 ∆∞ := {u := (uj ) ∈ (R+ 0) | n Bn (t) = k=0 = n n X (−1)k k=0 1 (t − k)n−1 + k!(n − k)! n X (t − k)n−1 + Q (k − l) l6=k = (−1)n n = (−1)n n[0, 1, . . . , n](t − •)n−1 + . k=0 (The factor (−1)n is due to our representation of the cardinal B-spline via backward difference operators.) The same ideas give rise to the definition of complex divided differences. SAMPTA'09 ∞ X j=0 uj = 1} = lim ∆n , ←− defined as the projective limit of the finite dimensional simplices ∆n , and let µ∞ e be the generalized Dirichlet measure defined by the projective limit µ∞ lim Γ(n + 1)λn , e =← − where λn the Lebesgue measure on ∆n . Then Z 1 g (z) (N0 · u)dµ∞ [z, N0 ]g = e (u) Γ(z + 1) ∆∞ Z 1 Bz (t)g (z) (t) dt = Γ(z + 1) R for all real-analytic g ∈ S(R+ ). 296 Up to now we have considered complex B-splines with knot sequence N0 and derived from there new difference operators and finally the relation to Dirichlet averages, just as indicated in the diagram in Fig. 1: B-splines → Difference operators → Dirichlet averages. Our next step will consist of generalizing the setting with appropriate weights in travelling through the diagram another way round: Dirichlet averages for other knot sequences τ and with weights → Generalized B-splines with knot sequence τ → Difference operators. 4. Splines and Dirichlet Averages 0 ∈ RN Let b ∈ R∞ + be a weight vector and τ = {tk }k∈N0 √ + an increasing sequence of knots with lim supk→∞ k tk ≤ ρ < e. Definition 6. A complex B-spline Bz (• | b; τ ) with weight vector b and knot sequence τ is a function satisfying Z Z g (z) (τ · u)dµ∞ Bz (t | b; τ )g (z) (t) dt = b (u) (1) ∆∞ R for all real-analytic g ∈ S(R+ ). Here, µ∞ lim µnb is b = ← − the projective limit of Dirichlet measures with densities Γ(b0 ) . . . Γ(bn ) b0 −1 b1 −1 u u1 . . . unbn −1 . Γ(b0 + . . . + bn ) 0 Since both W z and W −z are linear operators mapping S(R+ ) into itself [11, 15] and since the real-analytic functions in S(R+ ) are dense in S(R+ ) [13], (1) holds for all g ∈ S(R+ ). Moreover, since S(R+ ) is dense in L2 (R+ ), we deduce that Bz (• | b, τ ) ∈ L2 (R+ ). Equation (1) means, we define the weighted version of the complex B-spline in a weak sense via Dirichlet averages. Referring again to the diagram in Fig. 1, we now move from the generalized B-splines to generalized divided differences. 0 RN + and weight Definition 7. For knot sequences τ ∈ vectors b ∈ R∞ + as above, we define the generalized complex divided differences [z; τ ]b as follows. Let g : R → C be some function. Z 1 Bz (t|b; τ )g (z) (t) dt [z; τ ]b g := Γ(z) R for all g ∈ S(R). This definition is compatible with the usual Dirichlet and splines. In fact, for all finite τ = τ (n) ∈ Rn+1 + n+1 b = b(n) ∈ R+ , and for z = n ∈ N0 the Dirichlet spline Dn (•|b; τ ) of order n is defined by Z Z (n) g (n) (τ · u) dµnb (u) g (t)Dn (t|b; τ ) dt = ∆n R = G(n) (b; τ ) for all g ∈ C n (R). Here, G is the Dirichlet average of g: Z G(b; τ ) = g(τ · u) dµnb (u). ∆n SAMPTA'09 5. Multivariate Complex B-Splines To define complex B-splines in a multivariate setting, we consider ridge functions and define multivariate B-splines on their basis. Then, we walk again through the diagram in Fig. 1: Multivariate B-splines → Multivariate difference operators. Results on Dirichlet averages yield new recurrence relations for multivariate B-splines: Dirichlet averages → B-splines. Note that the approach via ridge functions had already let to an extension of the Curry-Schoenberg-splines to a multivariate setting, e.g. [3, 4, 10, 12]. However, some of these approaches have certain restrictions on the knots and none of them considers complex splines. Given λ ∈ Rs \{0}, a direction, and g : R → C a function. The ridge function gλ corresponding to g is defined via gλ : Rs → C, gλ (x) = g(hλ, xi) for all x ∈ Rs . ∈ (Rs )N0 a sequence Definition 8. [9] Let τ = {τ n }n∈Np 0 n s of knots in R with lim supn→∞ kτ n k ≤ ρ < e. The multivariate complex B-spline B z (•|b; τ ) with weights 0 b ∈ CN + and knots τ is defined on ridge functions via Z Z g(hλ, xi)B z (x | b; τ ) dx = g(t)Bz (t | b; λτ ) dt, Rs R (2) where g ∈ S(R+ ) and λ ∈ Rs \ {0}, such that λτ = {hλ, τ n i}n∈N0 is separated. Since ridge functions are dense in L2 (Rs ) [14], we deduce that B z (• | b; τ ) ∈ L2 ((R+ )s ). Example 9. (Divided differences in the multivariate case) Given b = e := (1, 1, 1, . . .). Then for all g ∈ S(R∞ ): [z; τ ]e gλ = = = [z; τ ]gλ = [z; τ ]g(hλ, •i) Z 1 g (z) (hλ, xi)B z (x | e; τ ) dx Γ(z) Rs Z 1 g (z) (t)Bz (t | e; λτ ) dt = [z; λτ ]g. Γ(z) R for all λ ∈ Rs such that λτ is separated. Example 10. (Multivariate cardinal B-splines) For n ∈ N and a finite sequence of knots τ = {τ 0 , τ 1 , . . . , τ n }: [τ 0 , . . . , τ n ]gλ := [n; τ ]g(hλ, •i) Z 1 = g (n) (hλ, xi)B n (x | e; τ ) dx n! Rs Z 1 g (n) (t)Bn (t | e; λτ ) dt = n! R n X g(hλ, τ j i) Q = [n; λτ ]g = . j l l6=j hλ, τ − τ i j=0 Given a sequence of knots τ ⊂ Rs and a weight vector b as above. In addition, let b ∈ l1 (N0 ) such that kbk1 =: c. (z+1) Then the Dirichlet averages of g (z) ∈ D(R) and gj := j (z+1) (hλ, τ i − •)g , j ∈ N0 , satisfy: (1+z) (c−1)G(z) (b; λτ ) = (c−1)G(z) (b−ej ; λτ )+Gj (b; λτ ). 297 For the finite dimensional case see [1, 12]. These and other relations of similar type on Dirichlet averages yield new results for multivariate complex B-splines. As a example, we state: Proposition 11. [9] Under the above conditions, for all j ∈ N0 : (c − 1) Z Rs Z (z) gλ (x)B z (x | b; τ )dx = (z) gλ (x)B z (x | b − ej ; τ )dx (c − 1) Rs Z (1+z) + (x)B z (x | b; τ )dx. hλ, τ j − xi gλ = Rs More relations of this type are given in [8]. 6. Fourier representation of multivariate complex B-splines We saw above that both the univariate and the multivariate complex B-splines are L2 -functions: Bz (• | b; τ ) ∈ L2 (R+ ) and B z (• | b; τ ) ∈ L2 ((R+ )s ) Therefore, we can apply the Plancherel transform to both functions and consider their frequency spectrum. Let ω = (ω1 , . . . , ωs ) ∈ Rs and let λ ∈ Rs , kλk = 1, be the direction of ω, i.e., ω = ωλ for some ω ≥ 0. For the Fourier transform of the generalized complex B-spline we have for x = (x1 , . . . , xs ) ∈ Rs : bz (ω | b; λτ ) = B Z e−iωt Bz (t | b; λτ ) dt = ZR = e−iωhλ,xi B z (x | b; τ ) dx Rs Z = e−iω(λ1 x1 +...+λs xs ) B z (x | b; τ ) dx Rs Z = e−i(ω1 x1 +...+ωs xs ) B z (x | b; τ ) dx s ZR = e−ihω,xi B z (x | b; τ ) dx Rs b z (ω | b; τ ) = B = b z (ωλ | b; τ ). B This shows that the frequency spectrum of the multivariate complex B-spline along directions λ is given by the spectrum of the univariate spline with knots projected onto these λ. 7. Summary Complex B-splines allow to define difference and divided difference operators of complex order for arbitrary knots and weights. Via their relation to Dirichlet averages and Dirichlet splines, they can be extended to higher dimensions via ridge functions. The Fourier transform of the univariate and multivariate complex B-spline are also related on ridges. SAMPTA'09 8. Acknowledgments This work was partially supported by the grant MEXTCT-2004-013477, Acronym MAMEBIA, of the European Commission. References: [1] B. C. Carlson. B-Splines, hypergeometric functions and Dirichlet averages. J. Approx. Th., 67:311–325, 1991. [2] H. B. Curry and I. J. Schoenberg. On spline distributions and their limits: The Pólya distribution functions. Bulletin of the AMS, 53(7–12):1114, 1947. Abstract. [3] W. Dahmen and C. A. Micchelli. Statistical Encounters with B-Splines. Contemporary Mathematics, 59:17–48, 1986. [4] C. de Boor. Splines as linear combinations of Bsplines. In G. G. Lorentz et al., editor, Approximation Theory II, pages 1–47. Academic Press, 1976. [5] B. Forster, T. Blu, and M. Unser. Complex B-splines. Appl. Comp. Harmon. Anal., 20:281–282, 2006. [6] B. Forster and P. Massopust. Statistical encounters with complex B-Splines. to appear in Constructive Approximation. [7] B. Forster and P. Massopust. Some remarks about the connection between fractional divided differences, fractional B-Splines, and the Hermite-Genocchi formula. International Journal of Wavelets, Multiresolution and Information Processing, 6(2):279–290, 2008. [8] P. Massopust. Double Dirichlet averages and complex B-splines. Submitted to SAMPTA 2009. [9] P. Massopust and B. Forster. Multivariate complex B-splines and Dirichlet averages. Submitted to Journal of Approximation Theory. [10] C. A. Micchelli. A constructive approach to Kergin interpolation in Rk : Multivariate B-splines and Lagrange interpolation. Rocky Mt. J. Math., 10(3):485– 497, 1980. [11] K. S. Miller and B. Ross. An introduction to the fractional calculus and fractional differential equations. Wiley, 1993. [12] E. Neuman and P. J. Van Fleet. Moments of Dirichlet splines and their applications to hypergeometric functions. Journal of Computational and Applied Mathematics, 53:225–241, 1994. [13] O. V. Odinokov. Spectral analysis in certain spaces of entire functions of exponential type and its applications. Izv. Math., 64(4):777–786, 2000. [14] A. Pinkus. Approximating by ridge functions. In A. Le Méhauté, C. Rabut, and L. L. Schumaker, editors, Surface Fitting and Multiresolution Methods, pages 1–14. Vanderbilt University Press, 1997. [15] S. G. Samko, A. A. Kilbas, and O. I. Marichev. Fractional Integrals and Derivatives. Gordon and Breach Science Publishers, Minsk, Belarus, 1987. [16] M. Unser and T. Blu. Fractional splines and wavelets. SIAM Review, 42(1):43–67, March 2000. 298 Concrete and discrete operator reproducing formulae for abstract Paley–Wiener space J.R. Higgins I.H.P., 4 rue du Bary, 11250 Montclar, France. rowlandhiggins@yahoo.com Abstract: The classical Paley–Wiener space possesses two reproducing formulae; a ‘concrete’ reproducing equation and a ‘discrete’ analogue, or sampling series, and there is a striking comparison between them. It is shown that such analogies persist in the setting of Paley–Wiener spaces that are more general than the classical case. In fact, there are ‘operator’ versions of the reproducing equation and of the sampling series that are also comparable, not ‘exactly’ but nearly so. Reproducing kernel theory and abstract harmonic analysis are brought together to achieve this, then the special case of multiplier operators with respect to the Fourier transform is considered. The Riesz transforms provide a two-dimensional example, with possibilities of extension to higher dimensions and to further classes of operators. 1. Introduction It has often been remarked that the classical Paley–Wiener space possesses two reproducing formulae; a ‘concrete’ reproducing equation Z f (t) sinc(s − t)dt, (s ∈ R), (1) f (s) = 2.1 The basic setting of this paper is that of the reproducing kernel theory of Saitoh [8, Ch. 2, §1]. Very briefly the background is as follows. Let E be an abstract set. For each t belonging to E let Kt belong to H (a separable Hilbert space with inner product denoted by h, iH ). Then k(s, t) := hKt , Ks iH is defined on E ×E and is called the kernel function of the map Kt . This kernel function is a positive matrix [8, Ch. 2, §2] and as such it determines one and only one Hilbert space for which it is the reproducing kernel. This Hilbert space is denoted by R(K) since it turns out to be the set of images of H under the transformation (Kg)(t) := hg, Kt iH , (g ∈ H). Theorem 1 (Saitoh) With the notations established above, R(K) (which is now abbreviated to just R) is a Hilbert space which has the reproducing kernel k(·, ·), and is uniquely determined by this kernel k. For f ∈ R there exists α ∈ H such that kf kR = kKαkR ≤ kαkH , (3) and there exists a unique member, g say, of the class of all α’s satisfying (3) such that f (t) = hg, Kt iH , R and a ‘discrete’ reproducing equation, or sampling series, X f (s) = f (n) sinc(s − n), (s ∈ R), (2) The reproducing kernel theory (t ∈ E), and kf kR = kgkH . n∈Z and that there is a striking analogy between the two (see, e.g., [3, p. 58]). Here, sinc denotes the standard function sinc x := (sin πx)/πx. The purpose of the present lecture is to point out that concrete and discrete reproducing formulae and analogies between them persist in the setting of Paley–Wiener spaces that are more general than the classical case. It will be shown that for suitably chosen operators there are ‘operator’ versions of the reproducing equation and of the sampling series that are also comparable, in the same way as in the classical case described above. 2. The setting Abstract theories that lead to reproducing formulae are outlined in §2.1 and §2.2, and are brought together in §2.3. SAMPTA'09 The reproducing equation for f ∈ R is f (t) = hf, k(·, t)iR (4) The following theorem is simple but very useful. Theorem 2 The convergence of a sequence in the norm of R implies that it converges pointwise over E, and the convergence is uniform over any subset of E on which k(t, t) = kk(·, t)k2 is bounded. The following Theorem is to be found in [8]. Theorem 3 With notations as above, let {sn }, (n ∈ X), be points of E such that {Ksn } is an orthonormal basis for H. Then the sampling series representation X f (t) = f (sn )k(sn , t), (5) n∈X 299 holds, convergence being in the norm of R; and then of course Theorem 2 applies. 2.2 Abstract harmonic analysis A very brief introduction (mostly just notations) to the abstract harmonic analysis that will be needed is now given. All necessary background, and much more, is to be found in [1], [2]. Let G be a locally compact abelian (LCA) group (written additively). Let (t, γ) be a character of G, that is, a continuous homomorphism of G into the circle group. Let G∧ = Γ denote the group of continuous characters on G, usually called the dual group of G. We assume that Γ has a countable discrete subgroup Λ. Haar measures on the various groups are normalised in the standard way [1], and this means in particular that there is a measurable transversal (i.e., a complete set of coset representatives) Ω ⊂ Γ of Γ/Λ, and it has finite Haar measure. Now H = Λ⊥ := {t ∈ G : (t, λ) = 1, (λ ∈ Λ)}. is a subgroup of G and is called the ‘annihilator’ of Λ. We assume that H is discrete; it follows that the quotient group Γ/Λ is compact. The Fourier transform on L2 (G) is defined in the usual way: Z f ∧ (γ) = (Ff )(γ) := f (t)(t, γ) dt, G in the L2 sense, where dt denotes the element of Haar measure on G (likewise, dγ denotes the element of Haar measure on Γ). The inverse Fourier transform will be denoted by ∨ or by F −1 . We shall need the ‘shift’ property of the Fourier transform: f (· − x)∧ (γ) = (−x, γ)f ∧ (γ). Abstract Paley Wiener space P WΩ (G) is defined as follows: 2 P WΩ (G) := {f : f ∈ L (G) ∩ C(G), f ∧ (γ) = 0 (Haar) a.a. γ 6∈ Ω} 2.3 close association between sampling in the harmonic analysis setting and Saitoh’s theory. The space R of §2.1 is now seen to be the Paley–Wiener space defined in (6), and its reproducing equation is (6) Combining harmonic analysis with the reproducing kernel theory The abstract set E of §2.1 is often taken to be R or C. Here, however, we take it to be an LCA group G thus combining two abstract theories, harmonic analysis and the reproducing kernel theory. In the notations of §2.1 and §2.2 we also take Kt = (t, ·), H = L2 (Ω) and Kg = F −1 g, g ∈ L2 (Ω). Then we have Z Z k(s, t) = (t, γ)(s, γ) dγ = (t − s, γ) dγ Ω Ω  ∨ χ = Ω (t − s) =: ϕΩ (t − s), (7) where χS denotes the characteristic function of a set S. It does not seem to have been recognised that ϕΩ (t − s) is the reproducing kernel for P WΩ (G), and that this allows a SAMPTA'09 f (t) = hf, ϕΩ (t − ·)iL2 (G) (8) Kluvánek’s sampling theorem [4, p. 45] is a consequence: Theorem 4 Let f ∈ P WΩ . With the assumptions of §2.2, X f (h)ϕΩ (t − h) (9) f (t) = h∈H in norm, etc., (see Theorem 2). Our concrete – discrete comparison is beween (8) and (9). 3. Operator kernels and operator reproducing formulae The presence of kernels and reproducing equations associated with operators on a reproducing kernel Hilbert space add greatly to the richness of its structure, as we shall see in this section. 3.1 Operator kernels and operator reproducing equations Let R be the separable Hilbert space of functions defined on E with reproducing kernel k(s, t), as we have discussed it in §2.1. Let B be a bijection on R, and let B ∗ denote the adjoint operator. The action of B on R is governed by the action of B ∗ on the reproducing kernel k, because for f ∈ R,  Bf (t) = hBf, k(·, t)iR = hf, B ∗ k(·, t)iR . (10) See, e.g., [5]. Definition 1 The kernel  h(s, t) := B ∗ k(·, t) (s), s, t ∈ E will be called the operator kernel of B. In this notation (10) is  Bf (t) = hf, h(·, t)i. (11) ∗ −1 Now from Definition 1 above, ((B ) h(·, t))(s) = k(s, t), so that, using the ordinary reproducing formula (4), we have f (t) = hf, k(·, t)i = hf, (B ∗ )−1 h(·, t)i ∗ = h (B ∗ )−1 f, h(·, t)i. Now using standard properties of operators and their adjoints (e.g., [6, p. 202]) we can summarise these calculations as: f (t) = h(B −1 f )(·), h(·, t)i. (12) This formula tells us that f can be reproduced, not from its own values as in the ordinary reproducing kernel theory, but from the result of acting on it with an operator. We can call this an operator reproducing equation in analogy with the ordinary reproducing equation (4). Similar formulae for B ∗ can be obtained in the same way. First, we make the following 300 Definition 2 ∗ h (s, t) := h(t, s) ((t, s) ∈ E × E) will be called the adjoint operator kernel of B. Kernels and their adjoints occur in important areas of study such as the theory of integral equations (see, e.g., [6, p. 170] for basic information). We shall find series expansions for such kernels and identify the action of h∗ explicitly in Theorem 5 below. First, let {ϕn }, n ∈ X, be an orthonormal basis for R. for a constant B which is consequent upon the fact that, since B is bounded, B ∗ is bounded and by Banach’s ‘bounded inverse’ theorem (B ∗ )−1 is bounded. Now N can be made to approach ∞. Since Fn (t) converges to 0 both in norm and pointwise on E (see Theorem 2), the expression in (21) approaches 0 for each fixed t ∈ E. Finally, from (20) we obtain the following Theorem 5 Let R, B and E be as above. Then we have the adjoint operator reproducing formula f (t) = h(B ∗ )−1 f, h∗ (·, t)i. Lemma 1 h(s, t) = X n∈X  Bϕn (t) ϕn (s), (s, t ∈ E). (13) Convergence is in the norm of R for each t ∈ E, and the pointwise convergence is governed by Theorem 2. Proof The coefficients for the expansion of h(·, t), t fixed, in the basis {ϕn } are  hh(·, t), ϕn i = hϕn , h(·, t)i = Bϕn (t) by (11), thus (13) is obtained. It will be recalled that if we put ( Bϕn = ψn (B ∗ )−1 ϕn = ψn∗ ,  (14) then {ψn } is a Riesz basis for R with dual basis {ψn∗ }. In this notation (13) can be written X ψn (t)ϕn (s). (15) h(s, t) = n∈X Hence by Definition 2 we have X ϕn (t)ψn (s). h∗ (s, t) = (16) n∈X in the norm of R for each t ∈ E. By uniqueness the coefficients {ϕn (t)} are such that ϕn (t) = hh∗ (·, t), ψn∗ i ∗ −1 = h(B ) ∗ ϕn , h (·, t)i, (17) N Consider P f (t) − h(B ∗ )−1 f, h∗ (·, t)i. f (t) − h(B ∗ )−1 f, h∗ (·, t)i (20) X ∗ −1 ∗ cn ϕn ), h (·, t)i = Fn (t) − h(B ) (f − N ≤ Fn (t) + h(B ) ∗ −1 ≤ Fn (t) + k(B ) (f − ≤ Fn (t) + Bkf − SAMPTA'09 (f − X N X cn ϕn ), h∗ (·, t)i N X cn ϕn )kkh∗ (·, t)k N cn ϕn kkh∗ (·, t)k Operator sampling series There are connections here to the theory of single channel sampling (see, e.g., [3, Ch. 12]), but the present approach is much more general. In order to match the operator reproducing equation (12) with a discrete analogue, some further assumption will have to be made. In fact we shall assume the existence of a sequence (sn ) ⊂ E, n ∈ X such that {h(sn , t)} is an orthogonal basis for R with normalising factors νn , so that {νn h(sn , t)} is orthonormal. This can sometimes be traced back to the condition that {Ksn } be an orthogonal basis for H. Again, we could assume that {h(sn , t)} is just a basis for R, or just a frame. However, weaker assumptions demand more technicalities and we will not pursue this kind of generality here. Let f ∈ R. Its expansion in our assumed orthonormal basis is X f (t) = cn νn h(sn , t) (22) where cn = hf, νn h(sn , ·)i = νn hf, h∗ (·, sn i = νn (B ∗ f )(sn ) by Theorem 5. So (22) is X f (t) = |νn |2 (B ∗ f )(sn ) h(sn , t). (23) n∈X Then (12) and (23) are concrete – discrete analogues of each other. (19) Put f (t) − N cn ϕn (t) = Fn (t). Now inserting the right and left hand sides of (18) we find from (19) that ∗ −1 3.2 n∈X Since this relationship is true for every member ϕn of a basis for R, it holds for every f ∈ R by the usual density argument. This argument runs as follows: P Let N cn ϕn be the Nth partial sum of the expansion for f in the basis ϕn . Then taking linear combinations in (17), X X cn ϕn (t) = h(B ∗ )−1 cn ϕn (t), h∗ (·, t)i. (18) N This shows the basic property of h∗ ; it reproduces f from (B ∗ )−1 f . (21) 4. Multiplier operators with respect to the Fourier transform Take E to be an LCA group G with dual Γ (for notations and references, see §2.2), and let R be a Paley–Wiener space P WΩ . Let µ(γ) be a non-nul complex valued function on Γ such that ( 0 < α ≤ |µ(γ)| ≤ β < ∞, (Haar) a.a. γ ∈ Ω; (24) µ(γ) = 0, γ 6∈ Ω. Let M denote the operation of multiplication by χΩ (γ)µ(γ). 301 Definition 3 Let f ∈ P WΩ . The operator T is defined by (T f )(s) := (F −1 MFf )(s) Lemma 2 The operator T of Definition 3 is a bijection on P WΩ Proof Clearly T is linear. Furthermore it is one-to-one, since the null space of T is {f : T f = θ} = {f : µ(γ)f ∧ (γ) = θ} which implies that f = θ. Again, T is “onto”. Let g ∈ P WΩ . Then if M−1 denotes multiplication by [µ(γ)]−1 , f = F −1 M−1 Fg ∈ P WΩ . Then from Definition 3, T f = g. The boundedness of T follows from two applications of Plancherel’s Theorem. Indeed, let f ∈ P WΩ . Then kT f kL2 (G) = kF −1 MFf kL2 (G) = kMFf kL2 (Γ)  First we need to know the adjoint T ∗ . Let f1 , f2 ∈ P WΩ . The defining equation is hT f1 , f2 i = hf1 , T ∗ f2 i. Suppose that T ∗ is of the same form as T of Definition 3, that is, T ∗ f = F −1 M∗ Ff, (25) where M∗ denotes multiplication by the multiplier µ∗ which is to be determined. In the integral notation, and using the ‘hat’ notation for the Fourier transform, the criterion is: Z Z   µ(·)f1 ∧ (·) ∨ (t)f2 (t) dt = f1 (t) µ∗ (·)f2 ∧ (·) ∨ (t) dt. G By Plancherel’s theorem this is: Z Z µ(γ)f1 ∧ (γ)f2 ∧ (γ) dγ = f1 ∧ (γ)µ∗ (γ)f2 ∧ (γ) dγ, Γ Γ from which we may choose µ∗ (γ) = µ(γ). It may be noted that T is self-adjoint if µ is real-valued. It is now evident that the assumption (25) leads to Z ∗ (T f )(s) = µ(γ)f ∧ (γ)(s, γ) dγ (26) Γ The operator kernel for T can now be calculated. From Definition 1 and (7) we have  h(s, t) = T ∗ ϕΩ (· − t) (s), SAMPTA'09 = µ(·)∨ (s − t). Hence h(s, t) = µ(·)∨ (s − t) = µ∧ (s − t). Now (12) becomes f (t) = h(T −1 f )(·), µ ∨ (· − t)i, and (23) becomes X f (t) = |νn |2 (T ∗ f )(sn ) µ∧ (sn − t). (27) (28) 5. Examples Example 1 The classical case The operator kernel for T G Ω n∈X ≤ |µ|kFf kL2 (Γ) = |µ|kf kL2 (G) . 4.1 Therefore from (26), and using the ‘shift’ property of the Fourier transform, Z  µ(γ) χΩ ∨ (· − t) ∧ (γ)(s, γ) dγ h(s, t) = Z Γ µ(γ)(−t, γ)χΩ (γ)(s, γ) dγ = Γ Z µ(γ)(s − t, γ) dγ = s, t ∈ G. Naturally, we expect to recover the case of the classical reproducing equation and sampling formula as special cases of the theory. To do this we pick G = R, Ω = [−π, π], T = I = T ∗ = T −1 and µ = χ[−π,π] (y). Therefore we have Z π √ 1 ∨ ei(s−t)y dy = 2π sinc(s − t). µ (s − t) = √ 2π −π Here and in subsequent Examples the choice of Haar measure on G, Γ, etc., accounts for apparent anomalies in the normalising constants in the formulae (e.g., Haar measure on R is taken to be (2π)−1/2 times Lebesgue measure. See [2, p. 257]). With these choices, (27) becomes (1). The classical sampling series √ (2) now follows the textbook proof. Since {e−iny / 2π : n ∈ Z} is an orthonormal (ON) basis of L2 (−π, π), Plancherel’s theorem shows that the inverse Fourier transforms {sinc(n−t)} : n ∈ Z} form an orthonormal basis of P W[−π,π] . Coefficients in the expansion of f in this basis are obtained from (1) and so, with sn = n, our choices for T and µ show that (28) becomes (2). Example 2 The Hilbert transform Another well-known example illustrates the present theory; a member of P W[−π,π] can be sampled and reconstructed from samples of its Hilbert transform (see, e.g., [3, p. 126] and references there). This idea can be fitted it into the theme of the present lecture by taking G = R, Ω = [−π, π], T = H := F −1 MF where M denotes multiplication by −i sgn(y). H is the Hilbert transform on P W[−π,π] . 302 For (27) we need π i 1 µ ∨ (s − t) = √ √ sgn(y)ei(s−t)y dy 2π 2π −π (29) = − sinc 12 (s − t) sin π2 (s − t) Z after a simple calculation. Also we have H−1 = −H = H∗ , therefore (27) is Z  f (t) = − Hf (τ ) sinc 12 (τ −t) sin π2 (τ −t) dτ. (30) R For (28) we need to find {sn } such that {µ∧ (sn − t)}, n ∈ Z, is an√ON basis of P Wπ . We can start with the ON basis {einy / 2π}, (n ∈ Z), of L2 (−π, π), then multiply each member by −i sgn(y). The result is again an ON basis, as a consequence of | − i sgn(y)| = 1 a.e. on [−π, π]. The inverse Fourier transform of a typical one of these basis elements is  −i −1 sgn(·)e−in· (t) = − sinc 12 (n − t) sin π2 (n − t) F 2π by the same calculation as in (29). But, taking account of Haar measure, this also gives µ∧ (n − t). Hence (28) becomes X  f (t) = Hf (n) sinc 12 (n − t) sin π2 (n − t) (31) n∈Z Our concrete – discrete comparison is between (30) and (31). Since the multiplier is of unit modulus, a two dimensional version of the construction that we used in the previous example shows that     −i y1 y2 −ihk,yi , (k ∈ Z2 ), +i e 2π |y| |y| is an ON basis of L2 ([−π, π]2 ). Then (28) becomes X  R∗ f (k)m∧ (k − t). f (t) = (35) k∈Z2 The comparison for this example lies between the concrete (34) and the discrete (35). Other combinations of the Riesz transforms are possible, in two and higher dimensions, whose multipliers satisfy (24) but are not always of unit modulus. 6. Conclusions The multiplier transforms treated in this study form a rather restricted class of operators; nevertheless, the methods can be used in connection with the very important Riesz transforms. It remains to investigate extensions to other types of operator. Likely candidates are, for example, multiplier transforms with less restrictive conditions on the multiplier, the singular integral operators of Calderón–Zygmund type (a class containing the Riesz transforms, see, e.g., [9, Ch. VI]), and operators of the Hankel and Toeplitz type (e.g., [7]). Example 3 The Riesz transforms References: For background on the Riesz transforms see [9, p. 223] Take G to be Rd , (d ∈ N). Let t = (t1 , . . . , td ) and let y = (y1 , . . . , yd ) etc. Let the scalar product in Rd be denoted by h , i. Definition 4 Let f ∈ L2 (Rd ), and define Rj f := F −1 Mj Ff, j = 1, . . . , d, (32) Mj denoting multiplication by −iyj /|y| χ[−π,π]d (y). We note that this multiplier is not bounded away from zero when d ≥ 2 and y ∈ [−π, π]d and therefore does not always satisfy the criterion (24). However, it is possible to define operators involving the Riesz transforms which do satisfy the criterion (24). First we consider the case d = 2, and define the operator R := R1 + iR2 (33) acting on P W[−π,π]2 . Its multiplier is   y2 y1 +i m(y) := (−i) |y| |y| and clearly we have |m(y)| = 1 a.e. Hence m satisfies the criterion (24) with respect to two-dimensional Lebesgue measure. Now (27) becomes Z  1 R−1 f (s)m∨ (s − t) ds. (34) f (t) = 2π R2 SAMPTA'09 [1] M.M. Dodson. Groups and the sampling theorem. Sampl. Theory Signal Image Process., 6(1):1–27, 2007. [2] M.M. Dodson and M.G. Beaty. Abstract harmonic analysis and the sampling theorem. In J.R. Higgins and R.L. Stens, editors, Sampling theory in Fourier and signal analysis: advanced topics, pages 233–265. Clarendon Press, Oxford, 1999. [3] J.R. Higgins. Sampling theory in Fourier and signal analysis: foundations. Clarendon Press, Oxford, 1996. [4] I. Kluvánek. Sampling theorem in abstract harmonic analysis. Mat.-Fyz. Casopis Sloven. Akad. Vied., 15:43–48, 1965. [5] H. Meschkowski. Hilbertsche Räume mit Kernfunktion. Springer–Verlag, Berlin, 1962. [6] F. Riesz and B. Sz.-Nagy. Functional analysis. Dover Publications, New York, 1990. [7] R. Rochberg. Toeplitz and Hankel operators on the Paley–Wiener space. Integral Equations Operator Theory, 10(2), 1987. [8] S. Saitoh. Integral transforms, reproducing kernels and their applications. Longman, Harlow, 1997. [9] E.M. Stein and G. Weiss. Introduction to Fourier analysis on Euclidean spaces. Princeton University Press, Princeton, 1971. 303 SAMPTA'09 304 Explicit localization estimates for spline-type spaces José Luis Romero Departamento de Matemática Facultad de Ciencias Exactas y Naturales Universidad de Buenos Aires Ciudad Universitaria, Pabellón I 1428 Capital Federal ARGENTINA and CONICET, Argentina. jlromero@dm.uba.ar Abstract: given there are not explicit. We will derive a polynomial decay condition for the dual basis {gk }k , giving explicit We give some explicit decay estimates for the dual system information on the resulting constants. This yields some of a basis of functions that are polynomially localized in qualitative information, like the dependence of theses conspace. stants on A, C and s and the corresponding p-Riesz basis bounds for the original basis. 1. Introduction A spline-type space S is a closed subspace of L2 (Rd ) possessing a Riesz basis of functions well localized is space. That is, there exists a family of functions {fk }k ⊆ S and constants 0 < A ≤ B < +∞ such that X ck fk kL2 ≤ Bkckℓ2 , (1) Akckℓ2 ≤ k k holds for every c ∈ ℓ2 , and the functions {fk }k satisfy an spatial localization condition. In a spline-type space P any function in f ∈ S has a unique expansion f = k ck fk . Moreover the coefficients are given by ck = hf, gk i, where {gk }k ⊆ S is the dual basis, a set of functions characterized by the relation hgk , fj i = δk,j . These spaces provide a very natural framework for the sampling problem. The general theory of localized frames (see [6], [5] and [2]) asserts that the functions forming the dual basis satisfy a similar spatial localization. This can be used to extend the expansion in (1) to other spaces, so that the family {fk }k becomes a Banach frame for an associated family of Banach spaces (see [4] and [6]). In the case of a splinetype space S, this means that the decay of a function in S can be characterized by the decay of its coefficients and, in particular, that the functions {fk }k form a so called pRiesz basis for its Lp -closed linear span, for the whole range 1 ≤ p ≤ ∞. We derive, in some concrete case, explicit bounds for the localization of the dual basis. We will work with a set of functions satisfying a polynomial decay condition around a set of nodes forming a lattice. By a change of variables, we can assume that the lattice is Zd . So, we will consider a set of functions {fk }k ⊆ L2 (Rd ) satisfying the condition, −s |fk (x)| ≤ C (1 + |x − k|) , x ∈ Rd and k ∈ Zd , for some constant C. This type of spatial localization is specifically covered by the results in [5], but the constants SAMPTA'09 2. Main result Theorem 1 Let C ≥ 1, and let t > d be integers. Let s > d + t be a real number. For k ∈ Zd let fk : Rd → C be a measurable function such that −s |fk (x)| ≤ C (1 + |x − k|) , (x ∈ Rd ). Suppose that {fk }k is a Riesz basis for its L2 closed linear span S, with bounds 0 < A ≤ B < ∞. Let {gk }k ⊆ S be its dual basis. Then, the dual functions satisfy, −t |gk (x)| ≤ D (1 + |x − k|) , (x ∈ Rd ). where D is given by, D= E st C 2t+1 1 + At−1 , (s − t − d)t At+1 for some constant E > 0 that only depends on the dimension d. Remark 1 The constant E can be explicitly determined from the proof. The results in [6] prescribe polynomial decay estimates for the dual basis similar to those possessed by the original basis. As a trade-off for the explicit constants we will not obtain the full preservation of these decay conditions. Nevertheless, any degree of polynomial decay on the dual system can be granted, provided that the original basis has sufficiently good decay. Finally observe that, although the basis {fk }k is assumed to be concentrated around a lattice of nodes, the functions fk are not assumed to be shifts of a single function. In particular, Theorem 1 below allows for a basis of functions whose ‘optimal’ concentration nodes do not form a lattice but are comparable to one. The ‘eccentricity’ of the configuration of concentration nodes is, however, penalized by the constants modelling the decay. 305 3. Sketch of a proof and comments Now we sketch the proof of the main result, for a complete proof see [11]. Consider the gram matrix of the basis {fk }k given by, M ≡ (mk,j )k,j∈Zd , mk,j := hfk , fj i . Since {fk }k is a Riesz sequence, M , as an operator on ℓ2 , has an inverse N ≡ (nk,j )k,j∈Zd . Moreover, kN kℓ2 →ℓ2 ≤ A−1 and nk,j = hgk , gj i, where {gk }k ⊆ S is the dual basis of {fk }k . The localization assumptions on the basis {fk }k yield a polynomial decay estimate on the entries of M, |mk,j | . (1 + |k − j|)−s . If we can establish a similar estimate for the entries of N , |nk,j | . (1 + |k − j|)−t . with all the constants given explicitly, then, using calculations similar to those in [5], we obtain the desired polynomial concentration conditions for the dual functions. Let us first consider the case where the basis {fk }k consists of integer shifts of a single generator f (that is, fk = f (· − k), k ∈ Zd ). In this case, the matrix M is constant on its diagonals. That is, mk,j = ak−j , for some sequence a. Similarly, N is given by nk,j = bk−j , where the sequence b satisfies a ∗ b = δ. Therefore, in this special case, M and N are convolution operators. The off-diagonal decay of their entries is equivalent to the decay of their kernels a and b. Since the decay of a sequence x can be characterized by the smoothness of its Fourier transform x̂, the problem can be reformulated as the preservation of the smoothness of the function â under pointwise inversion. This reasoning is present, for example, in [1]. We can measure the smoothness of â by considering weakderivatives and use repeatedly a chain-rule argument for Sobolev spaces to obtain similar smoothness conditions for b̂. In the general case, where M and N need not be convolution operators, we try to imitate this reasoning, but we avoid using the Fourier transform. Given a matrix L ≡ (lk,j )k,j∈Zd and 1 ≤ h ≤ d, we consider the new matrix, Dh (L)k,j := (kh − jh )lk,j . Observe that, up to some multiplicative constant, the map Dh acts on a convolution operator by taking a partial derivative of its symbol (that is, the Fourier transform of its kernel.) The domain of Dh consists of those matrices L such that Dh (L) defines a bounded operator on ℓ2 . We call Dh (L) the derivative of L (with respect to xh .) Dh is a derivation in the sense that it satisfies the equation Dh (AB) = Dh (A)B + ADh (B), provided that Dh (A) SAMPTA'09 and Dh (B) are both defined. Derivations are a wellknown tool in operator-algebras theory (see [3], [9] and [10].) Since M N = I and Dh (I) = 0, we can formally express the high-order derivatives of N in terms of its lower-order ones and all the derivatives of M , u−1 X u  u Dh (N ) = − (2) Dhl (N )Dhu−l (M )N. l l=0 Using the polynomial off-diagonal decay bounds on M and the bound kN kℓ2 →ℓ2 ≤ A−1 we can obtain bounds for the ℓ2 → ℓ2 norms of some derivatives of N . These imply polynomial off-diagonal decay estimates for N , and hence yield the desired spatial localization bounds for the dual basis. In the argument above we related the off-diagonal decay of a matrix with the ℓ2 → ℓ2 norm of its derivatives. The ℓ2 → ℓ2 norm of a matrix is not determined by the size of its entries. However, there are some necessary and (other) sufficient conditions on the size of the entries of a matrix for it to be bounded on ℓ2 . This “gap” in the conditions accounts for the loss of some decay information in Theorem 1, when passing from the original basis to its dual system. Finally we point out that the formal computations in the above argument are not sufficient to prove the theorem. Consider again the simple case of a basis of integer shifts. With the notation of the discussion above, we have the relation a ∗ b = δ, (3) we have some decay estimate on a (that can be reformulated as a smoothness condition on â) and we want to prove a similar decay condition for b. There may be various sequences x satisfying the relation a ∗ x = δ; b can be singled out as the only one of them having a bounded Fourier transform. For example, when a is finitely supported, equation 3 is a linear difference equation which has other solutions besides b (that grow exponentially). The decay of the sequence b can be rigorously proved by resorting to some Sobolev-space smoothing argument. In the general case, to derive equation (2), one needs to use the associativity of the product of matrices. This is justified only if all the matrices involved represent bounded operators. In other words, we need to know a priori that the derivatives of N that are involved in equation (2) define bounded operators. This can be proved using the general results on derivations on Banach algebras (see [3], [9]) or Jaffard’s Theorem [8]. The use of derivations is somehow implicit in Jaffard’s paper [8]. Recently, Gröchenig and Klotz [7] have systematically studied the use of derivations in connection to various problems including the preservation under inversion of various kinds of off-diagonal decay conditions. 4. Application From Theorem 1 we can derive the following qualitative statement.  Theorem 2 Let F i i∈I be a family of Riesz sequences,  F i ≡ fki k∈Zd ⊆ L2 (Rd ), (i ∈ I). 306 sharing a uniform lower basis bound. Suppose that the family F i i satisfies a uniform concentration condition, −s fki (x) ≤ C (1 + |x − k|) , (x ∈ Rd , k ∈ Zd , i ∈ I), for some constants C ≥ 1, s > d + t and t > d, with t an integer. Then the following holds.  (a) The respective family of dual systems Gi i - where  Gi ≡ gki k∈Zd - satisfies a uniform concentration condition, −t gki (x) ≤ D (1 + |x − k|) , (x ∈ Rd , k ∈ Zd , i ∈ I), for some constant D ≥ 1. (b) A uniform p-Riesz basis condition holds, for all 1 ≤ p ≤ ∞. More precisely, there exist constants q, Q > 0 such that for any p ∈ [1, ∞] and any i ∈ I, the relation X qkckℓp ≤ k ck fki kLp ≤ Qkckℓp k holds for all finitely supported sequences (ck )k∈Zd . Statement (a) follows directly from Theorem 1. Examining the proofs in [5] we see that the uniformity of the constants given in (a) yields statement (b). This qualitative conclusion on Theorem 2 was the original motivation for Theorem 1. Finally, observe that the arguments given above are applicable to a generel intrinsically localized basis in the sense of [5]. 5. [4] Hans G. Feichtinger and Karlheinz Gröchenig. Banach spaces related to integrable group representations and their atomic decompositions, I. J. Funct. Anal., 86:307–340, 1989. reprinted in ’Fundamental Papers in Wavelet Theory’ Heil, Christopher and Walnut, David F.(2006). [5] Massimo Fornasier and Karlheinz Gröchenig. Intrinsic localization of frames. Constr. Approx., 22(3):395–415, 2005. [6] Karlheinz Gröchenig. Localization of Frames, Banach Frames, and the Invertibility of the Frame Operator. J. Fourier Anal. Appl., 10(2):105–132, 2004. [7] Karlheinz Gröchenig and Klotz Andreas. Noncommutative approximation: Inverse-closed subalgebras and off-diagonal decay of matrices. Preprint, available at http://arxiv.org/abs/0904.0386, 2009. [8] Stephane Jaffard. Propriétés des matrices “bien localisées” près de leur diagonale et quelques applications. Ann. Inst. H. Poincaré Anal. Non Linéaire, 7(5):461–476, 1990. [9] Edward Kissin and Victor Shulman. Dense qsubalgebras of banach and c*-algebras and unbounded derivations of banach and c*-algebras. Proc. Edinburgh Math. Soc, 36:261–276, 1993. [10] Edward Kissin and Victor Shulman. Differential properties of some dense subalgebras of c*-algebras. Proc. Edinburgh Math. Soc, 37:399–422, 1994. [11] José Luis Romero. Explicit localization estimates for spline-type spaces. Submitted, available at http://arxiv.org/abs/0902.0557, 2008. Acknowledgements The author wishes to thank Karlheinz Gröchenig and Andreas Klotz for their comments and for sharing an early draft of [7], and is indebted to Hans Feichtinger and Ursula Molter for some insightful discussions. The author holds a fellowship from the CONICET and thanks this institution for its support. His research is also partially supported by grants: PICT06-00177, CONICET PIP N 5650, UBACyT X149. This note was partially written during a long-term visit to NuHAG in which the author was supported by the EUCETIFA Marie Curie Excellence Grant (FP6-517154, 2005-2009). References: [1] Akram Aldroubi and Karlheinz Gröchenig. Nonuniform sampling and reconstruction in shift-invariant spaces. SIAM Rev., 43(4):585–620, 2001. [2] Radu M. Balan, Peter G. Casazza, Christopher Heil, and Z. Landau. Density, overcompleteness, and localization of frames I: Theory. J. Fourier Anal. Appl., 12(2):105–143, 2006. [3] Ola Bratteli and Derek W. Robinson. Unbounded derivations of c*-algebras. Commun. math. Phys, 42:253–268, 1975. SAMPTA'09 307 SAMPTA'09 308 A Fast Fourier Transform with Rectangular Output on the BCC and FCC Lattices Usman R. Alim (1) and Torsten Möller (1) (1) School of Computing Science, Simon Fraser University, Burnaby BC V5A 1S6, Canada. ualim@cs.sfu.ca, torsten@cs.sfu.ca Abstract: This paper discusses the efficient, non-redundant evaluation of a Discrete Fourier Transform on the three dimensional Body-Centered and Face-Centered Cubic lattices. The key idea is to use an axis aligned window to truncate and periodize the sampled function which leads to separable transforms. We exploit the geometry of these lattices and show that by choosing a suitable non-redundant rectangular region in the frequency domain, the transforms can be efficiently evaluated using the Fast Fourier Transform. 1. Introduction The Discrete Fourier Transform (DFT) is an important tool used to analyze and process data in an arbitrary number of dimensions. Most applications of the DFT in higher dimensions, however, rely on a tensor product extension of a one-dimensional DFT, with the assumption that the underlying data is sampled on a Cartesian lattice. This extension has the advantage that it allows for a straightforward application of the Fast Fourier Transform (FFT). The Cartesian lattice is known to be sub-optimal when it comes to sampling a band-limited function in two or higher dimensions [6]. In 3D, for instance, the BodyCentered Cubic (BCC) lattice is the optimal sampling lattice and yields a 30% savings in samples as compared to the Cartesian lattice [8]. The Face-Centered Cubic (FCC) lattice, although not optimal, is still better than the Cartesian lattice and is also the lattice that yields the minimum amount of Fourier-domain aliasing when sampling a general trivariate function [2]. From the perspective of continuous signal reconstruction, both the BCC and FCC lattices have received considerable attention because of their many applications in Visualization and Computer Graphics. Entezari et al. have devised a set of Box-Splines that can be used for signal approximation on the BCC [3] as well as the FCC [5] lattices. However, very little effort has gone into the development of discrete processing tools that are suitable for these nonCartesian lattices. The idea of a multidimensional DFT (MDFT) on nonCartesian lattices is not new. Mersereau provided a derivation of a DFT for a hexagonally periodic sequence and designed other digital filters suitable for a 2D hexagonal lattice [6]. Later, the idea was extended to higher SAMPTA'09 dimensions and a MDFT for arbitrary sampling lattices was proposed [7]. Guessoum et al. proposed an algorithm for evaluating the MDFT that has the same computational complexity as the Cartesian DFT [4]. Recently, Csébfalvi et al. [1] applied the MDFT to the BCC and FCC lattices by choosing a Cartesian periodicity in the spatial domain which leads to a Cartesian sampling of the Fourier transform. This allows the MDFT to be written in a separable form that can be evaluated via the FFT. However, their representation is redundant and leads to inefficient transforms. The aim of this paper is to revisit these transforms and show that they can be computed much more efficiently by exploiting the geometric properties of the BCC and FCC lattices to eliminate the redundancy. The paper is organized as follows. We provide a basic review of multidimensional sampling in Section 2. which is later used in the derivation of a fast DFT for BCC and FCC lattices in Section 3. Some properties of these transforms are discussed in Section 4. and a summary is presented in Section 5. 2. Optimal Trivariate Sampling Let fc (x) be a continuous trivariate function and Fc (ξ) be its Fourier transform defined as Z fc (x) exp[−2πjξ T x]dx (1) Fc (ξ) = R3 where T denotes the transpose operation. Let f (n) be the sampled sequence obtained by sampling the function through f (n) = fc (Ln) (2) where L is a 3 × 3 sampling matrix and n is an integer vector. Sampling on the lattice defined by the matrix L amounts to a periodization of the Fourier spectrum on a reciprocal lattice generated by the matrix L−T . In particular, the spectrum of the sampled sequence is given by [9] X 1 Fc (ξ − L−T r) (3) F̂ (ξ) = | det L| r where r is any integer vector. If we assume that fc (x) is isotropically band-limited (i.e Fc (ξ) = 0 for kξk > ξ0 for some band-limit ξ0 ), then one of the lattices that achieves the tightest possible packing of the spectrum replicas (spheres) in the Fourier domain is 309 the FCC lattice. Thus, in order to sample a trivariate bandlimited function optimally, the function should be sampled on the reciprocal of the FCC lattice, i.e. the BCC lattice. 3. Discrete Fourier Transform If the sequence f (n) is non-zero within a finite region, it can be periodically extended spatially and represented as a Fourier series which is a sampled version of the transform (3) [7]. The pattern with which the continuous transform (3) is sampled in the Fourier domain depends on the periodicity pattern in the spatial domain. Merserau et al. [7] used a periodicity matrix to define the periodic extension of the finite sequence. Here, we use a somewhat different approach by splitting the sampled sequence into constituent Cartesian sequences [1]. The BCC and FCC lattices LB and LF are generated by the integer sampling matrices h 1 −1 1 i h1 0 1i LB = −1 1 1 and LF = 0 1 1 1 1 −1 choose a cuboid shaped fundamental region generated by limiting n to the set N := {n ∈ Z3 : 0 ≤ n1 < N1 , 0 ≤ n2 < N2 , 0 ≤ n3 < N3 } for some positive integers N1 , N2 and N3 . This region consists of 2N1 N2 N3 data points (i.e. Voronoi cells) and has a total volume of 8N1 N2 N3 h3 . If we define N to be the diagonal matrix diag(N1 , N2 , N3 ), then the two subsequences f0 (n) and f1 (n) contained within the fundamental region can be periodically extended on a Cartesian pattern such that they satisfy f0 (n + N r) = f0 (n) and for all n and r in Z . This Cartesian periodic extension in the spatial domain amounts to a Cartesian sampling in the Fourier domain. In particular, the continuous transform (3) is sampled at 1 N −1 k yielding the sequence the frequencies ξ = 2h F (k) =F̂ (ξ) 1 ξ= 2h N −1 k  −2πj T −1  k N 2hIn + 2h n∈N   −2πj T −1 f1 (n) exp k N (2hIn + ht) 2h X   f0 (n) + f1 (n) exp −πjkT N −1 t · = n∈N   (4) exp −2πjkT N −1 n , 110 respectively. Both these lattices are based on a cubic sampling pattern whereby, in addition to samples at the eight corners of a cube, LB has an additional sample in the center of the cube and LF has six additional samples on the faces. Both these lattices can also be built from shifts of a Cartesian sublattice as shown in Fig. 1. In particular, samples that lie on the corners of cubes form the sublattice 2Z3 . The quotient group LB /2Z3 is isomorphic to Z2 and the quotient group LF /2Z3 is isomorphic to Z4 . Therefore, LB can be partitioned into two Cartesian cosets while LF has four Cartesian cosets (Fig. 1). f1 (n + N r) = f1 (n), 3 = X f0 (n) exp where k = (k1 , k2 , k3 )T ∈ Z3 is the frequency index vector. The above equation defines a forward BCC DFT. Since it is a sampled version of a continuous transform that is periodic on an FCC lattice, it should be invariant under translations that lie on the reciprocal lattice generated by 1 LF . This property is easily the matrix (hLB )−T = 2h demonstrated as follows. If r ∈ Z3 , then after substituting 1 ξ = 2h (N −1 k + LF r) in (4) and simplifying, we get  1 (N −1 k + LF r) 2h X   = f0 (n) + f1 (n) exp −πj(kT N −1 + r T LF )t · n∈N   exp −2πj(kT N −1 + r T LF )n 1 =F̂ ( N −1 k), 2h F̂ Figure 1: Left, the BCC lattice, a 16 point view. Right, the FCC lattice, a 32 point view. Lattice sites that are Voronoi neighbors are linked to each other. Cosets are indicated by different colors. 3.1 BCC DFT The BCC lattice with arbitrary scaling is obtained via the sampling matrix hLB where h is a positive scaling parameter. The Voronoi cell is a truncated octahedron having a volume of | det hLB | = 4h3 . The Voronoi cell of the reciprocal FCC lattice is a rhombic dodecahedron having a volume of 4h1 3 . Since LB has two Cartesian cosets, a sampled sequence can be split up into two subsequences given by f0 (n) = fc (2hIn) and f1 (n) = fc (2hIn + ht), where I is the 3 × 3 identity matrix, t is the translation vector (1, 1, 1)T and n = (n1 , n2 , n3 )T is any integer vector. f0 (n) is the sequence associated with the first coset while f1 (n) is associated with the second. Since these sequences are sampled on a Cartesian pattern, a straightforward truncation of the original sequence is to SAMPTA'09 since r T LF n is always an integer and r T LF t is always even. One fundamental period of the BCC DFT is contained within a rhombic dodecahedron of volume 4h1 3 . The sampling density in the frequency domain is given by 1 | det 2h N −1 | = (8N1 N2 N3 h3 )−1 . Thus, the fundamental period consists of a total of 2N1 N2 N3 distinct frequency samples which is the same as the number of distinct spatial samples. The inverse BCC DFT is obtained by summing over all the distinct sinusoids and evaluating them at the spatial sample locations. This gives   1 X f0 (n) = F (k) exp 2πjkT N −1 n (5a) N k∈K  1  1 X F (k) exp 2πjkT N −1 (n + t) (5b) f1 (n) = N 2 k∈K 310 where N = 2N1 N2 N3 is the number of samples and K ⊂ Z3 is any set that indexes all the distinct frequency samples. It is easily verified that both the sequences (5a) and (5b) are periodic with periodicity matrix N . 3.1.1 Efficient Evaluation Since N is diagonal, the kernel in both equations (4) and (5) is separable. This suggests that the transform can be efficiently computed via the rectangular multidimensional FFT, provided that a suitable rectangular index set K can be found. Observe that the Cartesian sequence F (k) is periodic with periodicity matrix 2N , i.e F (k + 2N r) = F (k) for all r ∈ Z3 . Therefore, one way to obtain a rectangular index set is to choose K such that it contains all the frequency indices within one period generated by the matrix 2N . This consists of a total of | det 2N | = 4N indices and hence contains four replicas of the fundamental rhombic dodecahedron. A non-redundant rectangular index set can be found by exploiting the geometric properties of the FCC lattice. If we consider the first octant only, 4N samples are contained within a cube formed by the FCC lattice sites that have even parity. This cube also contains six face-centered sites. By joining any two axially opposite face-centered sites, we can split the cube into four rectangular regions such that each region consists of non-redundant samples only. Six rhombic dodecahedra contribute to such a region as illustrated in Fig. 2. The non-redundant region shown in Fig. 2b is obtained by limiting k to the index set given by K = {k ∈ Z3 : 0 ≤ k1 < N1 , 0 ≤ k2 < N2 , 0 ≤ k3 < 2N3 }. (a) (b) Figure 2: (a) Six rhombic dodecahedra contribute to a non-redundant rectangular region. (b) Zoomed in view of the non-redundant rectangular region that contains the full spectrum split into six pieces. ξ1 , ξ2 and ξ3 indicate the principal directions in the frequency domain. This region can further be subdivided into two cubes stacked on top of each other, each containing N1 ×N2 ×N3 samples. The forward transform (4) can then be evaluated in the two cubes separately by appropriately applying the Cartesian FFT to the two sequences f0 (n) and f1 (n) and combining the results together. After rearranging terms in (4), the forward transform in the bottom cube becomes X   f0 (n) exp −2πjkT N −1 n + F0 (k) = F (k) = n∈N X  (6) exp[−πjkT N −1 t] f1 (n) exp −2πjkT N −1 n , n∈N where k is now restricted to the set N . Since this equation is valid for all k ∈ Z3 , the forward transform in SAMPTA'09 the top cube can be computed from (6) by F1 (k) = F0 (k + (0, 0, N3 )T ) which simplifies to X   F1 (k) = f0 (n) exp −2πjkT N −1 n − n∈N X  (7) f1 (n) exp −2πjkT N −1 n , exp[−πjkT N −1 t] n∈N for k ∈ N . Equations (6) and (7) are now in a form that permits a straightforward application of the Cartesian FFT. Since the two equations are structurally similar, only two N1 × N2 × N3 FFT computations are needed, one for the sequence f1 (n) and one for f2 (n). In a similar fashion, the inverse transform (5) can be computed using two inverse FFT computations. Splitting the summations in (5) into the two constituent cubes gives   1 X (F0 (k) + F1 (k)) exp 2πjkT N −1 n , f0 (n) = N k∈N  1 X f1 (n) = (F0 (k) − F1 (k)) exp[πjkT N −1 t] · N k∈N   (8) exp 2πjkT N −1 n . 3.2 FCC DFT The FCC lattice with arbitrary scaling is generated by the sampling matrix hLF . The rhombic dodecahedral Voronoi cell has a volume of | det hLF | = 2h3 . The frequency spectrum is replicated according to (3) on a reciprocal BCC lattice that has a truncated octahedral Voronoi cell having a volume of 2h1 3 . A sequence sampled on the FCC lattice can be split up into four Cartesian subsequences corresponding to the four Cartesian cosets. Each subsequence is given by fi (n) = fc (2hIn + hti ), where i ∈ {0, 1, 2, 3} and ti are the integer shift vectors (0, 0, 0)T , (1, 0, 1)T , (0, 1, 1)T and (1, 1, 0)T respectively. Analogous to the BCC case, let us choose a rectangular truncation of the original sequence by limiting n to the set N and extend the sequences periodically so that they satisfy fi (n+N r) = fi (n). This truncation yields a rectangular fundamental region in the spatial domain consisting of a total of N = 4N1 N2 N3 distinct samples. Therefore, each truncated octahedron in the frequency domain tessellation will consist of N distinct points that are sampled 1 in a Cartesian fashion at the frequencies ξ = 2h N −1 k 3 where k ∈ Z . The sampled sequence in the frequency domain is thus given by 1 F (k) = F̂ ( N −1 k) 2h 3 (9) XX  1  fi (n) exp −2πjkT N −1 (n + ti ) . = 2 i=0 n∈N This defines a forward FCC DFT. Like the BCC case, it is 1 (N −1 k + LB r) invariant under shifts of the type ξ = 2h making it periodic on a BCC lattice with one fundamental period contained in a truncated octahedron. The inverse FCC DFT is obtained by summing over all the distinct sinusoids evaluated at the spatial sample locations   1 X F (k) exp 2πjkT N −1 (n+ 21 ti ) , (10) fi (n) = N k∈K where K ⊂ Z3 is any set that indexes all the N distinct sinusoids. 311 3.2.1 Efficient Evaluation Since N is diagonal, the key to efficiently evaluating the FCC DFT pair (9) and (10) is to choose a suitable rectangular region in the frequency domain that contains N distinct samples. Similar to the BCC DFT, the sequence (9) is 2N periodic with one complete rectangular period containing | det 2N | = 2N samples and hence two complete spectrum replicas. These 2N samples are contained within a cube, the corners of which lie at the even parity points of the BCC lattice. This cubic region can be split into two by halving along any of the three principal directions yielding a rectangular region that contains only non-redundant samples as illustrated in Fig. 3. The index set that spans the region depicted in Fig. 3b is given by K = {k ∈ Z3 : 0 ≤ k1 < 2N1 , 0 ≤ k2 < 2N2 , 0 ≤ k3 < N3 }. computes only two N1 × N2 × N3 FFTs for the BCC case and four N1 × N2 × N3 FFTs for the FCC case. Any operation in the frequency domain must respect the arrangement of the different portions of the spectrum. The BCC DFT splits the spectrum into six parts as illustrated by the six pieces (two lunes and four spherical triangles) of the sphere in Fig. 2b. The FCC transform splits the frequency spectrum into five parts as indicated by the hemisphere and the four spherical triangles in Fig. 3b. 5. Summary In this paper, we have shown that a MDFT of a Cartesian periodic sequence sampled on the BCC or FCC lattices can be efficiently evaluated using the FFT. The BCC lattice can be represented as two shifted Cartesian lattices. This representation leads to a separable transform that is efficiently computed via two non-redundant FFT evaluations of the Cartesian subsequences. Similarly, the FCC lattice consists of four shifted Cartesian lattices and the MDFT requires four non-redundant FFT evaluations. References: (a) (b) Figure 3: (a) Five truncated octahedra contribute to a non-redundant rectangular region. (b) Zoomed in view of the rectangular region that contains the full spectrum. The non-redundant region can be split into four N1 ×N2 × N3 cubic subregions and the forward transform (9) can be evaluated in each of the subregions separately by appropriately applying the FFT to each of the subsequences fi (n) and combining the output. The derivation is very similar to the BCC case and we leave the details to the reader. The forward transform in each subregion can be written as Fm (k) = 3 X Him exp[−πjkT N −1 ti ]· i=0 X   (11) fi (n) exp −2πjkT N −1 n , n∈N where m ∈ {0, 1, 2, 3}, k ∈ N and element of  1 H1im 1is an 1 1 −1 1 −1 the 4 × 4 Hadamard matrix H = 1 1 −1 −1 . The four 1 −1 −1 1 subregions Fm (k) have their bottom left corners at the frequency index vectors (0, 0, 0)T , (N1 , 0, 0)T , (0, N2 , 0)T and (N1 , N2 , 0) respectively. Likewise, the inverse transform (10) can be evaluated using four inverse FFT evaluations, one for each of the subsequences. This yields fi (n) = 4. 3 X  1 X Him Fm (k) · exp[πjkT N −1 ti ] N k∈N  m=0 (12) exp 2πjkT N −1 n . Discussion The decomposition of the non-redundant region in the frequency domain into cubes leads to transforms that are much more efficient. Both the BCC and FCC DFTs proposed by Csébfalvi et al. [1] are redundant and require the FFT of a 2N1 ×2N2 ×2N3 sequence. In contrast, our proposed evaluation strategy eliminates the redundancy and SAMPTA'09 [1] B. Csébfalvi and B. Domonkos. Pass-Band Optimal Reconstruction on the Body-Centered Cubic Lattice. In Vision, Modeling, and Visualization 2008: Proceedings, October 8-10, 2008, Konstanz, Germany, page 71. IOS Press, 2008. [2] A. Entezari. Optimal Sampling Lattices and Trivariate Box Splines. PhD thesis, Simon Fraser University, Vancouver, Canada, July 2007. [3] A. Entezari, D. Van De Ville, and T. Möller. Practical box splines for volume rendering on the body centered cubic lattice. IEEE Transactions on Visualization and Computer Graphics, 14(2):313 – 328, 2008. [4] A. Guessoum and R. Mersereau. Fast algorithms for the multidimensional discrete Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-34(4):937–943, 1986. [5] M. Kim, A. Entezari, and J. Peters. Box Spline Reconstruction on the Face Centered Cubic Lattice. IEEE Transactions on Visualization and Computer Graphics (Proceedings Visualization/Information Visualization 2008), 14(6):1523–1530, 2008. [6] R. Mersereau. The Processing of Hexagonally Sampled Two-dimensional Signals. Proceedings of the IEEE, 67(6):930–949, June 1979. [7] R. Mersereau and T. Speake. The processing of periodically sampled multidimensional signals. IEEE Transactions on Acoustics, Speech, and Signal Processing, (1):188–194, 1983. [8] T. Theußl, T. Möller, and M. Gröller. Optimal regular volume sampling. In Proceedings of the conference on Visualization’01, pages 91–98. IEEE Computer Society Washington, DC, USA, 2001. [9] P. Vaidyanathan. Fundamentals of multidimensional multirate digital signal processing. Sadhana, 15(3):157–176, 1990. 312 Daubechies Localization Operator in Bargmann - Fock Space and Generating Function of Eigenvalues of Localization Operator Kunio Yoshino, Tamazutsumi, 1-28-1, Setagaya-ku, Tokyo, Japan,158-8557. yoshinok@tcu.ac.jp. Abstract: We will express Daubechies localization operators in Bargmann - Fock space. We will prove that the Hermite functions are eigenfunctions of Daubechies localization operator. By making use of generating function of eigenvalues of Daubechies localization operator, we will show some reconstruction formulas for symbol function of Daubechies localization operator with rotational invariant symbol. 1. Introduction Daubechies localization operator was introduced in I. Daubechies : A Time Frequency Localization Operator: A Geometric Phase Space Approach, IEEE. Trans. Inform. theory. vol.34, pp.605-612(1988) She obtained following results. Theorem(Daubechies)([2]) Suppose that symbol function of Daubechies localization operator is rotational invariant. Then (i) Eigenfunctions of Daubechies localization operator are Hermite functions. (ii) Eigenvalues are given by Mellin transform of symbol function. In this paper we realize Daubechies localization opeartor in Bargamann - Fock space. We will consider the eigenvalue problem of Daubechies localization opeartor in Bargmann - Fock space. By making use of Bargamann - Fock space we will give a new proof of above theorem. We will establish reconstruction formula of symbol function of Daubechies localization operator with rotational invariant symbol by generating function of eigenvalues of Daubechies localization operator. For the simplicity, we will confine ourselves to 1-dimensional case. 2. Bargmann Transform Put ½ ¾ √ 1 2 2 A(z, x) = π exp − (z + x ) + 2z · x , 2 where z ∈ C and x ∈ R. −1/4 SAMPTA'09 Bargmann transform B(ψ) is defined as follows : def B(ψ)(z) = Z R ψ(x)A(z, x)dx, (ψ ∈ L2 (R)). Put BF = {g ∈ H(C) : Z C 2 |g(z)|2 e−|z| dz ∧ dz̄ < ∞} where H(C) denotes the space of entire functions in the complex plane. BF is called Bargmann-Fock space. Theorem 1([1]) Bargmann transform is a unitary mapping from L2 (R) to Bargmann-Fock space BF . For the details of Bargmann transform and Bargmann Fock space, we will refer the reader to [1] and [3]. 3. Hermite Functions Definition 1([1],[3]) Hermite functions hm (x) is defined by : √ dm hm (x) = (−1)m (2m m! π)−1/2 exp(x2 /2) m exp(−x2 ), dx (m ∈ N). Hermite functions has following generating function expansion : ¾ √ 1 2 2 π exp − (z + x ) + 2z · x 2 ∞ X zm √ hm (x), = m! m=0 (z ∈ C, x ∈ R). −1/4 ½ We recall some well known facts about Hermite functions. Proposition 1([1],[3]) 313 (i) {hm (x)}∞ m=0 is complete orthonormal basis in L2 (R). ∂2 + x2 − 1)hm (x) = mhm (x), ∂x2 zm B(hm )(z) = √ , (z ∈ C) m! (ii) (− 5. A Realization of Daubechies Localization Operator in Bargmann Fock space where F is Fourier transform. In this section we will express Daubechies Localization Operator in Bargmann - Fock space. First we need following lemmas. Lemma 1 2 p + iq B(φp,q )(z) = ezw−1/2|w| +1/2ipq , (w = √ ) 2 Proposition 2([1],[3]) Lemma 2([1]) (iii) (iv) F(hm )(x) = (−i)m hm (x), (i) (B ◦ L ◦ B −1 )g(z) = z ∂2 where L = − 2 + x2 − 1. ∂x ∂ g(z), ∂z (ii) (B ◦ F ◦ B −1 )g(z) = g(−iz), where F is Fourier transform and g(z) ∈ BF . 4. Daubechies Localization Operator Put 2 φp,q (x) = π −1/4 eipx e−(x−q) /2 . < φp,q , f >= Z φp,q (x)f (x)dx. R This is so called Short time Fourier transform (or Windowed Fourier transform, or Gabor transform). Definition 2([2]) Suppose that F (p, q) ∈ L1 (R2 ) and f (x) ∈ L2 (R). We put PF (f )(x) = 1 2π Z Z F (p, q)φp,q (x) < φp,q , f > dpdq, R2 We call PF (Daubechies) localization operator F (p, q) is called symbol function. Daubechies obtained following results. Theorem([2]). Suppose that F (p, q) ∈ L1 (R2 ) and g(z) = 1 2πi Z Z C 2 ewt g(t)e−|t| dt ∧ dt, (g ∈ BF ) Theorem 2 Under the same assumptions in Prop. 3, we have (B ◦ PFZ ◦ZB −1 )(g)(z) 2 1 = F (w, w)ezw g(w)e−|w| dw ∧ dw, 2πi C (∀g ∈ BF ) (Proof) Since Bargmann transform is unitary operator, we have Z Z 1 F (p, q)φp,q (x) < φp,q , f > dpdq, PF (f )(x) = Z Z 2π 1 = F (p, q)φp,q (x) < Bφp,q , Bf > dpdq, 2π So by lemma 1, B ◦ PFZ(fZ)(x) 1 F (p, q)Bφp,q (z) < Bφp,q , Bf > dpdq, = 2π Z Z 2 1 F (p, q)ezw−1/2|w| +1/2ipq < Bφp,q , Bf > = 2π dpdq, Hence we have (B ◦ PFZ ◦ZB −1 )(g)(z) 2 1 = F (p, q)ezw−1/2|w| +1/2ipq < Bφp,q , g > 2π dpdq, On the other hand < Bφp,q Z ,Zg > 2 2 1 = et̄w̄−1/2|w| −1/2ipq g(t)e−|t| dtdt̄, 2π By Lemma 2, 2 = e−1/2|w| −1/2ipq g(w̄) Thus we obtained our desired result. Proposition 3([2]). Suppose that F (p, q) ∈ L1 (R2 ) and F (p, q) is rotational invariant function, i.e. F (p, q) = F̃ (r2 ), (r2 = p2 + q 2 ). Then (i) Hermite functions hm (x) are eigenfunctions of Daubechies operator PF . F (p, q) is rotational invariant function, i.e. F (p, q) = F̃ (r2 ), (r2 = p2 + q 2 ). Then (i) Functions z m are eigenfunctions of operator B ◦ PF ◦ B −1 . PF (hm )(x) = λm hm (x), (m ∈ N), Z ∞ 1 (ii) λm = e−s sm F̃ (2s)ds, (m ∈ N). m! 0 (B ◦ PF ◦ B −1 )(z m ) = λm z m , (m ∈ N), Z 1 ∞ −s n e s F̃ (2s)ds, (n ∈ N). (ii) λn = n! 0 (Proof) SAMPTA'09 314 By Theorem 2, we have (B ◦ PFZ ◦ZB −1 )(z m ) 2 1 = F (2|w|2 )ezw wm e−|w| dw ∧ dw, 2πi C PFa = As a corollary of Proposition 3, we obtained following Daubechies’s results in section 4. Proposition 4([8]) Let {λm } be eigenvalues of PF . Then there exists a positive constant C such that C , (m ∈ N). |λm | ≤ p |m| Λ(w) = ∞ X λm w m . m=0 We call Λ(w) generating function of eigenvalues of Daubechies Localization Operator. Theorem 3 Under the same assumptions in Prop. 3, we have (B ◦ PF ◦ B −1 )(g)(z) = (2πi)−n I z dt g(t)Λ( ) , t t (∀g ∈ BF ) (Proof) Suppose that g(z) ∈ BF . We consider Taylor expansion of g(z) at the origin. Put ∞ X am z m g(z) = m=0 By Proposition 3, we have (B ◦ PF ◦ B −1 )(z m ) = λm z m . So (B ◦ PF ◦ B −1 )(g)(z) = (B ◦ PF ◦ B −1 )( = ∞ X am λm z m = (2πi)−n m=0 I ∞ X am z m ) m=0 z dt g(t)Λ( ) t t Hence we have I z dt −1 −1 (B ◦ PF ◦ B )(g)(z) = (2πi) g(t)Λ( ) . t t 6. An Example of Daubechies Localization Operator In this section we will consider following special Daubechies localization operators. Put a−1 2 a−1 2 2 Fa (p, q) = e 2a (p +q ) = e 2a r , Then a λm = am+1 , Λ(w) = . 1 − aw m+1 hm (x). PFa (hm )(x) = a SAMPTA'09 am+1 hm (x)hm (y). m=0 Employing polar coordinte transform w = reiθ and s = r2 , we have Z ∞ 1 = zm e−s sm F̃ (2s)ds. m! 0 Put ∞ X (0 < a < 1). valids in operator sense. ∞ X am+1 |m >< m|, (PFa = in Dirac’s Notation.) m=0 −1 If a = 2 , this is Schatten decomposition of PFa and PFa is called density operator in quantum statistical mechanics. Proposition 5 (Mehler’s formula [3],[5]) ∞ X am+1 hm (x)hm (y) m=0 −1 1−a 1+a 2 2 a e 4 ( 1+a (x+y) + 1−a (x−y) ) , =p 2 π(1 − a ) (|a| < 1). Corollary 3 (i) Z PFa (f ) 1+a −1 1−a 2 2 a p e 4 ( 1+a (x+y) + 1−a (x−y) ) f (y)dy, = 2) π(1 − a R (f ∈ L2 ). (ii) If a ∈ C, |a| < 1, then PFa : L2 −→ L2 is bounded linear operator. (Proof) 1−a 1+a If a ∈ {a ∈ C : |a| < 1}, then real part of + 1+a 1−a is positive. So PFa is bounded linear operator from L2 to L2 . Namely, we obtained analytic continuation of PFa under the condition (a ∈ C, |a| < 1). 7. Realization of PFa in Bargmann - Fock space In this section we will consider PFa in Bargmann - Fock space. Proposition 6 ∞ X z m w̄m am+1 √ √ . m! m! m=0 valids in operator sense. (i) B ◦ PFa ◦ B −1 = −1 (ii) (B Z Z◦ PFa ◦ B )(g)(z) 2 ia = eazw̄ g(w)e−|w| dw ∧ dw̄, 2 C (g ∈ BF ) (Proof) zm Since √ . are eigenfunctions of B ◦ PFa ◦ B −1 , m! we have ∞ X z m w̄m B ◦ PFa ◦ B −1 = am+1 √ √ . m! m! m=0 Proposition 7 Suppose that |a| < 1, (a ∈ C). Then we have (B ◦ PFa ◦ B −1 )(g)(z) = ag (az) , (g ∈ BF ). 315 (Proof) −1 −1 I z dt g(t)Λ( ) t t (B ◦ PFa ◦ B )(g)(z) = (2πi) I a dt = ag(az). = (2πi)−1 g(t) t − az Proposition 8 (i) lim PFa (f ) = (−i)Ff, a→−i lim PFa (f ) = iF−1 f, m=0 a→i is an asymptotic expansion of G(t). where F is Fourier transform. Remark Λ(w) is the Borel transform of formal power ∞ X series m!λm t−m−1 . (Proof) By Prop.7, we have m=0 (B ◦ PFa ◦ B −1 )(g)(z) = ag (az) , (i) (g ∈ BF ). lim (B ◦ PFa ◦ B −1 )(g) = lim ag(az) = g(z). a→1 a→0 This means that lim PFa = Identity operator. a→1 (ii) m!λm t−m−1 In general this series is divergent series. We put Z ∞ F̃ (2s)e−s G(t) = ds, (t ∈ C\[0, ∞]). t−s 0 We have Proposition 10([8]) Formal power series ∞ X m!λm t−m−1 For f ∈ L2 , we have lim PFa (f ) = f, (iii) ∞ X m=0 a→1 (ii) Λ(w) is called generating function for eigenvalues of PF Now we consider following formal power series : lim (B ◦ PFa ◦ B −1 )(g) = lim ag(az) a→−i a→−i = (−i)g(−iz). By (ii) in Proposition 2, this means that lim PFa = (−i)F . Since G(t) is Hilbert transform of F̃ (2s)e−s , we have Theorem 5 −1 F̃ (2s) = es lim (G(s + it) − G(s − it)) t→0 2πi We also have Theorem 6([8]) F̃ (2s) = (2π)−n es F(Λ(iv))(s), valids in distribution sense. where F is Fourier transform. a→−i Proof of (iii) is same as that of (ii). Proposition 9 (i) G = {P Fa : a ∈ C, |a| < 1} ∪ {Id } is semigroup. (ii) PFa ◦ PFa = PFab . (Proof) By Proposition 7, (B ◦ PFa ◦ B −1 )(g)(z) = ag(az), g(z) ∈ BF So, we have (B ◦ PFa ◦ PFb ◦ B −1 )(g)(z) = bag (baz) Hence we have PFb ◦ PFa = PFab . In these cases, Fa (p, q) ∈ / L1 . But these operators still define bounded operators from L2 to L2 . As seen in Proposition 8, these operators are obtained as limit of P Fa , (Fa ∈ L1 ). 8. Reconstruction formulas We assume that F (p, q)p is rotational invariant L1 function. Namely, F (p, q) = F̃ ( p2 + q 2 ). References: [1] V. Bargmann : On a Hilbert Space of Analytic Functions and an Associated Integral Transform Part I, Comm.Pure.Appl.Math, pp. 187-214(1961) [2] I. Daubechies : A time frequency localization operator; A geometric phase space approach, IEEE. Trans. Inform. theory. vol.34, pp.605-612(1988) [3] G. B. Folland : Harmonic Analysis in Phase Space, Princeton Univ. Press (1989) [4] K. Gröhenig: Foundations of Time-Frequency Analysis, Birkhäuser-Verlag, Basel, Berlin, Boston(2000) [5] M.W. Wong : Weyl Transforms, Springer-Verlag. New York. (1998) [6] M.W. Wong : Localization Operators on the WeylHeisenberg Group, Geometry, Analysis and Applications, Proceedings of the International Conference (editor:P.S.Pathak) 303-314(2001) [7] M.W. Wong : Wavelet Transforms and Localization Operator, Birkhäuser-Verlag. Basel, Berlin, Boston. (2002) [8] K. Yoshino : Localization operators in Bargmann Fock space and reconstruction formula for symbol functions, preprint (2009) In section 5, we introduced following generating function: ∞ X Λ(w) = λm w m m=0 SAMPTA'09 316 Signal-dependent sampling and reconstruction method of signals with time-varying bandwidth Modris Greitans and Rolands Shavelis Institute of Electronics and Computer Science, 14 Dzerbenes str., Riga LV-1006, Latvia. greitans@edi.lv, shavelis@edi.lv Abstract: The paper describes the sampling method of nonstationary signals with time-varying spectral bandwidth. The reconstruction procedure exploiting the low-pass filter with time-varying cut-off frequency is derived. The filter application in signal reconstruction from its level-crossing samples is shown. The results of computer simulations are presented. 1. Introduction The spectral characteristics of signals of practical interest often change with time. Generally, a signal with time-varying spectral bandwidth can be approximated with fewer samples per interval using appropriate nonequidistantly spaced samples than using uniform sampling procedure, where the sampling rate is chosen taking into account the highest signal frequency. For example, let us inspect a signal with wide bandwidth regions and narrow spectral bandwidth in the rest of signal observation. It is more efficient to sample the narrow bandwidth regions at a lower rate than the regions, where spectral bandwidth is wide. Solving this problem correctly requires the knowledge of the function of the instantaneous maximum frequency of signal. The paper will show two typical situations. First, information about the time-varying bandwidth is known a priori. In this case the deliberately non-uniform sampling instants can be calculated in advance, and reconstruction is based on application of filter with appropriate time-varying impulse response function. Second, the signal-dependent sampling scheme - level crossing sampling (LCS) is used for analog-to-digital (A/D) conversion. The idea of level-crossing sampling is based on the principle that samples are captured when the input signal crosses predefined levels. Such a sampling strategy has quite long history and is exploited for various applications [1, 2]. It has been shown that LCS has several interesting properties and is more efficient than traditional sampling in many respects [3]. In particular, it can be related to the processing of non-stationary signals, because if a waveform is changing rapidly, the samples are spaced more closely, and conversely – if a signal is varying slowly, the samples are spaced sparsely [4]. This property allows to calculate the estimate of the function of the instantaneous maximum frequency of signal from the positions of samples. In this case to reconstruct the waveform of signal, SAMPTA'09 an additional resampling procedure is needed before the use of time-varying reconstruction filter, which will be described in next section. Note that in both cases the local sampling density reflects the local bandwidth of the signal, therefore samples are spaced non-uniformly and advanced algorithms are required for digital signal processing. 2. Reconstruction of signal with timevarying bandwidth There are several methods used for reconstruction of nonuniformly sampled band-limited signals. For correct recovery, they typically require that the maximal length of the gaps between the sampling instants does not exceed the Nyquist rate [5]. If the signal is non-stationary with time-varying spectral bandwidth, satisfying globally this requirement is not an appropriate decision, because this provides redundant data. The use of level-crossing sampling scheme can reduce the amount of samples, because the intervals between samples are determined by signal local properties and by the number of quantization levels. The quality of processing can be improved if the recovery procedure takes into account the local bandwidth of the signal [6]. In the following subsections the proposed idea and methods for reconstruction using filters with timevarying bandwidth and for the estimation of local maximum frequency of signal from its level-crossing samples will be discussed. 2.1 Idea of signal-dependent reconstruction functions The sampling theorem states that every bandlimited signal s(t) can be reconstructed from its equidistantly spaced samples if the sampling rate equals or exceeds the Nyquist rate 2Fmax , where Fmax is the maximum frequency in the signal spectrum. The reconstruction in time domain can be expressed as ŝ(t) = N −1 X s(tn )h(t − tn ), (1) n=0 where sb(t) denotes reconstructed signal, N is the number of the original signal samples s(tn ) and h(t) is an appropriate impulse response of the reconstruction filter, classi- 317 cally, sinc-function h1 (t) = sinc(2πFmax t) As the sampling instants tn = response n 2Fmax , (2) then the impulse h1 (t − tn ) = h1 (t, tn ) = sinc(2πFmax t − nπ), (3) where h1 (t − tn ) = h(t, tn ) is written as the function of two arguments. The reconstructed signal becomes ŝ(t) = N −1 X s(tn )h1 (t, tn ) (4) n=0 If the signal with time-varying frequency bandwidth fmax (t) is considered, then the sampling rate of the signal according to Nyquist must be at least 2Fmax , where Fmax = max(fmax (t)). In this case any information about the local spectral bandwidth is ignored during the sampling process. To take it into account, it is proposed instead of h1 (t, tn ) to use more general function h2 (t, tn ) = sinc(Φ(t) − Φ(tn )) = sinc(Φ(t) − nπ), (5) Rt where Φ(t) = 2π 0 fmax (t)dt is the phase of the sinusoid, whose frequency changes in time as fmax (t), t ≥ 0 and sampling instants tn are chosen such that Φ(tn ) = nπ. If the signal is stationary and band-limited fmax (t) = const = Fmax , Eq. (3) and (5) become equivalent. In case of non-constant fmax (t) waveform of the reconstruction function h2 (t, tn ) and the desired sampling instants tn are determined by fmax (t). Samples are spaced non-equidistantly and the mean sampling frequency can be less than it is required by Nyquist criterion, which, in this case, should be satisfied rather in local than in global sense. 2.2 Reconstruction algorithm To reconstruct the non-uniformly sampled signal according to equation (1), the reconstruction procedure involves signal resampling to the equidistantly spaced sampling set 1 . {tn } with sampling period ∆t = tn − tn−1 = 2Fmax The estimation of ŝ(tn ) is possible according to the simple iterative algorithm [5] the idea of which is to interpolate the sampled band-limited signal s(t) by the sum P šs(tm ) (t) = m s(tm )ψm and filter it in order to remove high frequencies. Piecewise linear interpolation, which is well suited to level-crossing samples, uses ψm consisting of the triangular functions  t−tm−1   tm −tm−1 for tm−1 ≤ t < tm , tm+1 −t ψm (t) = tm+1 for tm ≤ t < tm+1 , (6) −tm   0 elsewhere. It is proved [5] that if the maximum length of the gaps 1 , then evbetween the sampling instants τmax ≤ 2Fmax ery s(t) can be reconstructed from the values s(tm ) of an arbitrary τmax -dense sampling set {tm } iteratively. The recovery algorithm can be written as: ŝ0 (tn ) = šs(tm ) (tn ); ŝ0 (t) = C [ŝ0 (tn )] ; ŝi (tn ) = ŝi−1 (tn ) + š(s−si−1 )(tm ) (tn ); ŝi (t) = C [ŝi (tn )] , SAMPTA'09 (7) Figure 1: Piecewise polynomial p1k (t) approximation. where i indicates the number of iteration. The linear operator C denotes filtering as the convolution of samples s(tn ) with impulse response h1 (t, tn ) of the filter according to Eq. (4) C [s(tn )] = N −1 X s(tn )h1 (t, tn ) (8) n=0 The sampling of non-stationary signal using levelcrossing scheme does not ensure the satisfaction of the 1 . Direct application of the requirement τmax ≤ 2Fmax above described algorithm leads to a considerable reconstruction error, therefore two substantial enhancements are introduced to the algorithm - performing resampling to the non-equidistantly spaced values and the use of filter with impulse response h2 (t, tn ) instead of classical h1 (t, tn ). The resampling instants tn are determined by Φ(t), which depends on fmax (t), that in general case is not known in advance. To solve this problem, an algorithm is developed, which estimates the time-varying instantaneous maximum frequency using information about locations of level-crossings. 2.3 Estimation of instantaneous maximum frequency The obvious ways to estimate the local bandwidth of the signal is by finding its time-frequency representation (TFR) using, for example, short-time Fourier transform, wavelet transform or Wigner-Ville distribution. These methods are developed for uniformly sampled signals, however, there are some modifications in order to find the TFR of non-uniformly sampled signals [7]. The use of such approach is time consuming, therefore a simpler method is considered that is based on empirical evaluations. To estimate the function fbmax (t) from samples s(tm ), starting with the initial index value m = 0 two pairs of successive level-crossing samples s(tm′j ) = s(tm′j +1 ) and s(tm′′j ) = s(tm′′j +1 ) are found such that m′′j > m′j and the difference m′′j − m′j is minimal. Thereafter the next two pairs are found considering that m′j+1 = m′′j . For each j = 1, 2, . . . the value f (tj ) is calculated as  −1 f (tj ) = tm′′j + tm′′j +1 − tm′j − tm′j +1 , (9) where tj =  1 tm′′j + tm′′j +1 − tm′j − tm′j +1 4 (10) 318 a) b) 1,2 f, [Hz] s(t), [V] 1.25 0 0,8 0,4 −1.25 0 50 100 t, [s] 150 0 0 200 50 100 t, [s] 150 200 Figure 2: (a) Test signal sampled by Φ(tn ) = nπ and (b) frequency traces of its components. a) b) 1.2 f, [Hz] s(t), [V] 1.25 0 0.8 0.4 −1.25 0 50 100 t, [s] 150 200 0 0 50 100 t, [s] 150 200 Figure 3: (a) Test signal sampled by level-crossings and (b) estimated instantaneous maximum frequency fbmax (t) as solid line, true instantaneous maximum frequency as dashed line and f (tj ) as black points. If a single sinusoid is sampled, then f (tj ) = f (tj+1 ) for all j and it equals the frequency of the sinusoid. If the signal consists of more harmonics, then f (tj ) for different PJ j vary around the average value of f¯ = J1 j=1 f (tj ), where J is the total number of detected pairs within the observation time of the signal. Experiments show that f¯ is close to the frequency of the highest component. Thus, the estimate of function of instantaneous maximum frequency fbmax (t) can be obtained by {f (tj )} approximation with piecewise polynomials prk (t) of order r. By choosing the number L > 1 the observation interval of signal is divided into subintervals ∆Tk : t ∈ [tk,1 ; tk,2 ] , (11) where k = 0, 1, . . . is the number of subinterval and tj=kL + tj=kL+1 , 2 tj=(k+1)L + tj=(k+1)L+1 = 2 tk,1 = tk,2 ensure prk−1 (tk,1 )(0) = prk (tk,1 )(0) , prk (tk,2 )(0) = prk+1 (tk,2 )(0) prk−1 (tk,1 )(1) = prk (tk,1 )(1) , prk (tk,2 )(1) = prk+1 (tk,2 )(1) .. . prk−1 (tk,1 )(r) = prk (tk,1 )(r) , prk (tk,2 )(r) = prk+1 (tk,2 )(r) and the value of expression K−1 X (k+1)L X 2 [f (tj ) − prk (tj )] = min is minimal. The denotation (. . .)(r) means the derivative of order r and K is the total number of subintervals. After solving the minimization task using the method of least squares, the coefficients of polynomials prk (t) are obtained and the estimate of instantaneous maximum frequency fbmax (t) = prk (t), if tk,1 ≤ t ≤ tk,2 (12) (13) k=0 j=kL+1 (14) depends on the number L of samples f (tj ) per subinterval. To reduce the dependency the final frequency estimate is obtained by averaging fbmax (t) calculated for different L values. The example of piecewise polynomial of order r = 1 approximation when L = 7 is shown in Fig. 1 3. Simulation results For each subinterval ∆Tk the coefficients ak,r , ak,r−1 , . . . , ak,1 , ak,0 of polynomial prk (t) = ak,r tr + ak,r−1 tr−1 + · · · + ak,1 t + ak,0 are found to SAMPTA'09 The methods described in previous section are applied to reconstruct nonstationary signal from its nonuniform sam- 319 a) b) reconstruction error, [V] e(t), [V] 1.25 0 −1.25 0 50 100 t, [s] 150 200 0.2 0.15 0.1 0.05 0 0 10 1 10 number of iteration i 2 10 Figure 4: (a) The difference between original and recovered signal from its 349 level-crossing samples after 10 iterations and (b) reconstruction error (solid lines - reconstruction from level-crossings using h2 (t, tn ), dashed lines - reconstruction from level-crossings using h1 (t, tn ), dotted line - reconstruction from samples obtained by Φ(tn ) = nπ). ples s(tn ) obtained in two different ways. The first one is when fmax (t) is given and sampling instants tn satisfy Φ(tn ) = nπ (Fig. 2). The second way is by level-crossing sampling and fmax (t) is not known in advance (Fig. 3). In the first case 239 nonequidistantly spaced samples were obtained during 200 seconds of the test signal, which consists of three sinusoids with constant amplitudes and timevarying frequencies as shown in Fig. 2b. As the reconstructed signal according to Eq. (4) using h2 (t, tn ) differs insignificantly from the original one, it is not illustrated here. In order to obtain similar result in uniform sampling case, at least 360 samples would be required since the maximum frequency of the signal is Fmax = 0.9 Hz. In the level-crossing sampling case 349 samples were captured using 6 quantization levels (Fig. 3a). To recover the signal the first task was to find the values f (tj ) according to Eq. (9) in order to estimate the instantaneous maximum frequency (14). In Fig. 3b f (tj ) are shown as black points, true fmax (t) as dashed line and calculated fbmax (t) as solid line. The similarity between frequency traces is obvious. The second step was to recover the original signal according to Eq. (7) using level-crossing samples and estimated fbmax (t). The difference signal ei (t) = s(t) − si (t) after 10 iterations q i = 10 is illustrated in Fig. 4a. The reRT construction error T1 0 ei (t)2 dt reduces as the number of iterations i increases. It is shown in Fig. 4b as a grey solid line. The grey dashed line corresponds to reconstruction error, when instead of time-varying bandwidth filter h2 (t, tn ) the filter with constant cut-off frequency of Fmax = 0.9 Hz and impulse response h1 (t, tn ) is used. In this case the achieved result is not so good as the reconstruction quality remains only in intervals, where the sampling density is sufficient. The reconstruction error can be reduced by decreasing the distance between quantization levels giving 437 level-crossing samples. It is shown in Fig. 4b as black solid and dashed lines. The dotted line corresponds to the first case when fmax (t) is given and sampling instants tn satisfy Φ(tn ) = nπ. 4. Conclusions The proposed approach for non-stationary signal processing uses signal dependent techniques: level crossing sam- SAMPTA'09 pling for data acquisition and filtering with time-varying bandwidth for signal reconstruction. The information carried by level-crossing samples is employed in two ways – time instants of samples are used to estimate the instantaneous maximum frequency of the signal, while the amplitude values of samples are used in reconstruction algorithm. The reconstruction procedure is based on the use of iterative filtering with time-varying bandwidth filter. The enhancement of classical signal reconstruction approach is made by introducing signal-dependent, ”nonstationary” impulse response and resampling to the corresponding, nonuniform sampling set. Speech signal processing can be quoted as one of the potential application areas of the proposed algorithm. The level-crossing sampling technique reduces the number of samples and leads to effective signal coding approaches. References: [1] P. Ellis. Extension of phase plane analysis to quantized systems. IRE Transactions on Automatic Control, 4(2):43–54, 1959. [2] M. Miskowicz. Send-on-delta concept: An eventbased data reporting strategy. Sensors, 6:49–63, 2006. [3] E. Allier and G. Sicard. A new class of asynchronous a/d converters based on time quantization. In Proc. of International Symposium on Asynchronous Circuits and Systems ASYNC’03, pages 196–205, 2003. [4] M. Greitans. Processing of non-stationary signal using level-crossing sampling. In Proc. of the International Conference on Signal Processing and Multimedia Applications SIGMAP’06, pages 170–177, 2006. [5] H. G. Feichtinger and K. Grochening. Theory and practice of irregular sampling. 1994. [6] M. Greitans and R. Shavelis. Speech sampling by level-crossing and its reconstruction using splinebased filtering. In Proceedings of the 14th International Conference IWSSIP 2007, pages 305–308, 2007. [7] M. Greitans. Time-frequency representation based chirp-like signal analysis using multiple level crossings. In Proceedings of the 15th European Signal Processing Conference EUSIPCO 2007, 2007. 320 Optimal Characteristic of Optical Filter for White-Light Interferometry based on Sampling Theory Hidemitsu Ogawa (1) and Akira Hirabayashi (2) (1) Toray Engineering Co., Ltd., 1-45, Oe 1-chome, Otsu, Shiga, 520-2141, Japan. (2) Yamaguchi University, 2-16-1, Tokiwadai, Ube City, Yamaguchi 755-8611, Japan. hidemitsu-ogawa@kuramae.ne.jp, a-hira@yamaguchi-u.ac.jp Abstract: White-light interferometry is a technique of profiling surface topography of objects such as semiconductors, liquid crystal displays (LCDs), and so on. The world fastest surface profiling algorithm utilizes a generalized sampling theorem that reconstructs the squared-envelope function r(z) directly from an infinite number of samples of the interferogram f (z). In practical measurements, however, only a finite number of samples of the interferogram g(z) = f (z) + C with a constant C are acquired by an interferometer. We have to estimate the constant C and to truncate the infinite series in the sampling theorem. In order to reduce both the truncation error and the estimation error for C, we devise an optimal characteristic of the optical filter installed in the interferometer in the sense that the second moment of the square of the interferogram is minimized. Simulation results confirm the effectiveness of the optimal characteristic of the optical filter. CCD camera Optical Filter Beam splitter A White-light source Beam splitter B O L (fixed) Reference mirror L E Object z Stage 1. Introduction White-light interferometry is a technique of profiling surface topography of objects such as semiconductors, liquid crystal displays (LCDs), and so on. It is attractive because of its advantages including non-contact measurement and unlimited measurement range in principle [1, 2, 3, 5, 6, 8, 9]. From the viewpoint of sampling theory, white-light interferometry has the following two interesting features. First, a signal to be processed, a white-light interferogram, f (z), is a bandpass signal. Second, a signal to be reconstructed from sampled values of f (z) is not the interferogram itself, but its squared-envelope function r(z). This type of sampling theorem is called a generalized sampling theorem [4, 10, 11]. The present authors also derived such a sampling theorem [9]. Based on the theorem, the world fastest surface profiling algorithm were proposed and installed in commercial systems [5]. The sampling theorem is expressed in a form of infinite series and uses samples of the interferogram f (z). In practical measurements, however, only a finite number of samples of the interferogram g(z) = f (z) + C with a constant C are acquired by an interferometer. Hence, in the algorithm, the constant C is estimated from the samples, and the infinite series is truncated with the number of samples. If both the truncation error and the estimation error for C were reduced, we can SAMPTA'09 Figure 1: Basic setup of an optical system used for surface profiling by white-light interferometry. further improve the preciseness of the algorithm. For both error reductions, it is very effective for interferograms to have small side lobes. The waveform of interferograms can be controlled by an optical filter installed in the interferometer. Hence, in this paper, we devise an optimal characteristic of the optical filter in the sense that the second moment of the square of the interferogram is minimized with a fixed band-width. We show that the optical characteristic is given by a sine curve which has a half of the period as the fixed band-width. We also show that we have a socalled uncertainty principle between the band-width and the second moment. Simulation results confirm the effectiveness of the optimal characteristic of the optical filter. 2. Surface Profiling by White-Light Interferometry Figure 1 shows a basic setup of an optical system used for surface profiling by white-light interferometry. It uses the Michelson interferometer. A beam from a white-light source passes through an optical filter. The beam is re- 321 orem for bandpass signals [7]. It is interesting that, since the squared-envelope function r(z) is the sum of squares of f (z) and its Hilbert transform, the squared-envelope function is also reconstructed from samples of f (z), not those of r(z). Indeed, the following result was established [9, 5]. The center wavelength and the bandwidth of the optical filter in Fig. 1 are denoted by λc and 2λb , respectively. Let kl and ku be angular wavenumbers defined by 2 1.5 1 kl = 0.5 5 10 15 20 25 30 z[µm] Figure 2: An example of a white-light interferogram g(z) and its sampled values. flected by the beam splitter A, and divided into two portions by the beam splitter B at the point O. One of the portions indicated by the dotted line is transmitted to a reference mirror, whose distance from the point O is L. The other portion indicated by the dashed line is transmitted to a surface of an object being observed. The height of the surface from the stage at the point P is denoted by zp . E is a virtual plane whose distance from the point O is L. z is the distance of the plane E from the stage. The two beams reflected by the object surface and the reference mirror are recombined and interfere. As the interferometer scans along the z-axis, the resultant beam intensity varies as is shown in Fig. 2 by the dotted line. It is called a white-light interferogram or simply an interferogram and denoted by g(z) = f (z) + C, where C is a constant. Its peak appears in the right side in Fig. 2 if the height zp is high, while it appears in the left side if zp is low. Hence, the maximum position of the interferogram provides the height zp . The intensity is observed by a charge-coupled device (CCD) video camera with a shutter speed of 1/1000 second. It has, for example, 512×480 detectors. Each of them corresponds to a point on the surface to be measured. Since the CCD camera outputs the intensity, for example, every 1/60 second, we can utilize only discrete sampled values of the interferogram shown by ‘•’ in Fig. 2. We have to estimate the maximum position of the interferogram from these sampled values. It is known that the envelope function m(z) shown by the solid line in Fig. 2, or its square r(z), has the same peak as the interferogram and they are much smoother than the interferogram. Hence, usually these functions are used for detection of the peak instead of the interferogram. In this paper, we use the latter r(z), which we call the squaredenvelope function. 3. (1) Two parameters ωl = 2kl and ωu = 2ku are also used. 0 0 2π 2π , ku = . λc + λb λc − λb Sampling theorem for squared-envelope functions Since the interferogram f (z) is a bandpass signal, it can be reconstructed from its samples by using the sampling the- SAMPTA'09 Proposition 1 [5] (Sampling theorem for squaredenvelope functions) Let I be an integer such that ωl , (2) 0≤I≤ ωu − ωl and ωb be any real number that satisfies  ωu  ≤ ωb (I = 0),  2 ωl ωu   ≤ ωb ≤ (I 6= 0). 2(I + 1) 2I (3) Let ωc be a real number defined by ωc = (2I + 1)ωb . Let ∆ be a sampling interval given by π ∆= , 2ωb (4) (5) and {zn }∞ n=−∞ be sample points defined by zn = n∆. (6) Then, it holds that 1. When z is a sample point zj , { ∞ }2 ∑ f (zj+2n+1 ) 4 . r(zj ) = {f (zj )}2 + 2 π 2n + 1 n=−∞ (7) 2. When z is not any sample point,  }2 { ∞ 2 ( ∑ f (z2n ) 2∆  πz ) r(z) = 2 1 − cos π ∆ z − z2n n=−∞ ( + 1 + cos πz ) ∆ }2  f (z2n+1 ) . z − z2n+1 n=−∞ { ∞ ∑ (8) To apply Proposition 1 for surface profiling, we have the following difficulties. In the proposition, an infinite number of sampled values {f (zn )}∞ n=−∞ of the interferogram f (z) are used. In practical applications, however, only a N −1 finite number of sampled values {g(zn )}n=0 of the interferogram g(z) = f (z) + C are available. Hence, we have to truncate the infinite series in Proposition 1 and approximate the sampled values f (zn ) by g(zn ) − Ĉ, where Ĉ is an estimate of C. For example, the average of g(zn ) is used as Ĉ. Now, we are suffered from the truncation error as well as the estimation error for Ĉ. Both of these errors severely affect our final goal of precise estimation of zp . 322 Theorem 1 Among second continuously differentiable functions ψ(k) ∈ C 2 [kl , ku ] satisfying 2 ψ(k) = 0 (k ≤ kl , k ≥ ku ), ψ(k) ≥ 0 (kl < k < ku ), ∫ ku {ψ(k)}2 dk = 1, 1.5 (13) (14) (15) kl 1 ψ(k) that minimizes the criterion J[ψ] is given by π(k − kl ) 1 . ψ(k) = √ sin 2ka ka 0.5 0 0 5 10 15 20 25 30 z[µm] Figure 3: A white-light interferogram g(z) when ψ(k) is rectangular. 4. Optimal characteristics of optical filter To reduce both of the errors, the following observation is crucial. As you can see in Fig. 2, only a few number of samples are located in the main lobe of g(z) while the rest of them are in side lobes. The latter mostly vanishes once the constant C is estimated precisely. This implies that, the smaller the side lobes are, the smaller the truncation error is. Smaller side lobes also lead us to better estimations of C as shown experimentally in Section 5. Fortunately, we can control the waveform of the interferogram by the optical filter in the interferometer. Let a(k) be its characteristic in terms of an angular wavenumber k. The support of a(k) is the interval kl < k < ku . Averaged attenuation rates of two beams along the dashed and the dotted lines in Figure 1 are denoted by qo (k) and qr (k), respectively. Let ψ(k) be { 2{a(k)}2 qo (k)qr (k) (k > 0), ψ(k) = (9) 0 (k ≤ 0). It is also supported on the same interval as a(k): ψ(k) = 0 (k < kl , k > ku ). (10) The function ψ(k) is related to the interferogram f (z) as f (z) = ∫ ku kl ψ(k) cos 2k(z − zp ) dk. (11) Equation (11) clearly shows that we can control f (z) by a(k) through ψ(k). To have smaller side lobes, we can easily arrive at the following idea: we design ψ(k) so that it minimizes the second moment of the square of the interferogram f (z): ∫ ∞ J[ψ] = (z − zp )2 {f (z)}2 dz. (12) −∞ Now, we are at the point to show our main result in this paper. Let ka be (ku − kl )/2. SAMPTA'09 The minimum value J0 is given by ( π )3 1 π∆2 J0 = = . 2 2 (2ka ) 2 (16) (17) The following two results are direct consequence of Theorem 1. Corollary 1 The optimal characteristic a(k) under the criterion J[ψ] is given by ( )1/2 sin π(k − kl )/2ka √ a(k) = . (18) 2 ka qo (k)qr (k) Corollary 2 The optimal waveform of the interferogram f (z) is given by where f (z) = m(z) cos(ku + kl )(z − zp ), (19) √ 4π ka cos 2ka (z − zp ) . m(z) = π 2 − 16ka2 (z − zp )2 (20) The interferogram shown in Fig. 2 was the optimal one given by Eqs. (19) and (20) while that shown in Fig. 3 is generated from a rectangular ψ(k) given by { √ 1/ ku − kl (kl < k < ku ), ψ(k) = 0 (otherwise). Though this ψ(k) is not continuously second differentiable, the conditions (13) ∼ (15) are satisfied. In both figures, λc = 600[nm] and λb = 30[nm] were used. We can see that the side lobes in Fig. 2 are much smaller than those in Fig. 3. The sampling interval used in both figures is ∆ = 1.425[µm], which is the maximum among those satisfying Eqs. (2) ∼ (5). We have six samples in the main lobe in Fig. 2 while only four samples are located there in Fig. 3 (these samples are displayed by relatively large dots compared to samples in side lobes). In a nutshell, the optimal characteristic results in fewer samples in the small side lobes. This results in small errors on the truncation and the estimation of C, which we demonstrate in the next section through computer simulations. Before proceeding simulations, let us make a final remark in this section. Corollary 3 Let σ 2 be the value of J[ψ]. Then, the following uncertainty principle holds: ( π )3 , σ 2 (2ka )2 ≥ 2 π σ2 ≥ . 2 ∆ 2 323 6. Conclusion 1.1 1.05 1 1 0.95 Rectangular 0.9 ✲ 0.85 0.8 0.75 0.8 0.7 0.65 0.6 10 10.5 11 11.5 12 12.5 13 13.5 0.6 0.4 Optimal ✲ 0.2 0 0 5 10 15 20 25 30 z[µm] Figure 4: Squared-envelope functions (the dashed lines) and reconstructed functions (the solid lines) from samples of g(z) for both of the optimal and the rectangular ψ(k). 5. Simulations We compare the optimal and the rectangular characteristics ψ(k) by computer simulations. We first sample the interferograms g(z) generated from both ψ(k) with the sampling interval ∆ = 1.425µm. Then, the averages for each sample values are computed for the estimation of C. Finally, we reconstruct the squared-envelope functions r(z) by using a finite number of g(zn ) − Ĉ instead of f (zn ) in Proposition 1. The reconstructed functions are shown in Fig. 4 by the solid lines as well as the original squaredenvelope functions by the dashed lines for both of the optimal and the rectangular ψ(k). The small window in the top-right side shows the magnified image around the peak. We can see that the reconstructed function for the optimal ψ(k) provides a much better result than that for the rectangular ψ(k). We also notice that the latter oscillates severely. The normalized truncation errors for the optimal and the rectangular ψ(k) are 0.45% and 4.68%, respectively. The former is less than 10% of the latter. When C = 1.10, its estimation results are 1.10 and 1.06 for the optimal and the rectangular ψ(k), respectively. Finally, errors for the estimation of zp are 0.05µm and 0.06µm for the optimal and the rectangular ψ(k), respectively. Even though the difference is not so significant, the oscillation of the reconstructed squared-envelope function for the rectangular ψ(k) may cause difficulties for fast search of the maximum position. We repeated the same simulations for thirty two values of zp from 10µm to 20µm. Then, averages of estimation errors were 0.0496µm and 0.0541µm for the optimal and the rectangular, respectively. They are almost the same value. However, the averages of truncation errors were 0.35% and 4.67% for the optimal and the rectangular ψ(k), respectively. The former is less than 7% of the latter. These results show the effectiveness of the optimal characteristics of the optical filter. SAMPTA'09 In this paper, we devised an optimal characteristic of the optical filter that minimizes the second moment of the square of the interferogram so that both of the truncation error and the estimation error for the constant in the interferogram are reduced. We showed that the optimal characteristic is given by a sine curve which has a half of the period as the band-width of the optical filter. Simulation results showed that the truncation error for the optimal characteristic is less than 7% of that for the rectangular one. The estimation error of the constant for the optimal characteristic was also smaller than the rectangular one. Even though the difference on the estimation error of the maximum position was not so significant, reconstructed functions for the optimal characteristic was much smoother than those for the rectangular one. These results showed the effectiveness of the optimal characteristic. Our future tasks include to produce a prototype of the optical filter with the optimal characteristic. References: [1] P.J. Caber. Interferometric profiler for rough surfaces. Applied Optics, 32(19):3438–3441, 1993. [2] S.S.C. Chim and G.S. Kino. Three-dimensional image realization in interference microscopy. Applied Optics, 31(14):2550–2553, 1992. [3] P. de Groot and L. Deck. Surface profiling by analysis of white-light interferograms in the spatial frequency domain. Journal of Modern Optics, 42(2):389–401, 1995. [4] O.D. Grace and S.P. Pitt. Sampling and interpolation of bandlimited signals by quadrature methods. The Journal of the Acoustical Society of America, 48(6):1311–1318, 1969. [5] A. Hirabayashi, H. Ogawa, and K. Kitagawa. Fast surface profiler by white-light interferometry by use of a new algorithm based on sampling theory. Applied Optics, 41(23):4876–4883, 2002. [6] G.S. Kino and S.S.C. Chim. Mirau correlation microscope. Applied Optics, 29(26):3775–3783, 1990. [7] A. Kohlenberg. Exact interpolation of band-limited functions. Journal of Applied Physics, 24:1432– 1436, 1953. [8] K.G. Larkin. Efficient nonlinear algorithm for envelope detection in white light interferometry. Journal of Optical Society of America, 13(4):832–843, 1996. [9] H. Ogawa and A. Hirabayashi. Sampling theory in white-light interferomtery. Sampling Theory in Signal and Image Processing, 1(2):87–116, 2002. [10] D.W. Rice and K.H. Wu. Quadrature sampling with high dynamic range. IEEE Transactions on Aerospace and Electronic Systems, AES-18(4):736– 739, 1982. [11] W.M. Waters and B.R. Jarrett. Bandpass signal sampling and coherent detection. IEEE Transactions on Aerospace and Electronic Systems, AES-18(4):731– 736, 1982. 324 SAMPTA'09 Poster Sessions SAMPTA'09 325 SAMPTA'09 326 Continuous Fast Fourier Sampling Praveen K. Yenduri(1) and Anna C. Gilbert(2) (1) University of Michigan, 4438 EECS building, Ann Arbor, MI 48109, USA. (2) University of Michigan, 2074 East Hall, Ann Arbor, MI 48109, USA. ypkumar@umich.edu, annacg@umich.edu Abstract: Fourier sampling algorithms exploit the spectral sparsity of a signal to reconstruct it quickly from a small number of samples. In these algorithms, the sampling rate is subNyquist and the time to reconstruct the dominate frequencies depends on the type of algorithm—some scale with the number of tones found and others with the length of the signal. The Ann Arbor Fast Fourier Transform (AAFFT) scales with the number of desired tones. It approximates the DFT of a spectrally sparse digital signal on a fixed block by taking a small number of structured random samples. Unfortunately, to acquire spectral information on a particular block of interest, the samples acquired must be appropriately correlated for that block. In other words, the sampling pattern, though random, depends on the block of interest. When blocks of interest overlap significantly, the union of the sampling patterns may not be an optimal one (it might not be sub-Nyquist anymore). Unlike the much slower algorithms, the sampling pattern does not accommodate an arbitrary block position. We propose a new sampling procedure called Continuous Fast Fourier Sampling which allows us to continuously sample the signal at a sub-Nyquist rate and then apply AAFFT on any arbitrary block. Thus, we have a highly resource-efficient continuous Fourier sampling algorithm. 1. Introduction Let x be a discrete time signal of length n which is sparse or compressible in the frequency domain but the exact frequency content depends on time. We consider the problem of computing the frequency content present in different blocks of the signal in a resource efficient manner. This problem arises in many applications such as cognitive radio [2] where a wireless node alters its transmission or reception parameters based on active monitoring of radio frequency spectrum at various times. Another application is incoherent demodulation of communication signals [3] such as FSK, MSK, OOK, etc., where the computed frequency spectrum at different times represents the message being transmitted itself. There are several Fourier sampling algorithms [1, 8, 9] with low sampling costs that reconstruct the entire spectrum of a sampled signal. These algorithms make use of a uniformly random (not structured) sample set for computations thus allowing us to compute frequencies in any SAMPTA'09 arbitrary block of interest from the signal. However, the time to reconstruct the spectrum is superlinear in signal’s size and hence are slow and inappropriate for the applications involving large signal sizes or bandwidths where just a few frequencies are of interest. Instead, we consider a sub-linear time computational method called the AAFFT (Ann Arbor Fast Fourier Transform) described in [4]. Figure 1: Figure showing the samples acquired in S1 and the samples required to apply AAFFT on B = [16, 47]. Let y be a fixed block of interest of length N in the discrete time signal x. Since x is sparse in frequency domain, it can be assumed that y has only m dominant digital frequencies, where m ≪ N . The AAFFT algorithm takes a small number of (correlated) random samples from the block of interest and produces an approximation of its DFT (identifies dominant tones), using time and storage mpoly(log(N )). If we are interested in a windowed Fourier analysis of x over windows of length N , a straightforward approach towards solving our problem using AAFFT is to divide the signal x into consecutive non-overlapping blocks of length N , generate appropriately correlated sampling patterns for each block, acquire the samples and then apply AAFFT on each block. Let us call this sample set S1. Unfortunately, S1 does not accommodate arbitrary block positions. For example, consider samples acquired in S1 from two consecutive blocks B1 and B2. Lets say we are now interested in block B which consists of second half of B1 and first half of B2 (see Figure (1)). However AAFFT cannot be applied on B since the samples acquired from B will not be appropriately structured for its application. This is illustrated in Figure (1) for a simple case of N = 32 with a dummy y-axis and a few samples plotted for clarity. We propose a new sampling procedure called the Continuous Fast Fourier Sampling that allows us to continu- 327 ously sample the signal (as opposed to division into discrete blocks) at a sub-nyquist rate and then apply AAFFT on any arbitrary block of interest. The article describes the algorithm in detail in Section (3.2), proves its correctness in Section (3.3), followed by a few numerical experiments and results in Section (3). 2. The Fourier (AAFFT) Sampling Algorithm The Fourier Sampling algorithm is predicated upon nonevenly spaced samples unlike many traditional spectral estimation techniques [6, 7] and uses a highly nonlinear reconstruction method that is divided into two stages, frequency identification and coefficient estimation, each of which includes multiple repetitions of basic subroutines. A detailed description of the implementation of AAFFT is available in [5]. Frequency Identification consists of two steps, dominant frequency isolation and identification. Isolation is carried out by a two-stage process: (i) pseudo random permutations of the spectrum, followed by (ii) the application of a filter bank with K = O(m) bands, where m = number of tones (dominant spikes) in the signal. With high probability, a significant fraction of the dominant tones fall into individual bands, isolating each tone from the others and this probability can be increased with additional repetitions. Note that all the above is carried out conceptually in the frequency domain but instantiated in the time domain. That is, we sample the permuted and filtered signal in the time domain. To carry out the computations, the algorithm uses signal samples at time points indexed by P (t, σ) = {(t + qσ) mod N, q = 0, 1, .., K − 1}, where (t, σ) is randomly chosen for each repetition. The identification stage performs group testing to determine the dominant frequency value in each of the K outputs of the filterbank. This stage uses the samples indexed at arithmetic progressions P (tb , σ) formed from each element of the geN , b = 0, 1, .., log2 (N/2). ometric progression tb = t+ 2b+1 The estimation stage uses the random sampling similar to the isolation stage for coefficient estimation of each of the dominant frequencies identified. Note that although the (t, σ) pair is chosen randomly in each repetition, the samples that result from each pair are highly structured. Let A1 = {(t, σ)} used in the frequency identification stage and similarly let A2 be defined for the estimation stage. These two sets define a sampling pattern. the theorems in Section (3.3) hold. For j = 1, .., J, denote by Ij the arithmetic progression formed by (t(j), σ), Ij = {t(j) + qσ, ∀q ≥ 0 : t(j) + qσ ≤ n} (1) N Now, consider the geometric progression tb = t + 2b+1  N for all b = 0, 1, .., α − 1. For each b, t + 2b+1 , σ is treated as another (t, σ) pair and the sequence tb (j) and the corresponding progressions Ijb can be defined. Do all the above, for each pair (tℓ , σℓ ) in A1 and A2 and denote the arithmetic progressions produced, by Iℓ,j , for j = 1, .., Jℓ . DefineSthe union of all such arithmetic Jℓ Iℓ,j . Similarly define Iℓb = progressions as Iℓ = j=0 Sα−1 b S Jℓ b B b=0 Iℓ . j=0 Iℓ,j for b = 0, .., α − 1. Now define Iℓ = Finally define ! ! [ [ B I(A1 , A2 ) = (Iℓ ∪ Iℓ ) ∪ Iℓ (2) A1 A2 Given a set of indices I, we denote by S x (I) the set of samples from signal x indexed by I. Figure 2: Calculation of N -Wraparound t(1) from t. 3.2 The CFFS Algorithm Preprocessing: INPUT: N // Block length (1) Sample-set generation : Choose A1 and A2 as defined and compute I(A1 , A2 ) (as in Equation (2)). OUTPUT: I(A1 , A2 ) // Index set Sample Acquisition INPUT: I(A1 , A2 ), x (2) sample signal x at I and obtain samples S x (I). OUTPUT: S x (I) Reconstruction S x (I), (n1 , n2 ) // boundary indices of an arbitrary block y of length N from signal x (3) calculate A′1 , A′2 (depend on (n1 , n2 ), defined in Section (3.3)) and extract S y (I(A′1 , A′2 )) ⊂ S x (I). (4) apply AAFFT on the sample-set S y (I(A′1 , A′2 )) OUTPUT:top m frequencies of x in block y = x[n1 , n2 ] INPUT: 3. 3.1 Continuous Fast Fourier Sampling Sample set construction Let n be the length of signal x which has m dominant tones that vary over time. Let the block length be N . Let K = O(m) and α = log2 (N ). Let (t, σ) be a fixed pair in A1 or A2 . Define a sequence of time points t(0) = t , t(j) = (t(j − 1) + Q(j − 1)σ)modN for j = 1, .., J, where Q(j − 1) = smallest integer such that t(j −1)+Q(j −1)σ ≥ N and J = ⌈ Kσ N ⌉. We call t(j) the “N -wraparound” of t(j − 1). Figure (2) illustrates the calculation of a N -wraparound. The choice of J is such that SAMPTA'09 3.3 Proof of Correctness of CFFS The arbitrary block y has boundaries (n1 , n2 ). To generate samples from this block, we define new sets A′1 and A′2 as follows. For every (t, σ) in A1 and A2 , let 328 i be the smallest integer such that t + iσ > n1 . Define t′ = (t + iσ)modn1 . Note that t′ is simply the n1 wraparound of t. Put A′1 = {(t′ , σ) : (t, σ) ∈ A1 } and similarly A′2 . Note that A′1 and A′2 are still random since A1 and A2 were chosen randomly. To apply AAFFT on block y we can now use samples of y indexed by the sampling pattern defined (as in Section (2.)) from A′1 and A′2 . The following theorems together show that the required samples of y are available in S x (I(A1 , A2 )). Theorem 1 For sets A′1 and A′2 as defined above, S y (I(A′1 , A′2 )) ⊂ S x (I(A1 , A2 )). Theorem 2 AAFFT can be applied on the sample-set S y (I(A′1 , A′2 )), i.e. the index set I(A′1 , A′2 ) has the required structure explained in Section (2.). Rather than giving detailed proofs, we prove a proposition that lies at the heart of the two theorems. Proposition 3 For every (t′ , σ) S y (P (t′ , σ)) ⊂ S x (I(A1 , A2 )). in A′1 or A′2 , Proof: Let (t, σ) be the pair in A1 or A2 from which (t′ , σ) was obtained. We will prove that the arithmetic progressions Ij formed by the sequence of wraparounds t(j),j = 1, .., J as defined in Section (3.1), induce modN arithmetic in the progression P (t′ , σ) (P as defined in Section (2.)). Consider the first few terms in P (t′ , σ), till (t′ + (q0 − 1)σ) mod N where q0 is the smallest integer such that (t′ + q0 σ) ≥ N . From definition of t′ observe that t′ = (t+iσ −n1 ). so y(t′ ) = x(n1 +t′ ) = x(t+σ) ∈ S x (I0 ), where I0 is defined in Equation (1). Similarly it is easy to see that the first q0 terms in S y (P (t′ , σ)) are ′ contained in S x (I0 ). Now call the next term l (t ′+ m q0 σ) N −t ′ ′ ′ − N. mod N = t (1). Observe that t (1) = t + σ  N −t  σ Similarly observe that t(1) = t + σ σ − N . Now, ′ Substituting t′ = (t + iσ − n1 ) in the  Nexpression  for t (1) −t+n1 −iσ ′ we get, t (1) = t + iσ − n1 + σ −N = σ   t + iσ − n1 + σ Nσ−t + dσ − N = t(1) + (i + d)σ − n1 , for an appropriately defined d, which can be shown to be positive. So y((t′ + q0 σ) mod N ) = y(t′ (1)) = x(t(1) + (i + d)σ) ∈ S x (I1 ), where again I1 is defined in Equation (1). Let q1 be the smallest integer such that (t′ (1) + q1 σ) ≥ N . Now it is easy to see that the next q1 terms in S y (P (t′ , σ)) are contained in S x (I1 ). Repeat this until all the terms in P (t′ , σ) are covered. Proposition 4 On average, the storage requirement of n CFFS algorithm is O( N m logO(1) N ), which is of the same order as a straightforward, fixed boundary sample set for AAFFT. 4. Results and Discussion The Continuous Fast Fourier Sampling algorithm has been implemented and tested in various settings. In particular, we performed following three experiments. First, we consider a model problem for communication devices which use frequency-hopping modulation schemes. The signal we want to reconstruct has two tones that SAMPTA'09 Figure 3: The Sparsogram for a synthetic frequencyhopping signal consisting of two tones, as computed by AAFFT (S1) and by CFFS. change at regular intervals. We apply both the straightforward AAFFT on S1 and CFFS to identify the location of the tones. Figure (3) shows the obtained sparsogram which is a time-frequency plot that displays only the dominant frequencies in the signal. We get the same sparsogram in both cases, as expected. For N = 220 , S1 samples about 0.94% of the signal whereas CFFS samples about 1.06% of the signal, which is only very slightly larger than S1. This experiment demonstrates the efficiency and similarity of the two methods and supports the proposition made in Section (3.3). Figure 4: Applying CFFS to different blocks of signal x. While S1-AAFFT cannot be applied to compute the dominant tones in any arbitrary block, the CFFS has no such limitation. This is demonstrated in the next experiment as follows. Let y be a signal of length N = 220 , with m = 4 known dominant frequencies. Let x be an arbitrary signal of length n with N ≪ n. Now let x[n1 , n2 ] be an arbitrary block of interest of length N . Set x(n1 + q) = y(q), for q = 0, 1, . . . , N − 1. Thus we have placed a copy of the known signal y in the block of interest. The CFFS was then applied and the four dominant frequencies in the block of interest were computed. The obtained values for frequencies and their coefficients match closely with those of the signal y and satisfy the error guarantees of AAFFT. The whole experiment was repeated with different values for n1 (and corresponding n2 = n1 + N − 1) and the same results were obtained. Figure (4) shows the sketch of a signal x, pre-sampled in a predetermined manner (according to CFFS), with copies of y placed at arbitrary positions. Application of AAFFT to any block with copy of y gives the same results thus demonstrating the correctness of CFFS. 329 In the final experiment, we consider the frequency hopping signal from the first experiment. Let the block size be N = 217 with unknown block boundaries. Let f1 and f2 be the respective frequencies in two adjacent blocks (f 1 in the left block). We consider the problem of finding the block boundary using CFFS with an analysis window of size N. The center of the window can be varied and a binary search can be performed for the block boundary in the following manner. If the center is to the left of the actual boundary, then the coefficient of f 1 produced by AAFFT will be higher than that of the f 2. This indicates that the center has to be moved to the right from its current position. Also the search is not strictly binary since the amount by which f 1 coefficient is higher than f 2 can be used to shift the center of the window to the right by an equivalent amount. This step can be iterated a few times to make the center converge to the actual block boundary. We express the error as the distance to the true boundary and determine what percentage of the block this distance is. Table (1) displays the error and how the error increases with decreasing SNR. Note that even in the case SNR(dB) no noise 10 8 %Error 0.39 0.58 0.70 SNR(dB) 6 4 2 %Error 0.78 0.79 1.56 Table 1: Percentage error in boundary identification. of no noise there is some inherent ambiguity in the identification of block boundary. This uncertainty is caused by two factors. First, when the analysis window has portions of both the f 1-block and f 2-block, the net signal is no longer sparse due to a sudden change in frequency and has a slowly decaying spectrum. With m = 2 the AFFT guarantees that the error made in signal approximation is about as much as the error in optimal 2-term approximation [5]. Hence a slowly decaying spectrum implies more error in the approximation. A second and more important factor is the number of samples actually acquired from the region of uncertainty around the block boundary. From the entire block, CFFS acquires about 8% samples from the N = 217 present. Assuming these samples are uniformly distributed (which is not true for CFFS), the number of samples present in the region of uncertainty (0.4%) is about 40. In practice, CFFS contains even fewer samples in the uncertainty region (about 30 on average). In terms of samples actually acquired in CFFS, the boundary estimation is off by only a few samples and hence is negligible, as it does not affect the computations. This will be true for any sparse sampling method like CFFS. Furthermore, if the uncertainty were to be reduced to 0.3% say, the boundary identification would improve by only about 6 samples on average, which again is negligible. Hence the boundary identification through the above method is accurate enough for all practical purposes. 5. Conclusions and Future Work We described and proved a sub-linear time sparse Fourier sampling algorithm called the CFFS which along with AAFFT can be applied to compute the frequency content SAMPTA'09 of sparse digital signals at any point of time. Once the block length N is selected, a sub-nyquist sampling pattern can be pre-determined and the samples can be acquired from the signal (during the runtime if required). The AAFFT can be applied to the samples corresponding to any block of length N of the signal and the dominant frequencies in that block and their coefficients can be computed in sublinear time. The algorithm requires the block length N to be fixed beforehand. Designing or extending the algorithm to work for different values of N can be considered. Adapting the algorithm to further reduce the computational complexity by using known side information about the signal can also be considered. The algorithm is also highly parallelizable and can be adapted for hardware applications. Also, we may be able to extend this sample set generation to the deterministic sampling algorithm described in [10]. Acknowledgements The authors have been partially supported by DARPA ONR N66001-06-1-2011. ACG is an Alfred P. Sloan Fellow. References: [1] E.Candes, J.Romberg and T.Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans.Inform.Theory, 52:489–509, 2006. [2] I.F.Akyildiz, W.Y.Lee, M.C.Vuran, and S.Mohanty, Next generation dynamic spectrum access cognitive radio wireless networks: A survey, Computer Networks Journal (Elsevier),50: 2127–2159, Sep 2006. [3] Simon Haykin, Communication systems. Fourth Edition, John Wiley and Sons, 2005. [4] A.C.Gilbert, S.Muthukrishnan, and M.J.Strauss, Improved time bounds for near-optimal sparse Fourier representations. Proc. SPIE Wavelets XI, 59141(A):1– 15, 2005. [5] A.C.Gilbert, M.J.Strauss, and J.A.Tropp. A Tutorial on Fast Fourier Sampling IEEE Signal Processing Magazine, 25(2):57–66, March 2008. [6] G.K. Smith and D.M. Hawkins, Robust frequency estimation using elemental sets. J.Comput.Graph.Stat, 9(1):196–214, 2000. [7] G. Harikumar and Y. Bresler, FIR perfect signal reconstruction from multiple convolutions: minimum deconvolver orders. IEEE Trans.Signal Processing, 46(1):215–218, 1998. [8] G. Cormode and S. Muthukrishnan. Combinatorial algorithms for compressed sensing. Proc.2006 IEEE Int.Conf.Information Sciences Systems, 230– 294, April 2006. [9] A.C.Gilbert, M.J.Strauss, J.A.Tropp, and R.Vershynin, One sketch for all: Fast algorithms for Compressed Sensing. Proc. 39th ACM Symposium on Theory of Computing, 237–246, June 2007. [10] M.A.Iwen, A deterministic sub-linear time sparse Fourier algorithm via non-adaptive compressed sensing methods. SODA ’08, 20–29, Jan 2008. 330 Double Dirichelet Averages and Complex B-Splines Peter Massopust (1,2) (1) Institute for Biomathematics and Biometry, Helmholtz Zentrum München, Neuberberg, Germany. (2) Center of Mathematics, Technische Universität München, Garching, Germany. massopust@ma.tum.de Abstract: A relation between double Dirichlet averages and multivariate complex B-splines is presented. Based on this reationship, a formula for the computation of certain moments of multivariate complex B-splines is derived. 1. 2. Complex B-Splines Let n ∈ N and let △n denote the standard n-simplex in Rn+1 : ( △n := u :=(u0 , . . . , un ) ∈ Rn+1 uj ≥ 0; j = 0, 1, . . . , n; Introduction n X uj = 1 . j=0 Recently, a new class of B-splines with complex order z, Re z > 1, was introduced in [4]. It was shown that complex B-splines generate a multiresolution analysis of L2 (R). Unlike the classical cardinal B-splines, complex B-splines Bz possess an additional modulation and phase factor in the frequency domain: ) The extension of △n to infinite dimensions is done via projective limits. The resulting infinite-dimensional standard simplex is given by   ∞   X N0 △∞ := u := (uj )j ∈ (R+ uj = 1 , 0)   j=0 bz (ω) = B bRe z (ω) ei Im z ln |Ω(ω)| e− Im z arg Ω(ω) , B where Ω(ω) := (1 − e−iω )/(iω). The existence of these two factors allows the extraction of additional information from sampled data and the manipulation of images. In [6] and [9], some further properties of complex Bsplines were investigated. In particular, connections between complex derivatives of Riemann-Liouville or Weyl type and Dirichlet averages were exhibited. Whereas in [6] the emphasis was on univariate complex B-splines and their applications to statistical processes, multivariate complex B-splines were defined in [9] using a wellknown geometric formula for classical multivariate Bsplines [7, 10]. It was also shown that Dirichlet averages are especially well-suited to explore the properties of multivariate complex B-splines. Using Dirichlet averages, several classical multivariate B-spline identities were generalized to the complex setting. There also exist interesting relationships between complex B-splines, Dirichlet averages and difference operators, several of which are highlighted in [5]. This short paper presents a generalization of some results found in [3, 12] to complex B-splines. For this purpose, the concept of double Dirichlet average [1] was introduced and its definition extended via projective limits to an infinite-dimensional setting suitable for complex Bsplines. Moments of complex B-splines are defined and a formula for their computation in terms of a special double Dirichlet average presented. SAMPTA'09 and endowed with the topology of pointwise convergence, i.e., the weak∗-topology. We denote by µb = lim µnb ←− the projective limit of Dirichlet measures µnb on the ndimensional standard simplex △n with density Γ(b0 ) · · · Γ(bn ) b0 −1 b1 −1 u1 · · · unbn −1 . u Γ(b0 + · · · + bn ) 0 (1) Here, Γ : C\Z− 0 → C denotes the Euler Gamma function. Let R+ := {x ∈ R | x > 0} and let C+ := {z ∈ C | Re z > 0}. 0 Definition 1 ([6]). Given a weight vector b ∈ CN + and N0 an increasing knot sequence τ := {τk }k ∈ R with the √ property that limk→∞ k τk ≤ ̺, for some ̺ ∈ [0, e), a complex B-spline Bz (• | b; τ ) of order z, Re z > 1, with weight vector b and knot sequence τ is a function satisfying Z Z Bz (t | b; τ )g (z) (t) dt = g (z) (τ · u) dµb (u) (2) R △∞ for all g ∈ S (R). Here, S (R) denotes P the space of Schwartz functions on R, and τ · u = k∈N0 τk uk for u = {uk }k∈N0 ∈ △∞ . In addition, we used the Weyl or Riemann-Liouville fractional derivative [8, 11, 13] of complex order z, Re z > 0, W z : S (R) → S (R), defined by Z ∞ (−1)n dn z (W f )(x) := (t − x)ν−1 f (t) dt, Γ(ν) dxn x 331 with n = ⌈Re z⌉, and ν = n − z. Here ⌈ · ⌉ : R → Z, x 7→ min{n ∈ Z | n ≥ x}, denotes the ceiling function. To simplify notation, we write f (z) for W z f It is easy to show that the univariate complex B-spline Bz (t | b; τ ) is an element of L2 (R) [5]. Remark 2. For finite τ = τ (n) and b = b(n) and z := n ∈ N, (2) defines also Dirichlet splines if g is chosen in C n (R). For, Dirichlet splines Dn ( · | b; τ ) of order n are defined as those functions for which Z Z g (n) (t)Dn (t| b; τ ) dt = g (n) (τ · u) dµb (u), ∆n R holds true for τ ∈ Rn+1 and for all g ∈ C n (R), and thus for g ∈ S (R). To define a multivariate analogue of the univariate complex B-splines, we proceed as follows. Let λ ∈ Rs \ {0} be a direction, and let g : R → C be a function. The ridge function corresponding to g is defined as gλ : Rs → C, gλ (x) = g(hλ, xi) for all x ∈ Rs . Definition 3 ([9]). Let τ = {τ n }n∈N0 ∈ (Rs )N0 be a sequence of knots in Rs with the property that (3) n→∞ The multivariate complex B-spline B z (• | b, τ ) : Rs → C 0 of order z, Re z > 1, with weight vector b ∈ CN + and knot sequence τ is defined by means of the identity Z Z g(t)Bz (t | b, λτ ) dt, g(hλ, xi)B z (x | b, τ ) dx = Rs R (4) where g ∈ S (R), and where λ ∈ Rs \ {0} such that λτ := {hλ, τ n i}n∈N0 is separated. As consequence of the fact that Bz (• | b; τ ) ∈ L2 (R), one obtains from the above definition that B z (• | b, τ ) ∈ L2 (Rs ) [5]. Moreover, it follows from the HermiteGenocchi formula for the univariate complex B-splines Bz ( • | b, λτ ) and (4), that B z ( x | b, τ ) = 0, when x ∈ / [τ ], the convex hull of τ . 3. Dirichlet Averages Let Ω to be a nonempty open convex set in Cs , s ∈ N, and 0 let b ∈ CN + . Let f ∈ S (Ω) := S (Ω, C) be a measurable function. For τ ∈ ΩN0 ⊂ (Cs )N0 and uP∈ △∞ , define ∞ τ · u to be the bilinear mapping (τ, u) 7→ i=1 ui τ i . The infinite sum exists if there exists a ̺ ∈ [0, e) so that p lim sup n kτ n k ≤ ̺. (5) n→∞ Here, k · k now denotes the canonical Euclidean norm on Cs . (See also [6].) SAMPTA'09 △∞ where µb = lim µnb is the projective limit of Dirichlet ←− measures on the n-dimensional standard simplex △n . We remark that the Dirichlet average is holomorphic in b ∈ (C+ )N0 when f ∈ C(Ω, C) for every fixed τ ∈ ΩN0 . (See [2] for the finite-dimensional case and [9] for the infinite-dimensional setting.) Definition 5. [1] Let f : Ω ⊂ C → C be continuous. Let b ∈ Ck+1 and β ∈ Cκ+1 + + . Suppose that for fixed k, κ ∈ N, X ∈ C(k+1)×(κ+1) and that the convex hull [X] of X is contained in Ω. Then the double Dirichlet average of f is defined by Z Z F (b; X; β) := f (u · Xv)dµkb (u)dνβκ (v), △k We denote the canonical inner product in Rs by h•, •i and the norm induced by it by k • k. p ∃ ̺ ∈ [0, e) : lim sup n kτ n k ≤ ̺. Definition 4. Let f : Ω ⊂ Cs → C be a measurable N0 0 → C over function. The Dirichlet average F : CN + ×Ω ∞ △ is defined by Z f (τ · u) dµb (u), F (b; τ ) := where u · Xv := Pk i=0 △κ Pκ j=0 ui Xij vj . Note that F (b; X; β) is holomorphic on Ω in the elements of b, β, and X. We again use projective limits to extend the notion of double Dirichlet average to an infinite-dimenional setting. To this end, let u, v ∈ △∞ and let µb = lim µnb and ←− νβ = lim νβn be the projective limits of Dirichlet mea← n− n sures µb and νβ of the form (1) on the n-dimensional 0 standard simplex, where b, β ∈ CN + . Now suppose that N0 ×N0 is a infinite matrix with the property that X ∈ C P ∞ P∞ |X ij | converges. Let i=0 j=0 u · Xv := ∞ X ∞ X ui Xij vj . i=0 j=0 Suppose that Ω ⊂ C contains the convex hull [X] of X and that f : Ω → C is continuous. The double Dirichlet average of f over △∞ is then given by Z Z F (b; X; β) := f (u · Xv)dµb (u)dνβ (v). (6) △∞ △∞ (We use the same symbol for the (double) Dirichlet average over △∞ and its finite-dimensional projections △n .) It is easy to show that Z (7) F (b; X; β) = F (β; uX)dµb (u), △∞ where uX := {hu, Xj i}j∈N0 , with Xj denoting the jcolumn of X. We note that F (b; X; β) is holomorphic in the elements of b, β, and X over △∞ . For z ∈ C+ , we define Z Z F (z) (b; X; β) := f (z) (u · Xv)dµb (u)dνβ (v). △∞ △∞ (See also [9] for the case of a single Dirichlet average.) 332 4. Double Dirichlet Averages and Complex B-Splines Assume now that the matrix X is real-valued and of the form Xij = 0, for i ≥ s and all j ∈ N0 , some s ∈ N. In other words, X ∈ Rs×N0 . Theorem 6. Suppose that β ∈ R∞ + and that Re z > 1. Ps−1 Let b := (b0 , b1 , . . . , bs−1 ) ∈ Rs be such that i=0 bi ∈ / −N0 . Assume that f ∈ S (R+ ). Further assume that uX is separated for all u ∈ △s−1 . Then Z F (z) (b; X; β) = B z (x | β, X) F (z) (b; x)dx. Rs Proof. We prove the formula first for b ∈ Rs+ . To this end, we identify u = (u0 , u1 , . . . , us−1 , 0, 0, . . .) ∈ △∞ with (u0 , u1 , . . . , us−1 ) ∈ △s−1 . By the Hermite-Genocchi formula for complex B-splines (see [6] and to some extend [9]), we have that Z F (z) (β; uX) = f (z) (u′ · uX) dµβ (u′ ) ∞ △ Z = f (z) (t)Bz (t | β, uX)dt R Substituting this expression into (7) and using (4) gives F (z) (b;X; β) = Z Z △∞ Rs f (z) (hu, xi)B z (x | β, uX) dx dµb (u). Interchanging the order of integration yields the statement for b ∈ Rs+ . To obtain the general case b ∈ Rs , we note that by Theorem 6.3-7 in [2], the Dirichlet average F can beP holomorphically continued in the b-parameters s−1 provided i=0 bi ∈ / −N0 . Remark 7. Theorem 6 extends Theorem 6.1 in [12] to complex B-splines and the △∞ -setting. 5. (See, [2], (6.6-5).) Now, let p = (p0 , p1 , . . . , ps−1 ) ∈ Rs , s ∈ N, be a multiindex all of whose components satisfy pi < − 21 . The moPs (z) ment M|p| (b; X) of order |p| := i=1 pi of the complex B-spline B z (• | β, X) is defined by Z (z) xp B z (x | β, X) dx. M|p| (b; X) := Rs Note that since B z (• | β, X) ∈ L2 (Rs ) and B z (• | β, X) = 0, for x ∈ / [X], an easy application of the Cauchy-Schwartz inequality shows that the above integral exists provided the multi-index p satisfies the aforementioned condition on its components. Using a result from [8], namely Property 2.5 (b), and requiring that Re z < Re c, we substitute the function −(c−z) into (8) to obtain f := Γ(c−z) Γ(c) (•) (z) R−(c−z) (b; x) = R−c (b; x) = s−1 Y xbi i . i=0 The above considerations together with Theorem 6 immediately yield the next result. Corollary 8. Suppose that β ∈ R∞ + and that Re z > 1. Let b := (b0 , b1 , . . . , bs−1 ) ∈ (−∞, − 12 )s be such that Ps−1 c := / −N0 . Moreover, suppose that Re z < i=0 bi ∈ Re c. Then (z) (z) M−c (b; X) = R−(c−z) (b; X; β). 6. (10) Acknowledgements This work was partially supported by the grant MEXTCT-2004-013477, Acronym MAMEBIA, of the European Commission. References: Moments of Complex B-Splines Following [2], we define the R-hypergeometric function Ra (b; τ ) : Rs+ × Ωs → C by Z (u), (8) Ra (b; τ ) := (τ · u)a dµs−1 b △s−1 where Ω := H, H a half-plane in C \ {0}, if a ∈ C \ N, and Ω := C, if a ∈ N. It can be shown (see [2]) that R−a , a ∈ C+ , has a holomorphic continuation in τ to C0 , where C0 := {ζ ∈ C | − π < arg ζ < π}. Taking in the definition of the double Dirichlet average (6) for f the real-valued function t 7→ t−c , where c := Ps−1 i=0 bi , the resulting double Dirichlet average is denoted by R−c (b; X; β) and generalizes power functions. The corresponding single Dirichlet average R−c (b; x), where x = (x0 , . . . , xs−1 ), is given by R−c (b; x) = s−1 Y i=0 SAMPTA'09 i x−b , i x∈ / [X]. (9) [1] B. C. Carlson. Appell functions and multiple averages. SIAM J. Math. Anal., 2(3):420–430, August 1971. [2] B. C. Carlson. Special Functions of Applied Mathematics. Academic Press, New York, 1977. [3] B. C. Carlson. B-splines, hypergeometric functions, and Dirichlet averages. J. Approx. Th., 67:311–325, 1991. [4] B. Forster, T. Blu, and M. Unser. Complex B-splines. Appl. Comp. Harmon. Anal., 20:261–282, 2006. [5] B. Forster and P. Massopust. Multivariate complex B-splines, Dirichlet averages and difference operators. accepted SAMPTA 2009, 2009. [6] B. Forster and P. Massopust. Statistical encounters with complex B-splines. Constr. Approx., 29(3):325– 344, 2009. [7] S. Karlin, C. A. Micchelli, and Y. Rinott. Multivariate splines: A probabilistic perspective. Journal of Multivariate Analysis, 20:69–90, 1986. 333 [8] A. A. Kilbas, H. M. Srivastava, and J. J. Trujillo. Theory and Applications of Fractional Differential Equations. Elsevier B. V., Amsterdam, The Netherlands, 2006. [9] P. Massopust and B. Forster. Multivariate complex B-splines and Dirichlet averages. to appear in J. Approx. Th., 2009. [10] C. A. Micchelli. A constructive approach to Kergin interpolation in Rk : Multivariate B-splines and Lagrange interpolation. Rocky Mt. J. Math., 10(3):485– 497, 1980. [11] K. S. Miller and B. Ross. An Introduction to the Fractional Calculus and Fractional Differential Equations. Wiley, New York, 1993. [12] E. Neuman and P. J. Van Fleet. Moments of Dirichlet splines and their applications to hypergeometric functions. J. Comput. and Appl. Math., 53:225–241, 1994. [13] S. G. Samko, A. A. Kilbas, and O. I. Marichev. Fractional Integrals and Derivatives. Gordon and Breach Science Publishers, Minsk, Belarus, 1987. SAMPTA'09 334 Sampling in cylindrical 2D PET Yannick Grondin(1,2) , Laurent Desbat(1) and Michel Desvignes(2) (1) TIMC-IMAG, UMR CNRS 5525, UJF-Grenoble 1 (GU) In3 S, Faculté de Médecine, 38706 La Tronche France (2) Grenoble-INP/Phelma/ GIPSA-LAB 961 Rue de la houille blanche BP 46 St Martin d’Heres France Yannick.Grondin@imag.fr, Laurent.Desbat@imag.fr, michel.desvignes@gipsa-lab.inpg.fr Abstract: In this paper, we study 2D cylindrical Positron Emission Tomography (2D PET) sampling. We show that rectangular sampling schemes are more efficient than usual square schemes. 1. PET and sampling 1.1 PET The aim of Positron Emission Tomography (PET) is to map the internal nuclear activity of a patient from exterior measurement. Usually, the patient received some nuclear substance by inhalation or injection. In PET this substance is tagged with a radioactive isotope, such as Carbon-11, Fluorine-18, Oxygen-15. This substance has also chemical and biological properties that enable to visualize metabolism and functions of patient organs (such as blood flow). This substance, called radiotracer, emits a positron per decay. The positron annihilates with an electron, which results in the emission of two opposite gamma rays detected in a PET system. Thanks to detectors surrounding the patient and a powerful electronic processing, coincident photon pairs can be sorted, meaning that the emission occurred on the line joining both detectors. that some activity occurs on the line joining the detectors (ψ1 , z1 ) and (ψ2 , z2 ). This line is called a LOR (Line Of Response). In 2D mode, lead rings called septa, see Fig. 2, are used to restrict detected LORs to be essentially perpendicular to the PET cylinder axis. In this case, LORs have only three parameters (ψ1 , ψ2 , z), see Fig. 3. LORs with a small oblicity (crossed LORs) are usually approximated to LORs perpendicular to the axis, between two true detectors rings, creating a virtual detection ring, allowing to improve the sampling rate along the axis direction, see Fig. 2. Detector rings Interpolated LOR Crossed LORs z Septum Figure 2: Crossed LORs interpolated to improve axial sampling . LOR(ψ1, ψ2, z) LOR(ψ1, z1, ψ2, z2) ψ2 r ψ1 ψ1 ρ f support z ψ2 r f support z ρ z2 Detectors ring z z1 Detectors ring Transverse plane Figure 1: Parametrization of a LOR with the variables (ψ1 , z1 , ψ2 , z2 ). Figure 3: Parametrization of a LOR with the variables (ψ1 , ψ2 , z). In a cylindrical PET system of radius r, see Fig. 1, the unitary detectors are distributed on a cylinder surrounding the patient (supposed to lie in a cylinder of radius ρ). Each gamma ray detector localization can be parametrized by cylindrical coordinates (ψ, z). When the coincidence on two detectors (ψ1 , z1 ) and (ψ2 , z2 ) is detected, one knows In 2D PET, after the attenuation correction [5] the measure can be modeled by g : [0, 2π] × [0, 2π] × R → R, with Z f (u (ψ1 , z) + tθ (ψ1 , ψ2 )) dt g (ψ1 , ψ2 , z) = SAMPTA'09 R t with u (ψ1 , z) = (r cos ψ1 , r sin ψ1 , z) and θ (ψ1 , ψ2 ) = 335 t 1 (cos ψ2 − cos ψ1 , sin ψ2 − sin ψ1 , 0) . Ob)| viously g satisfies the symmetry relation 2|sin( ψ1 −ψ2 2 g(ψ1 , ψ2 , z) = g(ψ2 , ψ1 , z). (1) 1.2 Sampling 2. 3D Sampling in cylindrical PET 2D mode In [1] we have established the sampling conditions of the 3D Fan-Beam X-ray Transform (3DFBXRT): Z f (u)du, De3 ⊥ f (β, α, z) = Lβ,α,t We want to sample a function g being 2π-periodic in its two first variables and in R in its third variable. This is a particular case of the general framework of sampling of function on groups, see for example [2, 3]. In this case, the Fourier transform of g ∈ C0∞ ([0; 2π[×[0; 2π[×R) can be defined by: Z Z Z 1 √ ĝ(ξ) = g(x)e−ix·ξ dx, (2π)2 2π [0;2π[ [0;2π[ R where u ∈ R3 , Lβ,α,z is the line in the plane perpendicular to e3 at abscissa z (z ∈ R), joining the source at r(cos β, sin β, 0)t + ze3 , β ∈ [0, 2π[ and the detector at angular position α ∈ [−π/2, π/2[, see Fig. 4. This geometry appears in X-ray CT scanner when considering the reconstruction of many 2D slices. Cylindrical PET in 2D mode can be linked with the 3DFBXRT in the following way: g(x) = D3D f (A(x − eπ )) (2) where x = (ψ1 , ψ2 , z)t ∈ [0; 2π[×[0; 2π[×R, ξ = where x = (ψ1 , ψ2 , z)t , eπ = (0, (p1 , p2 , ζ)t ∈ Z × Z × R and ξ · x = p1 ψ1 + p2 ψ2 + ζz.  The inverse Fourier transform defined for G a function on 1 0 Z × Z × R is given by A =  − 12 21 Z 0 0 1 G(ξ)eix·ξ Ǧ(x) = √ 2π Z×Z×R see Fig 4. Z 1 X X i(p1 ψ1 +p2 ψ2 +ζz) =√ G(p1 , p2 , ζ)e dζ. p2 2π p ∈Z p ∈Z ζ∈R 1 X 1 (SW g)(x) = √ | det W | g(y)χ̌K (x − y), 2π y∈LW where χK is the indicator function of the set K. The interpolation error is given by Z 2 ||SW g − g||∞ ≤ √ |ĝ(ξ)|dξ. 2π ξ6∈K R Thus if K is the essential support of ĝ, i.e., ξ6∈K |ĝ(ξ)|dξ can be negligible, then the interpolation error is low. The geometry of the set K can be exploited for the design of efficient sampling schemes, i.e., the choice of W satisfying the Shannon condition with | det W | maximal in order to minimize the number of sampling points. Source LOR(ψ1 , ψ2 , z) Detector 1 Ω 2 (r β ρ Detector 2 ψ1 ψ2 ρ f Ω + ρ) Ωr √ 2 2 Ωr v p1 v − Ω2 (r + ρ) −Ωr Figure 5: Kg : essential support of ĝ for η = ρ/r = 2/3, slices in the planes (p1 , p2 ) (left) and (v, ζ) (right). The 3D set Kg is just at the intersection of two cylinders of respective basis the slices in the (p1 , p2 ) and (v, ζ) and respective axis ζ and the direction perpendicular to (v, ζ) . This link allows to easily estimate the essential support of gb : Z × Z × R → R. Indeed, Z Z Z g(x)e−ix·ξ dx gb(ξ) = [0;2π[ [0;2π[ R Z Z Z D3D f (A(x − eπ ))e−ix·ξ dx = [0;2π[ [0;2π[ R Z Z Z = D3D f (Ax)e−ix·ξ+ip2 π dx = r (b) Figure 4: Fan beam (a) and natural PET (b) parametrization in a transverse plane . Ω 2 (r − Ω2 (r − ρ) = (a) − ρ) [0;2π[ α Detector r ζ 2 Let K ⊂ Z × Z × R, the non-overlapping Shannon condition associated to K for the sampling lattice LW = W Z3 ∩ ([0; 2π[×[0; 2π[×R) generated by the non singular 3 × 3 matrix W is that the sets K + 2πW −t l, l ∈ Z3 are disjoint sets in Z × Z × R. The Petersen-Middleton theorem [6, 3] yields the Fourier interpolation formula f π, 0)t , and  0 0  1 = (−1)p2 | det A| (−1)p2 | det A| [0;2π[ Z Z R [0;2π[ [0;2π[ Z Z Z Z [0;2π[ [0;2π[ D3D f (x)e−i(A −1 x)·ξ D3D f (x)e−ix·(A −t ξ) R (−1)p2 \ −t D3D f (A (ξ)) | det A| From this link we see that the essential support of gb is simply a linear transformation of the essential support of SAMPTA'09 dx R 336 dx \ D 3D f . From [1] it can be easily shown that Kg , the essential support of ĝ(p1 , p2 , ζ) when the emission function f is supposed the be essentially Ω band limited, is given by f = χB(c,0.03) + χB(c,0.05) + χB(c,0.07) where χB(c,r) is the indicator function of the ball of radius r centered on c = (0.9, 0, 0). The data are simulated for a PET of radius 1.5 with 32 rings and 300 detectors on each ring. (b) and (d) are based on a Monte Carlo (MC) simulation computed Kg = {(p1 , p2 , ζ) ∈ Z × Z × R, with GATE [4]. The phantom f is built with 5 concen2 2 2 2 2 |p1 − p2 | + r ζ < Ω r ; r|p1 + p2 | < ρ|p1 − p2 |} tric weighted ball sources (of radius r expressed in mm): f = a(χB(c,9) +χB(c,10) +χB(c,11) +χB(c,12) +χB(c,13) ), where the center c = (130, 0, 0) mm and the activity see Fig. 5 for a representation. a = 106 becquerel. The data are simulated for a PET The angles ψ1 and ψ2 parametrize the same detector ring, of radius 402 mm with 32 rings and 576 detectors on thus their sampling must be identical. We consider here each ring, imitating the ECAT EXACT HR+ scanner of only standard sampling, i.e. equidistant sampling along CTI/Siemens. We see that the simulation data are in good each direction. The most efficient diagonal matrix satisfyagreement with the theoretical results. ing the non overlapping Shannon conditions, see Fig. 6, is given by:     r 0 0 1 0 0 2π  0 1 0  2πWS−t = Ω  0 r 0  , WS = rΩ 0 0 2 0 0 2r (3) 50 100 150 200 250 300 350 400 450 500 550 100 √ - 2rΩ 300 (a) (b) (c) (d) 400 500 ζ - p2 200 √ 2 2 rΩ 2Ω 0 √ 2 2 rΩ v rΩ rΩ − Ωr 6 p1 Kg − 5Ωr 6 copies of Kg v Figure 6: Non overlapping conditions for the rectangular sampling scheme . Thus we see that the most efficient sampling distances are ∆ψ1 = 2π/rΩ(= ∆ψ2 ) and ∆z = π/Ω. lz = ∆z would thus be the detector axial length. If we approximate the detector tangential length by lt = r∆ψ1 , we see that the most efficient relation is lz = lt /2, thus the most efficient detectors from the sampling point of view are rectangular detectors. The empirical ring oversampling by rebinning the crossed LORs as in Fig. 2 yields exactly the factor 2 of oversampling in the direction z needed for efficient sampling. This is a theoretical justification of this widely used heuristic rebinning method. Figure 7: In (a) and (c) the emission function f is the sum of 3 concentric indicator functions. In (b) and (c) the data are obtained by a MC simulation of 5 concentric spherical sources. (a) and (b) slice ζ = 0 of |ĝ(p1 , p2 , ζ)| ; (c) and (d) 3D visualization of the isosurface at 1% of maximum of |ĝ(p1 , p2 , ζ)| (|ĝ(p1 , p2 , ζ)| is essentially negligible outside of this surface) . 3.2 Reconstruction resolution In Fig. 8, Fig. 9 and 10, we present the reconstruction of the clock phantom, see [8], from simple line integrals. The simulated cylindrical PET is of radius r = 1.5, the reconstruction region is of radius ρ = 1. We consider two sampling schemes with essentially the same number of data. The square scheme is based on square detectors, with lt = lz = 0.049. The number of ring is 20. The number of detectors on a ring is 190. The rectangular scheme is based on rectangular detectors, with lt = 2lz = 0.062. The number of ring is 32. The number of detectors on a ring is 150. We see in these numerical experiments that the rectangular scheme yields better reconstructions than the square scheme. 3. Numerical experiments 3.1 Essential support We have computed from numerical phantom the essential support of |ĝ(p1 , p2 , ζ)| see Fig. 7. In (a) and (c) the simulation is based on simple line integrals of a phantom f built with 3 concentric weighted ball indicator functions: SAMPTA'09 4. Conclusion We have shown the efficiency of the rectangular sampling scheme over the square scheme in 2D mode cylindrical PET. Sampling conditions in fully 3D PET as initiated in [7] are now being investigated. 337 A B 1.2 C 1.2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 1.2 1 0.2 0.2 0.8 0 0.6 0 5 10 0.4 15 20 25 30 0 0 5 10 C 15 20 25 30 20 25 30 D 0.2 1.2 0 0 5 10 A 15 20 25 1.2 30 D 1 1 0.8 0.8 0.6 0.6 0.4 0.4 1.2 1.2 1 1 0.2 0 0.6 0.6 0.4 0.4 0.2 0.2 0 0.2 0.8 0.8 0 10 20 30 40 50 60 70 0 0 5 10 15 20 25 30 E 0 5 10 B 15 20 25 30 E Figure 8: A = Original image: transverse view ; B = Image profile ; C = Original image: axial view ; D = Image profile 1 ; E = Image profile 2 . 0 0 5 10 15 F Figure 10: A = Square scheme image: axial view ; B = Rectangular scheme image: axial view ; C = Square scheme image profile 1 ; D = Rectangular scheme image profile 1 ; E = Square scheme image profile 2 ; F = Rectangular scheme image profile 2 . References: A B 1.2 1.2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 10 20 30 C 40 50 60 70 0 0 10 20 30 40 50 60 70 D Figure 9: A = Square scheme image: transverse view ; B = Rectangular scheme image: transverse view ; C = Square scheme image profile ; D = Rectangular scheme image profile . SAMPTA'09 [1] L. Desbat, S. Roux, P. Grangeat, and A. Koenig. Sampling conditions of 3D Parallel and Fan-Beam X-ray CT with application to helical tomography. Phys. Med. Biol., 49(11):2377–2390, 2004. [2] A. Faridani. An application of a multidimensional sampling theorem to computed tomography. In AMSIMS-SIAM Conference on Integral Geometry and Tomography, volume 113, pages 65–80. Contemporary Mathematics, 1990. [3] A. Faridani. A generalized sampling theorem for locally compact abelian groups. Math. Comp., 63(207):307–327, 1994. [4] S. Jan and coll. Gate: a simulation toolkit for pet and spect. Phys. Med. Biol, 49:4543–4561, 2004. [5] F. Natterer. The Mathematics of Computerized Tomography. Wiley, 1986. [6] D.P. Petersen and D. Middleton. Sampling and reconstruction of wavenumber-limited functions in Ndimensional euclidean space. Inf. Control, 5:279–323, 1962. [7] T. Rodet, J. Nuyts, M. Defrise, and C. Michel. A study of data sampling in pet using planar detectors. In IEEE Nuclear Science Symp. Conf. Rec, 2003. [8] Henrik Turbell. Cone-Beam Reconstruction Using Filtered Backprojection. PhD thesis, Linkping University, 2001. 338 Significant Reduction of Gibbs’ Overshoot with Generalized Sampling Method Yufang Hao(1) , Achim Kempf (1),(2) (1) Department of Applied Mathematics, University of Waterloo, Waterloo, ON, N2L 3G1, Canada (2) Department of Physics, University of Queensland, St Lucia 4072, QLD, Australia yhao@math.uwaterloo.ca Abstract: As is well-known, the use of Shannon sampling to interpolate functions with discontinuous jump points leads to the Gibbs’ overshoot. In image processing, it can lead to the problem of artifacts close to edges, known as Gibbs ringring. Its amplitude cannot be reduced by increasing the sample density. Here we consider a generalized Shannon sampling method which allows the use of timevarying sample densities so that samples can be taken at a varying rate adapted to the behavior of the function. Using this generalized sampling method to approximate a periodic step function, we observe a strong reduction of Gibbs’ overshoot. In a concrete example, the amplitude of the Gibbs’ overshoot is reduced by about 70%. Figure 1: Approximation of the step function by Shannon sam- 1. Introduction The Shannon sampling theorem [6] provides the link between continuous and discrete representations of information and has numerous practical uses in communication engineering and signal processing. For a review on Shannon sampling, see [7, 10, 1]. In addition, the Shannon sampling theorem has been used to interpolate samples to approximate a given function. In the use of Shannon sampling to approximate functions with discontinuous jump points, the well-known Gibbs’ overshoot [2, 3] has remained a persistent problem, leading to, e.g., Gibbs ringing in image compression [5]. The clearest example for the Gibb’s phenomenon is the periodic step function H(t), see Figure 1, where H(t) = 1 on (0, 12 ), H(t) = −1 on ( 12 , 1), H(t) = 0 at t = 0, 12 , 1, and H(t) has a period T = 1. In Figure 1, H(t) is approximated using Shannon’s shifted sinc reconstruction kernel with N = 24 sampling points on one periodic interval [0, 1). Samples are denoted by x in the plot, and the solid line at the top indicates the maximum value of the approximating function, which is 1.0640. Within an error of 0.003, the 6.40% overshoot beyond the maximum amplitude 1 of the step function H(t) can not be further reduced even if we increase the sampling density. However, using the generalized sampling method [4, 8, 9], which allows the reconstruction of a function on a set of non-equidistant sampling points, chosen adaptively according to the behavior of the function, we show that the SAMPTA'09 pling. Gibbs overshoot can be strongly reduced. For an example, see Figure 2. In Figure 2, we use the same number of points N = 24 in one period as in the case of Shannon in Figure 1, but we choose the sampling points to match the behaviour of the step function. Intuitively, the jump in the step function contains high frequencies. Thus more samples are taken near the jump points t = 0, 12 , and 1. In this example, the maximum value of the approximation is reduced to 1.0074 with an error of 0.0003. This is roughly a 70% reduction of Gibbs’ overshoot without increasing the number of samples, but only varying the local sample density. Figure 3 is a zoom-in of Figure 2 near the jump point. The dashed line on the top indicates the maximum values of the approximating function using the generalized sampling, while the solid line indicates the overshoot in the case of Shannon. 2. Generalized Shannon Sampling Method The generalized Shannon sampling theory considered here was not specifically developed for the application of reducing Gibbs’ phenomena. It was originally motivated by some fundamental physics problem in quantum gravity [4] and was introduced to engineering for spaces of functions with a new notion of time-varying Nyquist rate [8, 9]. The starting observation is that each set of Nyquist sampling points in Shannon sampling turns to be the eigen- 339 2.1 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 0 0.2 0.4 0.6 0.8 1 Figure 2: Approximating the step function by the generalized sampling method with non-equidistant sampling points.. 1.1 1.05 1 0.95 0.9 0.85 0.8 0.4 0.45 0.5 0.55 Figure 3: This is a zoom-in of Figure 2 near the jump point.. values of one of the self-adjoint extensions of a particular simply symmetric multiplication operator T with deficiency indices (1, 1), and the shifted sinc kernels are the corresponding eigenfunctions. The Shannon sampling theorem is the special case when the self-adjoint extensions of T have equidistant eigenvalues. By considering a generic such symmetric operator T , one obtains a generalized sampling method. We can not cover the mathematical derivations of the new generalized sampling method here, but we will review the key features of the generalization along with a comparison to the Shannon sampling theorem. The Shannon sampling theorem states that if a function φ(t) is in the space of Ω-bandlimited functions, i.e., φ(t) has a frequency upper bound Ω, then φ(t) can be perfectly reconstructed from its sample values {φ(tn )}n taken on a set of sampling points {tn }n with an equidistant spacing tn+1 − tn = 1/(2Ω) via: φ(t) = ∞ X n=−∞  G t, tn φ(tn ) (1) The function G(t, tn ) is the so-called reconstruction ker nel, which is the shifted sinc function sinc 2Ω(t − tn ) . The frequency upper bound Ω is called the bandwith, and the sampling rate 1/(2Ω) is the Nyquist sampling rate. SAMPTA'09 One-Parameter Family of Sampling Lattices We will call a set of Nyquist sampling points {tn }n a sampling lattice. The Shannon sampling theorem only specifies the constant spacing between adjacent points in one lattice, but it does not specify an initial sampling point. Therefore, we can parameterize all possible sampling lattices as: n+θ tn (θ) = , 0≤θ<1 (2) 2Ω Hence the Shannon sampling method possesses a natural one-parameter family of sampling lattices, and any function in the function space can be perfectly reconstructed from its values on any fixed lattice via Eq. (1). The generalized sampling method also possesses an analogous one-parameter family of sampling lattices, but the points in each lattice are generally non-equidistant now. To distinguish from the case of Shannon, we use a different parameter α in {tn (α)}n , 0 ≤ α < 1, and assume that {tn (α)}n are differentiable with respect to the parameter α: dtn (α) t′n (α) = dα Shannon’s family of sampling lattices {tn (θ)}n can be generated by a single number, namely, the constant bandwidth Ω. It is so simple because the function space in the case of Shannon has a constant bandwidth Ω. However, in the generalization, since we have a time-varying ‘bandwidth’, in the sense of Nyquist lattices with nonequidistant points, more specification is required. The entire family of sampling lattices is now generated from the knowledge of a given lattice, say {tn (0)}, and a set of corresponding derivatives {t′n (0)}n by solving for t = tn (α) in: X t′ (0) m = π cot(πα) (3) t − tm (0) m This equation implies that one sampling lattice and the correponding derivatives are enough to determine the entire family of sampling lattices, and hence the reconstruction kernel and the function space. This is important for practical purposes, because one usually takes samples of a given signal on only one lattice. The family of sampling lattices {tn (α)}n in the generalization shares many important properties of the uniform lattices {tn (θ)} of Shannon: as the parameter α (or β in the case of Shannon) increases from 0 to 1, the sampling lattices specified by the parameter move to the right on the real line simultaneously and continuously with the following continuity condition: tn (1) := lim tn (α) = tn+1 (0), t′n (1) = t′n+1 (0) (4) α→1− Hence, together, these sampling points in all lattices again cover the real line exactly once. Namely, for any t ∈ R, there exists an unique integer n and an unique α in [0, 1) such that t = tn (α). 2.2 The Generalized Reconstruction Kernel From the theory of self-adjoint extensions, if on each fixed but arbitrary lattice {tn (α)}, α fixed, we let tn = tn (α), 340 2.3 1.2 Interpolation Strategy To approximate a given function, depending on the behavior of the function, one must select a sampling lattice for interpolation. Arising from the theory of self-adjoint extensions, the chosen lattice {tn }n must have a minimum and maximum spacing, namely, there must exist positive real numbers δmin and ∆max such that: 1 0.8 0.6 0.4 0.2 0 0 < δmin ≤ ∆tn = tn+1 − tn ≤ ∆max for all n (8) −0.2 −0.4 −0.6 −3 −2 −1 0 1 2 3 Figure 4: An example of generalized sinc function (or reconstruction kernel) on an arbitrary non-equidistant sampling lattice. The stars on the real line indicate the points in an arbitrary nonequidistant sampling lattice, and the circles denote the same set of points with an amplitude 1. t′n = t′n (α), then the reconstruction kernel in the generalized sampling theorem reads: !−1/2 p X t′n t′m z(t, tn ) G(t, tn ) = (−1) | t − tn | (t − tm )2 m (5) where z(t, tn ) is the number of the sampling points {tm }m between t and tn exclusively. As functions in t, for each fixed α, the set of functions o n (6) gn(α) (t) = G(t, tn (α)) n forms a basis of the function space. Thus, indeed, in the generalized sampling theorem, every function in the function space specified by the family of sampling lattices can be expanded using these basis functions. These continuous functions in Eq. (6) have analogous properties to the shifted sinc function of Shannon: they interpolate all the points in the lattice specified by α  gn(α) (tm (α)) = G tm (α), tn (α) = δmn and their maximum values are all 1 at the sampling points about which they are ‘centered’. This is important for the stability of reconstruction. We will refer to these basis functions as generalized sinc functions. See Figure 4 for an example. It to recall that each set of basis functions n is important o (α) gn (t) specified by α spans the same function space. n This property is remarkable since as in Figure 4, the shape of the generalized sinc functions is quite non-trivial. To recover the Shannon sampling theorem as a special case, we choose any uniform sampling lattice {tn }n with 1 tn+1 − tn = 2Ω for all n, together with constant deriva′ tives tn = C. Then the reconstruction kernel  in (5) simplifies to the sinc kernel sinc 2Ω(t − tn ) , by using the following trignometric identity: +∞ X 1 π2 = sin2 (πz) k=−∞ (z − k)2 SAMPTA'09 (7) From Section 2.1 Eq. (3), we know that one also needs a set of corresponding derivatives to apply the generalized sampling method. So the question is, for a given lattice {tn }n , what is a suitable choice of the set of corresponding derivatives {t′n }n ? To this end, we notice that the derivative t′n (α) is the velocity with which the sampling points tn (α) are moving to the right along the real line for increasing α at t = tn (α). Hence, a good candidate for t′n is the distance travelled in one period of α, which is the spacing between two adjacent points ∆tn = tn+1 − tn . For symmetry, we set t′n to be the average distance between tn to its previous and successive points: t′n =  1  1 ∆tn + ∆tn−1 = tn+1 − tn−1 2 2 (9) Here a constant prefactor for the derivatives on a fixed lattice does not matter because the reconstruction kernel is independent of a scalarp multiplication of the derivatives: in (5), the prefactor in t′n -term will cancel out the one in t′m on the numerator inside the series. With this set of initial data {tn }n and {t′n }n , we have an explicit expression of the reconstruction kernel (5). Hence we can construct theinterpolating  function φ(t) through all the sample points tn , φ(tn ) n using the reconstruction formula (1). 3. 3.1 Reduction of Gibbs’ Overshoot Reconstruction of Periodic Functions The clearest example to demonstrate the reduction of Gibb’s overshoot using the generalized sampling method is the periodic step function H(t). One of the reasons for choosing a periodic function is that the infinite summations in the both reconstruction kernel (5) and the reconstruction formula (1) will simplify to a finite sum. Hence, we eliminate the truncation error in the summation. To this end, assume that the function φ(t) has a period of T , and we take N sampling points on one period [0, T ), which are denoted by {τ1 , τ2 , . . . , τN } ⊆ [0, T ). Hence, all the sampling points are tnN +k = nT + τk , 1 ≤ k ≤ N, n ∈ N (10) and from the periodictiy, we have t′nN +k = t′k , φ(tnN +k ) = φ(tk ) (11) After a lengthy calculation, the reconstruction kernel (5) 341 amplitude is 1.0193, which is a significant reduction compared to the maximum amplitude 1.0640 in Gibbs’ overshoot (Figure 1). on this periodic lattice now reads: p (−1)z(t,tnN +l ) t′k |t − tnN +K | N −1/2 π X ′ −2 π tl sin (t − τl ) T T G(t, tnN +K ) = (12) l=1 and the reconstruction formula (1) of the T -periodic function φ(t) reads: φ(t) = N X (−1)z(t,tnN +k ) k=1 N X l=1 t′l sin−2 q  π t′k cot (t − τk ) T (13) −1/2 π φ(tk ) (t − τl ) T As discussed in Section 2.3, for using the formulae (12) and (13) to approximate a periodic step function H(t), the only task now left is to find a sampling lattice adapted to the behavior of H(t). With a periodic lattice (10), we only need to pick up a finite number N of them on [0, T ). 3.2 Approximating a Periodic Step Function Before we discuss how to determine a set of nonequidistant sampling points, let us first consider why the uniform lattices of Shannon do not work very well. Intuitively, because of the sudden change in the amplitude of a step function H(t) at its jump points t = 0, 12 and 1, the function can be considered to suddenly oscillate at an “infinite” frequency in a sufficiently small neighborhoods at the jump points, namely to have an ‘infinite’ bandwidth at t = 0, 12 and 1. Recall that the constant Nyquist spacing 1/(2Ω) in the case of Shannon is inversely proportional to the bandwidth Ω. A uniform lattice implies uniform bandwidth. Intuitively, the uniform lattice in the case of Shannon is therefore not matched with the increase of bandwidth in the small neighborhoods of jump points. We therefore choose N sampling points with nonequidistant spacings so that the smallest spacing (the highest bandwidth) occurs near the jump points at t = 0, 12 , 1, and the spacing gradually increases away from the jump points (the bandwidth decreases). We used the easiest such increasing change in spacing, which is linear. Due to the symmetry of the jump points at t = 0, 12 , 1, we divide one period [0, 1) into four equal subintervals with length 14 . On the first subinterval, [0, 14 ), we choose K points so that their adjacent spacing is linearly increasing. Let δ be the linear increment in spacing, then τ1 = 0, τ2 = δ, τ3 = 3δ, . . . 1 τK = K(K − 1)δ 2 (14) The (K + 1)st point is 14 . The sampling points on ( 14 , 12 ] are a mirror image of the points on [0, 14 ) with respect to t = 14 , and the points on [ 12 , 1) repeat the ones on [0, 12 ). Therefore, we have in total N = 4K points on [0, 1). The approximation in Figure 2 is obtained in this way with K = 6. Hence it has the same total number of sampling points (N = 24) on [0, 1) as in Figure 1. Its maximum SAMPTA'09 4. Outlook The question arises how far one can ultimately reduce the Gibb’s overshoot? Is the linear change in sampling spacing, as in Eq. (14), the optimal lattice spacing to match the behavior of a step function? This question will be addressed in a longer following-up paper, in which we will pursue an analytical optimization of the Gibbs’ overshoot reduction. To this end, the fact that the closed form of the reconstruction kernel (12) is available in the case of periodic functions has an important advantage: it in effect reduces infinitely many points to a set of finitely many points. We can then analytically study the behavior of the constructed approximating functions. Eventually, we hope such an analytical study can lead us to the ultimately goal, which is to provide solution to design optimally adapted lattices for arbitrary given functions. 5. Acknowledgment This work has been supported by NSERC’s Discovery, Canada Research Chairs and CGS D2 programs. A.K. and Y.H. gratefully acknowledge the kind hospitality at the University of Queensland, where A.K. is currently on sabbatical. References: [1] J.J. Benedetto. Modern Sampling Theory. Birkhauser, Boston, 2001. [2] J.W. Gibbs. Fourier series. Nature, 59:606, 1899. [3] A.J. Jerri. The Gibbs Phenomenon in Fourier Analysis, Splines and Wavelet Approximations. Springer, 1998. [4] A. Kempf. On fields with finite information density. Phys. Rev. D, 69(124014), 2004. [5] A. Gelb R. Archilbald. A method to reduce the gibbs ringing artifact in mri scans while keeping tissue boundary integrity. IEEE Trans. on Medical Imaging, 21(4):305–319, Apr. 2002. [6] C.E. Shannon. Communication in the presence of noise. Proc. IRE, 37:10–21, Jan. 1949. [7] M. Unser. Sampling - 50 years after shannon. Proc. IEEE, 88(4):569–587, Apr. 2000. [8] A. Kempf Y. Hao. On a non-fourier generalization of shannon sampling theory. Proc. of Canadian Workshop on Information Theory, pages 193–196, 2007. [9] A. Kempf Y. Hao. On the stability of a generalized shannon sampling theorem. Proc. of International Symposium on Information Theorem and its Applications, Dec. 2008. [10] A.I. Zayed. Advances in Shannon’s Sampling Theory. CRC Press, Boca Baton, 1993. 342 Optimized Sampling Patterns for Practical Compressed MRI Muhammad Usman(1) and Philip G. Batchelor(1) (1) Division of Imaging Sciences, Kings College London, United Kingdom muhammad.3.usman@kcl.ac.uk, philip.batchelor@kcl.ac.uk Abstract: The performance of compressed sensing (CS) algorithms is dependent on the sparsity level of the underlying signal, the type of sampling pattern used and the reconstruction method applied. The higher the incoherence of the sampling pattern used for undersampling, less aliasing will be noticeable in the aliased signal space, resulting in better CS reconstruction. In this work, based on point spread function (PSF) properties, we compare random, Poisson disc and constrained random sampling patterns and show their usefulness in practical compressed sensing applied to dynamic cardiac magnetic resonance imaging (MRI). Introduction One of the main questions that arise in compressed sensing magnetic resonance imaging (CS-MRI) is: which type of sampling is optimal? The basic theory of compressed sensing as proposed by Donoho [1] and Candes [2] requires acquisition of randomized set of measurements. For MRI, this corresponds to the random sampling in Fourier domain (k-space) which results in incoherent aliasing artefacts in image space. However, random sampling requires bigger changes in amplitudes and polarity of MR system gradients, making it infeasible practically in an MR system. Figure 1 shows one dimensional gradient variations for 2D random and uniform lattice sampling patterns. From the figure, it is evident that we have bigger changes in amplitude and polarity of gradients in case of random sampling pattern than uniform lattice. The solution to this problem is to use deterministic sampling patterns. The uniform lattice pattern is a deterministic pattern but yields coherent artefacts in its PSF and hence, it does not satisfy the basic requirements of compressed sensing theory. Our goal is to find deterministic sampling patterns that have incoherent artefacts in the PSF and can yield better CS reconstruction. of PSF of the ideal sampling pattern: The near zero region around the main lobe of the PSF should be as large as possible and outside that region, PSF should resemble white noise. The samples should be placed randomly but with a restricted maximum distance between samples. These two conditions are met by Poisson disc sampling [4]. Recently, Poisson disc sampling has been shown to give good results in parallel MRI due to better reconstruction conditioning [5]. However, it also has impractical gradient requirements. Gamper [6] defined constrained random pattern with incoherent artefacts in its PSF. Constrained random pattern is a normal lattice pattern with samples shifted along one dimension randomly by -1, 0 and +1. Hence, it is a normal lattice with constrained randomization added along one direction and has moderate gradient requirements. Figure 2 shows the three candidate sampling patterns (random, Poisson disc and constrained random) with the corresponding PSFs. Like Poisson disc sampling, the constrained random sampling has a nearzero region around the main lobe in its PSF and it also possesses the uniform density of sampling both locally and globally. 1. Candidate Sampling Patterns in CS Figure 1: Gradient variation along one dimension in MR system for (a) random sampling (top) (b) uniform lattice sampling (bottom) To have minimum aliasing due to sampling below the Nyquist rate, Nayak [3] defined the following properties Additionally, due to added constrained randomization, the amplitudes of coherent side lobes in the PSF of constrained random pattern are also suppressed. In CS SAMPTA'09 343 recovery algorithms like OMP [6] which are based on picking the most significant component from the aliased space iteratively, the suppression of coherent artefacts ensures that only the right candidates are picked up in successive iterations. acceleration factors/sampling factors (SF) from 3 to 7. The x-f space corresponding to each frequency encoding index was independently reconstructed by our modified OMP method with adaptive thresholding scheme [9]. The OMP algorithm stops when maximum residual aliasing intensity in x-f space reaches the intensity level of noise. Figure 2: Three candidate sampling patterns and their corresponding PSFs: top to bottom: random, Poisson disc and constrained random 2. Experimental Setup To test the performance of three sampling patterns in dynamic cardiac MRI, two sets of dynamic cardiac data of size (nf x n p x nt, nf: number of frequency encoding indices, np: number of phase encoding indices, nt: number of time frames) (224x155x50) and (336x178x48) were acquired with a Philips MRI scanner 1.5 T, SSFP sequence, FOV 350x350 mm2. We used the jittered grid approximation of the Poisson disc sampling as proposed by Cook [7]. For CS based reconstruction, the x-f space (x: spatial location, f: temporal frequency) is chosen to be the sparse representation [8]. The x-f space representation of the dynamic cardiac data is obtained by taking the Fourier transform of dynamic MR data along the temporal dimension. Figure 3 shows the dynamic cardiac MR data and its sparse representation. For each frequency encoding index, the under-sampled data was simulated by applying the three sampling patterns to the fully sampled dynamic cardiac data in kx-t space (kx: phase encoding index, t: time) with varying SAMPTA'09 Figure 3: Dynamic cardiac MR data (a) and its x-f space representation (b), the frequency axis ‘f’ is centered around dc frequency (f=0) 3. Performance Results The CS reconstruction results for candidate sampling patterns are shown in Figure 4 to Figure 9 with different acceleration factors. For the original cardiac frame shown in Figure 4 (a), the CS reconstruction results by OMP method with under-sampling factor (SF) of 3 are shown in Figure 4 (b), (c) and (d) for random, Poisson disc and constrained random sampling patterns. The corresponding temporal profiles are shown in Figure 5. The CS reconstruction results for acceleration factors of 5 and 7 are shown in Figure 6 to Figure 9. Up to the acceleration factor of 5, the CS reconstructed data has same spatial and temporal resolution for all three sampling patterns with nearly exact signal reconstruction achieved up to SF=3 (Figure 4 and Figure 5). Below SF=5, the temporal resolution of CS reconstructed data 344 with constrained random sampling gets worse than that for the other sampling patterns (Figure 9 d). This is due to the fact that with very high acceleration factors, many locations within the constrained random pattern will have zero probability of being picked up, as the sampling locations are constrained to only one sample shift from the uniform lattice grid. pattern (Poisson disc sampling). Up to the acceleration factor of 5, the quality of the reconstructed images and the temporal resolution of the CS reconstructed data are nearly the same for random, Poisson and constrained random sampling patterns. Since the constrained random sampling has moderate gradient requirements when compared with other optimal sampling schemes, it is an excellent choice to be used as an optimal sampling pattern in practical compressed MRI. References: Figure 4: CS reconstruction results with SF=3: (a) original cardiac frame, CS reconstructed data with (b) random sampling (c) Poisson disc sampling (d) constrained random sampling Figure 5: CS reconstruction results with SF=3: (a)original temporal profile, CS Reconstructed temporal profile with (b) random sampling (c) Poisson disc sampling (d) constrained random sampling [1] D.L.Donoho, “Compressed Sensing," IEEE transactions on information theory, vol.52, no. 4, pp. 1289-1306, 2006. [2] E. Candes,”Compressive sampling," in Proceedings of the International Congress of Mathematicians, vol. 3, pp. 1433-1452, Madrid, Spain, 2006 [3] KS Nayak et al, “Randomized trajectories for reduced aliasing artifact”, in: Proceedings of the ISMRM, p 670., Sydney, 1998 [4] J. I. Yellot, “Spectral consequences of photoreceptor sampling in the rhesus retina.” Science 221, 382–385, 1985 [5] M. Lustig et al., “Autocalibrating Parallel Imaging Compressed Sensing using L1 SPIR-iT with PoissonDisc Sampling and Joint Sparsity Constraints ISMRM Workshop on Data Sampling and Image Reconstruction, Sedona '09 [6] U. Gamper et al,”Compressed sensing in dynamic MRI,"MRM, vol. 59, no. 2, pp. 365-373, 2008. [7] R. L. Cook, “ Stochastic sampling in computer graphics”, ACM Transactions on Graphics (TOG), vol.5, no. 1, p. 51-72, Jan 1986 [8] S. J. Malik et al, “x-f Choice: reconstruction of undersampled dynamic MRI by data-driven alias rejection applied to contrast-enhanced angiography. Stochastic sampling in computer graphics”, MRM, vol.56, p. 811-823, 2006 [9] M. Usman et al, “Adaptive thresholding scheme for OMP method applied to dynamic MRI”, Proc. ESMRMB, Valencia, vol. 25, no. 766, pp. 389, Oct 2008 4. Conclusion We showed that the PSF properties of constrained random sampling are similar to the optimal sampling SAMPTA'09 345 Figure 6: CS reconstruction results with SF=5: (a) original cardiac frame, CS reconstructed data with (b) random sampling (c) Poisson disc sampling (d) constrained random sampling Figure 7: CS reconstruction results with SF=5: (a)original temporal profile, CS Reconstructed temporal profile with (b) random sampling (c) Poisson disc sampling (d) constrained random sampling SAMPTA'09 Figure 8: CS reconstruction results with SF=7: (a) original cardiac frame, CS reconstructed data with (b) random sampling (c) Poisson disc sampling (d) constrained random sampling Figure 9: CS reconstruction results with SF=7: (a)original temporal profile, CS Reconstructed temporal profile with (b) random sampling (c) Poisson disc sampling (d) constrained random sampling 346 A Study on Sparse Signal Reconstruction from Interlaced Samples by l1-Norm Minimization Akira Hirabayashi (1) (1) Yamaguchi University, 2-16-1, Tokiwadai, Ube City, Yamaguchi 755-8611, Japan. a-hira@yamaguchi-u.ac.jp Abstract: We propose a sparse signal reconstruction algorithm from interlaced samples with unknown offset parameters based on the l1 -norm minimization principle. A typical application of the problem is superresolution from multiple lowresolution images. The algorithm first minimizes the l1 norm of a vector that satisfies data constraint with the offset parameters fixed. Second, the minimum value is further minimized with respect to the parameters. Even though this is a heuristic approach, the computer simulations show that the proposed algorithm perfectly reconstructs sparse signals without failure when the reconstruction functions are polynomials and with more than 99% probability for large dimensional signals when the reconstruction functions are Fourier cosine basis functions. 1. Introduction Sampling theory is at the interface of analog/digital conversion, and sampling theorems provide bridges between the continuous and the discrete-time worlds. A fundamental framework of the sampling theorems consists of data acquisition (sampling) process of a target signal and reconstruction process from the data. Classical studies assumed that both processes are fixed and known. Then, sampling theorems yield in linear formulations [9]. On the other hand, recent studies assume that sampling or reconstruction processes contain unknown factors. Then, sampling theorems become nonlinear. For example, Vetterli et al. discussed problems in which locations of reconstruction functions are unknown [11], [5]. They introduced the notion of rate of innovation, and provided perfect reconstruction procedures for signals with finite rate of innovation. The recent hot topic, compressive sampling, assumes that signals are sparse in the sense that signals are expressed by a small subset of reconstruction functions, but the subset is unknown [3], [1], [4]. It is interesting that the solution is obtained by the l1 -norm minimization. In contrast to the above studies, problems with unknown factors in the sampling process have also been discussed. For example, sampling locations are assumed to be unknown and completely arbitrary in [8] and [2]. A more restricted sampling process is interlaced sampling [7], in which a signal is sampled by a sampling device several times with slightly shifted locations. If the offset parame- SAMPTA'09 ters are unknown, the sampling theorem becomes nonlinear. A typical application is superresolution from a set of multiple low-resolution images. A replacement of a single high-rate A/D converter by multiple lower rate converters also yields within this formulation. To this problem, Vandewalle et al. proposed perfect reconstruction algorithms under a condition that the total number of unknown parameters is less than or equal to the number of samples [10]. We can find, however, practical situations in which the condition is not true. The method proposed in [2] can be applied to such situations. However, it hardly provides a high quality stable result. In order to solve these difficulties, the present author proposed an algorithm that reconstructs the closest function to a mean signal under data constraint assuming that signals are generated from a probability distribution [6]. The mean signal is, however, not always available. Hence, in this paper we propose a signal reconstruction algorithm from interlaced samples with unknown offsets using a relatively weak a priori knowledge, sparsity. The algorithm first minimizes the l1 -norm of a vector that satisfies data constraint with the offset parameters fixed. Then, the minimum value is further minimized with respect to the parameters. Even though this is a heuristic approach, the computer simulations show that the proposed algorithm perfectly reconstructs sparse signals without failure when the reconstruction functions are polynomials and with more than 99% probability for large dimensional signals when the reconstruction functions are Fourier cosine basis functions. This paper is organized as follows. Section 2 formulates the fundamental framework and defines the notion of sparsity. Section 3 introduces interlaced sampling and summarizes the conventional studies. In Section 4, we propose the l1 -norm minimization algorithm. Section 5 evaluates the algorithm through simulations, and shows that the algorithm perfectly reconstruct sparse signals with high probability. Section 6 concludes the paper. 2. Sparse Signals A signal f to be reconstructed is defined on a continuous domain D. We assume that f belongs to a Hilbert space H = H(D) of a finite dimension K. The inner product for f and g in H √ is denoted by hf, gi, and the norm is induced as kf k = hf, f i. By using an arbitrarily fixed 347 basis {ϕk }K−1 k=0 , any f in H is expressed as f= K−1 ∑ H ak ϕk . (1) k=0 f= A K-dimensional vector with k-th element ak is denoted by a. Definition 1 A signal f is J-sparse if at most J coefficients of {ak }K−1 k=0 in Eq. (1) are non-zero and the rest are zero. It should be noted that unknown factors in J-sparse signals are not only values of non-zero coefficients but also their locations. Hence, there are 2J unknown factors in a Jsparse signal. If 2J ≥ K, then the number of unknown factors is more than K, which is the number of the original unknown coefficients {ak }K−1 k=0 without sparsity. Hence, in order for sparsity to be meaningful, we assume that J < K/2. In real applications, J is supposed to be much smaller than K/2. 3. Interlaced Sampling Interlaced sampling means that a signal f is sampled M times by an identical observation device with offsets −1 (0) {δ (m) }M = 0. An M -dimensional vector m=0 , where δ (m) with m-th element δ is denoted by δ. The observation −1 device is characterized by sampling functions {ψn }N n=0 , which are given a priori. Then, the sampling function for the n-th sample in the m-th sequence is given by and the sample is expressed as dn(m) = hf, ψn(m) i. (2) (m) Let d be an M N -dimensional vector in which dn is the n+mN -th element. An M N ×K matrix with the n+mN , (m) k-th element hϕk , ψn i is denoted by Bδ . Substituting Eq. (1) into Eq. (2) yields (3) For simplicity, we assume that the column vectors of Bδ are linearly independent. Figure 1 illustrates the formulation of interlaced sampling. In order to reconstruct the signal f from interlaced samples with unknown offsets, we have to determine both (m) M −1 {ak }K−1 }m=1 . To this problem, Vandek=0 and {δ walle et al. proposed perfect reconstruction algorithms under a condition that the number of unknown parameters is less than or equal to the number of samples (m) N −1 M −1 {{dn }n=0 }m=0 , or K + M − 1 ≤ M N. (4) We can find, however, practical situations in which the condition is not true. The method in [2] can be applied SAMPTA'09 ak ϕk k=0 Reconstruction CK  a0   a =  ...  aK−1  Sampling Bδ   d= CM N d0 .. . (M −1) dN −1    Figure 1: Formulation of sampling and reconstruction. The vector a is to be estimated from the vector d. Note that there are unknown offset parameters δ in Bδ . to the situation without Eq. (4). However, the results obtained by the method tend to be unstable. The present author proposed an algorithm which uses a mean signal as a prior [6]. However, the mean signal is not always available. Hence, in this paper, we propose perfect reconstruction algorithms using a relatively weak prior, sparsity. 4. l1 -Norm Minimization Algorithm The problem which we are going to solve in this paper is stated as follows. Problem 1 Determine J-sparse vector a and δ which satisfy Eq. (3) under the condition that the column vectors of Bδ are linearly independent. ψn(m) (x) = ψn (x − δ (m) ), Bδ a = d. K−1 ∑ Because of the linear independentness, a vector a that satisfies Bδ a = d is uniquely determined as a = Bδ† d, where Bδ† is the Moore-Penrose generalized inverse of Bδ . Let us define a matrix Bε by setting an arbitrarily fixed parameter ε instead of δ. By using this matrix, a vector cε is defined as cε = Bε† d. (5) Then, our problem becomes a problem of finding a parameter ε such that the vector cε is J-sparse. It is well-known that l1 -norm minimization is effective to promote sparsity as is used in the compressed sensing [3], [1], [4]. Hence, we also employ this principle to find Jsparse vector cε . Now, our problem becomes the following problem. Problem 2 Determine ε that makes column vectors of the matrix Bε linearly independent, and minimizes l1 -norm of cε in Eq. (5): ε̂ = argminε kcε kl1 = argminε kBε† dkl1 . (6) 348 Table 1: Parameters K, J, N and M used in simulations. K J N M 4 1 2 2 6 2 3 2 8 3 4 2 10 4 5 2 12 5 6 2 ✁ ✆ ✁ ☎ ✁ ✄ ✁ ✄ ✞ The solution to Problem 2 is different from that to Problem 1 in general. Similar to the compressed sensing, the former agrees with the latter in some cases. Theoretical analyses for the agreement are still under consideration. Instead, we show simulation results in this paper. ✁ ☎ ✁ ✆ ✁ ✝ ✁ ✂ ✞ ✞ ✞ ✞ ✁ ✂ ✄ ✄ ✁ ✂ ☎ ☎ ✁ ✂ ✆ ✆ ✁ ✂ ✝ 5. Simulations (a) Reconstruction result We show computer simulations which demonstrate that the proposed algorithm perfectly reconstructs sparse signals under certain conditions. We consider two reconstruction functions, polynomial and Fourier cosine basis. ✁ ✁ ✄ ✞ ✁ ✄ ✁ ✁ ☎ ✞ ✁ ☎ ✁ ✁ ✆ ✞ ✁ ✆ ✁ ✁ 5.1 Polynomial reconstruction ✞ Let H be a space spanned by functions ϕk (x) = xk (0 ≤ k < K) for [0, l] where l is a positive real number. The inner prod∫l uct is defined by hf, gi = 1l 0 f (x)g(x)dx. Sampling (m) is assumed to be ideal, i.e., dn = f (xn + δ (m) ). The sample point xn is given by ✝ ✞ ✁ ✝ ✁ ✁ ✞ ✁ ✁ (2n + 1)l xn = (n = 0, 1, . . . , N − 1), 2N which we call the base sequence. Let l = N so that the sampling interval becomes one. Figure 2 (a) shows a simulation result, in which the dimension of H is K = 8, sparsity parameter is J = 3, the number of samples in each sequence is N = 4, and the sequence was used M = 2 times. The offset parameter is δ (1) = −0.4. The black line shows the target signal f , and ‘o’ and ‘x’ respectively show the base and the first sequences. The red line shows the reconstructed signal, from which we can see the target signal is perfectly recovered. Figure 2 (b) shows that the l1 -norm of cε is indeed minimized at ε = −0.4. We repeated the simulation for one thousand target signals with the values shown in Table 1. Then, all of the signals are perfectly recovered as well as the offset parameters. 5.2 Fourier cosine basis reconstruction We used the same setup except that the reconstruction functions are { 1 (k = 0), ϕk (x) = √ kπx 2 cos (0 < k < K). l Under the above defined inner product, {ϕk }K−1 k=0 is an orthonormal basis. SAMPTA'09 ✁ ✂ ✄ ✁ ✂ ☎ ✁ ✂ ✆ ✁ ✂ ✝ ✁ ✁ ✂ ✝ ✁ ✂ ✆ ✁ ✂ ☎ ✁ ✂ ✄ (b) l1 -norm of cε Figure 2: Simulation result. The black line shows the target signal f , and ‘o’, ‘x’, and ‘+’ respectively show the base, the first, and the second sequences. The red line shows the reconstructed signal which perfectly matches to the target signal. Figure 3 (a) shows a simulation result, in which the dimension of H is K = 60, sparsity parameter is J = 15, the number of samples in each sequence is N = 20, and the sequence was used M = 3 times. The offset parameters are δ (1) = −0.2 and δ (2) = 0.3. The black line shows the target signal f , and ‘o’, ‘x’, and ‘+’ respectively show the base, the first, and the second sequences. The red line shows the reconstructed signal, from which we can see the target signal is perfectly recovered. Unfortunately, perfect reconstruction is not always achieved. Figure 4 shows failure rates [%] of perfect reconstruction with respect to K. The dotted red and the solid blue lines show the rates when J = K/4 and J = K/6, respectively. The failure rate for J = K/4 arrives at less than or equal to 1% when K > 32, while that for J = K/6 does so when K > 30. Even though these results are only verified through simu- 349 tions are Fourier cosine basis functions. Because of the computational efficiency, the proposed algorithm is very attractive. Theoretical analyses of these results are our most important future task. 6 4 2 Acknowledgment 0 This work was partially supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Young Scientists (B), 20700164, 2008. -2 -4 References: -6 0 2 4 6 8 10 12 14 16 18 20 Figure 3: Simulation result for Fourier cosine basis functions. The black line shows the target signal f , and ‘o’, ‘x’, and ‘+’ respectively show the base, the first, and the second sequences. The red line shows the reconstructed signal which perfectly matches to the target signal. lations, the proposed approach is attractive because of its computational efficiency. It takes less than 0.4 second to find the solution for the case of K = 60, N = 20, and M = 3. [%] 30 25 J = K/4 20 15 10 J = K/6 5 0 0 10 20 30 40 50 60 K Figure 4: Failure rates of signal recovery when reconstruction functions are Fourier cosine basis functions. 6. Conclusion We proposed a sparse signal reconstruction algorithm from interlaced samples with unknown offset parameters. The algorithm is based on the l1 -norm minimization principle: First, it minimizes the l1 -norm with the offset parameters fixed. Second, the minimum value is further minimized with respect to the parameters. Even though this is a heuristic approach, the computer simulations showed that the proposed algorithm perfectly reconstructs sparse signals without failure when the reconstruction functions are polynomials and with more than 99% probability for large dimensional signals when the reconstruction func- SAMPTA'09 [1] R.G. Baraniuk. Compressive sensing [lecture notes]. IEEE Signal Processing Magazine, 24(4):118–121, July 2007. [2] J. Browning. Approximating signals from nonuniform continuous time samples at unknown locations. IEEE Transactions on Signal Processing, 55(4):1549–1554, April 2007. [3] E.J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Thoery, 52(2):489– 509, February 2006. [4] E.J. Candes and M.B. Wakin. An introduction to compressive sampling [a sensing/sampling paradigm that goes against the common knowledge in data acquisition]. IEEE Signal Processing Magazine, 25(2):21–30, March 2008. [5] P. L. Dragotti, M. Vetterli, and T. Blu. Sampling moments and reconstructing signals of finite rate of innovation: Shannon meets Strang-Fix. IEEE Transactions on Signal Processing, 55(5):1741–1757, May 2007. [6] Akira Hirabayashi and Laurent Condat. A study on interlaced sampling with unknown offsets. In Proceedings of European Signal Processing Conference 2008 (EUSIPCO2008), volume CD-ROM, 2008. [7] R.J. Marks II. Introduction to Shannon Sampling and Interpolation Theory. Springer-Verlag, New York, 1991. [8] P. Marziliano and M. Vetterli. Reconstruction of irregularly sampled discrete-time bandlimited signals with unknown sampling locations. IEEE Transactions on Signal Processing, 48(12):3462–3471, December 2000. [9] M. Unser. Sampling—50 Years after Shannon. Proceedings of the IEEE, 88(4):569–587, April 2000. [10] P. Vandewalle, L. Sbaiz, J. Vandewalle, and M. Vetterli. Super-resolution from unregistered and totally aliased signals using subspace methods. IEEE Transactions on Signal Processing, 55(7):3687–3703, July 2007. [11] M. Vetterli, P. Marziliano, and T. Blu. Sampling signals with finite rate of innovation. IEEE Transactions on Signal Processing, 50(6):1417–1428, June 2002. 350 Multiresolution analysis on multidimensional dyadic grids Douglas A. Castro(1) , Sônia M. Gomes(1) , Anamaria Gomide(2) , Andrielber S. Oliveira (1), Jorge Stolfi(2) (1) IMECC-Unicamp, Caixa Postal 6065, CEP 13083-859 Campinas-SP, Brazil. (2) IC-Unicamp, Caixa Postal 6176, CEP 13081-970 Campinas-SP, Brazil. {douglas,andriel,soniag}@ime.unicamp.br {anamaria,stolfi}@ic.unicamp.br Abstract: We propose a modified adaptive multiresolution scheme for representing d-dimensional signals which is based on cell-average discretization in dyadic grids. A dyadic grid is an hierarchy of meshes where a cell at a certain level is partitioned into two equal children at the next refined level by hyperplanes perpendicular to one of the coordinate axes which varies cyclically from level to level. Adaptivity is obtained by interrupting the refinement at the locations where appropriate scale (wavelet) coefficients are sufficiently small. One important aspect of such multiresolution representation is that we can use a binary tree data structure in all dimensions, that helps to compress data while still being able to navigate through it. Dyadic grids provide a more gradual refinement as compared with traditional multiresolution analyses that use, for instance, different quad-trees or oct-trees in 2D or 3D multiresolution applications. The cells may have different scales in different directions, this property can be explored to improve data compression of signals having anisotropic aspects. 1. dyadic grids to multiresolution analysis, using cell averaging as the discretization method. The paper is organized as follows. In Section 2 we define dyadic grids and related concepts. In Section 3 we present a general overview of multiresolution analysis. Section 4 contains numerical results on sample problems to show the efficiency of the proposed scheme. 2. Dyadic grids Let the coordinates of Rd be indexed from 0 to d − 1. An infinite dyadic grid is a hierarchy of meshes that begins with a d-cube at level k = 0, and, for each higher level k > 0, is the result of dividing each cell of level k into two equal children by a hyperplane perpendicular to the coordinate axis (k mod d) [2]. Figure 1 illustrates five steps of the refinement process for d = 3. Introduction In recent years, many multiscale techniques have been used to provide more efficient algorithms than those that use just one level of resolution. In such frameworks, the differences between the information at consecutive levels of refinement are computed, and only the significant coefficients are stored. These are the principles of wavelet compression which have being successfully applied in many different contexts [3]. For example, multiresolution finite volume schemes of Müller [6] and Domingues et al. [4] use adaptive grids that are dynamically obtained by taking local regularity information indicated by wavelet coefficients in the context of multiresolution analysis for cell averages of signals. Such adaptive discretizations allow the efficient solution of problems with vastly different scales of detail in different parts of the domain. For computational efficiency, one important aspect of such multiresolution methods is the topology of the mesh and data structure used to represent it. Often quad-grids and oct-grids are used for 2D and 3D domains, respectively, represented by quad-tree and oct-tree data structures [1]. We describe here a type of mesh, the dyadic grid, that can be efficiently represented by a binary tree, in domains of arbitrary dimension. For illustration, we apply adaptive SAMPTA'09 Figure 1: 3D dyadic grids. In practice, one uses only finite segments of this grid, where the subdivision stops at a maximum level. In a regular dyadic grid, the refinement stops at the same level everywhere. In an irregular grid, the maximum level varies from place to place. The topology of a dyadic grid can be represented by a 02 binary tree. This is a data structure consisting of a set 351 R Detail coefficients. In the structure we do not store the averages (f¯ck or fˆck ), but only the details or wavelet coeficients. Each detail dkc is the difference between the exact average in the cell c and the value predicted for it by formulas (3) and (4) from the cell’s parent and its neighbors: d−1 c cr c c dk+1 = f¯ck+1 − fˆck+1 . c i Note that the detail of the root cell is not defined. Figure 2: Definition of c− , c+ , c, cr . of elements named nodes, among which there is a special node r, the root; every node has either zero or two children nodes; and every node, except the root, has exactly one parent node. A node that has no children is called a leaf node. The children of a non-leaf node t are called the left child tℓ and the right child tr . Each node of this tree represents a cell that appeared at some level of the subdivision; the leaf nodes represent the cells that weren’t divided. By convention, the left child tℓ of a non-leaf node t in level k represents the “lower” half cℓ of the cell c represented by t; that is, the half whose projection on the axis i = k mod d has smallest i-coordinates. 3. Multiresolution analysis In mutiresolution analysis, signals can be represented in two ways, as ordinary samples at each scale, or as differences between two consecutive scales. Connecting these two views are the prediction and the restriction operators. The prediction operator Pkk+1 takes information from a coarse level k and gives an estimate for the information at the next finer level k + 1. Conversely, the restriction opk erator Pk+1 takes information from a fine level k + 1 and gives an estimate of the information at a coarser level k. In this paper, the samples of a d−dimensional signal f are averages computed over the cells of a d-dimensional dyadic grid. That is, the sample associated with a cell c in level k of the grid is Z 1 f (x)dx, (1) f¯ck = |c| c where |c| is the volume of c. The restriction operation is therefore (trivially and exactly) the sum of the averages in the children cells, i 1h f¯ck = f¯ck+1 + f¯ck+1 . (2) r ℓ 2 In the other direction, we predict the cell average of a child cell cr or cℓ by the formulas f¯ck+1 ≈ fˆck+1 r r f¯ck+1 ≈ fˆck+1 ℓ ℓ i 1h = f¯ck + f¯ck+ − f¯ck− 8 i h 1 k = f¯c − f¯ck+ − f¯ck− 8 (3) (4) where c− and c+ are the two closest neighbor cells of c at level k in the direction of refinement. See Figure 2. These estimators are exact for quadratic polynomials. SAMPTA'09 (5) Analysis and synthesis. The analysis algorithm computes the details of every cell, given the average values f¯ck for every cell c. It scans the tree bottom-up, level by level. For each non-leaf cell c in level k, it executes h i ¯k+1 ; δ̄ ← 12 f¯ck+1 − f cℓ h r i δ̂ ← 18 f¯ck+ − f¯ck− ; (6) δ ← δ̄ − δ̂ dkcr ← +δ; dkcℓ ← −δ. Once the detail dkc of a cell has been computed, its average f¯ck is no longer needed, so we may store the detail in its place. In the root node r, however, we must still keep the average f¯r0 of the function over the whole domain. The inverse of the analysis algorithm is the syntesis algorithm, which recomputes the averages f¯ck from the details. It scans the tree top down, level by level. At each cell c in level k, it executes h i δ̂ ← 18 f¯ck+ − f¯ck− ; δ̄ ← δ̂ − dk+1 cr ; ¯k + δ̄ f¯ck+1 = f c r f¯ck+1 = f¯ck − δ̄. ℓ (7) After this step, the details dk+1 and dk+1 of the children cr cℓ are no longer needed, and can be overwritten with the reconstructed averages f¯ck+1 and f¯ck+1 . r ℓ Compact representation. These algorithms show that knowledge of the cell averages for all leaves is equivalent to knowledge of the average value f¯r0 for the root cell together with the detail of every right child cell. To save space, we could store the detail of the right child in its parent’s node (and keep the domain average f¯r0 in variable external to the tree). Then the leaf nodes would carry no information, and could be omitted from the structure. We will refer to this variant (which is an ordinary binary tree) as the compact tree representation. Adaptive resolution grid. As in any wavelet representation, we can save space and processing time by pruning all sub-trees which do not contribute significantly to the reconstructed signal. If we start with a tree of sufficient depth, we can eliminate all sibling leaf nodes cℓ and cr such that dkc falls below a prescribed tolerance ǫk . This will condition implies that the predictions fˆck+1 and fˆck+1 r ℓ k+1 ¯ be very close to the actual averages fcℓ and f¯ck+1 . Here r we use Harten’s thresholds [5], ǫk = (1 − q)q (L−k) ǫ, (8) 352 where q and ǫ are specified by the user, with ǫ > 0 and 0 < q < 1, and L is the maximum level of the initial tree. 4. Numerical results In order to compare the efficiencies of dyadic grids and quad-grids, we performed the multiresolution analyses of two different examples in 2D, using cell-average discretization. √In all tests, the root cell was the rectangle [0, 1] × [0, 22 ], and the starting tree was a uniform grid with 210 × 210 = 220 = 1, 048, 576 leaf cells. This corresponds to tree structures with L = 20 and L = 10 levels for dyadic grid and quad-grid frameworks, respectively. The cell averages f¯cL were computed for every leaf cell c by Gaussian quadrature with 5 × 5 sampling points. The trees were pruned as described in the previous section, with threshold parameters ǫ = 0.1 and the q = 0.5.PThe number of non-leaf nodes in the initial 19 tree was i=0 2i = 1, 048, 575 for the dyadic grid, and P9 i i=0 4 = 349, 525 for the quad-grid. In the first test, we used the signals Figure 3: Pruned dyadic tree (top) and dyadic grid (bottom) for the first signal at t = 0.1. f (x, y) = 1−tanh(100(x−0.2−t)+0.001(y −1)), (9) for t varying from 0 to 0.6 in steps of 0.1. Equation (9) describes a 2D smooth step function with an almost vertical straight front, moving form left to right. Figure 3 shows the dyadic grid and the corresponding tree at t = 0.1, after pruning cells with small details. Figure 4 shows the corresponding quad-grid and quad-tree. Figure 5 shows the number of leaf cells in both grids for each time step, as a percentage of the number of leaves in the uniform grid. For the second test, we used the signals f (x, y) =  p 1 if (x − 0.5)2 + (y − 0.35)2 < t, 0 otherwise. (10) for t varying between 0.05 and 0.35 in steps of 0.05. Equation (10) describes a step function with a sharp circular front, expanding from the center of the domain. Figure 6 shows the dyadic grid and its tree at t = 0.2, and Figure 7 shows the corresponding quad-grid and its quad-tree. The number of leaves is plotted in Figure 8. Space efficiency. If leaves are explicitly represented in the tree structure, and all nodes have the same fields, then the space E used by the structure is E = (pA + B)n, where n is the number of nodes, p is the number of pointers in each node, A is the size of a pointer in bytes, and B is the size of any additional information stored in each node (such as the detail coefficients dkc ). In all these trees we have n = (pm − 1)/(p − 1) ≈ mp/(p − 1), where m is the number of leaf nodes. From the plots in Figure 5, we see that, in the first test, the quad-grid (p = 4) had about 8 times as many leaf cells as the dyadic grid (p = 2), and therefore about 5 times as many tree nodes, for the same accuracy. Assuming A = 4 and B = 8 bytes, we conclude that the quad-tree used 5(24/16) ≈ 7.5 times as much storage as the dyadic grid. SAMPTA'09 Figure 4: Pruned quad-tree (top) and quad-grid (bottom) for the first signal at t = 0.1.. 5 quad−grid dyadic grid 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 Figure 5: Leaf count in the pruned trees for the first test.. 353 In the second test, the quad-grid had about 1.32 times as many leaf cells as the dyadic grid, and therefore about 0.88 as many tree nodes. With the same A and B, the quad grid still used 0.88(24/16) ≈ 1.32 times as much space as the dyadic grid. Had we used the compact representation of the tree, with omitted leaves, the storage cost would be E = (pA + (p − 1)B)(n − m). The quad-tree would use 7.5 times as much storage as the dyadic tree in the first example, and 1.1 times as much in the second example. 5. Conclusions Our tests show that adaptive dyadic grids are substantially more efficient than quad-grids for the same level of accuracy, both in terms of space needed to store the topology (tree structure) of the grid, and in the number of leaf cells retained — which determines the time cost of most adaptive numeric algorithms. 6. Acknowledgments Figure 6: Pruned dyadic tree (top) and dyadic grid (bottom) for the second signal at t = 0.2.. The authors thank CNPq (grants 06631/07-5, 472402/072, and 142191/06-0) and FAPESP (07/52015-0) for financial support. References: [1] B. L. Bihari and A. Harten. Multiresolution schemes for the numerical solution of 2-d conservation laws. SIAM J. Sci. Comput., 18(2):315–354, 1997. [2] C. G. S. Cardoso, M. C. Cunha, A. Gomide, D. J. Schiozer, and J. Stolfi. Finite elements on dyadic grids with applications. Mathematics and Computers in Simulation, 73:87–104, 2006. [3] A. Cohen. Wavelet Methods in Numerical Analysis. Handbook of Numerical Analysis. in: Ph. Ciarlet and J. L. Lions (Eds.), Handbook of Numerical Analysis, Vol VII, Elsevier, Amsterdam, 2000. [4] M. O. Domingues, S. M. Gomes, O. Roussel, and K. Schneider. An adaptive multiresolution scheme with local time-stepping for evolutionary pdes. Journal of Computational Physics, 227:3758–3780, 2008. [5] A. Harten. Multiresolution representation of cellaveraged data. Technical Report CAM/Report/94-21, UCLA, Los Angeles, US, July 1994. [6] S. Muller. Adaptive Multiscale Schemes for Conservation Laws. Vol. 27 of Lecture Notes in Computational Science and Engineering, Springer, Heidelberg, 2003. Figure 7: Pruned quad-tree (top) and quad-grid (bottom) for the second signal at t = 0.2.. 2.5 quad−grid dyadic grid 2 1.5 1 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Figure 8: Leaf count in the pruned trees for the second test.. SAMPTA'09 354 Adaptive and Ultra-Wideband Sampling via Signal Segmentation and Projection Stephen D. Casey (1) , Brian M. Sadler(2) (1) Department of Mathematics and Statistics, American University, Washington, DC, USA . (2) Army Research Laboratory, Adelphi, MD, USA. scasey@american.edu, bsadler@arl.army.mil Abstract: Adaptive frequency band (AFB) and ultra-wide-band (UWB) systems require either rapidly changing or very high sampling rates. Conventional analog-to-digital devices have non-adaptive and limited dynamic range. We investigate AFB and UWB sampling via a basis projection method. The method decomposes the signal into a basis over time segments via a continuous-time inner product operation and then samples the basis coefficients in parallel. The signal may then be reconstructed from the basis coefficients to recover the signal in the time domain. We develop the procedure of this method, analyze various methods for signal segmentation and close by creating systems designed for binary signals. 1. Introduction Adaptive frequency band (AFB) and ultra-wide-band (UWB) systems, requiring either rapidly changing or very high sampling rates, stress classical sampling approaches. At UWB rates, conventional analog-to-digital devices have limited dynamic range and exhibit undesired nonlinear effects such as timing jitter. Increased sampling speed leads to less accurate devices that have lower precision in numerical representation. This motivates alternative sampling schemes that use mixed-signal approaches, coupling analog processing with parallel sampling, to provide improved sampling accuracy and parallel data streams amenable to lower speed (parallel) digital computation. We investigate AFB and UWB sampling via a basis projection method. The method was introduced as a means of UWB parallel sampling by Hoyos et. al. [7] and applied to UWB communications systems [8, 9, 10]. The method first decomposes the signal into a basis over time segments via a continuous-time inner product operation and then samples the basis coefficients in parallel. The signal may then be reconstructed from the basis coefficients to recover time domain samples, or further processing may be carried out in the new domain [7]. We address several issues associated with the basisexpansion and sampling procedure, including choice of basis, truncation error, rate of convergence and segmentation of the signal. We develop a mathematical model of the procedure, using both standard (sine, cosine) basis elements and general basis elements, and give this rep- SAMPTA'09 resentation in both the time and frequency domains. We compute exact truncation error bounds, and compare the method with traditional sampling. We close by developing the method for binary signals. 2. Sampling via Projection Let f be a signal of finite energy whose Fourier transform fb has compact support, i.e., f, fb ∈ L2 , with supp(fb) ⊂ [−Ω, Ω]. The signal is in the Paley-Wiener class P W (Ω). For a block of time Tc , let X f (t) = f (t)χ[(k)Tc ,(k+1)Tc ] (t) . k∈Z For this original development, we keep Tc fixed. We later let Tc be adaptive and will denote the adaptive time segmentation as τc (t). A given block fk (t) = f (t)χ[(k)Tc ,(k+1)Tc ] (t) can be Tc − periodically continued, getting (fk )◦ (t) = (f (t)χ[(k)T ,(k+1)T ] (t))◦ . c c ◦ Expanding (fk ) (t) in a Fourier series, we get X (2πint/Tc ) ◦ [ , where (fk )◦ (t) = (f k ) [n]e n∈Z 1 ◦ [ (f k ) [n] = Tc Z (k+1)Tc f (t)e(−2πint/Tc ) dt . (k)Tc Given that the original function f is Ω band-limited, we can estimate the value of n for which fk [n] is non-zero. At minimum, fk [n] is non-zero if n ≤ Ω , or equivalently, n ≤ Tc · Ω . Tc Let N = ⌈Tc · Ω⌉ . (Note that the truncated block functionsfk are not bandlimited. We discuss this in section 3.) For this choice of N , we compute f (t) = X f (t)χ[(k)Tc ,(k+1)Tc ] (t) k∈Z  X ◦ (fk ) (t) χ[(k)Tc ,(k+1)Tc ] (t) = k∈Z ≈ X n=N X k∈Z n=−N  (2πint/Tc ) χ ◦ [ (fk ) [n]e [(k)Tc ,(k+1)Tc ] (t) . 355 Given this choice of the standard (sines, cosines) basis, we can, for a fixed value of N , adjust to a large bandwidth Ω by choosing small time blocks Tc . Also, after a given set of time blocks, we can deal with a increase or decrease in bandwidth Ω by again adjusting the time blocks, e.g., given an increase in Ω, decrease the time blocks adaptively to τc (t), and vice versa. There is, of course, a price to be paid. The quality of the signal, as expressed in the accuracy the representation of f , depends on N , Ω and Tc . Theorem : [The Projection Formula] Let f , fb ∈ L2 (R) and f ∈ P WΩ , i.e. supp(fb) ⊂ [−Ω, Ω]. Let Tc be a fixed block of time. Then, for N = ⌈Tc · Ω⌉, f (t) ≈ fP (t), where fP (t) = N X X (2πint/Tc ) fk [n]e k∈Z n=−N  χ[kT c ,(k+1)Tc ] (t). the other two to fluctuate. The easiest and most practical parameter from the design factor to fix is N . For situations in which the bandwidth does not need flexibility, it is possible to fix Ω and Tc by the equation N = ⌈Tc · Ω⌉. However, if greater bandwidth Ω is need, choose shorter time blocks Tc . The Projection Method adapts to general orthonormal systems, much as Kramer-Weiss extends sampling to general orthonormal bases. Given a function f such that f ∈ P WΩ , let Tc be a fixed time block. Define f (t), fk (t) and fk ◦ (t) as in the beginning of the computation above. Now, let {ϕn } be a general orthonormal system for L2 [0, Tc ]. Then, ∞ X fk ◦ (t) = fk [n]ϕn (t), where fk [n] = hfk ◦ , ϕn i. n=−∞ (1) The Projection Method can adapt to changes in the signal. Suppose that the signal f (t) has a band-limit Ω(t) which changes with time. For example, let f be a signal from a cell phone which changes from voice to a highly detailed musical piece. This change effects the time blocking τc (t) and the number of basis elements N (t). This, of course, makes the analysis more complicated, but is at the heart of the advantage the Projection Method has over conventional methods. During a given τc (t), let Ω(t) = sup {Ω(t) : t ∈ τc (t)}. For a signal f that is Ω(t) band-limited, we can estimate the value of n for which fk [n] is non-zero. At minimum, fk [n] is non-zero if n ≤ Ω(t) , or equivalently, n ≤ τc (t) · Ω(t) . τc (t) Since f ∈ P WΩ , there exists N = N (Tc , Ω) such that fk [n] = 0 for all n > N . Therefore, f (t) ≈ fP (t), where fP (t) = ∞  X N X k=−∞ n=−N  fk [n]ϕn (t) χ[kTc ,(k+1)Tc ] (t). (3) Given characteristics of the input class signals, the choice of basis functions used in the the Projection Method can be tailored to optimal representation of the signal or a desired characteristic in the signal. We develop a Walsh system for binary signals in section 4. We close this section with a different system of segmentation for the time domain. This was created because it is relatively easy to implement, cuts down on frequency error and has no loss of data in time. It was developed by studying the de la Vallée-Poussin kernel used in Fourier series. Let 0 < r < Tc /2 and let Let TriL (t) = max{[((Tc /(4r)) + r) − |t|/(2r)], 0} , N (t) = ⌈τc (t) · Ω(t)⌉ . For this choice of N (t), we have the following. Theorem : [The Adaptive Projection Formula] Let f , fb ∈ L2 (R) and f have a variable but bounded bandlimit Ω(t). Let τc (t) be an adaptive block of time and given τc (t), let Ω(t) = sup {Ω(t) : t ∈ τc (t)}. Then, for N (t) = ⌈τc (t) · Ω(t)⌉ , f (t) ≈ fP (t), where fP (t) = (t) X N X k∈Z n=−N (t)  fk [n]e(2πint/τc ) χ[kτc ,(k+1)τc ] (t). (2) In comparison, Shannon Sampling examines the function at specific points, then uses those individual points to recreate the signal. The Projection Method breaks the signal into segments in the time domain and then approximates their respective periodic expansions with a Fourier series. This process allows the system to individually evaluate each piece and base its calculation on the needed bandwidth. The individual Fourier series are then summed, recreating a close approximation of the original signal. It is important to note that instead of fixing Tc , the method allows us to fix any of the three while allowing SAMPTA'09 TriS (t) = max{[((Tc /(4r)) + r − 1) − |t|/(2r)], 0} and Trap(t) = TriL (t) − TriS (t) . The Trap function has perfect overlay in the time domain and 1/ω 2 decay in frequency space. When one time block is ramping down, the adjacent block is ramping up at exactly the same rate. This leads to the Projection formula N X X k∈Z n=−N 3.  ((f ·Trap)k [n]e(2πint/(Tc +r)) Trap(t−k(Tc /2)) . Error Analysis To compute truncation error, we first calculate the Fourier transform of both sides of the equation. Let f ∈ P W (Ω), so f ∈ L2 and Ω band-limited. For N = ⌈Tc · Ω⌉, fP (t) = N X X k∈Z n=−N (2πint/Tc ) fk [n]e  χ[kT c ,(k+1)Tc ] (t) 356 Taking the transform of both sides and evoking the relationship between the transform and convolution gives  N    X X (2πint/Tc ) b c fP (ω) = (ω) ∗ fk [n] e k∈Z n=−N  χ[kT   (t) b(ω) c ,(k+1)Tc ] Performing the indicated transforms using the definition results in   N X X n c fk [n] δ(ω − ) ∗ fP (ω) = Tc k∈Z n=−N  1 sin(πTc ω) e(2πi(k− 2 )Tc ω) πω It is important to note that f · χ[kTc ,(k+1)Tc ] is no longer band-limited, but it does decay at a rate less than or equal to ω1 in frequency. Using the relationship between translation and modulation, we get the following. Theorem : [The Fourier Transform of the Projection Formula] Let f , fb ∈ L2 (R) and f ∈ P WΩ , i.e. supp(fb) ⊂ [−Ω, Ω]. Let Tc be a fixed block of time. Then, for N = ⌈Tc · Ω⌉, ∞  X N X 1 n fc (ω) = fk [n]e(2πi(k− 2 )Tc (ω− Tc ) P k=−∞ n=−N n c sin(π( ωT 2 − 2 )) π(ω − Tnc ) ! This replaces the sinc term in the equation above. The Fourier coefficients are also different, and are computed in the same method as the de la Vallée-Poussin kernel used in Fourier series. In the formula for the Projection Method, there is a reliance on a number N , representative of the number of Fourier series components. In order to ensure maximum utility from the formula, the difference between the infinitely summed series and the truncated must be made a minimum. To do this, the mean square error must be calculated. We compute this as a truncation error on the number of Fourier coefficients used to represent a given block fk . For a fixed N , the mean square error is Computing and then simplifying gives Z (k+1)Tc X 1 |fk ◦ (t) − fk [n]e(2πitn/Tc ) |2 dt e2N = Tc kTC |n|≤N = 1 Tc Z (k+1)Tc kTC SAMPTA'09 | X |n|>N fk [n]e(2πitn/Tc ) |2 dt . |n|>N ≤ X |n|>N 1 |fk [n]| · Tc 2 Z (k+1)Tc kTc 12 dt = X |fk [n]|2 |n|>N This demonstrates that the value of N has to be chosen carefully. This truncation error perpetuates over all the blocks. The Projection Method experiences error due to truncation in two separate categories: time and frequency. The error in frequency is a function of the errors on each block due to the choice of N . By duality, this gives us errors in time. We can also get an error in time by loss of a given block or blocks of information. This is easier to compute. Given any lost or partially transmitted block fk,L , error is simply kfk − fk,L k2 . Error over the entire signal is computed by simply adding up the blocks. Cell phone users are used to lost information blocks, which gives rise to the following frequently used phrase – “Can you hear me now?” 4. Binary Signals (4) The system using overlapping Trap functions has the advantage of 1/ω 2 decay in frequency. Let βL = p pTc /(4r) + r, αL = Tc /(4r) + r/2, βS = Tc /(4r) + r − 1, αS = Tc /(4r) − r/2. The Fourier transform of Trap is  2  2 sin(2παL (ω) sin(2παS (ω) (βL ) − (βS ) . π(ω) π(ω) 2 e2N = kfk − fk,N k22 = kfbk − fd k,N k2 . Applying the triangle inequality to the right side and then exploiting the fact that e(2πitn/Tc ) is an orthonormal system, thus |e(2πitn/Tc ) | = 1, we arrive at the following: Z (k+1)Tc X 1 | fk [n]e(2πitn/Tc ) |2 dt (5) e2N = Tc kTC The Walsh functions {ωn } form an orthonormal basis for L2 [0, 1]. The basis functions have the range {1, −1}, with values determined by a dyadic decomposition of the interval. The Walsh functions are of modulus 1 everywhere. The functions are give by the rows of the unnormalized Hadamard matrices, which are generated recursively by   1 1 H(2) = 1 −1   H(2k ) H(2k ) (k+1) k . ) = H(2) ⊗ H(2 ) = H(2 H(2k ) −H(2k ) We point out that although the rows of the Hadamard matrices give the Walsh functions, the elements have to be reordered into sequency order. Walsh arranged the components in ascending order of zero crossings (see [1]). The Walsh functions can also be interpreted as the characters of the group G of sequences over Z2 , i.e., G = (Z2 )N . The Walsh basis is a well-developed system for the study of a wide variety of signals, including binary. The Projection Method works with the Walsh system to create a wavelet-like system to do signal analysis. First assume that the time domain is covered by a uniform block tiling χ[kTc ,(k+1)Tc ] (t). Translate and scale the function on this kth interval back to [0, 1] by a linear mapping. Denote the resultant mapping as fk , which is an element of L2 [0, 1]. Given that f ∈ P W (Ω), there exists an N > 0 (N = N (Ω)) such that hfk , ωn i = 0 for all n > N . The decomposition of fk into Walsh basis elements is N X hfk , ωn i ωn . n=0 357 Translating and summing up gives the Projection representation fP fP (t) = N X X k∈Z n=0  hfk , ωn i ωn χ[kTc ,(k+1)Tc ] (t). fP (t) = (6) Next assume that the time domain is covered by a uniform overlapping trapezoidal tiling Trap(t − k(Tc /2)). Note that the construction of the trapezoidal system results in the loss of no signal data, for just as a given block is ramping down, the subsequent block is ramping up at exactly the same rate. Again translate and scale the function on this kth interval back to [0, 1] by a linear mapping. Denote the resultant mapping as fkT . The resultant function is an element of L2 [0, 1]. Given that f ∈ P W (Ω), there exists an M > 0 (M = M (Ω)) such that hfkT , ωn i = 0 for all n > M . The decomposition of fkT into Walsh basis elements is M X hfk , ωn i ωn . n=0 Translating and summing up gives the Projection representation fPT fPT (t) = N X X k∈Z n=0 5.  hfkT , ωn i ωn Trap(t−k(Tc /2)). (7) Conclusions The Projection Method gives a method for analog-todigital encoding which is an alternative to Shannon Sampling. Projection gives a procedure for the sampling of a signal of variable or ultra-wide bandwidth Ω by varying the time blocks Tc . If f is Ω band-limited, we can estimate the value of n for which the Fourier coefficients fk [n] of a given time block are non-zero. At minimum, fk [n] is non-zero if Tnc ≤ Ω, or equivalently, n ≤ Tc · Ω. If N = ⌈Tc · Ω⌉, then, f (t) ≈ fP (t), where fP (t) = N X X k∈Z n=−N (2πint/Tc ) fk [n]e  χ[kT c ,(k+1)Tc ] (t). For fixed N , if greater bandwidth Ω is need, choose shorter time blocks Tc . The price paid for this flexibility is in signal error, which has been computed above. The Projection Method can also adapt to changes in the signal, e.g., f (t) has a band-limit Ω(t) which changes with time. This change effects the time blocking τc (t) and the number of basis elements N (t). During a given τc (t), let Ω(t) = sup {Ω(t) : t ∈ τc (t)}. For a signal f that is Ω(t) band-limited, we can estimate the value of n for which fk [n] is non-zero. At minimum, fk [n] is non-zero if n ≤ Ω(t) , or equivalently, n ≤ τc (t) · Ω(t) . τc (t) We let N (t) = ⌈τc (t) · Ω(t)⌉ , SAMPTA'09 and have (t) X X N (2πint/τc ) fk [n]e k∈Z n=−N (t)  χ[kτ c ,(k+1)τc ] (t). This adaptable time segmentation makes the analysis more complicated, but is at the heart of the advantage the Projection Method has over conventional methods. Subsequent work on this method will focus on minimizing error, creating systems based on the Projection Method tailored to different types of signals and optimizing signal reconstruction in a noisy environment. References: [1] Beauchamp, K. G., Applications of Walsh and Related Functions, Academic Press, London, 1984. [2] Benedetto, J. J., Harmonic Analysis and Applications, CRC Press, Boca Raton, FL, 1997. [3] Casey, S. D., “Sampling and reconstruction on unions of non-commensurate lattices via complex interpolation theory,” 1999 International Workshop on Sampling Theory and Applications, 48– 53, 1999. [4] Casey, S. D., and Sadler, B. M., “New directions in sampling and multi-rate A-D conversion via number theoretic methods,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2000), 3, 1417–1420, 2000. [5] Casey, S. D., and Walnut, D. F., “Residue and sampling techniques in deconvolution,” Chapter 9 in Modern Sampling Theory: Mathematics and Applications, Birkhauser Research Monographs, ed. by P. Ferreira and J. Benedetto, 193–217, Birkhauser, Boston, 2001. [6] Casey, S. D., “Two Problems from Industry and Their Solutions via Harmonic and Complex Analysis, to appear in The Journal of Applied Functional Analysis, 31 pp., 2009. [7] Hoyos, S., and Sadler, B. M. “Ultra wideband analog-to-digital conversion via signal expansion,” IEEE Transactions on Vehicular Technology, Invited Special Section on UWB Wireless Communications, vol. 54, no. 5, pp. 1609–1622, September 2005. [8] Hoyos, S., Sadler, B. .M., and Arce, G., “Broadband multicarrier communication receiver based on analog to digital conversion in the frequency domain,” IEEE Transactions on Wireless Communications, vol. 5, no. 3, pp. 652–661, March 2006. [9] Hoyos, S., and Sadler, B. .M. “Frequency domain implementation of the transmitted-reference ultrawideband receiver,” IEEE Transactions on Microwave Theory and Techniques, Special Issue on Ultra-Wideband, vol. 54, no. 4, Part II, pp. 1745– 1753, April 2006. [10] Hoyos, S., and Sadler, B. M. “UWB mixed-signal transform-domain direct-sequence receiver,” IEEE Transactions on Wireless Communications, vol. 6, no. 8, pp. 3038-3046, August 2007. 358 Non-Uniform Sampling Methods for MRI Steven Troxler (1) Arizona State University, Tempe AZ 85287-1804 USA. Steven.Troxler@asu.edu 1. Introduction Simple Cartesian scans, which collect Fourier transform data on a uniformly-spaced grid in the frequency domain, are by far the most common in MRI. But non-Cartesian trajectories such as spirals and radial scans have become popular for their speed and for other benefits, like making motion-correction easier [12]. A major problem in such scans, however, is reconstructing from nonuniform data, which cannot be performed by a standard fast Fourier transform (FFT) as in the Cartesian case. Here, we briefly describe the most common reconstruction methods and the non-uniform fast Fourier transform (NFFT) needed to complete the computations quickly. We then give an overview of several current methods for choosing a density compensation function (DCF) and suggest some possible improvements. 2. Reconstruction Methods The most common method for nonuniform reconstruction in MRI is the Riemann approach, which approximates the integral difining the inverse (continuous) Fourier transform using a Riemann sum fw (x) = J X wj fˆ(ξ j )e2πiξj ·x , (1) j=1 where x ∈ ZdN are the pixel locations and ξ j , j = 1, ..., J, are the frequency locations at which we measure the Fourier transform (we assume J ≥ N d ). As the subscript w suggests, this approach requires finding appropriate weights wj for each sample point in the reconstruction, a major theoretical problem. An alternative method, called implicit discretization (ID), assumes that the image itself is a sum of evenly spaced delta impulses at the pixel points of the final image, so that its Fourier transform is a finitedimensional, harmonic trigonometric polynomial. We can then find a least-squares solution to the resulting system of equations X (2) f (x)e−2πix·ξj fˆ(ξ j ) = x∈Zd N This model, which is known to have negligible error (the model error is the Gibb’s error that would appear in a SAMPTA'09 Cartesian reconstruction), has the important advantage of not depending on our arbitrary choice of weights. These two approaches can be described in terms of matrix algebra as follows: Let G be a J × N d matrix given by Gj,x = e−2πiξj ·x . Then we see immediately that f w = G∗ W f˜, (3) where f w is the N d × 1 vector, indexed by ZdN , whose xth entry is fw (x), f˜ is the J × 1 vector of measurements whose jth entry is fˆ(ξ j ), and W is the N d × N d diagonal matrix with diagonal equal to w. Once we have w, whose determination is the main problem of interest, the remaining issue is one of computational complexity, since G∗ is a very large unstructured matrix. Fortunately, there is a fast method for computing products called the nonuniform fast Fourier transform (NFFT), based on the approximate factorization G ≈ C φ F Dφ , (4) where C φ is a sparse, banded N d × J matrix of convolution interpolation coefficients which depends on our choice of convolution kernel φ, F is the uniform M d ×M d DFT matrix for some M > N, products of which are rapidly computed via the FFT, and D φ is an M d × N d modified diagonal deconvolution matrix, also depending on φ, whose extra rows are zero. Since it is easy to compute products with all three factors, this algorithm can be used to quickly approximate matrix products involving either G or G∗ . The theory of the NFFT, as applied to MRI, was first laid out in [11] and [8]. Later, [4] found bounds on the errors for Gaussian interpolation, and [23] and [5] gave general estimates and gave sharper bounds for Gaussian kernels. The most complete discussion of NFFT theory is given in [16], while [15] presents many of the proofs. Practical considerations like computational load and numerical stability were addressed in [3] and [17], while [1] and [6] presented two methods of efficient interpolation using Kaiser-Bessel and Gaussian kernels. In matrix form, the ID problem attempts to find a leastsquares solution to the problem f˜ = Gf . 359 The ordinary least squares solution f OLS satisfies the normal equation (5) G∗ Gf OLS = G∗ fˆ. Although the matrix G∗ G is far too large to invert, it is symmetric, so we may use iterative methods like conjugate gradients to find the solution. The resulting solution typically has excellent quality, but convergence is often slow, making ordinary least squares expensive. Conjugate gradients converges fastest when G∗ G is close to the identity, which is unfortunately rarely the case unless the sampling density is reasonably close to unity. In order to improve the convergence of conjugate gradients, we introduce the weighted least squares problem, which finds the least squares solution to W 1/2 f˜ = W 1/2 Gf by solving the normal equations G∗ W bvecGf W LS = G∗ W fˆ, where W is the modified diagonal density compensation matrix used for the Riemann method. We expect an improvement in convergence because we know that the Riemann method gives much better results with W than without, which means G∗ W G approximate the identity much better than G∗ G. From a signal processing perspective, this has the additional benefit that we weight errors heavier at highly isolated observations of the Fourier transform, which heuristically contain more information about the objective function than less isolated observations. For either method, then, determining an appropriate value of w is important. It is more essential in the Riemann approach, where a poor choice of w will lead to useless results. The ID method is known to converge quite well after only a few iterations, even when a very rough approximation to w is used, but the better w, the fewer iterations are required. It is worth noting that the first iteration, which always moves in the direction of the residual, is actually just a rescaling of the Riemann solution. 3. Determination of an optimal DCF 3.1 Algebraic and Analytic Approaches Since the equation The weighted conjugate gradient method described at the end of the previous section, whose first iteration performs best when the matrix is as close to the identity as possible, leads to a similar but slightly simpler condition, that G∗ W G ≈ I, in the sense that the eigenvalues of G∗ W G be as closely clustered as possible. Several techniques have been proposed to use these conditions to find an algebraically ideal DCF via use of a singular value decomposition or some similar approach [22], [20]. These methods, however, tend to have high computational complexity. This is a problem if the same trajectory is not always used, as is the case in many MRI applications in which iterative reconstruction is used to compensate for field inhomogeneities and other measurement imperfections. Moreover, although such algebraic methods generally give workable results, other methods which take analytic considerations into account often perform better empirically. Possible reasons why the theoretically optimal algebraic solutions fail to give the best results include numerical instability and illconditioning. In some cases, the algebraic approaches even result in DCF’s with negative weights at some points. This contradicts our intuition, and empirical studies indicate that such DCF’s tend to perform relatively poorly. The simplest analytic approaches to determining w are based on the fact that the goal of the Riemann method is to approximate a Riemann sum. For radial and analytic spiral trajectories, which may be smoothly parameterized, methods have been proposed which use the Jacobian of a change-of-coordinates [10], [7]. These techniques give very good results for certain spirals, although for radial trajectories they tend to underweight points near the center. An alternative analytic method, which works for arbitrary nonuniform sampling schemes, is to construct a Voronoi diagram, which partitions the sampled part of frequency space into polygons about each sample point, and weight the samples according to the area or volume of those polygons [19]. This typically results in a good image for radial trajectories. With other trajectories, the results are generally inferior to alternative point-spreadfunction methods, although it was demonstrated in [9] that performing a few iterations of the weighted conjugate gradient method using Voronoi weights produces an excellent image. f˜ = Gf , used directly in the CG reconstruction, provides an accurate mathematical model for the measurements which does not depend on the choice of a sampling density w, the clearest method of evaluating a DCF w is to require that f˜ ≈ Gf w , where f w = G∗ W f˜. This is the same as requiring that ∗ G W approximate the pseudoinverse (G∗ G)−1 G∗ of G. SAMPTA'09 3.2 The Point Spread Function Most of the best-performing methods for determining the DCF when the trajectory is anything other than an analytic spiral are based on analysis of the point-spread-function (PSF). The PSF is defined as the inverse Fourier transform w̌ of the DCF, where we P view the DCF as a distribution on Rd defined by w := j wj δξj . The PSF w̌ is then given by J X w̌(x) = (6) wj e2πix·ξj . j=1 This is what the algorithm would produce if the true object were a delta impulse located at zero. The observed data would be a vector of all ones, so the reconstruction would 360 be the result of applying G∗ to w itself, i.e., the function defined by (6). If f is a more general object, it follows from the convolution theorem (for distributions) that the reconstructed function fw will be equal to the convolution f ∗ w̌ of the actual object f with the PSF. The more closely the PSF resembles a delta impulse, the better the reconstruction. It is important to note that, since the PSF is a (nonharmonic) trigonometric polynomial, it will not decay at infinity. Clearly, then, the best that we can hope for is that w̌ will resemble a delta impulse in some compact neighborhood of the origin. Recall that, by accepting the ID model as having negligible error, we are assuming that f is a finite-dimensional vector defined on ZdN which we associate with a distribution supported on ZdN for notational convenience when dealing with convolutions. Since the terms w̌(z)f (x − z) defining X fw (x) = w̌ ∗ f (x) = w̌(z)f (x − z) (7) z∈Zd ZdN , are nonzero only if (x − z) ∈ and we only want to find the reconstruction fw (x) for x ∈ ZdN , we conclude that the only values of z for which w̌(z) matter are z ∈ Zd2N . It is also worth noting that not all points z ∈ Zd2N appear equally often in the convolution defining f W . The origin will appear in one term of every sum, whereas values of z near the edge of Zd2N will appear only occasionally. For notational convenience, let A be the field of view [−N/2, N/2]d and let B be the region of optimization [−N, N ]d . PSF optimization techniques find some computational way of minimizing the error E = w̌ − δ over this region of optimization B. By carefully looking at (7), we see that the frequency with which a PSF error at x actually occurs in the final image is proportional to p = χA∗χA. Since errors are unavoidable and we would like to minimize the important errors, we introduce the weighted error, given by E = pw̌ − δ, (8) where p is called the error profile. This error can be expressed in the Fourier domain as Ê = p̂ ∗ w − 1 = χ̂2A ∗ w − 1. (9) Our goal is to minimize these errors, thereby minimizing the error in the final reconstruction fw = w̌ ∗ f. Although this optimal kernel p was suggest only recently in [13], convolution techniques for minimizing the Fourier domain PSF error Ê have been used for some time. In one of the early gridding papers, Jackson et. al. proposed taking w to be equal to w1 = w0 , φ ∗ w0 (10) where w0 is a DCF of unity (in distributional form) and φ is the gridding kernel [8]. This method predates PSF SAMPTA'09 techniques, and was instead motivated by the intuitive idea that φ ∗ w0 would give a reasonable, estimate of the sampling density. Later researchers noted, however, that if we φ with p̂, we would expect this ratio correction to make w1 ∗ p̂ closer to unity than w0 ∗ p̂ regardless of the initial density w0 [14]. An iterative technique, based on this observation, starts with a constant DCF w0 and takes wi+1 = wi . p̂ ∗ wi (11) Since p̂ can be effectivly truncated, each iteration can be computed quickly, particularly if an efficient sorting algorithm is used to avoid time-consuming searches for the nonzero terms p̂(ξk − ξ j )w(ξ j ) in the convolution [13]. Another iterative algorithm, aimed at the same goal of achieving Ê = 1, uses an additive correction instead of a ratio-based correction, taking wi = wi−1 + σ(1 − p̂ ∗ wi−1 ), where σ ∈ (0, 1) is a parameter controlling convergence [18]. Taking σ close to 1 may result in the fastest convergence, but could also lead to instability and a failure to converge. The advantage of these iterative techniques is that they are conceptually simple, computationally fast, and empirically give results as good as any current methods when the correct error profile p is used and the number of iterations is determined experimentally. A disadvantage is that, although they work conceptually and empirically, there is no theoretical basis for claiming that they converge to the optimal solution, and, in fact, experimental evidence indicates that the mean square error in fw can actually rises if the algorithm is allowed to run too long. This may be due to numerical instability, or to a failure of the mathematical algorithm itself to technically converge. An algebraic method of optimizing the PSF, which has more theoretical grounding than convolution-based methods, attempts to directly solve the inverse problem GG∗ w = u, where u is a vector of all ones. The direct solution to this problem via conjugate gradients using the NFFT was proposed in [21], but as with the algebraic solutions for w based on the least-squares method, this can result in a w with wide variations and sometimes even negative entries, which does not match our expectation for a density and empirically gives inferior results. A regularization of this method was proposed in [2] which instead attempts to solve (GG∗ + σ 2 I)w = u + σ 2 w1 , where w1 is an initial nonnegative and smoothly varying estimate of the density, say, Jackson’s weight (10), or, more optimally, the result of one or two iterations of (11). This second approach ensures that the solution behaves as we would expect a DCF to behave, and empirically gives better results than the unregularized method. The algorithm given in [2] also incorporates Jacobi preconditioning to speed convergence of the conjugate gradient iterations. Knowing that Pipe and Johnson’s error profile p provides an optimal weight on errors in the point-spread function, 361 it might be preferable to modify the approach in [2] in two ways. The first is that, since we need to minimize PSF errors over twice the support of f, we replace the NDFT matrix G with G1 , where the uniform grid has twice the radius of that used by G. This avoids the risk that we might ignore PSF errors which, according to the convolution defining f w , appear in the Riemann reconstruction. The second is replacing G1 G∗1 , which treats all PSF errors as equally important, with G1 P G∗1 , where P contains the values of the optimal error profile p. To the author’s knowledge, this has never been tried, but in light of experiments reported by [13] indicating that the approaches taken in [2] and [13] both yield the some of the best results of methods proposed to date for arbitrary trajectories, combining their methods might produce the best results seen yet. 4. Acknowledgments This work was part of an undergraduate research project under the supervision of Dr. Svetlana Roudenko. The author would also like to thank Ken Johnson and Dr. Jim Pipe for providing graphical illustrations and for helpful discussions of this content, as well as Dr. Doug Cochran for his assistance and input. The project was partially supported by NSF-DUE # 0633033 and NSF-DMS # 0652853. References: [1] Philip J. Beatty, Dwight G. Nishimura, and John M. Pauly. Rapid gridding reconstruction with a minimal oversampling ratio. IEEE Trans. Med. Imag., 24(6):799–808, June 2005. [2] Mark Bydder, Alexey A. Samsonov, and Jiang Du. Evaluation of optimal density weighting for regridding. Magnetic Resonance Imaging, 25:695–702, 2007. [3] S. Dunis and D. Potts. Time and memory requirements of the nonequispaced fft. Sampling Theory in Signal and Image Processing, 7:77–100, 2008. [4] A. Dutt and V. Rokhlin. Fast fourier transforms for nonequispaced data. SIAM Journal of Scientific Computing, 14(6):1368–1393, 1993. [5] B. Elbel and G. Steidl. Fast fourier transform for nonequispaced data. In C. K. Chui and L. L. Schumaker, editors, Approximation Theory IX. Vanderbilt University Press, Nashville, 1998. [6] Leslie Greengard and June-Yub Lee. Accelerating the nonuniform fast fourier transform. SIAM Review, 46(3):443–454, 2004. [7] R.D Hodge, R.K.S. Kwan, and G.B. Pike. Density compensation functions for spiral MRI. Magnetic Resonance in Medicine, 38:117–128, 1997. [8] J. Jackson, C. Meyer, D. Nishimura, and A. Macovski. Selection of a convolution function for fourier enversion using gridding. IEEE Trans. Med. Imag., 10(3):473–478, Sep. 1991. SAMPTA'09 [9] Tobias Knopp, Stefan Kunis, and Daniel Potts. A note on the iterative mri reconstruction from nonuniform k-space data. International Journal of Biomedical Imaging, 2007. [10] C. Meyer, B. S. Hu, D. Nishimura, and A. Macovski. Fast spiral coronary artery imaging. Magnetic Resonance in Medicine, 28:202–213, 1992. [11] J. O’sullivan. A fast sinc function gridding algorithm for fourier inversion in computerized tomography. IEEE Transactions on Medical Imaging, MI-4:200– 207, 1985. [12] James G. Pipe. [13] J.G. Pipe and Kenneth Johnson. Convolution kernel design and efficient algorithm for sampling density correction. Preprint, 2008. [14] J.G. Pipe and P. Menon. Sampling density compensation in MRI: Rationale and an iterative numerical solution. Magnetic Resonance in Medicine, 41:799– 808, June 2005. [15] D. Potts. Schnelle Fourier-Transformationen für nichtäquidistante Daten und Anwendungen. Habilitation, Universität zu Lübeck, 2003. [16] D. Potts, G. Steidl, and M. Tasche. Fast fourier transforms for nonequispaced data: A tutorial. In J. J. Benedetto and P. J. S. G. Ferreira, editors, Modern Sampling Theory: Mathematics and Applications, pages 247–270. Birkhäuser, Boston, 2001. [17] D. Potts and M Tasche. Numerical stability of nonequispaced fast fourier transforms. Journal of Computational Applied Mathematics, 222:655–674, 2008. [18] Y. Qian, J. Lin, and D. Jin. Reconstruction of mr images from data acquired on an arbitrary k-space trajectory using the same-image weight. Magnetic Resonance in Medicine, 48:306–311, 2002. [19] V. Rasche, R. Proksa, R. Sinkus, P. Bornert, and H. Eggers. Resampling of data between arbitrary grids using convolution interpolation. IEEE Trans. Med. Imag., 18:427–434, 1999. [20] D. Rosenfeld. An optimal and efficient new gridding algorithm using singular value decomposition. Magnetic Resonance in Medicine, 40:14–23, 1998. [21] A.A. Samsonov, E.G. Kholmovski, and C.R. Johnson. Determination of the sampling density compensation function using a point spread function modeling approach and gridding approximation. volume 11, 2003. [22] Hossein Sedarat and Dwight G. Nishimura. On the optimality of the gridding reconstruction algorithm. IEEE Transactions on Medical Imaging, 19(4):306– 317, 2000. [23] G. Steidl. A note on fast fourier transforms for nonequispaced grids. Advanced Computational Mathematics, 9:337–353, 1998. 362 On approximation properties of sampling operators defined by dilated kernels Andi Kivinukk (1) and Gert Tamberg (2) (1) Dept. of Mathematics, Tallinn University, Narva Road 25, 10120 Tallinn, Estonia. (2) Dept. of Mathematics, Tallinn University of Technology, Ehitajate tee 5 19086 Tallinn, Estonia. andik@tlu.ee, gert.tamberg@mail.ee Abstract: In this paper we consider some generalized Shannon sampling operators, which are defined by band-limited kernels. In particular, we use dilated versions of some previously known kernels. We give also some examples of using sampling operators with dilated kernels in imaging applications. 1. Introduction For the uniformly continuous and bounded functions f ∈ C(R) the generalized sampling series with a kernel function s ∈ L1 (R) are given by (t ∈ R; W > 0) ∞ X (SW f )(t) := k=−∞ where ∞ X k f ( )s(W t − k) W s(u − k) = 1, [16]), in Signal Analysis in particular. Many kernels can be defined by (3), e.g. 1) λ(u) = 1 defines the sinc function; 2) λ(u) = 1 − u defines the Fejér kernel (cf. [15]) sF (t) = 3) λH (u) := cos2 kernel (see [7]) sH (t) := |s(u − k)| < ∞ sC,b (t) := (2) (4) m  1X  bj sinc(t − j) + sinc(t + j) 2 j=0 (5) provided m ⌊X 2 ⌋ (u ∈ R). λ(u) cos(πtu) du = r π ∧ λ (πt). 2 0 (3) These types of kernels arise in conjunction with window functions widely used in applications (e.g. [1], [2], [11], SAMPTA'09 bj cos jπu defines the Blackman-Harris kernel (see [9]) If the kernel function is s(t) = sinc (t) := sinπtπt , we get the classical (Whittaker-Kotel’nikov-)Shannon operasinc tor SW . The idea to replace the sinc kernel (sinc (·) 6∈ 1 L (R)) by another kernel function s ∈ L1 (R) appeared first in [15], where the case s(t) = (sinc (t))2 was considered. A systematic study of sampling operators (1) for arbitrary kernel functions s was initiated at RWTH Aachen by P. L. Butzer and his students since 1977 (see [3], [4], [14] and references cited there). In this paper we consider the generalized sampling series with even band-limited kernels s, defined as the Fourier transform of an even window function λ ∈ C[−1,1] , λ(0) = 1, λ(u) = 0 (|u| > 1) by the equality s(t) := s(λ; t) := m X j=0 (1) k=−∞ Z1 1 sinc t = O(|t|−3 ); 2 1 − t2 4) the general cosine window and their operator norms are kSW k = = 12 (1 + cos πu) defines the Hann λC,b (u) := k=−∞ ∞ X πu 2 1 t sinc 2 = O(|t|−2 ); 2 2 b2j = j=0 ⌊ m+1 2 ⌋ X b2j−1 = j=1 1 . 2 (6) From approximation theory point of view at least two problems for the generalized sampling operators SW : C(R) → C(R) have some interest: 1) to calculate the operator norms kSW k = sup u∈R ∞ X |s(u − k)|; (7) k=−∞ 2) to estimate the order of approximation kf − SW f kC 6 M ωk (f, 1 ) W (8) in terms of the k-th modulus of smoothness ωk (f, δ). 2. Interpolating generalized sampling operators with dilated kernels Let us consider the dilated kernel sα (t) = αs(αt). The Shannon operators with sinc kernel satisfy the interpolatory conditions sinc (SW )( k k ) = f ( ) (k ∈ Z). W W (9) 363 When we replace the sinc kernel with a band-limited one (3), we may lose the interpolatory property (9), but using the dilated kernel s̃(t) = 2s(2t), we can recover the interpolatory property. If s ∈ Bπ1 , then sα ∈ Bα1 π , and the condition (2) is valid for 0 < α 6 2, therefore we get ∞ the sampling operator SW,α : C(R) → BαπW ⊂ C(R). p Here Bσ stands for the Bernstein class consisting of those bounded functions f ∈ Lp (R) (1 6 p 6 ∞) which can be extended to an entire function f (z) (z ∈ C) of exponential type σ. Using the Nikolskii inequality [13], we get the bounds for the operator norm. ∞ Theorem 1. Let the operators SW : C(R) → BW π ⊂ ∞ C(R), SW,α : C(R) → BαW π ⊂ C(R) are defined by (1) with kernels s and sα , respectively. Then ksk1 6 kSW,α k 6 (1 + απ)kSW k (0 < α 6 2). The order of approximation by operators SW,α we can estimate via modulus of smoothness ωk (f, σ). Next theorem generalizes slightly the result in [10] (Th. 1.3). Theorem 2. Let SW : C(R) → C(R), SW,α : C(R) → ∞ BαW π ⊂ C(R) be sampling operators defined by (1) with kernel functions s ∈ Bπ1 , sα ∈ Bα1 π , respectively. 1) If 0 < α 6 1, then there exist positive constants C1,α and C2,α such that C1,α kSαW f −f kC 6 kSW,α f −f kC 6 C2,α kSαW f −f kC . 2) Moreover, if 0 < α < 2, then kSW f − f kC 6 Mk ωk (f, 1 ), W (10) implies Example. The Blackman-Harris sampling operator CW,b is defined by the window function bj cos(πju). j=0 In [9] we proved that for some values of the parameters b = (b0 , b1 , . . . , bm ) ∈ Rm+1 we can estimate the order ∞ of approximation by operators CW,b : C(R) → BW π ⊂ 1 C(R) via the modulus of continuity ω2ℓ (f, W ) (ℓ 6 m). More precisely (see [9], Th. 3), let ℓ, 1 6 ℓ 6 m, be fixed. If for every k = 0, . . . , ℓ − 1 m X j 2k bj = 0 (00 = 1), (11) j=0 then 1 ). (12) W Now by Theorem 2 we obtain for the corresponding di∞ lated sampling operator CW,b;α : C(R) → BαW π ⊂ C(R) with 0 < α < 2 the estimate kf − CW,b f kC 6 Mb,ℓ ω2ℓ (f, kf − CW,b;α f kC 6 Mb,ℓ,α ω2ℓ (f, SAMPTA'09 1 ). W ∞ then S̃W : C(R) → B2W π ⊂ C(R) is an interpolating sampling operator. Examples. For the Hann window function λH (u) the condition (15) holds and we get the interpolating Hann sam∞ pling operator H̃W : C(R) → B2W π ⊂ C(R). Taking b0 = 1/2, b2j = 0(j ∈ N) in (11) gives us the BlackmanHarris window function for which the condition (15) is fullfilled (see [10]). 1 In the case when s ∈ Bβπ , 0 < β < 1 and (15) holds for the corresponding window function we can prove the following theorem. Theorem 4. Let the sampling operator S̃W be defined by (1) using the kernel s̃(t) := 2s(2t), where the kernel s ∈ 1 ⊂ L1 (R), 0 < β < 1, is generated by (3) with a Bβπ window function λ. If (15) is valid, then for every k ∈ N there exist a constant Mk such that 1 ). W Example. So-called Lanczos n-kernels for some constant Mk,α > 0. λC,b := Theorem 3. Let the sampling operator S̃W be defined by (1) using the kernel s̃(t) := 2s(2t), where the kernel s ∈ Bπ1 ⊂ L1 (R) is generated by (3) with a window function λ. If λ(u) + λ(1 − u) = 1 (u ∈ [0, 1]) (15) kS̃W f − f kC 6 Mk ωk (f, 1 kSW,α f − f kC 6 Mk,α ωk (f, ) W m X The case m = ℓ = 1 gives the Hann sampling operator HW : C(R) → C(R), which often has been used in practise. For the corresponding dilated operator ∞ HW,α : C(R) → BαW π ⊂ C(R) for 0 < α < 2 we obtain 1 kf − HW,α f kC 6 Mα ω2 (f, ). (14) W See Figure 2 for corresponding kernels. The next theorem gives hints how to construct the interpolating sampling series. (13) s̃L,n (t) := sinc t sinc t, n which has been often used in image processing. The Lanczos 3-kernel is especially popular in imaging ((see [16] and references cited there). They are defined by De la Vallée Poussin window function  1, 0 6 u 6 n−1  2n , 1 n−1 (1 + n(1 − 2u)), 2n < u < n+1 λL,n (u) := 2 2n ,  0, u > n+1 2n . If n > 1, then the De la Vallée Poussin window function λL,n satisfies the conditions (15) and s̃L,n ∈ B(1n+1 )π , 2n hence Theorem 4 is applicable. If n = 1, then we get the Fejér sampling operator (cf. [15]), for which we do not have even an estimate via the modulus of continuity ω1 . 3. Applications in 2D imaging A natural application of sampling operators with dilated kernels is imaging. We can represent an discrete 2D image f as a continuous function using sampling series X f (j, k)s1 (x − j)s2 (y − k). (16) (Sf )(x, y) := j,k 364 Figure 1: Original image, derivatives with Hann kernel s̃H (t) = 2sH (2t) and sH,1/4 (t) = 12 sH ( 14 t) (ϕ = Many image resizing (resampling) algorithms use such type of representation (see [16], [12], [6]). If the image data is exact, then we can take interpolating kernels s1 and s2 , like interpolating Hann, Blackman-Harris or Lanczos, and enlarge (up-sample) image, having (Sf )(j, k) = f (j, k). If we want to reduce the image size (downsample) (magnification γ < 1) then, for eliminating artifacts, we can choose a dilated kernel sα with in some sense optimal value of α = 2γ (see Figure 2). The artifacts in down-sampled images appear, because details that are resized to smaller than one pixel will be misrepresented by larger aliases (see [5], [6]). Depending on the choice of ∞ the parameter value α we have SW,α : C(R) → BαW π i.e. a function belonging to a class for bandlimited functions, for which the Fourier’ transform vanishes outside of the interval [−αW π, αW π]. This approach eliminates higher spatial frequencies, being equivalent to the use of low-pass filter. Also in the case, when the resolution of the optical system is less than the resolution of the sensor, we can choose the value of the dilation parameter α accordingly. Using the representation (16) we can apply different imaging technics. For image enhancement we can use the unsharp masking (see [5], [6]), i.e. to subtract a blurred version of an image from the image itself. For the representation of original image f (x, y) we can choose in (16) the interpolating kernels (dilation by α = 2), but to get blurred version fb (x, y), we choose in (16) the dilated kernels with small parameter α, like sH,1/2 in Figure 2. We can control the amount of unsharp masking choosing the parameter a < 0: fusm (x, y) = (1 − a)f (x, y) + afb (x, y). Another well-known image enhancement method uses the derivatives of image. First derivatives in image processing are implemented using the magnitude of the gradient. The representation (16) gives us a natural way to implement derivatives. Indeed X f (j, k)s′1 (x − j)s2 (y − k), fx (x, y) := j,k fy (x, y) := X j,k SAMPTA'09 f (j, k)s1 (x − j)s′2 (y − k). 2π 3 ). Surprisingly, if we choose Hann kernel s1 = s2 = sH and x, y ∈ Z, then the discrete convolution fx (p, q) ≈ p+1 X q+1 X f (j, k)s′H (p−j)sH (q −k) (17) j=p−1 k=q−1 gives us the well-known Sobel filter (see [5], [6])     1 0 −1 1   2  1 0 −1 =  2 0 −2  . 1 0 −1 1 Indeed, sH (k) = 0 (k ∈ Z) if |k| > 1 (see Figure 1) and we get 41 (1, 2, 1). For s′H we use the first 3 values only, i.e. 38 (1, 0, −1). We can easily compute a directional derivative   X fϕ (x, y) := f (j, k)s′1 (x − j) cos ϕ − (y − k) sin ϕ × j,k   ×s2 (y − k) cos ϕ + (x − j) sin ϕ , To get the edges with different spatial frequency, we choose the dilation parameter (see Figure 1). Second derivatives in image processing are implemented using the Laplacian. Using the representation (16) we get △ f (x, y) := fxx (x, y) + fyy (x, y) = X  f (j, k) s′′ (x − j)s(y − k) + s(x − j)s′′ (y − k) . j,k In image processing we use the derivatives for edge detection. Changing the dilation parameter α for the kernel sα (t) = αs(αt) we can detect edges with different spatial frequencies. In calculations we must use the truncated sampling series (p, q ∈ Z) (Sf )mn (p, q) := p+m X q+n X f (j, k)s1 (p − j)s2 (q − k) j=p−m k=q−n and have the truncation error. We can use kernels with finite support like the combinations of B-splines, considered in [4], to get rid of the truncation error, but in some 365 grants 6943, 7033, and by the Estonian Min. of Educ. and Research, projects SF0132723s06, SF0140011s09. The second author wants to thank Archimedes Foundation for support. References: Figure 2: Unsharp mask with Hann kernel sH,1/2 = 1 1 2 sH ( 2 t), a = −1.7. 1.0 0.8 0.6 0.4 0.2 -4 2 -2 4 -0.2 Figure 3: Hann kernel s̃H (t) = O(|t|−3 ), Lanczos kernel sL,3 (t) = O(|t|−2 ) and sinc(t) = O(|t|−1 ) . cases other types of kernels are more suitable. For minimizing the truncation error the kernel s(t) must decrease rapidly when |t| → ∞. The sinc function does not belong even to L1 . Therefore using the kernels in form s(t) = θ(t) sinc t, where θ(t) is some window function (see [11]), is well-known. In most cases of we lose the important property (2) and do not get a generalized sampling series anymore. The kernels in our approach, i.e. kernels defined via Fourier transform of window functions, allow us to get good approximation properties and are rapidly decreasing. In Figure 3 we take the Hann kernel s̃H (t) = O(|t|−3 ) and compare it with the Lanczos kernel sL,3 (t) = O(|t|−2 ), which is one of the most used kernels in imaging (see [16]). In the case of BlackmanHarris kernels (5), considered more precisely in [9], we have sC,b = (|t|−2ℓ−1 ) if for every k = 0, . . . , ℓ − 1 m X j 2k bj = 0. j=0 We defined many rapidly decreasing kernels also in [8], [7], [10]. 4. Acknowledgments This research was supported by European Social Fund Funds Doctoral Studies and Internationalisation Programme DoRa, by the Estonian Science Foundation, SAMPTA'09 [1] H. H. Albrecht. A family of cosine-sum windows for high resolution measurements. In IEEE International Conference on Acoustics, Speech and Signal Processing, Salt Lake City, Mai 2001, pages 3081– 3084. Salt Lake City, 2001. [2] R. B. Blackman and J. W. Tukey. The measurement of power spectra. Wiley-VCH, New York, 1958. [3] P. L. Butzer, G. Schmeisser, and R. L. Stens. An introduction to sampling analysis. In F Marvasti, editor, Nonuniform Sampling, Theory and Practice, pages 17–121. Kluwer, New York, 2001. [4] P. L. Butzer, W. Splettster, and R. L. Stens. The sampling theorems and linear prediction in signal analysis. Jahresber. Deutsch. Math-Verein, 90:1–70, 1988. [5] R. C. Gonzalez and R. E. Woods. Digital Image Processing. Second Edition. Prentice-Hall, 2002. [6] B. Jähne. Digital Image Processing: Concepts, Algorithms, and Scientific Applications. Springer Verlag, Basel, Stuttgart, 1997. [7] A. Kivinukk and G. Tamberg. On sampling operators defined by the Hann window and some of their extensions. Sampling Theory in Signal and Image Processing, 2:235–258, 2003. [8] A. Kivinukk and G. Tamberg. Blackman-type windows for sampling series. J. of Comp. Analysis and Applications, 7:361–372, 2005. [9] A. Kivinukk and G. Tamberg. On Blackman-Harris windows for Shannon sampling series. Sampling Theory in Signal and Image Processing, 6:87–108, 2007. [10] A. Kivinukk and G. Tamberg. Interpolating generalized Shannon sampling operators, their norms and approximation properties. Sampling Theory in Signal and Image Processing, 8:77–95, 2009. [11] R. J. Marks. Fourier Analysis and Its Applications. Oxford University Press, New York, 2009. [12] E. Meijering and et al. Quantitative comparison of sinc-approximating kernels for medical image interpolation. In C. Taylor and A. Colchester, editors, Medical Image Computing and Computer-Assisted Intervention, pages 210–217. Springer, Berlin, 1999. [13] S. M. Nikolskii. Approximation of Functions of Several Variables and Imbedding Theorems. Springer, Berlin, 1975. (Orig. Russian ed. Moscow, 1969). [14] R. L. Stens. Sampling with generalized kernels. In J. R. Higgins and R. L. Stens, editors, Sampling Theory in Fourier and Signal Analysis: Advanced Topics. Clarendon Press, Oxford, 1999. [15] M. Theis. Über eine interpolationsformel von de la Vallee-Poussin. Math. Z., 3:93–113, 1919. [16] K. Turkowski. Filters for common resampling tasks. In A. S. Glassner, editor, Graphics Gems I, pages 147–165. Academic Press, 1990. 366 Reconstruction of signals in a shift-invariant space from nonuniform samples Junxi Zhao College of Mathematics and Physics, Nanjing university of posts and telecommunications, Nanjing,21003, P.R. China junxi_zhao@163.com practical algorithm for implementing the optimal signal reconstruction. In Section , some numerical examples of reconstructing signals demonstrate the effectiveness of the proposed method. Abstract- In this paper, we consider the method for reconstructing a signal from a finite number of samples. In shift-invariant space framework, we derive an aproximately min-max optimal interpolator to to reconstruct a signal on an interval. An effective non-iterative algorithm for signal reconstruction is given also. Numerical examples show the effectiveness of the proposed method. Index Terms–sampling, signal reconstruction, scaling function, shift-invariant space .THE PROBLEM FORMULATION Throughout the paper, we focus on one-dimensional signals and denote the space of signals of finite energy on R by L2 (R) . Let || f || 2 = ∫ | f (t ) |2 dt be the energy of a signal Given K scaling functions ϕ1 (t ), ϕ2 (t ),A , ϕ K (t ) ∈ L (R ) , the shiftinvariant space V (ϕ1 , ϕ2 ,A , ϕ K ) is a Hilbert space defined as V (ϕ1 ,ϕ2 ,A,ϕK ) = close f (t ) ∈ L2 (R) : . INTRODUCTION The problem of signal reconstruction is pervasive in many areas of signal processing, such as in designs of nonuniform antenna arrays, sparse array beamforming, the restoration of signals with missing samples, image acquisition, etc [1-3]. The classical Shannon’s sampling theorem was extended theoretically to general shift-invariant subspaces, and various generalized sampling theorems concerning band-limited and nonband-limited signals have been proposed [4-9]. However, the problem of reconstructing a continuous-time signal from its finite number of nonuniform samples is often encountered in practical applications, and truncating infinite reconstruction leads to errors. Bandlimited and non bandlimited signals are often modeled by shift-invariant spaces. Some authors have proposed interpolation methods and iterative methods for reconstructing signals in shift-invariant spaces in the signal processing literatures[10-14]. A non-iterative reconstruction method is effective to reconstruct continuous-time signals from a finite number of samples by using a suitable interpolator. The Yen interpolator is well known to reconstruct band-limited signals in both minimal energy and least squares senses [13]. Some interpolation methods in shift-invariant spaces were given[9-10,12-14]. In this paper, we are interested in optimally constructing signals in a shift-invariant space from a finite number of nonuniform samples, and develop a new method for reconstructing continuous-time signal on a interval. The upper bound of reconstruction error is derived. We also propose a practical reconstruction algorithm. The method of signal reconstruction can be regarded as a generalization of Yen’s in general shift-invariant spaces. The paper is organized as follows: in Section we formulate the optimal reconstruction problem in shift-invariant spaces, and Section derives a new method to reconstruct a signal from a finite number of arbitrarily distributed samples. Section propose a SAMPTA'09 f (t ) ∈ L (R ) . R 2 2 { f (t ) = ∑∑ ck (n)ϕk (t − n) K , ( ck (n) ) ∈ l 2 (Z), k = 1, 2,A, K} k =1 n We assume that {ϕk (t − n) |1 ≤ k ≤ K , n ∈ Z } forms a frame of V (ϕ1 , ϕ2 ,A , ϕ K ) , i.e., there exist two constants A > 0 and B < +∞ such that A || f ||2 ≤ ∑ ∑ | ( f , ϕ k (t − n)) |2 ≤ B || f ||2 K k =1 (1) for any f ∈ V (ϕ1 , ϕ 2 ,A , ϕ K ) . To make the sampling of functions in V (ϕ1 , ϕ2 ,A , ϕ K ) well-defined, we additionally assume that there exists a constant C such that n ∑ ∑|ϕ K k =1 t ∈ [0, 1] k (t − n) |< C (2) n f (t ) = ∑ ∑ ck (n)ϕk (t − n) with ( ck (n) ) ∈ l 2 (Z) . for any . To see this, let K k =1 From n (2) and (3) it follows that ⎛ K ⎞ 2⎛ K ⎞ 2 | f (t ) |≤ ⎜ ∑ ∑ | ϕk (t − n) |2 ⎟ ⎜ ∑ ∑|ck (n) |2 ⎟ ⎝ k =1 n ⎠ ⎝ k =1 n ⎠ (3) ≤ C ' || f ||, t ∈ R where C ' is a constant. It is known from [8] that the assumption (3) implies that V (ϕ1 , ϕ 2 ,A , ϕ K ) is a reproducing kernel Hilbert space. For a function f (t ) in V (ϕ1 , ϕ2 ,A , ϕ K ) , we adopt the 1 1 1 367 E (τ ) = ∑ || ek ,τ −∑ hm (τ )ek , m ||2 = f# (t ) = ∑ hm (t ) f (tm ) following interpolator K M k =1 ∑ || e τ || −2∑ e m =1 K to reconstruct f (t ) on the interval containing sampling where Ak = | f (t ) − f# (t ) |2 inf sup , h || f ||2 f interpolator, we discuss the optimization D. DERIVATION OF SIGNAL RECONSTRUCTION t1 , t2 ,A , tM ∈ [a, b] , let a function nonuniform τ ∈ [a, b] and f (τ ) is determined by appropriate coefficients hm (τ ) ’s. So ,we study the following optimization Letting | f (τ ) − f# (τ ) |2 || f ||2 f ∈V (ϕ1 ,A,ϕ K ) sup k k {ϕ#k (t − n) |1 ≤ k ≤ K , n ∈ Z } is the {ϕk (t − n) |1 ≤ k ≤ K , n ∈ Z } . We have where, and ≤ ∑∑| ck,n (n) | n k =1 n k =1 m=1 m k where eTk ,τ = (ϕk (τ − n))n m =1 , k = 1, 2,A , K . So, it follows that − n) | 2 m be K M k =1 m =1 approximate f (t ) at τ . seen that T k e k ,τ . (9) aij( k ) = ϕ k (t j − i ) ⎞ ⎟ −1 ⎟ ⎠ H(τ ) = ( h1 (τ ) , h2 (τ ) ,A, hM (τ ) ) = X∑ ATk ek ,τ = ∑ X(∑ϕ (t −n)ϕ (τ −n),A,∑ϕ (t k =1 k 1 k k k =1 M −n)ϕk (τ −n))T = (∑∑∑ x1lϕk ( tl − n ) ϕk (τ − n),A, n K n (11) M k =1 l =1 n ∑∑∑ x K M k =1 l =1 Ml n ϕk ( tl − n ) ϕk (τ − n))T . So, the optimal reconstruction of f (t ) can be expressed as , f# (t ) = ∑ f (tm )hm (t ) M m =1 (12) ⎡K M ⎤ = ∑ f (tm ) ⎢ ∑∑∑ xmlϕ k ( tl − n ) ϕ k (t − n) ⎥. m =1 ⎣ k =1 l =1 n ⎦ Let us take a look at the case K = 1 in detail. Given M samples of x(t ) ∈ V (ϕ ) at instants t1 , t2 ,A , t M for a M minimizing make K T f# (τ ) proper scaling function ϕ (t ) ∈ L2 (R ) , we can express the To minimize E (τ ) , we further express E (τ ) as SAMPTA'09 with K eTk ,m = (ϕk (tm − n)) n E (τ ) = ∑ || ek ,τ − ∑ hm (τ )ek , m || 2 can can ∑A ⎞ −1 K ⎟ ⎟ ⎟ k =1 ⎠ interpolating vector as (6) K M | f (τ ) − f# (τ ) |2 −1 e sup || ≤ − A hm (τ )e k ,m || 2 ∑ ∑ k , τ 2 || f || f k =1 m =1 . It Ak A k = (aij( k ) ) ⎜ ⎝ k =1 | f (τ ) − f# (τ ) |2 ≤ A−1 || f ||2 ∑ || ek ,τ − ∑ hm (τ )ek , m || 2 , (7) k =1 T k the others for each f ∈ V (ϕ1 ,A , ϕ M ) . The above can furthermore be expressed explicitly in vector form as M ∑A ⎛ K ⎜ ⎜ ⎜ ⎝ k =1 M m=1 K = 0 yields that exist one sample in { f (tm )}m =1 which can be expressed by , M n k =1 (8) samples { f (tm )}m =1 is indepentant, that is, there doesn’t (5) ≤ A−1 || f || 2 ∑∑| ϕk (τ − n) − ∑ hm (τ )ϕk (tm − n) |2 . K ∂E (τ ) ∂H (τ ) ⎛ K ⎜ M k k = 1, 2,A, K . It is for and X = ( xij ) = ⎜ ∑ ATk A k ⎟ , we then rewrite the optimal ∑∑| ϕ (τ − n) − ∑ h (τ )ϕ (t 2 and 1 Letting m=1 K ⎞ k , M ⎟⎠ M M K H (τ ) = ( h1 (τ ), h2 (τ ), A , hM (τ ))T K dual frame of =| ∑∑ ck (n)[ϕk (τ − n) − ∑hm (τ ) ϕk (tm − n)] |2 n k =1 ⎟ ⎠ Note that the matrix ∑ ATk A k is invertible when the | f (τ ) − f# (τ ) |2 K ⎜ ⎝ k =1 k =1 ck (n ) = ( f , ϕ#k (t − n )) , k = 1, 2, A , K , n ∈ Z k =1 ⎞ ⎟ K K ⎡K ⎤ 2 (10) A || f || ⎢ ∑ || e k ,τ −A k (∑ ATl A l )−1 ∑ ATm em,τ ||2 ⎥ . l =1 m =1 ⎣ k =1 ⎦ K n ⎛ K ⎜ −1 (4) ∑ ∑ c (n )ϕ (t − n ), f (t ) = Ak H(τ ) + HT (τ ) ⎜ ∑ATk Ak ⎟ H(τ ). k ,τ and the minimal error between f (τ ) and f# (τ ) is given by | f (τ ) − f# (τ ) |≤ r (τ ) = m =1 h (τ ) k =1 e , e k , 2 ,A , e ⎛ ⎜ k ,1 ⎝ m =1 T H (τ ) = M inf 2 Therefore, solving f# (τ ) = ∑ hm (τ ) f (tm ) . The optimal estimation f# (τ ) of instants K followed that K K ∂E (τ ) = −2∑ ATk ek ,τ + 2∑ ATk A k H (τ ) . ∂H (τ ) k =1 k =1 which yields a min-max type interpolator. of at k, k =1 instants t1 ,A , t M . In order to obtain an optimal M samples Given f (t ) ∈ V (ϕ1 , ϕ2 ,A, ϕ K ) M optimal interpolating vector as H (τ ) = ( AT A)−1 AT eτ , 2 368 where A = (e1 , e2 ,A, e M ) and eτ = (ϕ (τ − n))Tn reconstruction error is strongly related to the sampling pattern. As pointed in [18], the reconstruction errors are smaller in the neighborhood of the sampling instants. So, the quality of reconstruction should be evaluated in a pointwise manner. From (10) we know that the min-max reconstruction error is pointwise upper-bounded by , em = (ϕ (tm − n))Tn for m = 1, 2,A, M . It is easy to see that the optimal interpolating vector H (τ ) is exactly the orthogonal projection of vector eτ onto the subspace e1 , e2 ,A, e M by and hence from # # (12) x(t ) ∈ V (ϕ ) with x(tm ) = x(tm ) for m = 1, 2,A, M . This implies that the reconstructed signal best fits the sampling data. Especially, for σ >0 and # ϕ (t ) = sinc(σ t ) , the optimal reconstruction x(t ) of K K ⎡K ⎤2 r(τ ) ≤ A−1/ 2|| f || ⎢∑ || ek ,τ −Ak (∑ ATl Al )−1∑ ATmem,τ ||2 ⎥ (14) l =1 m=1 ⎣ k =1 ⎦ spanned x(t ) ∈ V (ϕ ) from samples x(t1 ), x(t2 ),A , x(tM ) is also x(tm ) = x# (tm ) for band-limited to σ with m = 1, 2,A, M . It is easy to show that the interpolator obtained here is just Yen’s for band-limited signals. 1 and it can estimate the quality of reconstruction when the sampling instants are known. F. DEMONSTRATIVE EXAMPLES Some numerical examples are given to demonstrate the performance of the proposed method, where signals are selected randomly in shift-invariant subspaces, and the sampling instants are generated by adding random perturbation, distributed uniformly in the interval [−u , u ] for u > 0 , to each equally-spaced sampling instant, i.e., the sampling instants are mT + um , where um randomly distributes in [−u , u ] for m = 1, 2,A , M . Example 1 For the first example, we choose arbitrarily a signal of band [−π , π ] . We reconstruct it on [0, 40] from 42 samples. The average sampling period is T = 0.995 s and u = 0.7T . It is clear that the average sampling period is almost critical. The reconstructed signal, its reconstruction from its nonuniform samples and the errors between the original signal and its reconstructions were plotted in Fig.1. From this experiment, it can be seen that under such a relaxed condition, the reconstruction of a signal is quite satisfying, Example 2 For non band-limited signals, we choose the cubic spline [19] as a scaling function, and randomly choose a signal in the shift-invariant space. The average sampling period is T = 1.0s and the maximum of irregular perturbation is u = 0.5T . The signal to be reconstructed, its reconstruction and the reconstruction error (in dB) were plotted in Fig.2, respectively. Note that, although the sampling density is much lower than that estimated in [5](see the examples therein for details), the quality of signal reconstruction is still considerably high. In addition, although the cubic spline is supported compactly, the method given in [10] could not be used in this case because the maximal gap between adjacent sampling instants is too large. In fact, the local reconstruction methods in [10] required the condition that the maximal gap of the sampling instants must be less 1 and the number of the samples must be bigger than the length of the reconstruction interval, but our method doesn’t rely on any sampling condition. In contrast to the results in [10], we also give another example to show the performance of the proposed method in Fig.3. In this example, we chose the same scaling function, a Gaussian function, and the similar sampling condition as in [10]. E. ALGORITHM AND DISCUSSION In the previous section we have derived an interpolator for signal reconstruction. However, because computing the min-max interpolator requires calculating the inverse of a matrix with possibly larger dimension, the reconstruction formula (12) wound be unfeasible when the number of samples is much large. To circumvent this problem, we reshape (12) as ⎛M K M ⎞ f# (τ ) = ∑ ⎜ ∑∑∑ f (tm ) xmlϕk ( tl − n ) ⎟ϕk (t − n) . (13) n ⎝ l =1 k =1 m =1 ⎠ From this, a non-iterative reconstruction algorithm can be given as follows. Algorithm: (1) Let f = ( f (t1 ), f (t2 ),A , f (tM ))T , Ek (t ) = (A , ϕ k (t − n), ϕ k (t − n + 1),A)T , (k ) (k ) A k = (bmn ) with bmn = ϕk (tm − n) for i = 1, 2,A , M and k = 1, 2,A , K ; (2) Compute T = ∑ A k A Tk K k =1 (3) Solve Th = f ; (4) f# (t ) = h T ∑ A k Ek (t ) . K k =1 The most crucial step is solving the equation system of equations Th = f . This can be done effectively by computing the Cholesky factorization of the matrix T when T is invertible. In fact, the cholesky factorization of T gives a upper triangular matrix S such that T = ST S . Then the solution of the system Th = f can be obtained by sequentially solving the systems ST b = f and Sh = b by Gaussian elimination. This procedure is faster and more robust than directly computing the inverse of T . When T is not invertible, the equation can be solved effectively by the least squares method. Note that although the proposed method has no restriction on sampling locations, the obtained SAMPTA'09 3 369 Fig. 1 Top : original signal and sampling points marked by dots; middle: reconstructed signal obtained by the proposed method; bottom: normalized errors between the original signal the its reconstruction obtained by the proposed method and Yen interpolator. Fig. 2 original signal with sampling points marked by stars, reconstructed signal obtained by the proposed method, normalized error(in dB) between the original signal the its reconstruction. SAMPTA'09 4 370 Fig. 3 original signal with sampling points marked by stars, reconstructed signal obtained by the proposed method, normalized error(in dB) between the original signal the its reconstruction with scaling function ϕ (t ) = e −t 2 / 2σ 2 , σ = 0.81 , and sampling density 0.85. Fig. 4 original signal with sampling points marked by stars, reconstructed signal obtained by the proposed method, normalized error(in dB) between the original signal the its reconstruction. ϕ1 (t ) = a1e Example 3. −t 2 / 4 and ϕ 2 (t ) = a2 (t + t )e Finally, we select 3 two −t 2 / 4 functions as scaling functions, where a1 and a2 are normalized constants. Here the average sampling period is T = 0.8 s and the maximum of irregular perturbation u = 0.6T . The SAMPTA'09 simulation results were showed in Fig.4. This example also indicates the feasibility of the proposed method for signal reconstruction in a shift-invariant spaced with several scaling functions . CONCLUSION 5 371 The proposed method of reconstructing a signal from its finite nonuniform samples has the following advantages: (a) the method doesn’t require the usual hypotheses on the maximal gap between adjacent sampling instants and the compactness of the scaling functions of the shift-invariant space as in the literature, and therefore can be applied in various shift-invariant spaces with sampling locations distributed arbitrarily. (b) The reconstruction error function as sensitivity functions [17] can measure the quality of the reconstruction prior to the practical implementation. (c) the method can be used effectively in a multi-wavelet space and can be extended straightforward to two-dimensional spaces. However, our method does not incorporate the case when samples are noisy, which we will investigate in future. [9] C. Ford and D.M. Etter, “Wavelet basis reconstruction of nonuniformly sampled data”, IEEE Trans on Circuits and Systems II, vol.45, pp.1165–1168, 1998. REFERENCES [12] P. J. S. G. Ferreira, “Noniterative and faster iterative methods for interpolation and extrapolation, ” IEEE Trans. on Signal Processing, vol. 42(11), pp. 3278-3282, 1994. [10] K. Grochenig and H. Schwab, “Fast local reconstruction methods for nonuniform sampling in shift-invariant spaces,” SIAM Journal of Matrix Analysis and Applications, vol.24 , no.4, pp.899-913, 2003. [11] A. Aldroubi and H. Feichtinger, “Complete iterative reconstruction algorithms for irregular sampled data in spline-like spaces”, in IEEE Acoustics, Speech and Signal Process International Conf.(ICASSP-97), vol.3, pp.1857-1860, 1997. [1] S. D. Berger, “Nonuniform sampling reconstruction Applied to sparse array beamforming,” Proc. IEEE Radar Conf. 2002, pp.98–103, 2002, [13] Hyeokho Choi and D. C. Munson, “Analysis and Design of minimax-optimal interpolators”, IEEE Trans. on Signal Processing, vol.46, pp.1571–1579, 1998. [2] D. S. Early and D.G. Long, “Image reconstruction and enhanced resolution imaging from irregular samples,” IEEE Trans. on Geoscience and Remote Sensing, vol.39, pp.291–302, 2001. [14] Y. Rolain, J. Schoukens and G. Vandersreen, “Signal reconstruction for non-equidistant finite length sample sets: a KIS approach,” IEEE Trans. on Instrument and Measurement, vol.47, no.5, pp.1046-1052,1998. [3] R. Stasinski and J. Konrad, “POCS-based image reconstrunction from nonuniform samples,” http:// iss.bu.edu/jkonrad/Publications/local/cpapers/Stas00ici p.pdf. [15]I. A. Aldroubi and M. Unser, “Sampling procedures in function spaces and asymptotic equivalence with Shannon sampling,” Numer. Funct. Anal. Optimiz., vol.15, pp. 1-21,1994. [4] G. G. Water, “A sampling theorem for wavelet subspaces,” IEEE Trans. on Inform. Theory, vol. 38, pp.881-884, 1992. [16] P.P. Vaidyanathan, “Generalizations of the sampling theorem: Seven decades after Nyquist”, IEEE, Trans. on Circuits and Systems I: Fundamental Theory and Applications, vol.48, pp.1094–1109, 2001. [5] Wen Chen, S. Itoh and J. Shiki, “On Sampling in Shift Invariant Spaces,” IEEE Trans. on Information Theory, vol.48, pp.2802–28010, 2002. [6] I. Djokovic and P.P. Vaidyanathan, “Generalized sampling theorem in multiresolution subspaces,” IEEE, Trans. on Signal Process, vol.45, pp.583–599, 1997. [17] R. G. Shenoy and T.W. Parks, “An optimal recovery approach to interpolation”, IEEE, Trans. on Signal Processing, Vol.40, pp.1987-1996, 1992. [7] I. W. Selesnick, “Interpolating multiwavelet bases and sampling theory,” IEEE Trans. on Signal Process, vol.47, pp.1615–11621, 1999. [18]A. Tarczynski, “Sensitivity of signal reconstruction”, IEEE Signal Processing Letter, vol.4, pp.192–194, 1997. [8] A. Aldroubi and K. Grochenig, “Nonuniform sampling and reconstruction in shift-invariant spaces”, SIAM Rev., 2001, no.4, pp.585-620. SAMPTA'09 [19]C. K. Chui, An Introduction to wavelets, Academic Press, Inc. 1992. 6 372 Spline Interpolation in Piecewise Constant Tension Masaru Kamada(1) and Rentsen Enkhbat(2) (1) Ibaraki University, Hitachi, Ibaraki 316-8511, Japan. (2) National University of Mongolia, P. O. Box 46/635, Ulaanbaatar, Mongolia. kamada@mx.ibaraki.ac.jp, renkhbat46@ses.edu.mn Abstract: Locally supported splines in tension are constructed where the tension, which has ever been constant over the entire domain, is allowed to change at sampling points. ∫ ∞ (f (2) (t))2 + p(t)2 (f (1) (t))2 dt of its squared second derivative f (2) and squared first derivative f (1) subject to the constraints f (tk ) = fk , k = 0, ±1, ±2, · · · . 1. Introduction A cubic spline gives the interpolation of data that minimizes the square integral of its second derivative [3, 5, 9] and is crowned as the smoothest interpolation in this sense. A linear spline gives the piecewise linear interpolation that is most straight but nonsmooth. The linear spline is characterized as minimizing the square integral of its first derivative [3, 9]. A spline in tension [1, 10] was devised as a generalization of those two splines. It minimizes the integral of a weighted sum of the squared second derivative and the squared first derivative. By increasing the weight called tension, we can make a spline in tension approach the most straight linear spline while retaining smoothness similar to that of the cubic spline. The spline in tension has been known for more than 40 years. It has been extended even to the multidimensional cases [2, 7] and is now supported by a standard software library [8]. But the tension has ever been a single constant over the entire domain. In this paper, we look at the splines as the output of a linear dynamical system with a series of delta functions input. That is the same way as how the exponential splines and their locally supported basis were successfully constructed in [12, 13]. In addition, attending to that the linear dynamical system theory [6] allows for time-varying dynamical parameters, we shall place different tension in each sampling interval. Then we will obtain locally supported splines in piecewise constant tension that can change the interpolation characteristics from a sampling interval to another. (1) −∞ (2) In the case p = 0, f is identical with the cubic spline [3, 5, 9]. By increasing p, f approaches the most straight linear spline as if the curve were pulled from both ends. That is why p is called tension [1, 10]. The tension p has originally been a single constant over the entire domain [10]. We shall now relax the tension to be a non-negative constant in each sampling interval, i.e., p(t) = pk ≥ 0, for t ∈ [tk , tk+1 ), (3) which can change at the sampling points. By the calculus of variation, the minimization problem is reduced to solution of the Euler-Lagrange differential equation f (4) (t) − 2p(t)p(1) (t)f (1) (t) − p(t)2 f (2) (t) = w(t), (4) where w is a series of the Dirac delta functions w(t) = ∞ ∑ wn δ(t − tn ) n=−∞ to be determined so that (2) holds good. We do not have, however, a practical means to decide the coefficients {wn } for given {(tk , fk )}. In practice, it is convenient to have locally supported functions {yn } satisfying yn (t) = 0, for t ∈ / [tn , tn+4 ] (5) of which linear combination f (t) = ∞ ∑ cn yn (t) (6) n=−∞ 2. Preliminaries A spline f in tension interpolating the data {(tk , fk )}∞ k=−∞ given at strictly increasing sampling points (· · · < t−2 < t−1 < t0 < t1 < t2 < · · · ) on the real line is defined as the twice-differentiable function that minimizes the integral of a weighted sum SAMPTA'09 represents any possible f . This yn can be constructed by yn(4) (t) − 2p(t)p(1) (t)yn(1) (t) − p(t)2 yn(2) (t) = un (t) (7) for some appropriately chosen un (t) = 4 ∑ ul,n δ(t − tn+l ) (8) l=0 373 as long as the sampled data system (7) with the impulse input (8) is completely controllable [4]. Once we obtain yn (t), we have only to determine the coefficients {cn } by the linear equations fk = ∞ ∑ cn yn (tk ), k = 0, ±1, ±2, · · · n=−∞ from {(tk , fk )}. Although infinitely many coefficients and data are involved in the equations, we can solve the linear equations for finitely many {cn } from finitely many {(tk , fk )} in practice because {yn } are locally supported. 3. Construction of locally supported splines in piecewise constant tension A state-space representation of the differential equation (7) is x(1) n (t) = F (t)xn (t) + gun (t), yn (t) = hxn (t), (9) where  0 1  0 0 F (t) =   0 0 0 2p(t)p(1) (t)    yn  y (1)    n  xn (t) =  (2)  , g =    yn  (3) yn 0 1 0 p(t)2   0 0  , 1  0 0 0   , h = [1 0 0 0] . (10) 0  1 The state xn can be expressed by ∫ t xn (t) = Φ(t, v)xn (v) + Φ(t, τ )gun (τ ) dτ, (11) v (i) In the open interval (tn+l , tn+l+1 ), (11) with v = tn+l +0 is reduced to xn (t) = Φ(t, tn+l +0)xn (tn+l +0), l = 0, 1, 2, 3 (14) because un (t) = 0 for t ∈ in (10) is a constant matrix  0  0 F (t) =   0 0 (tn+l , tn+l+1 ). Besides, F (t) 1 0 0 0 0 1 0 p2n+l  0 0   1  0 (15) because of (3) so that we can calculate the state-transition matrix by the matrix exponential function [11] as follows: Φ(t, tn+l +0) Rt F (υ) dυ = etn+l +0   (t−tn+l )2 (t−tn+l )3  1 t − t n+l  2 6    (t−tn+l )2     0  1 t − t n+l  2       0 0 1 t − tn+l      0 0 0 1     if pn+l = 0       cosh(pn+l (t−tn+l ))−2    1 t − tn+l p2n+l    sinh(pn+l (t−tn+l ))  0 1 pn+l =   0  (t − tn+l )) 0 cosh(p  n+l    sinh(p (t − tn+l )) 0 0 p  n+l n+l     sinh(pn+l (t−tn+l ))−2pn+l (t−tn+l )    p3n+l     cosh(pn+l (t−tn+l ))−2    2   p  n+l    sinh(pn+l (t−tn+l ))     pn+l    cosh(pn+l (t − tn+l ))   if pn+l > 0. (16) for any real numbers t and v, in terms of the statetransition matrix function Φ and the input un [11]. In the special case that t = tn+l+1 -0, we have the state transition from xn (tn+l +0) to xn (tn+l+1 -0) as follows: Since un (t) = 0 for t ∈ / {tn , tn+1 , tn+2 , tn+3 , tn+4 }, it follows from (11) that xn (tn+l+1 -0) = Φ(tn+l+1 -0, tn+l +0)xn (tn+l +0), xn (t)  0, t < tn    Φ(t, tn+l +0)xn (tn+l +0), = (12) tn+l < t < tn+l+1 , (l = 0, 1, 2, 3)    Φ(t, tn+4 +0)xn (tn+4 +0), tn+4 < t. Because of the top and bottom lines of (12), yn = hxn is locally supported as (5) if xn (tn+4 +0) = 0. In order to avoid the trivial case u0,n = u1,n = u2,n = u3,n = u4,n = 0 that would result in un ≡ yn ≡ 0, let us fix one of them as u0,n = 1. Then the problem of constructing a locally supported yn becomes a dead-beat control problem of finding u1,n , u2,n , u3,n , and u4,n that make the terminal state dead as xn (tn+4 +0) = 0. (13) Once the terminal state is controlled to 0, it will stay at 0 forever for t > tn+4 without any beats. We shall consider two types of state transitions: (i) Those within each sampling interval (tn+l , tn+l+1 ), and (ii) one across each sampling point tn+l . SAMPTA'09 l = 0, 1, 2, 3. (17) The matrix Φ(tn+l+1 -0, tn+l +0) can be evaluated by the right hand side of (16) with t replaced by tn+l+1 . (ii) The state transition from xn (tn+l -0) to xn (tn+l +0) across the sampling point tn+l , (l = 0, 1, 2, 3, 4) finds a trouble that F (t) in (10) contains a derivative of the function p being discontinuous at tn+l as defined by (3). We had better consider this transition by way of the original differential equation (7). An equivalent form of (7) is ) d ( p(t)2 yn(1) (t) = un (t) (18) yn(4) (t) − dt and its integration gives ∫ t un (τ )dτ + c, (19) yn(3) (t) = p(t)2 yn(1) (t) + tn −0 where c is an integral constant. Substituting tn+l +0 and tn+l -0 for t of (19), we have yn(3) (tn+l +0) = p(tn+l +0)2 yn(1) (tn+l +0) +u0,n + · · · + un+l,n + c (20) 374 and yn(3) (tn+l -0) = p(tn+l -0)2 yn(1) (tn+l -0) +u0,n + · · · + un+l−1,n + c, (21) respectively. Recall that the spline in tension is sought among the twice-differentiable functions and attend to the definition (3) of p. Then we can reduce (20) and (21) to yn(3) (tn+l +0) = p2n+l yn(1) (tn+l ) +u0,n + · · · + un+l,n + c and (22) yn(3) (tn+l -0) = p2n+l−1 yn(1) (tn+l ) +u0,n + · · · + un+l−1,n + c, (23) respectively. Subtracting (23) from (22), we have y (3) (tn+l +0) − y (3) (tn+l -0) = (p2n+l − p2n+l−1 )y (1) (tn+l ) + ul,n , (24) which tells how to update the state variable y (3) at tn+l and implies that the other state variables y (2) , y (1) , and y are continuous at tn+l . So we can write the state transition across the sampling point tn+l as follows: xn (tn+l +0) = Φ(tn+l +0, tn+l -0)xn (tn+l -0) + gul,n , l = 0, 1, 2, 3, 4, (25) where  1  0 Φ(tn+l +0, tn+l -0) =   0 0 0 1 0 p2n+l − p2n+l−1 0 0 1 0  0 0   . (26) 0  1 The two types of state transitions (17) and (25) can be combined into the recurrence formulae xn (tn +0) = gu0,n = g, xn (tn+l +0) = Ψn+l xn (tn+l−1 +0) + gul,n , l = 1, 2, 3, 4, (27) where we have set Ψn+l = Φ(tn+l +0, tn+l -0)Φ(tn+l -0, tn+l−1 +0), l = 1, 2, 3, 4, (28) and used our choice u0,n = 1 and the initial state x(tn -0) = 0. By these recurrence formulae, we can write the terminal state as follows: xn (tn+4 +0) = Ψn+4 Ψn+3 Ψn+2 Ψn+1 g +Ψn+4 Ψn+3 Ψn+2 gu1,n +Ψn+4 Ψn+3 gu2,n +Ψn+4 gu3,n +gu4,n . (29) Then we can determine u1 , u2 , u3 , and u4 that makes the terminal state xn (tn+4 +0) be zero by   u1,n u2,n −1   u3,n= − [Ψn+4 Ψn+3 Ψn+2 g Ψn+4 Ψn+3 g Ψn+4 g g] Ψn+4 Ψn+3 Ψn+2 Ψn+1 g. u4,n (30) Existence of the inverse matrix is equivalent to the complete controllability of the sampled-data system with the SAMPTA'09 impulse control un input. We do not have the condition in a simpler form due to the complication caused by timevarying dynamics and non-uniform sampling. Even the uniform sampling case is yet to be investigated. For the numerical evaluation of yn , we first compute the states {xn (tn+l +0)}3l=0 by (27) from {ul,n }4l=0 . Then we can evaluate yn by  0, t ≤ tn    hΦ(t, tn+l +0)xn (tn+l +0), yn (t) = (31) tn+l < t ≤ tn+l+1 , (l = 0, 1, 2, 3)    0, tn+4 ≤ t and hΦ(t, tn+l +0) ] [ (t−tn+l )2 (t−tn+l )3  if pn+l = 0 1 t − t  n+l 2 6   [ n+l ))−2 (32) = 1 t − tn+l cosh(pn+lp(t−t 2   n+l ]   sinh(pn+l (t−tn+l ))−2pn+l (t−tn+l )  if pn+l > 0 p3 n+l which follow from (12), (16), and the continuity of yn over the entire domain. 4. Numerical examples Test data were prepared by concatenating a sampled smooth curve and a sampled polygonal line. Their interpolation was computed as a linear combination of the locally supported splines in tension. The cubic spline interpolation (equivalent to the case p(t) ≡ 0) is shown in Fig. 1. The cursive part is reproduced in a good shape but the polygonal part suffers from inter-sample vibration. The linear spline interpolation (equivalent to the case p(t) → ∞) in Fig. 2 behaves in the opposite way. Reproduction of the polygonal part is perfect but there is no smootheness. Interpolation by a spline in constant tension (p(t) ≡ 10) in Fig. 3 provides a good compromise between the cubic and linear spline interpolation. It is fairly smooth and has less vibration. Some may say that the cursive part is not smooth enough and rather polygonal in Fig. 3. In this case, we can obtain a better interpolation by varying the tension in time. Figure 4 is an interpolation by a spline in piecewise constant tension. Higher tensions are imposed on the polygonal part to suppress the vibration. The interpolation is kept smooth elsewhere. The locally supported splines used to construct this curve are plotted in Fig. 5 where the plots are vertically scaled to have a common peak value. 5. Conclusions Locally supported splines in tension were constructed where the tension is constant within each sampling interval and variable at the sampling points. They will hopefully contribute to the variety of curve drawing modules in the graphical design tools. Another application may be image enlargement tools which allow users to put higher tension manually at the portions where they want to suppress ringing effects. 375 f(t) t0 f(t) t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t Figure 1: Interpolation by a cubic spline (p(t) ≡ 0). t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t Figure 2: Interpolation by a linear spline (p(t) → ∞). References: f(t) [1] J. H. Ahlberg, E. N. Nilson and J. L. Walsh. The Theory of Splines and Their Applications. Academic Press, London, 1967. [2] M. N. Benbourhim and A. Bouhamidi. Approximation of vector fields by thin plate splines with tension. J. Approx. Theory, 136:198–229, 2005. [3] C. de Boor. Best approximation properties of spline functions of odd degree. J. Math. Mech., 12:747– 750, 1963. [4] Y. C. Ho, R. E. Kalman and K. S. Narendra. Controllability of linear dynamical systems. Contrib. Diff. Eqs., 1:189–213, 1963. [5] J. C. Holladay. Smoothest curve approximation. Math. Tables and Aids to Comput., 11:223–243, 1957. [6] R. E. Kalman. A new approach to linear filtering and prediction problems. Trans. ASME, 82(Series D):35–45, 1960. [7] H. Mitasova and L. Mitas. Interpolation by regularized spline with tension: I. theory and implementation. Mathematical Geology, 25:641–655, 1993. [8] A. Polyakov and V. Brusentsev. Graphics Programming with GDI+ & DirectX. A-List Publishing, Wayne, PA, 2005. [9] I. J. Schoenberg. On interpolation by spline functions and its minimal properties. In P. L. Butzer and J. Korevaar, editors, On Approximation Theory, pages 109–129, June 1964. [10] D. G. Schweikert. An interpolation curve using a spline in tension. J. Math. Phys., 45:312–317, 1966. [11] E. D. Sontag. Mathematical Control Theory. Springer, New York, 1990. [12] M. Unser and T. Blu. Cardinal exponential splines: Part I—Theory and filtering algorithms. IEEE Transactions on Signal Processing, 53(4):1425–1438, April 2005. [13] M. Unser. Cardinal exponential splines: Part II— Think analog, act digital. IEEE Transactions on Signal Processing, 53(4):1439–1449, April 2005. t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t Figure 3: Interpolation by a spline in constant tension (p(t) ≡ 10). f(t) t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t Figure 4: Interpolation by a spline in piecewise constant tension (p(t) = 0 for the cursive part (t < t4 ), p(t) = 10 for the straight parts (t4 ≤ t < t6 and t7 ≤ t), and p(t) = 30 for the breaking part (t6 ≤ t < t7 ) ). yn(t) t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t Figure 5: Locally supported splines used to construct the curve in Fig. 4. SAMPTA'09 376 The Effect of Sampling Frequency on a FFT Based Spectral Estimator Saeed Ayat Payame Noor University, Najafabad, Iran. dr.ayat@pnu.ac.ir Abstract: This paper reviews the effect of sampling frequency on a FFT-based spectral estimator. In signal processing applications usually a fix window size is used for obtaining the current frame spectral. For an application like speech enhancement this accuracy of this estimation has a great influence in the quality of the system, because listener feeling is very important in this subject. In our proposed method we divided the well-known spectral subtraction method in two phases. Then by using different frame sizes that we used in these two phases the overall quality of the system has increased in different sampling frequencies. Now with subtracting this estimation of noise spectrum from the spectrum of each noisy speech frames we can achieve enhanced speech signal. The paper is organized as follows: In section 2 we have a review on spectral subtraction method. In section 3 we proposed our method and in section 4 we present the simulation results. 2. Spectral Subtraction There are many different versions for spectral subtraction. In a generalized spectral subtraction [4] we have: 1 ) ⎧ ⎫ α α S (w) = max⎨( S (w) − β N (w) ) α , γ N (w) ⎬ ⎩ ⎭ ) S (w) (1) 1. Introduction Where S (w) , N (w) One of the first methods introduced for speech enhancement is spectral subtraction. Till now, different versions of spectral subtraction have been proposed to increase the performance of this method, for example [1, 2, 3]. Despite of its high noise removal, it can cause an annoying noise called musical noise and hence it can reduce overall quality. Musical noise is produced because, we don’t have the needed spectra exactly, so we have to use their estimations. In signal processing applications usually a fix window size is used for obtaining the current frame spectral. As we know if the frame length is L the frequency resolution in Fourier spectral analysis is Fs/L. For example if Fs=11025Hz and L=256 then Fs/L is 43Hz and this resolution may not be enough for speech signal. As we know a clean speech signal consists of some sections that have speech and some others that have no speech and we call them silences. In a noisy speech signal these silence sections have only noise and other sections have noisy speech signals. If the noise is stationary we can estimate its spectrum in the noise sections. In spectral subtraction method, after framing the noisy speech signal we use a silence detector or a voice activity detector for separating noisy speech frames and noise frames. After that with applying, FFT we have the spectrum of each frame. By calculating the average of the noise frames spectra we have estimation for noise spectra. spectrum of noisy speech, estimation of noise and enhanced speech. β is the oversubtraction factor and γ SAMPTA'09 and are magnitude is spectral floor. Both β and γ are adjusted to improve the quality of enhanced speech. By the assuming that the noise is stationary, a good estimation can be resulted by computing the average of the noise in silence frames spectra. We called such average W (w) . In presence of nonstationary noises, an adaptation technique can be used. Given an initial value W 0 ( w) , if the current frame is silence, Wm (w) is updated using this equation: W m ( w ) = (1 − f ) W m −1 ( w ) + f Ym ( w ) (2) In this formula Ym (w) is the spectrum of current silence frame and f is a coefficient called forgetting factor. This factor is changed depending on the noise changing rate. The main problem of spectral subtraction method is the production of musical noise. Musical noise is produced because we don’t have the exact spectrum of the noise signal. 377 3. Proposed Method SNRout = 10 log10 In our method that estimates the spectrum better than the basic averaging method, after separating speech and silence frames in the noisy signal with a basic analysis frame, we can increase the analysis frame length until it covers all the current silence frames. As in periodogram estimator technique the accuracy improves by increasing the number of signal samples, by using this adaptive analysis frame length we can have a better spectral estimation for noise and noisy signal and so the system can produce a better enhanced signal with less musical noise. As we know if the frame length is L the frequency resolution in Fourier spectral analysis is Fs/L. For example if Fs=11025Hz and L=256 then Fs/L is 43Hz and this resolution may not be enough for speech signal. In our method we first apply a SAD algorithm with L=256 and L/2=128 points overlap to detect the silence frames. Now we can increase the analysis frame length until it covers all the current silence frames. By this method we have larger window length and hence better frequency resolution. If we have several silence areas with the new frame length, the average of them is the overall noise spectrum. By applying such method we have better noise spectrum estimation with less musical noises. In section 4 we give experimental results that confirm this improvement clearly. (3) ( n) (6) 2 in the enhanced signal. At this point, β and SNR improvement is recorded. This is done for SNR equal to 5dB and different frame lengths with 256, 512, 1024 and 2048 samples. This test was evaluated for different sampling frequencies equal to 8000Hz, 11025 Hz and 16000Hz. α is fixed to 1.0 and γ to 0.0. Note that the frame length is 256 in silence detection step. Tables 1 to 3 show the results for β and SNR improvement at the appearance of musical noise in the enhanced signal for tested SNRs. L 256 512 1024 2048 SNRimp 0.8 1.0 1.4 1.44 β 0.1 0.15 0.25 0.45 Table 1: β and SNR improvement at the start of musical noise (Fs=8000Hz) L y (n) = s(n) + w(n) ∑ (sˆ(n) − s(n)) 2 In this experiment a listener listens to the enhanced signal and increases β until the musical noise appeares 256 512 1024 2048 SNRimp 1.0 1.8 2.3 3.1 β 0.15 0.3 0.5 0.9 4. Simulation Results In this section we explain our simulation. The speech signal that used for these tests was chosen from TIMIT data base and was pronounced with a female speaker. Then this sentence converted to different sampling frequencies by cool edit software. All these sentences degraded by additive Gaussian white noise, so we can have the noisy signal in required SNR, here 5dB. For evaluating our method we calculate SNR improvement as below. If s (n) is the clean speech, y(n) the noisy, sˆ( n) the enhanced signal and w(n) the noise then we have: ∑s Table 2: β and SNR improvement at the start of musical noise (Fs=11025Hz) L 256 512 1024 2048 SNRimp 1.2 2.3 3.1 3.8 β 0.2 0.5 0.7 1.0 Table 3: β and SNR improvement at the start of musical noise (Fs=16000Hz) and the SNR improvement is computed as follows[5]: SNRimp = SNR out − SNRin (4) In which SNRin and SNRout are the SNRs for noisy and enhanced: SNRin = 10 log10 SAMPTA'09 ∑s ∑ ( y(n) − s(n)) 2 ( n) 2 (5) As we can see the SNR improvement is better for longer frame lengths in all different sampling frequency rates. This show that the musical noise arises from inaccurate noise estimation and reduces as the frame length increases, and this result is true for different sampling frequencies. So with a greater frame length, we can choose a greater β without production of musical noise and by increasing it we can have less noise in the enhanced signal and then achieve more SNR improvement, too. 378 5. Conclusions In this paper we studied the effect of sampling frequency on a FFT-based spectral estimator. We also proposed an improved spectral subtraction method by increasing the accuracy of spectral estimator. This adaptive estimator can give better spectral estimation by increasing the analysis frame length that achieves in silence regions. In this method for separating silence frames we use a basic analyzing frame and for estimation the spectrum we use an adaptive frame length that can increase until it covers all current silence region. By this method we could have a better spectral estimation for noise and noisy signal and so the system can produce a better enhanced signal with less musical noise. SAMPTA'09 References: [1] S. F. Boll, "Suppression of Acoustic Noise in Speech using Spectral Subtraction,” IEEE Trans. Acoustics, Speech and Signal processing, vol. ASSP-27, No. 2, pp. 113-120, 1977. [2] H. Hu, F. Kuo, H. Wang, "Supplementary Schemes to Spectral Subtraction for Speech Enhancement," Speech Communication, 2002. [3] H. Gustafsson, S. Nordholm, “Speech Subtraction using Reduced Delay Convolution and Adaptive Averaging”, IEEE Trans. Speech and Audio Processing, vol. 9, No. 8, pp. 799-807, 2001. [4] J. S. Lim, A. V. Oppenheim, "Enhancement and Bandwidth Compression of Noisy Speech", Proceedings of the IEEE, vol. 67, 1972. [5] S. Ayat, “Enhanced Human-Computer Speech Interface Using Wavelet Computing", IEEE International Conference on Virtual Environments, Human-Computer Interfaces and Measurement Systems, 2008. 379 SAMPTA'09 380 Nonlinear Locally Adaptive Wavelet Filter Banks Gerlind Plonka (1) and Stefanie Tenorth (1) (1) Department of Mathematics, University of Duisburg-Essen, 47048 Duisburg, Germany. gerlind.plonka@uni-due.de, stefanie.tenorth@uni-due.de Abstract: In this paper we introduce a new construction of nonlinear locally adaptive wavelet filter banks by connecting the lifting scheme with the idea of image smoothing by nonlinear diffusion methods. In this paper we wish to construct a new invertible nonlinear wavelet filter bank by connecting the two concepts of the lifting scheme and the discrete nonlinear diffusion. The main goal is to adapt the local geometry of images suitably, in order to obtain highly efficient sparse image representations. 1. Introduction 2. A crucial problem in data analysis is to construct efficient low-level representations, thereby providing a precise characterization of features which compose it, such as edges and texture components. Fortunately, in many relevant applications, the components of given multidimensional data are not independent, and the strong correlation between neighboring data points can be suitably exploited. In the two-dimensional case, tensor-product wavelets are not optimal for representing geometric structures because their support is not adapted to directional geometric properties. Instead of choosing a priori a basis or a frame to approximate the image, one can try to adopt the approximation scheme to the image geometry. Within the last years, different approaches have been developed in this direction, see e.g. [1, 4, 5, 7, 10, 12, 13]. In particular, the construction of non-linear filter banks by the lifting scheme has been proposed already in [4, 8]. Since that time, there have been different attempts to construct adaptive and directional lifting based, invertible transforms for sparse image representation, see [2, 5, 6, 9, 12]. The lifting scheme for representation of wavelet filter banks has originally been suggested and analyzed by Sweldens [16]. It provides a flexible tool for the construction of new nonlinear wavelet filter banks. The main feature of lifting is that it provides an entirely spatial-domain interpretation of the transform. Besides wavelet shrinkage, other approaches like regularization techniques and PDE-based methods (as nonlinear diffusion) have been shown to be powerful tools in signal and image restoration in image processing, e.g., for denoising purposes. In particular, the choice of nonlinear diffusion filters leads to impressive results by removing insignificant, small-scale variations while preserving important features such as discontinuities [3, 11, 17, 18]. In [15], certain connections between explicit discrete onedimensional schemes for non-linear diffusion and shiftinvariant Haar wavelet shrinkage have been established. SAMPTA'09 2.1 Lifting and Nonlinear Diffusion The Lifting Scheme The typical lifting scheme consists of three steps: Split, Predict and Update. 1. Split. Usually, in this step, the given data is split into even and odd components. Let N ∈ N be of the form N = 2l r with l, r ∈ N. For a given digital image of the N ×N −1 , we split the data into form a = (a(i, j))N i,j=0 ∈ R the following two sets of equal size, ae −1 := (ai,j )N i,j=0,i+j even , ao −1 := (ai,j )N i,j=0,i+j odd , and we denote the components of ae and ao by aei,j and aoi,j , respectively. The data sets ae and ao split the image a like a checkerboard. 2. Predict. The goal of the prediction step is to find a good approximation ão of the data ao of the form ão = P1 (ao ) + P2 (ae ). Here P1 and P2 can be nonlinear operators. Afterwards, we consider the residual do := ao − ão = ao − (P1 (ao ) + P2 (ae )). We have to assume that the mapping (ae , ao ) 7→ (ae , do ) is invertible, i.e., the operator I − P1 needs to be invertible for arbitrary data ao . The operators P1 and P2 need to be chosen such that the residual do is very small. 3. Update. In the third step, we aim to find a smoothed approximation of the data ae that can be regarded as a lowpass filtered and subsampled version of the original image a. The general update has the form ãe := U1 (do ) + U2 (ae ) with (possibly nonlinear) operators U1 and U2 , where we again want to assume the invertibility of the mapping 381 (ae , do ) 7→ (ãe , do ), i.e., U2 is assumed to be invertible such that ae = U2−1 (ãe − U1 (do )). We use a discretization of the form − ukij uk+1 ij = g(|uki+1,j − uki,j |)(uki+1,j − uki,j ) τ −g(|uki,j − uki−1,j |)(uki,j − uki−1,j ) The complete scheme is illustrated in Figure 1. +g(|uki,j+1 − uki,j |)(uki,j+1 − uki,j ) ae U2 a + I − P1 −g(|uki,j − uki,j−1 |)(uki,j − uki,j−1 ), U1 −P2 ao ãe do + Figure 1: Illustration of the nonlinear filter bank using the lifting scheme. (2) where u0i,j := aij for i, j = 0, . . . , N − 1. Here, k denotes the iteration step and τ is the step size of time discretization. In our numerical examples we will use the step size τ = 1/4. 3. The Nonlinear Diffusion Filter Bank 2.2 Nonlinear Diffusion The nonlinear diffusion has been shown to be a very successful model for image denoising. For Ω = (0, N1 ) × (0, N1 ) we consider the diffusion equation ³ ´ ∂u = div g(|∇u|) ∇u ∂t on Ω × (0, ∞) (1) with a given noisy image a as initial state u(x, 0) = a(x), x∈Ω ∂u and with Neumann boundary conditions ∂n = 0 on ∂Ω. T , ∂u/∂x Here, ∇u = (ux1 , ux2 )T = (∂u/∂x 2 ) denotes p 1 the gradient of u, and |∇u| := u2x1 + u2x2 . The time t in (1) is a scale parameter. Increasing t corresponds to stronger filtering. The diffusivity function g(|∇u|) is a non-negative function that determines the amount of diffusion. It is decreasing in |∇u| in order to ensure that strong edges are hardly blurred by the diffusion filter while small variations (noise) are smoothed much stronger. Frequently used bounded diffusivities are the Perona-Malik diffusivity g(x) := Choice of the Prediction Operator Using equation (2) with the notations aoi,j := u0i,j , ãoi,j := u1i,j for i + j odd, and aei,j = u0i,j for i + j even, we obtain £ ãoi,j = aoi,j + τ g(|aei+1,j − aoi,j |)(aei+1,j − aoi,j ) + g(|aei−1,j − aoi,j |)(aei−1,j − aoi,j ) ¤ + g(|aei,j−1 − aoi,j |)(aei,j−1 − aoi,j ) . A prediction could now be of the form x = 0, x > 0, see [14, 17]. One may also take a “robust” diffusivity of the form ½ 1 0 ≤ x < θ, g(x) := 0 |x| ≥ θ, as it has been used in [14] with a suitably chosen threshold θ. Replacing g(|∇u|) by g(|∇uσ |), where uσ denotes the slightly smoothed image by convolution with the Gaussian kernel, uσ := Kσ ⋆ u, existence and uniqueness of a solution of (1) have been shown in [3]. For application of the diffusion approach to digital images we follow [11] and replace (1) by the following slightly modified equation ∂u = ∂x1 (g(|∂x1 u|) ∂x1 u) + ∂x2 (g(|∂x2 u|) ∂x2 u). ∂t SAMPTA'09 3.1 + g(|aei,j+1 − aoi,j |)(aei,j+1 − aoi,j ) 1 , 1 + x2 /λ2 or the Weickert diffusivity ( 1 ³ ´ g(x) := 1 − exp −3.315 (x/λ)4 Now we want to apply the nonlinear diffusion filter for the construction of prediction and update operators in the lifting scheme, in order to obtain a new sparse representation of images. The nonlinear filter bank should satisfy the following demands. 1. For linear (bivariate) polynomials, the residual do found in the prediction step should vanish. This condition is equivalent with two vanishing moments of the high-pass filter in a wavelet filter bank. 2. Near discontinuities (edges) of u, the residual do should remain small. 3. The data ãe should be a suitable (downsampled) approximation of the image a with good low-pass filter properties in smooth areas of a and without blurring of edges. doi,j = aoi,j − ãoi,j 1 h X i g(|aei+µ,j+ν − aoi,j |)(aei+µ,j+ν − aoi,j ) . = −τ µ,ν=−1 |µ|+|ν|=1 Unfortunately, with this coice of prediction the desired invertibility of the mapping (ae , ao ) 7→ (ae , do ) is not guaranteed since the nonlinear diffusion g depends on the data aoi,j . Therefore, we replace the values aoi,j that are used for the computation of the function values of g by the median of its four direct neighbors, aoi,j ≈ median {aei,j+1 , aei,j−1 , aei+1,j , aei−1,j }:= med aoi,j . A normalization with gij := 1 X µ,ν=−1 |µ|+|ν|=1 g(|aei+µ,j+ν − med aoi,j |) 382 now yields the prediction doi,j := −τ gij 1 P g(|aei+µ,j+ν− med aoi,j |)(aei+µ,j+ν− aoi,j ) µ,ν=−1 |µ|+|ν|=1 = τ aoi,j − τ gij 1 P µ,ν=−1 |µ|+|ν|=1 g(|aei+µ,j+ν− med aoi,j |) aei+µ,j+ν . Now,the invertibility of the prediction is ensured for τ > 0 and we have aoi,j = do i,j τ + g1ij 1 P µ,ν=−1 |µ|+|ν|=1 g(|aei+µ,j+ν −med aoi,j |)aei+µ,j+ν . Observe that the term gij is positive for all i, j if we take Perona-Malik diffusivity or Weickert diffusivity. At the boundary of the image, where not all four neighbors of a data point are given, we slightly change the operator and use only the three available neighbors in the sum (or even only two neighbors at a vertex). Because of the normalization with the (correspondingly defined constants gij ) the properties of the prediction operator will not change. application of the diffusion filter bank. Secondly, we apply the shrinkage function ½ x |x| ≥ θ, Sθ (x) := 0 |x| < θ, to the residual coefficients. In our numerical experiments we will take a level-independent threshold θ. Finally, we reconstruct the image with the modified residual coefficients. 4. Properties of the Diffusion Filter Bank We can show the following Theorem 1. Let g be a diffusivity function satisfying 0 < g(|x|) ≤ 1 for x ∈ R. The diffusion filter bank determined in Section 3 reproduces linear polynomials. Proof. We consider a linear polynomial of the form a(x1 , x2 ) = a0 + b0 x1 + c0 x2 , a0 , b0 , c0 ∈ R. Let the digital image now be given by 3.2 Choice of the Update Operator As update operator we simply apply a linear operator of the form √ ãei,j = 2aei,j + 14 (doi+1,j + doi−1,j + doi,j+1 + doi,j−1 ). ai,j = a(ih, jh) = a0 + b0 ih + c0 jh. Then we obtain for data that are not at the boundary med aoi,j = median {a0 + b0 (i − 1)h + c0 jh, a0 + b0 (i + 1)h + c0 jh, a0 + b0 ih + c0 (j − 1)h, Invertibility is obviously satisfied and we find ¡ ¢ aei,j = √12 ãei,j − 41 (doi+1,j + doi−1,j + doi,j+1 + doi,j−1 ) . At the boundary, where aei,j has only three neighbors, we slightly change the operator. For example, for 0 < i < N − 1 and j = 0, we take √ ãei,0 := 2aei,0 + 31 (doi+1,0 + doi−1,0 + doi,1 ), etc.. Analogously, at vertices, only two neighbors are taken into account. Observe that the low-pass filtered values ãei,j are amplified √ by 2 here (as it is usual also for orthogonal wavelet filter banks). 3.3 Iterative Application of the Filter Bank In order to obtain a suitable sparse representation of the digital image a, we now iteratively apply the nonlinear filter bank described above, and we use a hard threshold procedure to suppress small residual values doi,j . After the first application of the filter bank, the (small) residual data doi,j , i, j = 0, . . . , N − 1, i + j odd, are stored and we consider only the N 2 /2 values ãei,j , i, j = 0, . . . , N − 1, i + j even. For a second application of (1) the filter bank to ãei,j , we rename these data by ak,l := e ãk−l,k+l , where k = 0, . . . , N − 1 and l = − min{k, N − 1 − k}, . . . , min{k, N − 1 − k}, and apply the filters now to this data set, etc.. As usual, the complete procedure involves the following three steps. First, we decompose the image by an iterative SAMPTA'09 a0 + b0 ih + c0 (j + 1)h} = a0 + b0 ih + c0 jh + median {−b0 h, b0 h, −c0 h, c0 h} = a0 + b0 ih + c0 jh = aoi,j and doi,j = −τ gij 1 P g(|aei+µ,j+ν− aoi,j |)(aei+µ,j+ν− aoi,j ) µ,ν=−1 |µ|+|ν|=1 = −τ gij = 0. £ g(b0 h)(aei+1,j + aei−1,j − 2aoi,j ) +g(c0 h)(aei,j+1 + aei,j−1 − 2aoi,j ) ¤ Hence the prediction operator leads to doi,j = 0 and the √ update yields ãei,j = 2aei,j for all i, j with i + j even. ¤ Further, one can show in case studies, that the proposed filter bank behaves well at vertical, horizontal and diagonal edges, i.e., the obtained residual values using the nonlinear prediction operator remain to be small. 5. Numerical Results We apply the above described nonlinear diffusion filter bank in order to achieve sparse image representations. In the experiment, we consider the monarch image. We use the Perona-Malik diffusivity with λ = 28 and with τ = 0.25. We apply 8 levels of the nonlinear filter bank, 383 PSNR= 26.41 dB PSNR= 24.73 dB 10 10 10 20 20 20 30 30 30 40 40 40 50 50 50 60 60 10 20 30 40 50 60 60 10 20 30 40 50 60 10 20 30 40 50 60 Figure 2: Original image Monarch (left), sparse image representation with 449 coefficients using the proposed nonlinear diffusion filter bank (middle) and the biorthogonal filter bank with 7-9 filter (right). i.e., there will remain 16 low-pass coefficients. For thresholding we use the hard shrinkage function with θ = 13. In Figure 2(left), we present the original image. Figure 2(middle) shows the obtained compressed image with 449 remaining coefficients using the new diffusion filter bank. For comparison, we apply 8 decomposition levels of the two-dimensional biorthogonal wavelet shrinkage with the 7−9 filter with the same number of 449 remaining nonzero coefficients, see Figure 2(right). As we can see, the nonlinear filter bank not only gives an optically better result but also achieves a better PSNR value (26.41 dB) while the biorthogonal filter bank achieves a PSNR of 24.73 dB. We remark that our method is especially designed for constructing efficient low-level representations and does not work well for image denoising. 6. Acknowledgement The research in this paper is supported by the project PL 170/13-1 of the German Research Foundation (DFG). This is gratefully acknowledged. References: [1] F. Arandiga, A. Cohen, R. Donat, N. Dyn. Interpolation and approximation of piecewise smooth functions. SIAM J. Numer. Anal. 43:41–57, 2005. [2] N.V. Boulgouris, D. Tzovaras, and M.G. Strintzis. Lossless image compression based on optimal prediction, adaptive lifting, and conditional arithmetic coding. IEEE Trans. Image Process. 10:1–14, 2001. [3] F. Catté, P.-L. Lions, J.-M. Morel, and T. Coll, Image selective smoothing and edge detection by nonlinear diffusion. SIAM J. Numer. Anal. 29:182–193, 1992. [4] R.L. Claypoole, G.M. Davis, W. Sweldens, and R.G. Baraniuk. Nonlinear wavelet transforms for image coding via lifting. IEEE Trans. Image Process. 12:1449–1459, 2003. [5] A. Cohen and B. Matei. Compact representation of images by edge adapted multiscale transforms. In Proc. IEEE Int. Conf. on Image Process. (ICIP), Thessaloniki, pages 8–11, 2001. [6] W. Ding, F. Wu, X, Wu, S. Li, and H. Li. Adaptive directional lifting-based wavelet transform for im- SAMPTA'09 age coding. IEEE Trans. Image Process. 16:416–427, 2007. [7] D.L. Donoho. Wedgelets: Nearly minimax estimation of edges. Ann. Stat. 27:859–897, 1999. [8] F.J. Hampson and J.-C. Pesquet. A nonlinear subband decomposition with perfect reconstruction. Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., pages 1523–1526, 1996. [9] H.J.A.M. Heijmans, B. Pesquet-Popescu, G. Piella. Building nonredundant adaptive wavelets by update lifting. Appl. Comput. Harmon. Anal. 18:252–281, 2005. [10] S. Mallat. Geometrical grouplets. Appl. Comput. Harmon. Anal. 26:161–180, 2009. [11] P. Perona and J. Malik. Scale space and edge detection using anisotropic diffusion. Proc. IEEE Computer Society Workshop on Computer Vision, IEEE Computer Society Press, pages 16–22, 1987. [12] G. Piella, B. Pesquet-Popescu, H.J.A.M. Heijmans, and G. Pau. Combining seminorms in adaptive lifting schemes and applications in image analysis and compression. J. Math. Imaging Vis. 25:203–226, 2006. [13] G. Plonka. The easy path wavelet transform: A new adaptive wavelet transform for sparse representation of two-dimensional data. Multiscale Model. Simul. 7:1474–1496, 2009 . [14] G. Plonka and J. Ma. Convergence of an iterative nonlinear scheme for denoising of piecewise constant images. Int. J. Wavelets Multiresolut. and Inf. Process. 5:975–995, 2007. [15] G. Steidl, J. Weickert, T. Brox, P. Mrázek, M. Welk. On the equivalence of soft wavelet shrinkage, total variation diffusion, total variation regularization, and sides. SIAM J. Numer. Anal. 42/2:686–713, 2004. [16] W. Sweldens. The lifting scheme: A construction of second generation wavelets. SIAM J. Math. Anal. 29:511–546, 1997. [17] J. Weickert. Anisotropic Diffusion in Image Processing. Teubner, Stuttgart, 1998. [18] M. Welk, G. Steidl, and J. Weickert. A four–pixel scheme for singular differential equations. In R. Kimmel, N. Sochen, J. Weickert, editors, Scale-Space and PDE Methods in Computer Vision. Lecture Notes in Computer Science, Springer, Berlin, pages 610–621 (2005). 384