IEEE Signal Processing Magazine - November 2023

Call for Papers
IEEE Signal Processing

Magazine
Special Issue on Near-Field Signal Processing: Communications, Sensing and Imaging
Signal processing technologies are moving toward using small and densely packed sensors to create large aperture arrays. This
allows for higher angular resolution and beamforming gain. However, with the extended aperture and small wavelength, when
the receiver is in the near-field, i.e., it is closer to the transmitter than the Fraunhofer distance, the signal wavefront is no longer
planar. Therefore, a spherical wavefront must be considered since the system's performance depends on both the propagation
distance and the direction of the signal of interest. As a result, near-field signal processing has recently become an essential
technique for both radar sensing and wireless communications to achieve spatial multiplexing with increased degrees of
freedom and high-resolution with a range-dependent, very narrow beamwidth. Although near-field processing is a relatively
new concept in wireless communications and sensing, it has already been extensively studied in other fields, mainly
computational imaging, where the propagation distance is short, e.g., microscopy, holography, and optics, where the far-field
assumption fails.
The aim of this special issue is to provide a venue for a wide and diverse audience of researchers from academia, government,
and industry to survey the recent research advances in major near-field applications such as wireless communications, sensing
and imaging. Topics of interest include but are not limited to:
▪ Reactive and radiative near-field signal processing ▪ Near-field spatially varying channels, e.g., Rydberg sensors,
for massive MIMO communications holographic surfaces, and frequency diverse arrays
▪ Short-range THz communications ▪ Information metasurfaces for near-field imaging, signal
▪ Active/passive near-field beamforming techniques processing, and wireless communications
▪ Near-field integrated sensing and communications ▪ Signal processing for near-field localization, direction-of-
▪ Near-field sensor array/reflecting surface processing arrival estimation and sensing
▪ Near-field automotive radar sensing and imaging ▪ Near-field synthetic aperture sounding and radar imaging
▪ Signal processing for near-field acoustics ▪ Machine learning techniques to enable near-field systems
▪ Signal processing on the spherical manifolds ▪ Near-field wireless power transfer for 5G and IoT
▪ Signal processing for mixed near-field and far-field ▪ Recent advances in mid- and near-field optics via coded
observations diffraction patterns
▪ Antenna array calibration for near-field applications ▪ Modeling and prototyping in microscopy, holography, Raman
▪ Electromagnetics/physics of near-field beamforming spectroscopy, crystallography and optics
Submission Guidelines: White papers are required, and full articles will be invited based on the review of white papers. The
white paper format is up to 4 pages in length, including the proposed article title, motivation and significance of the topic, an
outline of the proposed paper, and representative references. An author list with contact information and short bios should also
be included. Submitted articles must be of tutorial/overview/survey nature, accessible to a broad audience, with significant
relevance to the scope of the special issue. Authors are invited to submit their contributions by following the detailed
instructions given at: https://signalprocessingsociety.org/publications-resources/ieee-signal-processing-
magazine/information-authors-spm. Manuscripts should be submitted online via http://mc.manuscriptcentral.com/spmag-ieee
Important Dates: Guest Editors:
White paper due: 1 March 2024 Ahmet M. Elbir (Lead), University of Luxembourg, Luxembourg
Invitation notification: 1 April 2024 (ahmetmelbir@ieee.org)
Full manuscripts due: 15 June 2024 Ana Isabel Perez-Neira, Centre Tecnológic de Telecomunicaciones de
First review to authors: 1 September 2024 Catalunya, Spain (ana.perez@cttc.cat)
Henry Arguello, Universidad Industrial de Santander, Colombia
Revision due: 1 November 2024
(henarfu@uis.edu.co)
Second review completed: 1 January 2025 Martin Haardt, Ilmenau University of Technology, Ilmenau, Germany
Final manuscript due: 1 February 2025 (martin.haardt@tu-ilmenau.de)
Publication: May 2025 Moeness G. Amin, Villanova University, USA
(moeness.amin@villanova.edu)
Digital Object Identifier 10.1109/MSP.2023.3313848 Tie Jun Cui, Southeast University, China (tjcui@seu.edu.cn)
Contents Volume 40 | Number 7 | November 2023
FEATURES 64 Tips & Tricks

Tricks for Designing a Cascade of Infinite
Impulse Response Filters With an Almost
Linear Phase Response
David Shiung, Jeng-Ji Huang,
18 POLYNOMIAL EIGENVALUE and Ya-Yin Yang
DECOMPOSITION FOR
MULTICHANNEL BROADBAND Super-Resolving a Frequency Band
SIGNAL PROCESSING Ruiming Guo and Thierry Blu
Vincent W. Neo, Soydan Redif, Implementing Moving Average Filters
John G. McWhirter, Using Recursion
Jennifer Pestana, Ian K. Proudler, Shlomo Engelberg
Stephan Weiss, and
Sub-Nyquist Coherent Imaging Using an
Patrick A. Naylor
Optimizing Multiplexed Sampling Scheme
Yeonwoo Jeong, Behnam Tayebi,
and Jae-Ho Han
38 SIGNAL PROCESSING
A
INTERPRETATION OF NOISE- 89 SP Education
REDUCTION CONVOLUTIONAL Data Science Education: The Signal
NEURAL NETWORKS Processing Perspective
Luis Albert Zavala-Mondragón, Sharon Gannot, Zheng-Hua Tan,
Peter H.N. de With, and Martin Haardt, Nancy F. Chen,
Fons van der Sommen Hoi-To Wai, Ivan Tashev,
Walter Kellermann, and Justin Dauwels
ON THE COVER 94 SP Competitions
Synthetic Image Detection
This issue discusses several topics including Fourier and
the early days of sound analysis, multichannel broadband
Davide Cozzolino, Koki Nagano,
SP, and noise-reduction with CNN. Lucas Thomaz, Angshul Majumdar,
and Luisa Verdoliva
COVER IMAGE:
BACKGROUND— ©SHUTTERSTOCK.COM/IOAT,
FOURIER IMAGE—WIKIMEDIA.ORG
COLUMNS
Normalized Angular Frequency Ω/π
0.9 0
0.8 –5
0.7 –10
0.6 –15 11 DSP History
Gain/[dB]
0.5 –20 Fourier and the Early Days

–25
0.4
–30
of Sound Analysis
0.3
–35 Patrick Flandrin
0.2 –40
0.1 –45
0
–90 –40 –10 0 30 45 80 –90
Angle of Arrival ϕ/[°]
PG. 18 PG. 38
IEEE SIGNAL PROCESSING MAGAZINE (ISSN 1053-5888) (ISPREG) is published bimonthly by the Institute of Electrical and Electronics Engineers, Inc., 3 Park Avenue, 17th Floor, New York,
NY 10016-5997 USA (+1 212 419 7900). Responsibility for the contents rests upon the authors and not the IEEE, the Society, or its members. Annual member subscriptions included in Society fee.
Nonmember subscriptions available upon request. Individual copies: IEEE Members US$20.00 (first copy only), nonmembers US$248 per copy. Copyright and Reprint Permissions: Abstracting
is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of U.S. Copyright Law for private use of patrons: 1) those post-1977 articles that carry a code at the bot-
tom of the first page, provided the per-copy fee is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923 USA; 2) pre-1978 articles without fee. Instructors are
permitted to photocopy isolated articles for noncommercial classroom use without fee. For all other copying, reprint, or republication permission, write to IEEE Service Center, 445 Hoes Lane,
Piscataway, NJ 08854 USA. Copyright © 2023 by the Institute of Electrical and Electronics Engineers, Inc. All rights reserved. Periodicals postage paid at New York, NY, and at additional mailing offices.
Postmaster: Send address changes to IEEE Signal Processing Magazine, IEEE, 445 Hoes Lane, Piscataway, NJ 08854 USA. Canadian GST #125634188 Printed in the U.S.A.
Digital Object Identifier 10.1109/MSP.2023.3317580
IEEEIEEE
SIGNAL PROCESSING
SIGNAL MAGAZINE| November
MAGAZINE
PROCESSING | July 2023
2023| | 1
IEEE Signal Processing Magazine
DEPARTMENTS EDITOR-IN-CHIEF
Christian Jutten—Université Grenoble Alpes,
ASSOCIATE EDITORS—COLUMNS AND FORUM
Ulisses Braga-Neto—Texas A&M University,
France USA
4 From the Editor Cagatay Candan—Middle East Technical
SPS Members, You Are All Heirs of Fourier! AREA EDITORS University, Turkey
Christian Jutten Feature Articles Wei Hu—Peking University, China
Laure Blanc-Féraud—Université Côte d’Azur, Andres Kwasinski—Rochester Institute of
7 President’s Message Technology, USA
France
Reflections on the Poland Xingyu Li—University of Alberta, Edmonton,
Chapter Celebration Special Issues
Alberta, Canada
Athina Petropulu Xiaoxiang Zhu—German Aerospace Center,
Xin Liao—Hunan University, China
Germany
Piya Pal—University of California San Diego,
Cover 3 Dates Ahead Columns and Forum USA
Rodrigo Capobianco Guido—São Paulo Hemant Patil—Dhirubhai Ambani Institute
State University (UNESP), Brazil of Information and Communication
H. Vicky Zhao—Tsinghua University, Technology, India
R.P. China Christian Ritz—University of Wollongong,
e-Newsletter Australia
Hamid Palangi—Microsoft Research Lab (AI),
USA ASSOCIATE EDITORS—e-NEWSLETTER
Social Media and Outreach Abhishek Appaji—College of Engineering, India
Emil Björnson—KTH Royal Institute of Technology, Subhro Das—MIT-IBM Watson AI Lab,
Sweden IBM Research, USA
Behnaz Ghoraani—Florida Atlantic University, USA
EDITORIAL BOARD Panagiotis Markopoulos—The University of Texas at
San Antonio, USA
Massoud Babaie-Zadeh—Sharif University of
Technology, Iran
Waheed U. Bajwa—Rutgers University,
IEEE SIGNAL PROCESSING SOCIETY
Athina Petropulu—President
USA
Min Wu—President-Elect
Caroline Chaux—French Center of National
Ana Isabel Pérez-Neira—Vice President,
Research, France
Conferences
Mark Coates—McGill University, Canada
Roxana Saint-Nom—VP Education
cover 3 Laura Cottatellucci—Friedrich-Alexander
Kenneth K.M. Lam—Vice President, Membership
University of Erlangen-Nuremberg, Germany
Marc Moonen—Vice President, Publications
Davide Dardari—University of Bologna, Italy
Alle-Jan van der Veen—Vice President, Technical
Mario Figueiredo—Instituto Superior Técnico,
©SHUTTERSTOCK.COM/SAYAN URANAN
Directions
University of Lisbon, Portugal
Sharon Gannot—Bar-Ilan University,
IEEE SIGNAL PROCESSING SOCIETY STAFF
Israel
Richard J. Baseil—Society Executive Director
Yifan Gong—Microsoft Corporation, USA
William Colacchio—Senior Manager, Publications
Rémi Gribonval—Inria Lyon, France and Education Strategy and Services
Joseph Guerci—Information Systems Rebecca Wollman—Publications Administrator
Laboratories, Inc., USA
Ian Jermyn—Durham University, U.K.
The IEEE International Conference on Acoustics, Speech, and
Signal Processing (ICASSP 2024) will be held in Seoul, Korea, Ulugbek S. Kamilov—Washington University, USA
14–19 April 2024. Patrick Le Callet—University of Nantes, IEEE PUBLISHING OPERATIONS
France Sharon M. Turk, Journals Production Manager
Sanghoon Lee—Yonsei University, Korea Katie Sullivan, Senior Manager,
Danilo Mandic—Imperial College London, U.K. Journals Production
Michalis Matthaiou—Queen’s University Belfast, Gail A. Schnitzer, Associate Art Director
U.K. Theresa L. Smith, Production Coordinator
Phillip A. Regalia—U.S. National Science Mark David, Director, Business Development -
Foundation, USA Media & Advertising
Gaël Richard—Télécom Paris, Institut
Felicia Spagnoli, Advertising Production Manager
Polytechnique de Paris, France
Peter M. Tuohy, Director, Production Services
Reza Sameni—Emory University, USA
Ervin Sejdic—University of Pittsburgh, USA Kevin Lisankie, Director, Editorial Services
Dimitri Van De Ville—Ecole Polytechnique Dawn M. Melley, Senior Director,
Fédérale de Lausanne, Switzerland Publishing Operations
Henk Wymeersch—Chalmers University
of Technology, Sweden
SCOPE: IEEE Signal Processing Magazine publishes tutorial-style articles on signal processing research and
applications as well as columns and forums on issues of interest. Its coverage ranges from fundamental principles
to practical implementation, reflecting the multidimensional facets of interests and concerns of the community. Its
mission is to bring up-to-date, emerging, and active technical developments, issues, and events to the research,
IEEE prohibits discrimination, harassment, and bullying. educational, and professional communities. It is also the main Society communication platform addressing important
For more information, visit issues concerning all members.
http://www.ieee.org/web/aboutus/whatis/policies/p9-26.html.
2 IEEE SIGNAL PROCESSING MAGAZINE | November 2023 |

2024 IEEE Conference on Computational Imaging Using Synthetic Apertures (CISA)
Advances in Theory, Engineering Practice, and Standardization
National Institute of Standards and Technology| Boulder | Colorado | 20-23 May 2024
Call for Papers
The IEEE Signal Processing Society, the IEEE Synthetic Aperture Standards Committee, and the IEEE Synthetic Aperture Technical Working Group
enthusiastically invite you to the scenic campus of the National Institute of Standards and Technology (NIST) in Boulder, Colorado to a unique
gathering of researchers and engineers engaged in cutting-edge research on computational imaging and sensing using synthetic apertures
(SAs). The term SA refers generically to a discrete
measurement scheme together with an inverse problem
solution that yields imaging or sensing performance better
than the hardware system is inherently capable of, e.g.,
wider field-of-view, higher angular resolution. An SA may
sample a propagating wavefield or environmental
Important Dates parameters in the signal domain via linear motion of an
Special session proposals due: October 20, 2023 antenna or transducer, as in synthetic aperture radar (SAR),
Initial 2-page abstract submissions due: January 19, 2024 sonar (SAS), or channel sounding. Alternatively, an SA may
Tutorial proposals due: January 26, 2024 sample in the k-space domain via different look angles
Acceptance notification: March 1, 2024 around an object or scene, as in computed tomography,
Camera-ready 4+1-page paper due: April 5, 2024 spotlight SAR or Fourier ptychography. Lastly, an SA may be
constructed from a sparse array of sensors as in radiometry,
seismology, or radio astronomy. The front end of an SA may
be a conventional antenna, acoustic transducer, or a quantum sensor, such as a Rydberg probe, in advanced implementations. CISA will
highlight advances in the theoretical development, engineering practice, and standardization of all aspects of SA imaging and sensing.
Suggested topics for CISA are listed below.
Radar: Automotive SAR, mmWave and THz SAR, Magnetic resonance imaging: Image reconstruction from
polarimetric SAR, ISAR, 3-D imaging, High-dimensional under-sampled measurements
feature processing using tensors Ultrasound: Flow and velocity estimation
Sonar: Micronavigation and position uncertainty, Distributed sensors: Networked coherent radars, sonars
Bathymetry, Wideband regimes Power beaming: Wireless power transfer to UAVs
Optics: Phase retrieval, Ptychography, Holography, Coded Radiometry and remote sensing: 5G signal interference
diffraction imaging, Coded aperture imaging, Wirtinger Quantum receivers: Rydberg probes, Lithium-niobate
flow, Deep learning techniques piezoelectric sensors
5G: Channel sounding, Over-the-air calibration, MIMO Integrated sensing and communications: Coherent UAV
antenna testbeds, Intelligent reflecting surfaces, Near-field swarms
beam focusing Radio astronomy: Low-noise receivers, Satellite
Seismology: Wave migration and localization techniques interference mitigation
Inverse problems: Deconvolution and hardware de- Point cloud processing: LiDAR, 4D mmWave radar in
embedding, Neuromorphic computing methods robotics, autonomous driving
Data-driven signal processing: SAR focusing techniques Model-based image reconstruction: Regularization
Prospective authors should visit https://2024.ieeecisa.org/ for more details and to submit manuscripts. All manuscripts must adhere to IEEE
formatting guidelines and accepted papers will appear in IEEE Xplore. The 2024 CISA conference will be an in-person event and authors must
attend to present their papers live at NIST. For additional questions, please contact the co-chairs, Alexandra Artusio-Glimpse (alexandra.artusio-
glimpse@nist.gov), Paritosh Manurkar (paritosh.manurkar@nist.gov), Samuel Berweger (samuel.berweger@nist.gov), Peter Vouras
(synthetic_aperture_twg@ieee.org), or Kumar Vijay Mishra (kvm@ieee.org).

FROM THE EDITOR
Christian Jutten | Editor-in-Chief | christian.jutten@grenoble-inp.fr
SPS Members, You Are All Heirs of Fourier!
M
y three years of service as the editor- Such ideas and the book Digital Signal note that all the articles in this issue ex-
in-chief (EIC) of Signal Processing Processing [1] were revolutionary at a plicitly mention Fourier’s legacy.
Magazine (SPM) are now coming to time when computers were in their in- You all know what an eigenvalue de-
a close. During the past three years, many fancy. In fact, the concept of digital signal composition (EVD) is and some of its
of us were deeply affected by serious po- processing was met with mixed reviews uses, but do you know what a polyno-
litical, social, and environmental events and skepticism. mial EVD (PEVD) is? In feature article
such as the war in Ukraine; protests for But long before this came the contri- [A2], you will learn about PEVD and
freedom in Iran; coups d’état in Africa; butions of Jean-Baptiste Joseph Fourier its application in many problems involv-
the COVID-19 pandemic; seisms in Tur- who developed for our understanding the ing multichannel broadband signals.
key, Syria, and Morocco; huge floods in propagation of heat. His most famous Denoising is an essential task in SIP. Cur-
Libya and India; gigantic fires in North book [2], published 201 years ago, in rently, many methods for image denois-
America and Southern Europe; and an 1822, contains the basics of the Fourier ing use convolutional neural networks
avalanche of stones in the Alps, to name series and transform and their ability to (CNNs). Feature article [A3] proposes
a few. In such a context, I believe that the represent a large range of signals. Fou- an in-depth understanding of encoding-
IEEE slogan, “Advancing Technology rier’s ideas were also “out of the box,” decoding CNN architectures (convo-
for Humanity,” is incredibly relevant and and they were also received with reserva- lution, down/upsampling structures, ac-
timely. It also must be viewed in a wider tions from eminent scientists who could tivation functions, etc.) following signal
sense, including the preservation of Earth not understand how and why a sum of processing principles.
and sustainable development. In point continuous functions could approximate This issue contains four “Tips and
of fact, what would become of human- noncontinuous functions. Later, in 1829, Tricks” columns. In [A4], the authors pro-
ity without Earth? I believe that we must Dirichlet presented the theoretical results pose two tricks for approaching a perfect
always have this in mind when contem- concerning the convergence of Fourier filter with reduced complexity. In [A5], the
plating our future projects, asking for series [A1]. Fourier’s life is a real novel, authors present a trick for robust estima-
funding, and while teaching. which the curious reader can discover in tion of the frequency of a single complex
The year 2023 also marks the 75th this well-documented and fun work (un- exponential using the magnitude of only
anniversary of the IEEE Signal Process- fortunately, only in French) [3]. two samples of its discrete-time Fourier
ing Society (SPS), and this too offers transform. In [A6], the author shows that
us an opportunity to think about the In this issue coding numbers as integers rather than as
signal processing domain and ponder During this year in which we celebrated floating points can avoid rounding errors
its roots and its dazzling evolution. It is the 75th anniversary of the SPS, it was in the implementation of moving average
also interesting to think about the early mandatory to recall Fourier, and I warmly filters. Note that a copy of the code for this
contributions that are of the highest im- thank Patrick Flandrin for his article [A1], trick can be obtained by sending an e-mail
portance in our domain and became its which gives many historical details on to the author. Finally, [A7] presents a trick
pillars. During ICASSP 2023 in Rhodes, some of Fourier’s contributions and their for realizing sub-Nyquist coherent imag-
Alan Oppenheim, Ron Schafer, and Tony impact on sound analysis and recording based on an optimized multiplexing
Constantinides recounted the adventure ings. The article also highlights tricks for hologram scheme.
of digital signal processing in the 1970s. implementing computation before the In SP Education column [A8], the au-
computer era with amazing machines. As thors reflect on education in data science
obvious proof of the importance of Fou- (DS), including signal processing and
Date of current version: 3 November 2023 rier in signal and image processing (SIP), machine learning, with the objective that

their ideas and guidelines can inspire edu- for special issues (SIs), Rodrigo Guido dience. Usually, the decision on proposals
cators to develop new teaching programs. and Vicky Zhao for columns and forums is based on at least 10 reviews, which are
I appreciated that in these new programs, (C&Fs), Emil Björnson for social media required in less than three weeks. I thank
the authors highlight the consideration of and outreach, and Behnaz Ghoraani and you all for your service to SPM.
ethical aspects and the sustainability of Hamid Palangi for the e-newsletter. I ap- All the members of the Editorial Board
our global environment. I believe that, in preciated their friendly interactions and are ambassadors of SPM, and in addition
the evaluation of DS methods, educators their hard work during these three years. to their reviewing tasks, their roles include
must also propose metrics, including both They were always involved in promoting the detection, stimulation, and invitation
performance and complexity terms. high-quality articles with the specific tu- of potential scientists to submit articles or
Finally, [A9] reports on the 2022 VIP torial nature that is the signature of SPM, Special Issues to SPM. This everyday task
Cup, which took place in October 2022 and whose target audience covers all SPS is essential for providing compelling and
at ICIP. The aim of the competition was members and beyond. attractive content.
the detection of synthetic images, i.e., tell- I also thank the great team of associ- SPM is a fully edited journal. This
ing the difference between real and fake ate editors of the C&F articles and the means that all the articles, after accep-
images, which is a very important task in e-newsletter. Their role is essential for tance, are edited, laid out, and illustrated
combating fraudulent design and the use managing the different categories of arti- by the IEEE editorial team. For each is-
and diffusion of fake images. cles but is not limited to handling reviews, sue, the cover is also created by the design
as in the transactions, since they are also team after exchanges between the EIC and
Many thanks… in charge of developing content. the journal production managers, Jessica
These three years as the EIC of SPM were The team of senior members is of the Welsh (up to the end of 2021) and Sharon
a very enriching experience, requiring a lot highest importance. In fact, since SPM ful- Turk. I warmly thank the IEEE editorial
of work but giving me great pleasure. Of ly covers SIP, and due to the tutorial style team, who contributed to the quality and
course, the EIC is just a link in the chain. of SPM articles, the expertise of the team of the attractiveness of SPM, and especially
It has been my great luck and plea- senior members must be very large. Each Sharon and Jessica for their leadership
sure to work with a very nice and effi- proposal for a Special Issue or a Feature and the quality of our interactions, both
cient Editorial Board. In the first circle, Article white paper must be reviewed by friendly and professional.
I warmly thank the area editors: Laure a large set of scientists, not all experts in The reviewing process is based in
Blanc-Féraud for FAs, Xiaoxiang Zhu the domain, to represent the SPM target au- ScholarOne, and I warmly thank Rebecca
FIGURE 1. A montage of the 19 SPM issues published while Christian Jutten served as EIC.
Wollman for the valuable, efficient, and diversity of articles and Special Issues and Process. Mag., vol. 40, no. 7, pp. 64–73, Nov. 2023, doi:
10.1109/MSP.2023.3290772.
timely help she provided to the authors, also the quality of the work done by the [A5] R. Guo and T. Blu, “Super-resolving a frequency
teams of guest editors (GEs), and mem- design and editorial teams. band,” IEEE Signal Process. Mag., vol. 40, no. 7,
bers of the Editorial Board. I won’t forget Professor Tulay Adali, from the Uni- pp. 73–77, Nov. 2023, doi: 10.1109/MSP.2023.
3311592.
Rupal Blatt, webpage manager, for her versity of Maryland, Baltimore County, [A6] S. Engelberg, “Implementing moving average fil-
reactivity in updating the SPM webpages, will be taking over as EIC on 1 January ters using recursion,” IEEE Signal Process. Mag., vol.
40, no. 7, pp. 78–80, Nov. 2023, doi: 10.1109/
adding templates, adding calls for articles 2024. You will be able to read about her MSP.2023.3294721.
for Special Issues etc. vision for the magazine in her editorial in [A7] Y. Jeong, B. Tayebi, and J.-H. Han, “Sub-Nyquist
I would like to thank all of the authors the January 2024 issue. I know her very coherent imaging using an optimizing multiplexed sam-
pling scheme,” IEEE Signal Process. Mag., vol. 40,
who contributed feature articles and col- well; she is a great scientist, and she has no. 7, pp. 81–88, Nov. 2023, doi: 10.1109/
umns and forums. And finally, I warmly also served the SPS in different positions MSP.2023.3310710.
thank the guest editors who proposed for many years. She is now inviting sci- [A8] S. Gannot, Z.-H. Tan, M. Haardt, N. F. Chen,
H.-T. Wai, I. Tashev, W. Kellermann, and J. Dauwels,
exciting Special Issues and thank them entists to join her as area editors, and she “Data science education: The signal processing per-
for their efforts in managing the reviews will present herself and her team in more spective,” IEEE Signal Process. Mag., vol. 40, no. 7,
pp. 89–93, Nov. 2023, doi: 10.1109/MSP.2023.
from white papers to full articles and for detail in her first editorials. With her as 3294709.
providing the final manuscripts in due EIC, I know that SPM is in good hands. [A9] D. Cozzolino, K. Nagano, L. Thomaz, A.
time. SPM needs high-level tutorial-like Majumdar, and L. Verdoliva, “Synthetic image detec-
tion: Highlights from the IEEE video and image pro-
contributions covering SIP methods and Appendix: Related Articles cessing cup 2022 student competition,” IEEE Signal
[A1] P. Flandrin, “Fourier and the early days of sound anal- Process. Mag., vol. 40, no. 7, pp. 94–100, Nov. 2023,
applications, following trends in DS and ysis,” IEEE Signal Process. Mag., vol. 40, no. 7, pp. 11–16, doi: 10.1109/MSP.2023.3294720.
machine learning but always under the Nov. 2023, doi: 10.1109/MSP.2023.3297313.
SIP umbrella. Keynote speakers and or- [A2] V. W. Neo, S. Redif, J. G. McWhirter, J. Pestana, I.
ganizers of tutorials and special sessions
K. Proudler, S. Weiss, and P. A. Naylor, “Polynomial References
eigenvalue decomposition for multichannel broadband [1] A. V. Oppenheim and R. W. Schafer, Digital Signal
in conferences and workshops—you are signal processing,” IEEE Signal Process. Mag., vol. 40, Processing. Englewood Cliffs, NJ, USA: Prentice-Hall,
no. 7, pp. 18–37, Nov. 2023, doi: 10.1109/ 1975.
all potential candidates for SPM articles. MSP.2023.3269200.
[2] J. Fourier, Théorie Analytique de la Chaleur. Paris,
Don’t hesitate to contact the area editors [A3] L. A. Zavala-Mondragón, P. H. N. de With, and F. France: Firmin Didot, 1822. [Online]. Available:
to refine and concretize your draft article van der Sommen, “A signal processing interpretation of https://gallica.bnf.fr/ark:/12148/bpt6k1045508v.
noise-reduction convolutional neural networks,” IEEE texteImage
or idea for a Special Issue. Following the Signal Process. Mag., vol. 40, no. 7, pp. 38–63, Nov.
ideas of previous EICs, I added the covers 2023, doi: 10.1109/MSP.2023.3300100. [3] E. Marie and E. Cerisier, Les Oscillations de Joseph
[A4] D. Shiung, J.-J. Huang, and Y.-Y. Yang, “Tricks for Fourier. Nantes, France: Editions Petit à Petit, 2018.
of the 19 SPM issues published during the designing a cascade of infinite impulse response filters
last three years (Figure 1), illustrating the with an almost linear phase response,” IEEE Signal SP

PRESIDENT’S MESSAGE
Athina Petropulu | IEEE Signal Processing Society President | a.petropulu@ieee.org
Reflections on the Poland Chapter Celebration
M
y end of term as IEEE Signal Pro- Our Society has made many strides one should join the SPS in an era with
cessing Society (SPS) president to level that playing field by providing abundant freely available informa-
is fast approaching. It has been many initiatives to grow and diversify t ion, nu merous platforms for shar-
an incredible experience that has pro- our membership. I’ve discussed these ing scientific work, and a multitude of
vided me with so many opportunities initiatives in my past messages, and conference options? Some argued for
to engage with our members around I’ll detail some recent programs be- the significance of in-person network-
the globe, forge relationships with other low, but there are still many ques- ing and professional development, un-
IEEE Societies, and meet a diverse tions that require novel solutions. derlining the importance of providing
range of people that I hope will become conference discounts and travel grants.
active members of our Society in the War and peace Ultimately, the paramount value that
future. It has been a great privilege to This past September, I had the privilege the SPS provides is the assurance of
be at the helm of a Society that garners of visiting Poland to commemorate high quality—in publications, confer-
such a high level of worldwide respect the 20th anniversary of the IEEE SPS ences, technical activities, educational
and recognition. It has also provided Poland Chapter. During this event, I offerings, and high ethical standards
me with the chance to learn, identify presented the history of the SPS and and guidelines.
the challenges we still face, anticipate its impactful role in signal processing.
future challenges, and work to find Additionally, I had the opportunity to Attracting young minds
solutions that will make our Society, learn about the journey of the Poland There was also a discussion on how
and the world, a better place. Chapter and its various activities. The to engage students and young pro-
The SPS has a unique dual role. We anniversary celebration coincided with fessionals in SPS initiatives. While
strive to grow and advance technologi- the Signal Processing Symposium older generations viewed Society
cal innovation and problem-solve at (SPSympo). Since its inception in 2003, membership as their only option to
the scientific level—from the bench SPSympo has consistently attracted be connected with the outside sci-
to the applications of these technolo- attendees from Poland and neighboring entific community, today’s students
gies in the real world. We need to be countries, particularly Ukraine. Unfor- may need special encouragement to
mindful that our scientific pursuits tunately, due to geopolitical events, become members.
don’t exist in a vacuum, that they have researchers from Ukraine were unable The SPS is providing several events
many social, political, and ethical to travel abroad. It begs the question: designed to spark the interest of young-
implications, and that their very ex- Can we find innovative methods to er generations and foster visibility and
istence is often shaped by an uneven help our members and nonmembers growth. Those include the Signal Pro-
playing field—for scientists that are in countries in the midst of conflict, cessing Cup, the Video and Image Pro-
isolated within their research silos or warfare, and humanitarian crises? Per- cessing Cup, the 5-Minute Video Clip
by geopolitical events, for women and haps we could implement humanitar- Contest, hackathons, and society level
ethnic minorities, for citizens of so- ian programs or grants for accessing awards in the areas of Best PhD Disser-
called low-income countries, and for conferences via online attendance or tation Awards and Young Author Best
young people with economic or cul- open access to postconference tran- Paper Awards and at the conference
tural restraints. scripts and other options to help them level there are and best student paper
overcome these barriers? awards, all of which bring recogni-
During discussions at the Poland tion to students and generate a lot of
Date of current version: 3 November 2023 Chapter meeting, some wondered why excitement. There are also o pportunities
IEEE SIGNAL PROCESSING MAGAZINE | November 2023 | 7

designed to empower students and vation behind looking into expanding bility careers. When I asked her if she
enhance their professional growth. the educational offerings of the SPS. really believed such a bias, she said she
St u d e nt s b e n ef it from job opportu- A couple of years ago, the SPS Educa- did not, but the society does, and who
nities through events like the Student tion Board conceived the idea of the was she to go against the society? The
Job Fair and Luncheon, creating a SPS Academy. The SPS would deliver student was not aware of the SPS initia-
br idge between academia and in- education-oriented short courses, pro- tives that disavow such perceptions, and
dustry. The low-cost membership viding deep understanding of critical I spoke to her about our efforts to reach
fee of just US$1 for IEEE Student and topics in the field. Unlike traditional out to women students, with several
Graduate Student Members further tutorials, the SPS’s education-oriented empowering opportunities.
opens the door to a wealth of resources. short courses delve into subjects with That conversation was a stark re-
Networking takes center stage through more depth, starting from the basics minder that SPS still has a long way to
events at SPS conferences, fostering and providing a comprehensive and go in our continuing efforts to foster di-
connections with industry profession- multisided perspective on each topic. versity and inclusivity. I am now more
als. Additionally, students gain ac- The courses are already being offered confident than ever that our initiatives
cess to webinars, career and soft skills at I C A S S P a n d ICIP and have are addressing critical problems. Wom-
training, and mentorship opportunities, proved very popular. They consist of en in Signal Processing (WISP) provides
contributing to their holistic develop- parallel tracks of 10-h sessions c o n - mentoring opportunities and networking
ment. The SPS also provides travel d u c t e d i n three segments, offering events, where women from around the
grants to students in developing coun- participants an immersive learning world can share experiences and strate-
tries on a competitive basis and based experience. Upon successful com- gies in balancing family and career.
on need to support to travel to ICASSP pletion of the course and quiz, par- As in Poland, the presence of women
and ICIP. ticipants are awarded professional and minorities on the faculties of uni-
Also important are the SPS Schol- development hours and continuing versities around the world is very small,
arship and Seasonal Schools Pro- education unit certificates, recogniz- and this deprives students of diverse
grams. The Schola rship Program ing their commitment to continuous role models and also limits the diversity
offers financial assistance to under- learning and growth in their respec- of perspectives in academic research.
graduate and graduate students who tive fields. The SPS Education Board Role models play a crucial role in in-
are dedicated to pursuing education is taking additional steps to make spiring students, instilling confidence
and careers in signal processing. available those short courses to wid- in their abilities, and demonstrating
Eligibility is open to students with er audiences, and it is working with the potential for success in their cho-
a minimum B grade point average a professional company to enhance sen fields. The SPS Promoting Diver-
(or international equivalent). Over a these educational courses. The Soci- sity in Signal Processing (PROGRESS)
span of three years, recipients have ety also offers free access to the SPS Workshop recognizes the significance
the opportunity to receive a total Resource Center for SPS members; of diverse representation and seeks to
prize of up to US$7,000. This ini- this is an online library of tutorials, bridge this gap through its empower-
tiative aims to support and encour- lectures, presentations. and more, ing initiatives. Since its inception in
age students with a strong academic and its spans the breadth of signal 2020, the PROGRESS Workshop has
commitment to signal processing, processing field. gained substantial momentum. It is now
providing a financial boost to their part of ICASSP and ICIP conferences.
educational journey. The gender gap The seventh PROGRESS Workshop
Seasonal Schools primary objec- The low numbers of women among stu- took place at ICIP 2023 in Kuala
tives are development of students dents and faculty at the Poland Signal Lumpur, Malaysia, and was success-
interested in signal processing, orga- Processing Conference SPSympo was fully led by Dr. Zaid Omar, of the Uni-
nizing opportunities to network with quite evident, as was the feeling of isola- versiti Teknologi Malaysia.
professors and established practitio- tion and hopelessness among the women The PROGRESS Workshop offers
ners, and engage in hands-on tutori- attendees. In speaking with women fac- an online participation option, recog-
als in signal processing. ulty and students, there was a consensus nizing that travel expenses may pose
that women shoulder more caregiving economic challenges for some stu-
The SPS Academy responsibilities than their male counter- dents. The SPS provides financial sup-
Another interesting topic discussed at parts, impacting their career choices. It port to students who choose to attend
the Poland Chapter meeting delved into came as a shock to me when I asked a in person. For instance, at the 2023
the difficult mathematical concepts bright young graduate student in Signal PROGRESS meeting during ICASSP,
that arise in signal processing and find- Processing if she was interested in an the SPS granted eight travel awards of
ing ways to convey them in an easily academic position after graduation, and US$1,000 each on a reimbursement ba-
digestible form. This is indeed a very she answered that women are too emo- sis. Similarly, the 2023 PROGRESS at
important issue, and it was the moti- tional and cannot pursue high-responsi- ICIP offered 20 travel grants of US$500

each. These funds do not mandate SPS Societies to educate members and jointly Summer School on Biomedical Imag-
membership in an effort to reach out to address technical challenges. ing, which covered applications of deep
students who do not traditionally attend The SPS is forging strategic partner- learning and AI in medical imaging.
SPS conferences. ships among multiple IEEE Societies This event was organized by Jean-Chris-
Other diversity and inclusion initia- in the ISAC area. Initial activities in- tophe Olivio-Marin, Elsa Angelini, and
tives include the Mentoring Experiences clude the first 2023 Summer School Arrate Munoz Barrutia in Cartagena,
for Underrepresented Young Researchers on ISAC, sponsored jointly by the SPS, Colombia, this past April.
Program (ME-UYR) [1] and K-12 Out- the AESS, and the European Associa- In Poland, I also participated on a
reach Initiatives [2]. ME-UYR provides tion for Signal Processing, which took panel discussion, which posed anoth-
mentoring experiences in the form of a place in June 2023, in Baiona, Spain. er interesting question related to the
nine-month collaboration for young re- The event was led by Nuria Gonzalez contrast of model-based signal pro-
searchers from underrepresented groups Prelcic, of North Carolina State Uni- cessing to data-driven machine learn-
together with an established researcher in versity, and attracted 50 students, and ing (ML), which seems to be in the
signal processing from a different insti- was also supported by Qualcomm, center of discussions in signal pro-
tute, and typically another country. Remcom, and Gradient. Another was cessing events. Despite ML’s success
The K-12 Outreach Initiatives Pro- the 2023 SPS–IEEE Communications in various applications, it falls short
gram strives to increase the visibility of Society (ComSoc) Summer School on in offering performance guarantees
SPS and the signal processing discipline ISAC, which was held in Shenzhen, and lacks transparency in revealing
to K-12 students worldwide by develop- China, and led by the SPS ISAC TWG how solutions are derived. This limi-
ing exciting, impactful educational pro- in cooperation with the ComSoc. This tation has hindered its application in
grams that utilize tools and applications was organized by Tsung-Hui Chang, several key areas, including medical
with hands-on signal processing experi- Feng Yin, and Jie Xu, from the Chi- diagnosis. Research showcased in
ences. The program is intended to bring nese University of Hong Kong–Shen- SPS venues concentrates on leverag-
the awareness of signal processing to zhen; Fan Liu, from the Southern ing models and domain knowledge
students who belong to groups that are University of Science and Technology; to design ML algorithms that are
underrepresented in STEM fields re- and Xiao Han, from Huawei Technolo- both reliable and explainable. Thus,
gionally and/or globally. gies. The event attracted 180 students greater synergies between SPS and
and researchers from mainland China, EMBS has the potential to unlock
Inter-Society initiatives while the online streaming of the event more dependable applications of ML
The world is facing complex problems reached 10,000 viewers. and AI in the field of medicine.
whose solutions require cross-disciplin- Another area that cuts across mul-
ary approaches, and strengthening inter- tiple areas is brain research. SPS is one Developing novel technologies:
Society initiatives is another key goal of of four core member societies of IEEE Synthetic apertures
the SPS. During SPSympo, I had the Brain Technical Community (TC), which Diversity is also key to scientific inno-
opportunity to meet with Mark Davis, is an IEEE-wide effort that unites engi- vation and progress, and our Society is
the president of the IEEE Aerospace neering and computing expertise across continually forging expertise in various
and Electronic Systems Society (AESS) IEEE Societies and Councils relevant new and evolving technological fields.
who was also in attendance. We both to neuroscience. IEEE Brain facili- On that note, I would like to share my
delivered plenary talks on the topic of tates cross-disciplinary collaboration excitement about the developments in
integrated sensing and communication and coordination to advance research, synthetic apertures (SAs) led by the SA
(ISAC) systems. We had the opportu- standardization, and development of T WG, est abl ish ing the SPS as the
nity to discuss the need to enhance and engineering and technology to improve pole of attraction of research in the
integrate opportunities for inter-Soci- understanding of the brain in order to treat critical SA area. SAs work by moving
ety engagement. diseases and improve the human condi- an antenna along a predetermined path
ISAC is a naturally cross-disciplinary tion. As core member, the SPS is respon- via mechanical means. As the antenna
topic, encompassing technologies that sible for chairing the TC along with other moves, it measures the strength and
combine sensing and communication core members, and this year Tulay Adali, direction of signals. This information
systems to utilize wireless resources ef- former SPS Technical Activities Vice helps reconstruct various properties of
ficiently, realize wide area environment President, is the chair. the scattered electromagnetic waves,
sensing, and even pursue mutual benefits. T h is yea r, at ICASSP, a satellite like power, arrival directions, delays,
Realizing the great potential for research workshop was organized on the topic and polarization. SAs can measure sig-
developments and standardization op- of Unravelling the Brain, which was nals over a very wide frequency band-
portunities, the ISAC Technical Work- very well attended and introduced new width and with an almost arbitrarily
ing Group (TWG) has been established blood in the area to ICASSP. large aperture size, enabling high angu-
to bring together academic and industrial The SPS partnered with IEEE EMBS lar and delay resolution to resolve close-
researchers in the SPS and related to offer the 2023 IEEE EMBS-SPS ISBI ly spaced scatterers. Further, SAs are

SPS President Athina Petropulu joining the Poland SPS Chapter chair, Konrad Je˛drzejewski, past chair Piotr Augustyniak, IEEE AESS President Mark E.
Davis, and other Chapter members in marking the 20th Anniversary of the Poland SPS Chapter.
cost-effective compared to digital multi- for the SPS. Going forward, we need United States and some other coun-
channel phased arrays while delivering to make extra efforts to safeguard that tries, which features negative ads and
comparable estimation performance. quality and the climate that holds the disinformation.
The two-pronged goal of the SA TWG SPS to such high standards. Despite the many challenges, SPS
is to support theoretical and empirical Alongside the need to continually membership growth has been strong
techniques that underpin the estima- grow inclusivity, diversity, and inter- throughout 2023. In October 2023, the
tion of parameters of propagating waves connectedness in our membership and membership of SPS soared past our
through various media using SAs and in our scientific pursuits, we must con- goal of 20,000, reaching the highest
also identify novel applications for SAs tinually adapt and promote high ethical point in SPS history. This underscores
that are enabled by the precise measure- standards and guidelines with both our the enduring relevance and value that
ment and estimation of environmental technological innovations and within SPS has to offer.
parameters. The SA TWG, under the the SPS leadership. With my term as SPS president con-
leadership of Dr. Peter Vouras, is work- As innovation advances at an expo- cluding this year, I’m optimistic that the
ing on developing IEEE standards on nential rate, ethical concerns surround- SPS will strive to turn challenges into
SAs, establishing a shared repository for ing current and emerging technologies opportunities, so that we can grow and
data and algorithms, delivering webinars grow proportionally. It is crucial to con- diversify our membership, and provide
on the topic, organizing special issues front the impact of technology on pri- even more value to our members and
in journals, and providing challenges vacy, security, and the environment. The our communities.
and competitions that promote the urgency to cultivate researchers and en-
adoption of SAs in engineering school gineers with strong ethical foundations
curricula as well as job training for has never been greater; they may serve
graduating students. as a crucial line of defense in navigating
In an exciting development this these complex challenges.
year, the SA TWG, working with the Another challenge involves lead- Acknowledgment
IEEE Synthetic Aperture Standards ership. In an effort to energize the I would like to thank Theresa Argi-
Committee, will offer the inaugural members, our Society has embraced a ropoulos and Rich Baseil for their help
NIST–IEEE Conference on Compu- member-driven election for the presi- with this article.
tational Imaging Using Synthetic Ap- dent-elect. Yet the strength and agility
ertures. The conference will be held of this approach requires a continued
References
20–23 May 2024, at the scenic campus effort to increase membership diver- [1] “Mentoring Experiences for Underrepresented
of the National Institute of Standards sity by growing our global appeal. Young Researchers (ME-UYR) Program,” in IEEE
Signal Process. Soc., 2023. [Online]. Available: https://
and Technology, in Boulder, CO, USA. We should also put policies into place signalprocessingsociety.org/community-involvement/
to safeguard the election process and me-uyr-mentoring-experiences-underrepresented
-young-researchers-program
Ethical standards help prevent negative electioneering [2] “K-12 Outreach Initiatives,” in IEEE Signal
In my tenure as the SPS president, it has campaigns that increase internal divi- Process. Soc., May 2022. [Online]. Available: https://
signalprocessingsociety.org/community-involvement/
been amazing to experience firsthand siveness. If unchecked, electioneering k-12-outreach-initiatives
that people around the world have such can lead to the same behaviors ob-
high levels of respect and recognition served in the political climate of the SP

DSP HISTORY
Patrick Flandrin
Fourier and the Early Days of Sound Analysis
J
oseph Fourier’s methods (and their from the outset. This turned out not to Fourier theory, from heat
variants) are omnipresent in audio be the case, the whole project of Fouri- to sound
signal processing. However, it turns er being devoted to a different physical
out that the underlying ideas took some problem, namely, the theory of heat, Fourier
time to penetrate the field of sound anal- and to mathematical developments As mentioned in the preceding, the
ysis and that different paths were first attached to it. Whereas many attempts scientific work of Fourier culminated
followed in the period immediately fol- had been made before Fourier (by Ber- in his Théorie Analytique de la Chal-
lowing Fourier’s pioneering work, with noulli, d’Alembert, Euler, Lagrange, eur (Analytical Theory of Heat) [2] that
or without reference to him. This illus- and others) to solve the problem of was eventually published in its final
trates the interplay between mathematics vibrating strings and express solutions form in 1822, i.e., 11 years after hav-
and physics as well as the key role played by means of sine/cosine expansions, ing been first presented as a memoir
by instrumentation, with notable inven- Fourier himself seemed to have devel- to the French Academy of Sciences.
tions by outsiders to academia, such as oped almost no interest in applying Although its value was recognized at
Rudolph Koenig and Édouard-Léon his results in this direction. Indeed, that time by awarding Fourier a prize,
Scott de Martinville. while his 1822 treatise on the analyti- this contribution was received by the
cal theory of heat is more than 600 examiners (including Lagrange) with
Introduction pages long, there is only one sentence some reservations concerning rigor,
Fourier analysis, Fourier series, (Fast) evoking such a possibility: “If we apply raising convergence issues that were
Fourier transform. … Fourier has today those principles to the question of the eventually resolved in full generality
something of a common name. If his motion of vibrating strings, we shall by Dirichlet and others. Fourier’s semi-
presence is now ubiquitous in almost overcome the difficulties first encoun- nal work established, nevertheless, the
all fields of science and technol- tered in Daniel Bernoulli’s analysis.” foundations of modern harmonic analy-
ogy, the name of Fourier is especially It was only 20 years later that Fourier sis, a branch of mathematics that flour-
unavoidable for all those interested ideas entered explicitly the field of ished in the 19th and 20th centuries and
in the theory and practice of signal acoustics, thanks to Georg Simon Ohm proved to be of utmost importance in
processing. In particular, the meth- (most famous for his law of electrical numerous applications. Starting from a
ods he developed—and the attached conductivity, established in 1827). This problem in physics and considering that
fundamental concepts, such as that of was, however, not a fully shared recog- [2, p. 13] “the profound study of nature
spectral representation—are the cor- nition, and, between theory and experi- is the most fertile source of mathemati-
nerstone of audio signal processing ments, the following years witnessed a cal discovery,” Fourier is generally
(speech, music, and so on). This might number of developments aimed at ana- considered the creator of mathematical
suggest that they were developed in lyzing sounds, with or without a refer- physics [3]. Eager to solve physics prob-
connection with the idea of analyzing ence to Fourier. This is what this text is lems by giving solutions based on firm
and/or synthesizing sounds or at least about. In complement to the immedi- mathematical grounds, Fourier was also
that such an application was envisaged ate post-Fourier influences in acoustics deeply concerned with effective calcu-
discussed here, a comprehensive study lations, claiming explicitly that [2, p. 11]
of the (pre-Fourier) acoustics origins of “the [proposed] method does not leave
Date of current version: 3 November 2023 harmonic analysis can be found in [1]. anything vague and indefinite in its

solutions; it drives them to their ulti- ry of heat. He also played a key role in in 1856, with Hermann von Helmholtz
mate numerical applications, necessary the creation of the University of Greno- [7], who made acoustics fully enter
condition for any research, and without ble and became a mentor and close experimental physics while taking into
which we would only end up with friend of Jean-François Champollion, account physiological considerations.
useless transformations.” This focus whom he encouraged in his research to Helmholtz gave credit to Ohm for his
on what we would now call algorith- decipher hieroglyphs. Subjected to the introduction of Fourier methods in
mic efficiency also makes of Fourier an vicissitudes of Napoleon’s resignation acoustics, and he followed him in pro-
actual father of signal processing. in 1814 and attempt to return to power posing to consider the inner ear a Fou-
Jean-Baptiste Joseph Fourier (1768– in 1815, Fourier was reassigned as gov- rier analyzer sensitive to the intensities
1830) was a French mathematician, ernor from Grenoble to Lyon, but he of Fourier components, or proper modes
physicist, and political figure who has resigned before the battle of Waterloo (what he referred to as “Ohm’s law”). As
had more than one life (Figure 1). Or and went to Paris in June 1815, hav- a side note, it is worth mentioning that
phaned at the age of nine and spotted ing no position at all. Being eventually the question of whether Fourier proper
for his intellectual abilities, he was elected a member of the Académie des modes (or their variations) are a physi-
taken in charge by a religious educa- sciences, in 1817 (and secrétaire perpé- cal reality or a mathematical construct
tional institution, where he developed tuel, in 1822), he devoted entirely the has been recurrent since then. One can
a particular interest in mathematics. final period of his life to his scientific quote, in this respect, Louis de Broglie,
He thus became a teacher in various activities. (An authoritative presenta- who once claimed that [8] “if we con-
domains and finally in mathematics. tion of the life and works of Fourier can sider a quantity that can be represented
After having taken an active part in be found in, e.g., [3].) in the manner of Fourier, by a superpo-
the French Revolution, for which he Whereas Fourier theory is now cen- sition of monochromatic components, it
was imprisoned twice, he was select- tral in acoustics, speech, and signal pro- is the superposition that has a physical
ed as one of the first students of the cessing, it seems that its first explicit use meaning, and not the Fourier compo-
newly created École Normale, where in sound studies was due to Georg Simon nents considered in isolation,” or refer to
he quickly became an assistant profes- Ohm, who claimed, in his 1843 seminal [9] for data-driven versus model-based
sor before succeeding Joseph-Louis paper aimed at defining what a “tone” is, approaches to beating phenomena.
Lagrange as a professor at École Poly- that he used [4, p. 519] “Fourier’s theo- In parallel, Helmholtz conduct-
technique, in 1797. One year later, he rem, which has become famous through e d exper iments with high-quality
was designated to join the Egyptian its multiple and important applications.” instruments—precise tuning forks
expedition of Napoléon Bonaparte Ohm’s paper was devoted to specific for the production of well-controlled
and became secretary of the Institut sound systems, namely, sirens, whose “pure” tones and resonators made of
d’Égypte, conducting there scientific physical construction clearly departed cavities of different sizes for identify-
and political activities until the Brit- from more classical vibrating strings ing frequency components within com-
ish victory. Back in France, in 1801, (for which sine/cosine descriptions were plex sounds. Helmholtz’s contributions
he thought of resuming his academic well accepted) and whose understand- have been of primary importance in
position but was appointed governor ing was an open question. Sirens had the development of modern acoustics
of Isère by Napoléon. While supervis- been previously investigated by August and psychoacoustics. His approach was
ing various road and sewerage works, Seebeck, who conducted a number of also emblematic of the key role played
it was during this period that he began experiments, ending up with puzzling by instruments in addressing scien-
his masterwork on the analytical theo- questions (combination tones, missing tific questions and challenging theo-
fundamental, and so on). Ohm proposed ries (as once said by the philosopher
to interpret Seebeck’s findings in Fouri- Gaston Bachelard, “Instruments are
er terms, but Seebeck raised objections, reified theories”). To this end, he was
and a controversy followed [5]. Ohm’s in close contact with a gifted instru-
approach was essentially mathematical ment maker settled in Paris: Rudolph
and disconnected from hearing issues Koenig (Figure 2).
(Ohm even claimed to have an “unmusi-
cal ear” [6]). Seebeck, on the contrary, Koenig
noted contradictions between Ohm’s Born in Königsberg (Prussia), Karl
predictions and actual perceptions by Rudolph Koenig settled in Paris in 1852
a trained ear. After Ohm lost inter- and died there in 1901. While devel-
est in those questions, the controversy oping a special interest in acoustics,
stopped, in 1849, when Seebeck passed Koenig was not part of any academic
away, and the fundamental question institution, but he was a prolific inven-
of confronting mathematical descrip- tor—with 272 items in his 1889 catalog
FIGURE 1. Joseph Fourier. tions with physical realities resurfaced, [10]—and a successful businessman

who manufactured and sold his own intensity is reported by a stylus on an
products all around the world. He was electrically sensitive paper (Figure 4).
especially famous for the quality of his Today, these acoustic or electrome-
tuning forks, and he contributed, with chanical devices have been replaced by
his experiments, to the debates and dis- computers to perform time-frequency
putes about beats and combination tones analysis of digitized data, with routine
[11]. Koenig happened to be, for a long techniques, such as short-time Fou-
time, the main maker of Helmholtz’s rier analysis, operating in time rather
instruments, and his workshop in Paris than frequency. The basic principles of
was a busy meeting place, where the these modern approaches are nonethe-
ideas of Helmholtz were popularized less similar, the difference being essen-
and spread in Parisian scientific circles, tially one of implementation.
maintaining a vivid relationship with Whereas Helmholtz elaborated
his native Germany [12]. In particular, on the findings of Ohm, who himself
exploiting the potentialities of Helm- referred to Fourier, no explicit refer-
holtz resonators and combining them ence to Fourier can be found in the writ-
with his own invention of “manometric ten productions of Koenig nor in the
flames,” he designed a “sound analyz- description of his sound analyzer [10].
FIGURE 2. Rudolph Koenig.
er” [10], [13] that allowed for a visu- Indeed, the motivation of Koenig was
alization of the frequency content of of frequency components. In the elsewhere, far from the rooting of his
a sound. To visualize sound waves, he sound analy z e r, s u c h intensities instruments on mathematical bases. It
first designed, in 1862, an apparatus— are evaluated acoustically and in par- turns out that this was an attitude shared
the so-called manometric flame—that allel (with all resonators acting simul- by most physicists of the immediate
consists of a flexible membrane encap- taneously for selecting frequencies), post-Fourier period (say, 1822–1850).
sulated in a chamber. When exposed to and they are visualized by the modu- Fourier is now perceived as the pioneer
a sound, the vibration of the membrane lations of the manometric flames. In of modern mathematical physics, but in
modulates the flow of a flammable the sound spectrograph, the acoustic the years that followed its main publica-
gas passed to a Bunsen burner, and the signal is first recorded on a magnetic tion, Fourier’s treatise attracted mostly
size of the flame is, in turn, modulated tape, and the frequency intensities are the attention of mathematicians who
accordingly. The final visualization is evaluated electrically and sequentially, substantially contributed to consolidat-
made possible in a stroboscopic way, thanks to a heterodyne filtering that ing and extending Fourier’s seminal
thanks to a four-faceted rotating mir- acts in a synchronous manner with the work, and, with the notable exception of
ror. Koenig later designed, in 1867, a rotation of the disk on which the tape is Ohm, its importance for physics seems
more complete “sound analyzer” by fixed and that of a drum on which the to have escaped physicists [15]. As an
plugging such capsules at the acoustician and an instru-
output of a family of Helm- ment-maker, Koenig was,
holtz resonators (i.e., cavities in fact, primarily interested
tuned to specific frequencies) in visualizing sounds and
playing the role of a filter bank in achieving this program
(Figure 3). The overall system through a more experimen-
permits, therefore, a Fourier- tal and empirical approach.
like frequency analysis and, This leads to another cor-
in cases where the impinging nerstone on the way to sound
sound is time varying, a time- analysis. It involved Koenig
frequency analysis. too but preceded the inven-
One can remark that Koe- tion of his sound analyzer.
nig’s apparatus has very much
the flavor of an electromechan- Scott and “the problem
ical system that would appear of speech writing itself”
almost one century later: the If we think of visualizing
so-called sound spectrograph sounds, there are at least
[14]. This instrument—which, two possibilities. The first
ironically, would be due to one, which corresponds to
another Koenig—shares with Koenig’s approach with his
the “sound analyzer” the idea sound analyzer, is indirect in
of visualizing the intensity FIGURE 3. Koenig’s sound analyzer [10]. the sense that what is given

to see are features resulting from some finds when it seeks them? Will one be that Scott approached him to perfect his
transformation upon the waveform able to have placed between two men device. Their collaboration resulted in a
(namely, intensities of Fourier modes). brought together in a silent room an second-generation phonautograph (Fig-
Another, more direct, approach could, automatic stenographer that preserves ure 6) with far better performance than
however, be imagined, which would the discussion in its minutest details the initial prototype. Koenig replaced
deliver a graphical representation of while adapting to the speed of the con- the sliding plate with a rotating cylinder,
the waveform itself. Such a track was versation? Will one be able to preserve allowing for much longer recordings.
indeed followed in the middle of the for the future generation some features He also supplemented the sound record-
19th century by another outsider of aca- of the diction of one of those eminent ing with the reference trace of a tuning
demia: Édouard-Léon Scott de Martin- actors, those grand artists who die with- fork, thus making it easier to read the
ville (Figure 5). out leaving behind them the faintest unavoidable irregularities in the rotation
trace of their genius?” To achieve this of the hand-cranked cylinder.
Scott ambitious goal, Scott took his inspi- In 1860, reasonably neat tracings
Édouard-Léon Scott de Martinville ration from the hearing process and were obtained this way, leading to the
(1817–1879) was a French inventor who, proposed to make use of a membrane fundamental issue: How to interpret
by profession, was a typographer. In (mimicking the eardrum) at the output them? The very purpose of Scott was
the early 1850s, he conceived the idea of a horn designed to collect and con- to consider phonautograms as graphi-
of drastically improving upon stenogra- centrate the sounds to be analyzed. The cal representations of sounds and to
phy for keeping track of spoken words vibrations of the membrane were trans- uncover from their reading the content
and other sounds by developing a sys- mitted to a stylus attached to it, whose of what had been recorded. He was, thus,
tem that would solve, in his own words, movements were inscribed as tracings interested in finding specific features in
“the problem of speech writing itself.” on a sliding lampblacked glass plate. the tracings, which could be considered
This question obsessed Scott until his Scott gave to his invention the name elementary recognizable components
final days [16], and it is worth quoting “phonautograph” and started making of speech or natural sounds. This quest
his agenda, as reported in the sealed experiments in 1853–1854, waiting until was not without echoes of some recent
manuscript he sent to the French Acad- 1857 to submit it to the French Academy advances in signal processing, with repre-
emy of Sciences, in 1857 [17] (English of Sciences [17] and patenting it [19]. sentations built upon “waveform diction-
translation by P. Feaster [18]): “Is there Looking at those first attempts (with aries” [20]. If we consider, for instance,
a possibility of reaching in the case of either speech or guitar sounds [17]), one the commented hand-drawn tracings of
sound a result analogous to that attained must admit that the tracings he recorded Figure 7, we see that Scott tried to iden-
at present for light by photographic pro- are extremely erratic and unlikely to tify elementary sounds from their graphi-
cesses? Can one hope that the day is be interpretable. Scott’s first phonauto- cal representation, that we would today
near when the musical phrase, escaped graph was the work of an amateur, and refer to as tones with low/high frequency
from the singer’s lips, will be written by to hope that it could be turned into a (“la voix grave/la voix aiguë”), down-
itself and as if without the musician’s reliable instrument required the profes- going/upgoing chirps (“une voix aiguë
knowledge on a docile paper and leave sionalism of an expert. The best expert descendant au grave/une voix grave mon-
an imperishable trace of those fugitive one could think of in this regard at that tant à l’aigu”), different amplitudes (“une
melodies which the memory no longer time was Koenig, and it was only natural
FIGURE 4. Koenig’s sound spectrograph [14]. FIGURE 5. Édouard-Léon Scott de Martinville.

voix intense/moyenne/faible”), a plosive
(“l’explosion de la voix”).
Scott’s phonautograph versus Edison’s

phonograph
Reading tracings from complicated
sounds, such as speech or songs, proved,
however, to be tricky, and, short of
support and encouragement, Scott
abandoned, in the early 1860s, his proj-
ect, becoming a librarian after having
warranted Koenig an exclusive license
to manufacture and sell the phonau-
tograph. Far from Scott’s dream of
“speech writing itself,” Koenig advocat-
ed the use of the phonautograph as a less
ambitious scientific instrument aimed at
mostly analyzing tuning forks or organ
pipes, before leaving it and turning
to his sound analyzer. Scott’s interest
in his phonautograph was rekindled,
however, in 1878, when Edison’s pho-
nograph was demonstrated at a memo-
FIGURE 6. Scott’s phonautograph [10].
rable session of the French Academy of
Sciences. Having heard of it, Scott could
not help but find elements in the phono-
graph—such as the system of recording
by means of a membrane, a stylus, and
a rotating drum—that seemed to him
to be directly inspired by his phonau-
tograph, without making any reference
to it. Bitter about this lack of recogni-
tion as well as the contrast between the
enthusiastic reception of Edison’s inven-
FIRSTSOUNDS.ORG.
tion and the poor interest that his own
invention had received 20 years ear-
lier, Scott self-edited a long plea for his
rights and his vision of speech analysis FIGURE 7. Scott’s “waveform dictionary” [19].
just before his death the following year
[16]. Of course, one of the main reasons a number of annotated phonautograms (Figure 8), registered by Scott himself
why the phonograph attracted so much with the French Academy of Sci- on 9 April 1860, can now be heard [23],
more attention than the phonautograph ences, in 1861 [21]. These recordings the first recording of a human voice,
was that the former allowed for the were properly archived and preserved, 17 years before Edison.
replay of recorded sounds, which the yet forgotten until 2007, when David
latter did not. As Scott had said many Giovannoni—a historian specializing in Conclusions
times, reproducing sound was not part old recordings, who had learned of their The purpose of this text was not to dis-
of his program at all, his only goal being existence—had the idea of transforming cuss Fourier’s achievements per se: this
to decipher phonautograms. One can them into truly audible sounds. This was can be found in many textbooks, from
imagine that he would have found little made possible—within the First Sounds different perspectives (see, e.g., [24]
interest in regenerating real sounds from project [22] and thanks, in particular, for a classical introduction, [25] for a
his phonautograms, and yet this is what to Patrick Feaster—by getting high- more mathematically oriented treatise,
happened … in 2008. quality scans of the tracings and trans- or [26] for a modern treatment, includ-
forming them into digital files through ing recent variations). What was at stake
Hearing Scott modern signal processing techniques. was to see how Fourier’s ideas, which
Indeed, before deciding not to pursue This is how the folk song “Au Clair de today seem indissociable from sound
his project any further, Scott deposited la Lune” (“By the Light of the Moon”) analysis, were not immediately adopted

tor, working within the Department of
Physics, ENS De Lyon, Lyon, France,
since 1991. His research interests
include nonstationary signal processing,
time-frequency/wavelet methods, scal-
ing stochastic processes, and c omplex
systems. He was awarded the SPIE
Wavelet Pioneer Award (2001), the
CNRS Silver Medal (2010), and a
Technical Achievement Award from
(a) the IEEE Signal Processing Society
(2017) and European Association
for Signal Processing (EURASIP)
(2023). He was elected to the French
Academy of Sciences in 2010 and
served as its president in 2021–2022.
He is a Fellow of IEEE (2002) and
EURASIP (2009).
References
[1] O. Darrigol, “The acoustics origins of harmonic
analysis,” Arch. Hist. Exact Sci., vol. 61, no. 4, pp.
343–424, Jul. 2007, doi: 10.1007/s00407-007-0003-9.
[2] J. Fourier, Théorie Analytique de la Chaleur. Paris,
France: Firmin Didot, 1822. [Online]. Available: https://
g a l l i c a . b n f . f r /a r k : / 1 214 8 / b p t 6 k 10 4 55 0 8 v.
FIRSTSOUNDS.ORG.
texteImage
[3] J. Dhombres and J.-B. Robert, Fourier, Créateur
(b) De la Physique Mathématique. Paris, France: Belin,
1998.
FIGURE 8. Scott’s phonautogram of the folk song “Au Clair de la Lune” [19]. (a) The complete [4] G. S. Ohm, “Über die Definition des Tones, nebst
daran geknüpfter Theorie der Sirene und ähnlicher
recording and (b) an enlargement showing (on three successive revolutions of the drum) plots of tonbildender Vorrichtungen,” Ann. Phys. Chem.,
the recorded voice and the reference oscillation given by the tuning fork. vol. 135, no. 8, pp. 513–565, 1843, doi: 10.1002/andp.
18431350802.
[5] R. S. Turner, “The Ohm-Seebeck dispute,
in this context and that parallel path- networks when one wants to go beyond Hermann von Helmholtz, and the origins of physio-
logical acoustics,” Brit. J. Hist. Sci., vol. 10, no. 1, pp.
ways, based on different approaches, a black box. 1–24, Mar. 1977, doi: 10.1017/S0007087400015089.
have been followed. It is striking to We choose to close the piece of his- [6] M. J. Kromhout, “The unmusical ear: Georg
Simon Ohm and the mathematical analysis of sound,”
observe that some of the options con- tory that has been outlined here in 1878, Isis, vol. 111, no. 3, pp. 471–492, Sep. 2020, doi:
sidered then are still relevant today. For when Edison opened a new chapter. 10.1086/710318.
instance, an approach à la Koenig is Many things happened in the follow- [7] H. von Helmholtz, “Über combinationstöne,”
Ann. Phys. Chem., vol. 175, no. 12, pp. 497–540,
based, in a first step, on the extraction ing years, with progressive and more 1856, doi: 10.1002/andp.18561751202.
of some features (in his case, the Fou- and more pervasive use of Fourier tech- [8] L. de Broglie, Certitudes et Incertitudes de la
rier modes, even if not named as such) niques in many domains. In the same Science. Paris, France: Albin Michel, 1966.
upon which the analysis is performed in year, 1878, Lord Kelvin constructed [9] G. Rilling and P. Flandrin, “One or two frequen-
cies? The empirical mode decomposition answers,”
a second step, whereas an approach à la his harmonic analyzer [27] that proved IEEE Trans. Signal Process., vol. 56, no. 1, pp.
Scott bypasses such a preprocessing and instrumental during decades for analyz- 85–95, Jan. 2008, doi: 10.1109/TSP.2007.906771.
[10] R. Koenig, Catalogue des Appareils d’Acoustique
relies directly on the raw data, as can be ing and predicting tides. Other mechan- construits par Rudolph Koenig. Paris, France: Chez
the case in modern end-to-end recogni- ical [28], electromechanical [14], and, l’auteur, 1889. [Online]. Available: https://sound
a nd sc ienc e.d e /t ext /cat a log ue - d e s- a p p a r e i l s
tion systems. Another important issue later, electronical [29] systems followed, -dacoustique-construits-par-rudolph-koenig
is the quest for interpretability when eventually giving Fourier the full credit [11] R. Koenig, Quelques Expériences d’acoustique.
confronting experimental results with he deserves in the information era, but Paris, France: Chez l’auteur, 1882. [Online]. Available:
https://gallica.bnf.fr/ark:/12148/bpt6k5688601m.
formal descriptions, i.e., physics with this is another story. texteImage
mathematics (this was at the heart of [12] D. Pantalony, “Rudolf Koenig’s workshop of
the Ohm–Seebeck dispute). Yet, under Author sound: Instruments, theories, and the debate over
combination tones,” Ann. Sci., vol. 62, no. 1, pp.
different forms extended to algorithms Patrick Flandrin (flandrin@ens-lyon.fr) 57–82, 2005, doi: 10.1080/00033790410001712183.
and computational issues, such a ques- received his Ph.D. degree from INP
tion of understanding is today of para- Grenoble, France, in 1982. He is cur-
mount importance, e.g., in deep neural rently a CNRS emeritus research direc- (continued on page 88)

On 2 June 1948, the Professional Group on Audio of the IRE
was formed, establishing what would become the IEEE
society structure we know today.
75 years later, this group — now the IEEE Signal Processing

Society — is the technical home to nearly 20,000 passionate,
dedicated professionals and a bastion of innovation,
collaboration, and leadership.
Celebrate with us:

Vincent W. Neo , Soydan Redif , John G. McWhirter, Jennifer Pestana ,
Ian K. Proudler , Stephan Weiss , and Patrick A. Naylor
Polynomial Eigenvalue
Decomposition for
Multichannel Broadband
Signal Processing
A mathematical technique offering new insights and solutions
©SHUTTERSTOCK.COM/MARISHA
T
his article is devoted to the polynomial eigenvalue decom- health-care monitoring, astronomy and seismic surveillance,
position (PEVD) and its applications in broadband multi- and military technologies, including radar, sonar, and commu-
channel signal processing, motivated by the optimum nications [3]. The success of these applications often depends
solutions provided by the EVD for the narrowband case on the performance of signal processing tasks, including data
[1], [2]. In general, we would like to extend the utility of the compression [4], source localization [5], channel coding [6],
EVD to also address broadband problems. Multichannel broad- signal enhancement [7], beamforming [8], and source separa-
band signals arise at the core of many essential commercial tion [9]. In most cases and for narrowband signals, performing
applications, such as telecommunications, speech processing, an EVD is the key to the signal processing algorithm.
Therefore, this article aims to introduce the PEVD as a novel
mathematical technique suitable for many broadband signal
Date of current version: 3 November 2023 processing applications.
18 IEEE SIGNAL PROCESSING MAGAZINE | November 2023 | 1053-5888/23©2023IEEE

Introduction to determine in practice. These approaches highlight the lack of
generic tools to solve broadband problems directly.
Motivations and significance Polynomial matrices are widely used in control theory and
In many narrowband signal processing applications, such as signal processing. In the control domain, these matrices are used
beamforming [8], signal enhancement [7], subband coding [6], to describe multivariable transfer functions for multiple-input
and source separation [9], the processing is performed based on multiple-output (MIMO) systems [19]. Control systems are usu-
the covariance matrix. The instantaneous spatial covariance ally designed for continuous-time systems and are analyzed in
matrix, computed using the outer product of the multichannel the Laplace domain. There, factorizations, such as the Smith and
data vector, can capture the phase shifts among narrowband sig- Smith–McMillan decompositions, of matrices in the Laplace vari-
nals arriving at different sensors. In the narrowband case, diago- able s target unimodularity, which is critical in the control context
nalization of the spatial covariance matrix often leads to optimum for invertibility, and spectral factorizations with minimum phase
solutions. For example, the multiple signal classification algo- components to minimize time delays [20]. More recently, within
rithm uses an EVD of the instantaneous spatial covariance matrix digital signal processing (DSP), multirate DSP exploits polynomi-
to perform super-resolution direction finding [5], [10]. al matrices to describe lossless filter bank systems using polyphase
The defining feature of a narrowband problem is the fact notation [6], [20]. In multichannel broadband arrays and convolu-
that a time-delayed version of a signal can be approximated by tively mixed signals, the array signals are generally correlated in
the undelayed signal multiplied by a phase shift. The success of time across different sensors. Therefore, the time delays for broad-
narrowband processing therefore depends on the accuracy of this band signals cannot be represented only by phase shifts but need to
approximation, which varies from problem to problem. It is well be explicitly modeled. The relative time shifts are captured using
known that as this approximation degrades, various issues start the space-time covariance polynomial matrix, where decorrelation
to occur when using narrowband algorithms. In array processing over a range of time shifts can be achieved using a PEVD [21].
problems, this is often because some quantity in the algorithm that While the initial work on the PEVD was a numerical
is related to direction of arrival (DOA) starts to depend on the fre- algorithm [21], the existence of the decomposition of an ana-
quency of the signal. For example, in DOA algorithms, a wide- lytic positive semidefinite para-Hermitian matrix, such as the
band source can appear to be spatially distributed. Another issue space-time covariance matrix, has only recently been proven
is that of multipath. Reflections can cause problems, as different [22], [23], [24]. In most cases, unique para-Hermitian eigenval-
multipath signals are derived from a single source but arrive at the ues and paraunitary eigenvectors for a para-Hermitian matrix
sensors at different times. This leads to various issues, which can EVD exist but are of infinite length. However, being analyt-
be advantageous or disadvantageous depending on one’s point of ic, they permit good approximations by finite length factors,
view. In beamforming and DOA estimation, this causes a problem, which are still helpful in many practical applications, such as
as the bearing to the source is clearly not well defined. However, in beamformers [25], MIMO communications [26], source cod-
signal recovery problems, such as speech enhancement and coming [27], signal enhancement [28], [29], source separation [30],
munication systems, multipath is advantageous, as the signals can source identification [31], and DOA estimation [32].
be combined to improve the signal-to-noise ratio (SNR). This is,
however, possible only if the multipath signals are coherent. With Outline of the article
narrowband processing, multipath signals appear to decorrelate as This article is organized as follows. The “Mathematical
the delay increases. Multipath signals can also cause frequency- Background” section provides a primer on the relevant mathe-
dependent fading, whereas narrowband processing can deal only matical concepts. The “Preliminaries: Representing Broadband
with flat fading. Hence, for some problems, it is desirable to depart Signals” section introduces the notations and gives a back-
from narrowband processing and introduce some form of frequen- ground on multichannel array processing, including the use of
cy-dependent processing. spatial and space-time covariance matrices and the inadequa-
For the broadband case, one common approach is to divide each cies of two common approaches. The “Polynomial Matrix
broadband signal into multiple narrowband signals. While these EVD” section first introduces the PEVD, whose analytic eigen-
narrowband signals are often processed independently by well- values and eigenvectors are described before their approxima-
established and optimal narrowband techniques that are typically tions by numerical algorithms are presented. The “Example
based on the EVD, splitting the broadband signal into independent Applications Using PEVD” section demonstrates the use of the
frequency bins neglects spectral coherence and thus ignores cor- PEVD for some multichannel broadband applications, namely,
relations among different discrete Fourier transform (DFT) bins adaptive beamforming, subband coding, and speech enhance-
[11], [12]. As a result, optimal narrowband solutions applied in ment. Concluding remarks and future perspectives are provid-
independent DFT bins give rise to suboptimal approaches to the ed in the “Conclusions and Future Perspectives” section.
overall broadband problem [13]. Broadband optimal solutions in
the DFT domain need to consider the cross-coupling among DFT Mathematical background
bins via cross terms, but the number of terms depends on the SNR
and cannot be determined in advance [14], [15]. Another approach Analytic functions
uses tapped delay line (TDL) processing [16], [17], [18], but the In the time domain, the key to describing the propagation of a
performance depends on the filter length, which is challenging broadband signal through a linear time-invariant (LTI) system

is the difference equation, where the system output y [n] is straightforward to implement but does not lend itself to sim-
depends on a weighted average of the input x [n] and past val- ple algebraic manipulations. For example, the difference equa-
ues of both y [n] and x [n]. This difference equation tion for the concatenation of two LTI systems is not easily
expressed in terms of the difference equations for the two
y [n] = / b [o] x [n - o] + / a [n] y [n - n](1) component systems. For this reason, the z transform
o$0 n20 x (z) = R n x [n] z -n, with z ! C, or for short, x (z) :–% x [n], can
Algebra of Functions
We are interested in matrices whose entries are more gen- lar, if f (z) and g (z) are analytic on a domain D 1 C,
eral than complex numbers. Specifically, we are interested then f (z) + g (z) is analytic; that is, it can be expressed as
in entries that are analytic functions: matrices whose a locally convergent power series for any z ! D.
entries are analytic functions rather than real and complex Similarly, f (z) - g (z) and f (z) $ g (z) are analytic. Things
numbers, and the algebraic manipulation of these matrices become a little more complicated when we consider quo-
may, at first, seem a little exotic, but many operations for tients of the form f (z) /g (z), but the result is analytic every-
real and complex numbers carry over to this setting. where except at zeros of g (z), as might be expected. This
There are several different classes of functions, depend- “closure” is important since it means that as we manipu-
ing on what properties they have. For example, there are late analytic functions, we do not need to worry if the
discontinuous functions, continuous but nondifferentiable result is also analytic. Note, as well, that if the product
functions, functions that are continuous and differentiable f (z) $ g (z) / 0 on D, then f (z) / 0 on D or g (z) / 0
up to a certain order, and functions that are continuous on D.
and differentiable for all orders. The class of analytic func- If we now restrict our attention to polynomials in z, which
tions, by definition, has locally convergent power series. are analytic everywhere, and Laurent polynomials, which
Consequently, the functions are infinitely differentiable and are analytic everywhere except z = 0, then we can say
easier to work with than other types of functions. These something more. Indeed, if f (z) and g (z) are (Laurent)
series might have a finite number of terms, but in general, polynomials, then f (z) + g (z), f (z) - g (z), and f (z) $ g (z)
there are infinitely many. The truncation of these series are not just analytic but are also (Laurent) polynomials.
results in polynomial approximations of the underlying Now, however, we must exercise some care when consid-
analytic function. ering quotients f (z) /g (z) since the result will be analytic in
Analytic functions can be algebraic or transcendental. An D [except at the zeros of g (z) ] but not be a (Laurent)
algebraic function f (x) is a function that is a root of a poly- polynomial in general. However, f (z) /g (z), and, indeed,
nomial equation. More specifically, f is algebraic if it satis- any analytic function, can be arbitrarily well approximated
fies p (x, f (x)) = 0 for some irreducible polynomial p (x, y) by polynomials, as discussed in the preceding.
with coefficients in some field. Examples of algebraic func- Let us now consider matrices R(z) whose entries are
tions include rational functions and nth roots of polynomials. analytic functions in D 1 C. We start by noting that for
Note that the inverse function of an algebraic function (if it any fixed z 0 ! C, the matrix R (z 0) is simply a matrix of
exists) is also algebraic. An analytic function that is not alge- complex numbers that can be manipulated in the usual
braic is called a transcendental function. Examples include ways. For example, we can multiply R (z 0) by another
e x, sin (x), and cos (x) . Such functions have power series (conformable) matrix or vector, and we can compute the
representations with an infinite number of terms. eigenvalue decomposition (EVD) of R (z 0) . When we
Let us first consider analytic functions on their own. A instead allow z to vary, it is still possible to form, say,
function f (z) is (complex) analytic in a domain (an open matrix–matrix and matrix–vector products with R (z) .
set) D 1 C if at each point it can be written as a locally Indeed, using the arguments in the previous paragraphs, if
convergent Taylor series. (Note that this means that such a R (z) has analytic (polynomial) entries, then the resulting
function is infinitely differentiable.) The set D is known as matrix or vector will also have analytic (polynomial)
the domain of analyticity of f (z). We note that two differ- entries. However, it is not immediately obvious that we can
ent analytic functions f (z) and g (z) may have different write down a single z -dependent EVD of R (z) that holds
domains of analyticity, say, D f and D g . When we oper- for all values of z ! D. That this is true in certain circum-
ate on these functions, we assume that D f and D g over- stances is proved in a remarkable result from Rellich [S1].
lap, i.e., that they have a nontrivial intersection, and restrict
f (z) and g (z) to this common domain D = D f + D g . Reference
[S1] F. Rellich, “Störungstheorie der Spektralzerlegung. I. Mitteilung.
Then, we can perform certain fundamental operations on Analytische Störung der isolierten Punkteigenwerte eines beschränkten
Operators,” Mathematische Annalen, vol. 113, pp. DC–DCXIX, 1937, doi:
analytic functions, and the result will also be an analytic 10.1007/BF01571652. [Online]. Available: https://eudml.org/
function with the same domain of analyticity D. In particu- doc/159886

be used to turn the time-domain convolution into the multipli- In signal processing, polynomials and convergent power
cative expression y (z) = h (z) $ x (z), which is easy to manipu- series can represent quantities, such as finite- and infinite-
late [33], [34]. impulse responses, of either causal or anticausal stable sys-
The z transform exists as long as the time-domain quantities tems. In contrast, Laurent series and Laurent polynomials
are absolutely summable; i.e., we require for x (z), R n x [n] appear as a result of correlation operations. First, assume that
1 3. Values of z for which the z transform is finite define the a zero-mean unit variance uncorrelated random signal x [n]
region of convergence (ROC), which, therefore, must include excites a system with impulse response h [n]. With the input
at least the unit circle since R n x [n] e -jXn # R n x [n] 1 3, autocorrelation sequence rx [x] = E " x [n] x ) [n - x] , = d [x],
where j = - 1 is the imaginary number. For values of z within the output autocorrelation is ry [x] = R n h ) [- n] h [x - n] [35],
this ROC, the function x (z) is complex analytic, which has pro- where E " $ , and [$] ) are the expectation and complex conju-
found consequences. Analytic functions mathematically belong to gate operators, respectively. Then, its z transform, the power
a ring such that any addition, subtraction, and multiplication will spectral density ry (z) :-% ry [x], and ry (z) = h (z) h ) (1/z )) will
produce an analytic result. These operations potentially reduce the be a Laurent series if h (z) is a power series and a Laurent poly-
ROC. Dividing by an analytic function also results in an analytic nomial if h (z) is a polynomial.
function as long as the divisor does not have spectral zeros; again,
this operation may shrink the ROC. For example, with b (z) and Polynomial approximation and polynomial arithmetic
a (z) analytic and the latter without any zeros on the unit circle, By the Weierstrass theorem, any continuous function can be
then h (z) = b (z) /a (z) is also guaranteed to be analytic. Note that arbitrarily well approximated by a polynomial of sufficient
the same cannot be said for nonanalytic functions. This is impor- degree, but, in general, it can be nontrivial to construct the
tant since nonanalytic functions can be difficult to approximate approximating polynomials. However, for analytic functions,
optimally in practice (see the following; for more on the algebra of such as Laurent and power series h (z), this approximation can
analytic functions, see “Algebra of Functions.”) be easily obtained by truncating h [n] %-: h (z) to the required
order. If the result is a Laurent polynomial as in (2), describ-
Laurent series, power series, and polynomials ing, e.g., the impulse response of a noncausal system, then a
Throughout this article, we often represent z transforms by polynomial (or causal system) can be obtained by a delay by
series, i.e., by expressions of the form N 1 sampling periods. Thus, by delay and truncation, all the
preceding expressions describing analytic functions—Laurent
N2
series, power series, and Laurent polynomials—can be arbi-
h (z) = / h [n] z -n . (2)
trarily closely approximated by polynomials.
n = N1
This is motivated by the fact that analytic functions can be

represented by a Taylor (or, equivalently, power) series within
the ROC. More generally, we are interested in Laurent series, Key Statement
power series, Laurent polynomials, and polynomials, which
we distinguish in the following. While operations on analytic functions tend to yield
For finite N 1 and N 2 in (2), h (z) is a Laurent polynomial if analytic functions, the same is not true for polynomials;
N 1 and N 2 have opposing signs. If N 1 and N 2 share the same e.g., the ratio of polynomials generally yields a rational
sign, i.e., if h (z) is purely an expression in powers of either z -1 function but not a polynomial. Nonetheless, since the
or z, it is a polynomial. Typically, by a polynomial, we refer to resulting function is analytic, it can be approximated
an expression that contains powers in z -1 . If interpreted as a arbitrarily closely by a polynomial via appropriate
transfer function, a polynomial h (z) in z -1 refers to a causal delay and truncation operations.
finite-impulse response filter. If it possesses finite coefficients,
then a polynomial or Laurent polynomial h (z) will always be
absolutely summable and hence be analytic. Matrices of analytic functions and
A Laurent series is characterized by N 1 " - 3 and N 2 " 3, polynomial matrices
while for a power series, h (z) strictly contains only powers in In this article, we consider matrices whose entries are analytic
either z -1 (for N 1 $ 0 and N 2 " 3) or z (for N 1 " - 3 and functions in general and their close approximation by polynomi-
N 2 # 0) . Both Laurent and power series possess a generally al matrices in particular. The mathematical theory of polynomial
infinite coefficient sequence {h [n]}. Such sequences can be matrices that depend on a real parameter has been studied in,
used to represent rational functions, where h (z) = b (z) /a (z) is e.g., [36]. This has found application, for example, in the control
a ratio of two polynomials; with respect to (1), such a power domain [37]. Within signal processing, polynomial matrices
series can describe an infinite-impulse response filter. Further have been used in filter bank theory. Specifically, polyphase
and more generally, Laurent and power series can also repre- notation [38] has been utilized to allow efficient implementa-
sent transcendental functions, which are absolutely convergent tion. Here, polynomial matrices in the form of polyphase analy-
but may not be representable by a finite number of algebraic sis and synthesis matrices describe networks of filters
operations, such as a ratio. operating on demultiplexed single-channel data. More generally,

polynomial matrices have been used to define space-time cova- Broadband source signals, which naturally arise in, for
riance matrices on demultiplexed data streams [20] and directly example, audio, speech, communications, sonar, and radar,
for multichannel data [21]. are directly reflected by (3). This is also applicable for nar-
rowband systems. Here, the source signal is often described
Polynomial matrix factorizations by a complex exponential, e jXn, where X is the normalized
A number of polynomial matrix factorizations have been intro- angular frequency. This means that (3) can be simplified by
duced in the past. Since we are particularly interested in diago- setting T = 0. As an alternative to (3), as shown in Figure 1,
nalizations of matrices, these prominently include the Smith the L source signals, s , [n] ,, = 1, ..., L, could be generated
and Smith–McMillan forms for matrices of polynomials and using spectral-shaped noise obtained by filtering uncorrelated
rational functions, respectively [20]. Popular in the control zero-mean unit variance complex Gaussian random variables,
domain, these allow a decomposition into a diagonal term and u , [n] ! N (0, 1), through some innovation filters, f, [n] [35].
two outer factors that are invertible but generally nonorthogonal The channel model in (3) can describe systems in diverse
polynomial matrices. Further, spectral factorizations [37], [39] scenarios, for example, instantaneous and convolutive mix-
involve the decomposition of a matrix into a product of a causal tures, near-field and far-field sources, and anechoic and
stable matrix and its time-reversed, complex-conjugated, and reverberant environments. The signal model in (3) is often sim-
transposed version. These are matrix-valued extensions of plified by taking the z transform. However, care is needed, as
Wiener’s factorization of a power spectral density into mini- the z transform of a random signal does not exist. Nonetheless,
mum and maximum phase components, and they are supported in the case of deterministic absolutely summable signals, the z
by numerical tools, such as PolyX [40]. transform of (3) may be written in matrix-vector form as
In control theory, minimizing the delay of a system is a criti-
cal design issue, and hence, many of the existing matrix decom- x (z) = A P (z) s (z) + v (z), (4)
positions, such as spectral factorization, emphasize the minimum
phase equivalent of the resulting matrix factors. In signal pro- w h e r e A [n] %–: A (z) ! C L # M, s [n] %–: s (z) ! C L, a n d
cessing, the delay is often a secondary issue, while, e.g., energy v [n] %–: v (z) ! C M are z-transform pairs of the channel matrix,
preservation (or unitarity) of a transform is crucial. Therefore, source, and noise vectors, respectively. The well-known equiva-
in the following, we explore the diagonalization of an analytic lence of convolution in the time domain and multiplication in
matrix by means of energy-preserving transformations. the z domain [33] is expressed in (3) and (4).
Preliminaries: Representing broadband signals Covariance matrices

The covariance matrix used in many narrowband subspace-
Signal model based approaches [5], [8], [10] is described by
The received signal at the mth of a total of M sensors for the
R = E {x [n] x H [n]}(5)
discrete-time index n is
L T using the data vector obtained from (3). The (m, ,)th element
x m [n] = / / a m,l [x] s , [n - x] + v m [n], m = 1, f, M, (3) of R is rm, , = E {x m (n) x *, (n)}, and the expectation operation
,=1 x=0
is performed over n. In practice, the expectation is approxi-
where a m, l [n] models the channel from the ,th source signal mated using the sample mean, where the inner product
s , [n] to the m th sensor and is an element of between the received signals at the mth and ,th sensor is com-
A [n] ! C M # L # (T + 1), v m [n] is the additive noise at the mth puted before normalizing by the total number of samples N.
sensor assumed to be uncorrelated with the L source signals, Because the inner product is calculated sample-wise, the cova-
and T is the maximum order of any of the channel impulse riance matrix instantaneously captures the spatial relationship
responses. The received data vector for M sensors is among different sensors. This article calls it the instantaneous
x [n] = [x 1 [n], f, x M [n]] T ! C M, and each element has a (or spatial) covariance matrix.
mean of E {x m [n]} = 0 6m, where [$] T represents the trans- When the system involves convolutive mixing and broad-
pose operator. Similarly, the source and noise data vectors are band signals, time delays among signals at different sensors
s [n] ! C L and v [n] ! C M, respectively. need to be modeled. This spatiotemporal relationship is explic-
itly captured by the space-time covariance matrix parameter-
s1[n] x1[n] ized by the discrete-time lag parameter x ! Z, defined as [21]
u1[n] f1[n] y1[n]
.. .. Channel .. ..
. . . Processor . R [x] = E {x [n] x H [n - x]}. (6)
A[n]
sL[n] xM [n]
uL[n] fL[n] yM [n]
The (m, ,)th element of R [x], arising from sensors with a
L Sources M Sensors fixed geometry, is rm, , [x] = E {x m [n] x ), [n - x]}, and, again,
FIGURE 1. The multichannel system model for L spectral-shaped source the expectation operation is performed over n, where wide-
signals and M sensors. Uncorrelated noise signals v [n], not drawn in sense temporal stationarity is assumed. The autocorrelation
the figure, are optionally added to each sensor based on (3). and cross-correlation sequences are obtained when m = , and

m ! ,, respectively. Furthermore, (5) can be seen as a special R [0] f R [T - 1]
case of (6) when only the instantaneous lag is considered; i.e., R| = > h j h H.(8)
R [0] is the coefficient of z 0 when x = 0, as demonstrated R [- T + 1] f R [0]
in Figure 2.
The z transform of the space-time covariance matrix in (6), Although of different dimensions, the covariance R | contains,
3 as submatrices, the same terms that also make up the space-
R (z) = / R [x] z - x, (7) time covariance matrix R [x]. However, it is not necessarily
x =-3
clear prior to processing how large or small T should be
known as the cross spectral density (CSD) matrix, is a para- selected. Apart from its impact on the accuracy of a delay
Hermitian polynomial matrix satisfying the property implementation, if T is selected smaller than the coherence
R P (z) = R (z). (The symbol [$] P denotes the para-Hermitian time of the signal, then some temporal correlations for lags
operator, R P (z) = R H (1/z )), which involves a Hermitian trans- x $ T in the signals are missed, leading to a potentially
pose followed by a time reversal operation [20], where [.] H insufficient characterization of the signals’ second-order sta-
denotes the Hermitian transpose operator.) The polynomial matrix tistics. If T is set too large, then no extra correlation informa-
can be interpreted as a matrix of polynomials (functions) as well tion is included, but additional noise may be added.
as a polynomial with matrix coefficients; i.e., R [x] is the matrix The EVD of the covariance matrix R | = Q | K | Q H| gives
coefficient of z - x . This is visualized in Figure 2(b), which access to MT eigenvalues in K | . In inspecting these eigenval-
describes the temporal evolution of the spatial relationship across ues, there no longer is any separation between space and time,
the entire array. Equivalently, the same polynomial matrix can also and, for example, a single broadband source that is captured by
be interpreted as a matrix with polynomial elements, representing the array in its data vector x [n] can generate, depending on its
the temporal correlation in the z domain between sensor pairs, for bandwidth, anything between one and T + D nonzero eigen-
example, element r3, 1 (z) for sensors 3 and 1 in Figure 2(c). values, where D is the maximum propagation delay across
the array that any source can experience. Hence, tasks such as
source enumeration can become challenging.
Furthermore, in narrowband processing, a common pro-
Key Statement cedure is to project the received signals x onto the so-called
signal subspace, as this suppresses some of the noise [7]. The
The space-time covariance matrix completely captures signal subspace is defined by partitioning the eigenvalues by
the second-order statistics of multichannel broadband sig- magnitude and selecting the subset of eigenvectors correspond-
nals via auto- and cross-correlation functions. Its z trans- ing to the larger eigenvalues. Mimicking this in the broad-
form has the useful property of being para-Hermitian. band case would mean partitioning Q | = 6Q s Q n@, where
Q s correspond to the larger eigenvalues. In the narrowband
case, it is well known that, in general, the projected signals
Comparison with other broadband signal representations
The multichannel signal model introduced in the “Signal
Model” section is compared against two signal representations R[0]
z0 z2
commonly encountered in array processing. They are the TDL z0
1 0.7 0 z –2
and short-time Fourier transform (STFT) approaches.
0 0 0
0.7 1 –0.3
TDL processing 0.8 0 0.5
z1
The relative delays with which broadband signals arrive at dif- 0 –0.3 1 0 0 0 z –1
ferent sensors cannot be sufficiently modeled by phase shifts
(a) (b)
because they can be accurate only at a single frequency.
Therefore, these delays need to be implemented by filters that
possess, at the very least, frequency-dependent phase shifts.
Such filters must rely on processing a temporal window of the 11(z) 12(z) 13(z)
signals; the access to this window can, in the finite-impulse
response case, be provided by TDLs that are attached to each 21(z) 22(z) 23(z)
of the M array signals. The length T of these TDLs will deter-

31(z) 32(z) 33(z)
mine the accuracy with which such delays—often of a frac-
tional nature [18]—are realized. (c)
Based on the array signal vector x [n], a T-element TDL
provides data that can be represented by a concatenated vector FIGURE 2. (a) A typical spatial covariance matrix for zero lag, i.e., x = 0,
T T T MT is the matrix coefficient of z 0 . (b) In general, each matrix slice corre-
| [n] = [x [n], f, x [n - T + 1]] ! C , which holds both sponds to a coefficient of the polynomial. (c) The same polynomial matrix
spatial and temporal samples. For the covariance matrix of is also a matrix consisting of polynomial elements represented by tubes in
H
| [n], R | = E {| [n] | [n]}, we have the same cube.

y [n] = Q Hs | [n] are not the source signals but merely span the matrices that accomplish the diagonalization at that value are
same space. However, if only one source signal is present, then unlikely to diagonalize the matrix at other values x ! x 0 and
the projected signal is the source signal. In the broadband case, z ! z 0 . We therefore require a decomposition that diagonalizes
not even this is true since, as noted in the preceding, we might R [x] for all values of x and R (z) for all values of z within the
have more than one eigenvalue per signal. ROC. We address the existence of such a decomposition via
the analytic EVD in the “Analytic EVD” section and provide
STFT some comments on numerical algorithms in the “PEVD Algo-
If we take a T-point DFT W of each of the TDLs in | [n], rithms” section.
we evaluate p [n] = (W 9 I M) | [n], with 9 being the
Kronecker product. The DFT domain covariance matrix Analytic EVD
R p = E {p [n] p H [n]} = (W 9 I M) R | (W 9 I M) H is generally The key to a more general EVD is the work by Rellich [43],
nonsparse due to cross-coupling between DFT bins. This who, in the context of quantum mechanics, investigated a
cross-coupling does not subside even as T is increased. For matrix-valued function A (t) that is self-adjoint, i.e.,
bin-wise processing, i.e., processing each of the frequency A (t) = A H (t), and analytic in t on some real interval. Matrix-
bins across the array independently of other frequency bins, valued functions of this type admit a decomposition
many of the terms in R p are neglected, leading to processing A (t) = U (t) C (t) U H (t) with matrix-valued functions U (t)
that can be very low cost but generally is suboptimal. To and C (t) that are also analytic in t and where C (t) is diagonal
achieve optimality, time-domain criteria must be embedded in and U (t 0) is unitary for any specific value t = t 0 . These
the processing, which generally leads to cross terms between results were obtained through perturbation analysis [44],
bins [14], [41]. The generally dense nature of R p can be where for the EVD of a matrix A (t 0) = U (t 0) C (t 0) U H (t 0), a
relaxed when employing more frequency-selective subband change of A (t 0) by some small Hermitian matrix results in
methods over DFT processing, but cross terms, at least only a limited perturbation of both the eigenvalues and eigen-
between adjacent subbands, still remain [42]. Together with vectors. There is no such guarantee if A (t) is not analytic in t;
the increased computational cost of such filters over the DFT, even infinite differentiability is not sufficient [44].
this negates the low-complexity aspiration of this approach. To decompose a matrix R (z) that is analytic in the com-
plex-valued parameter z, it suffices to investigate R (z) on the
unit circle for z = e jX. This is due to the uniqueness theorem
for analytic functions, which guarantees that if two functions
Key Statement are identical on some part of their ROC—here, the unit circle,
which must always be included—they must be identical across
Broadband processing requires accurately representing
the entire ROC. Although X ! R, Rellich’s results do not
fractional time delays. Previous approaches do not lead
directly apply, as they do not imply a 2r periodicity. Without
to proper generalizations of the narrowband algorithms
such periodicity, it is not possible to reparameterize the EVD
and are often suboptimal.
factors by replacing e jX with z and hence produce an EVD that
is analytic in z. However, it has recently been shown that Rel-
lich’s result admits 2rN-periodic eigenvalue functions, and,
Polynomial matrix EVD furthermore, N = 1 unless the data generating R (z) emerge
As discussed in the “Comparison With Other Broadband from N-fold multiplexing or block filtering [23], [24]. Analytic
Signal Representations” section, conventional approaches to eigenvector functions, then, exist with the same periodicity
processing broadband signals have some shortcomings. as the eigenvalues [45]. Therefore, an analytic EVD for this
Arguably, this is because the incorrect signal representation is N-fold multiplexed system
used. Specifically, the use of a TDL, with either time-domain
or frequency-domain processing, mixes up the spatial and R (z N ) = Q (z) K (z) Q P(z) (9)
temporal dimensions. This section builds on the signal model
in the “Preliminaries: Representing Broadband Signals” sec- exists with analytic factors such that K (z) is diagonal. The
tion, representing the broadband system using z transforms matrix Q (z) contains the eigenvector functions and for
and “polynomials.” Guided by the successful use of linear z = e jX0 is unitary. For a general z, Q (z) is paraunitary such
algebra, i.e., EVD, in narrowband systems, this section focus- that Q (z) Q P(z) = Q P(z) Q (z) = I. Paraunitarity is an exten-
es on the decomposition of para-Hermitian polynomial matri- sion of the orthonormal and unitary properties of matrices
ces, such as the space-time covariance matrix. That is, given a from the real- and complex-valued cases to matrices that are
para-Hermitian polynomial matrix R (z) = R P (z), does a functions in a complex variable [20]. For ease of exposition in
decomposition R (z) = Q (z) K (z) Q P (z) exist where K (z) is the following, we talk of “analytic matrices X (z), ” with the
diagonal and Q (z) is paraunitary? understanding that X (z) is a matrix-valued analytic function.
Note that the EVD can diagonalize a para-Hermitian In the analytic EVD of (9), the eigenvalue function is
matrix R [x] for only one specific lag x = x 0 and, alterna- K (z) = diag" m 1 (z), f, m M (z) ,, where diag ($) forms a diag-
tively, R (z) :–% R [x] for one specific value z = z 0 . The unitary onal matrix from its argument. When evaluated on the unit

circle, the eigenvalues m m (e jX), m = 1, f, M are real val-
ued and unique up to a permutation. If there are M distinct
eigenvalues, i.e., m m (e jX) = m n (e jX) only for m = n for any Key Statement
m, n = 1, f, M, then the corresponding eigenvectors q m (z)
The pioneering work of Rellich showed that an analytic
in Q (z) = [q 1 (z), f, q M (z)] are unique up to an arbitrary all-
EVD exists for a matrix function. Applying this to a
pass function; i.e., qlm (z) = } m (z) q m (z) is also a valid analytic
space-time covariance matrix on the unit circle introduc-
eigenvector of R (z N ), where } m (z) is all-pass.
es some additional constraints but results in the exis-
As an example, consider a system from [23],
tence of an analytic PEVD.
R1 - j 1 + j -1 1 + j 2 1 - j VW
S z+3+ z z +
2 2 2 2
R (z) = S 1 + j 1 - j 1 - j
W
1 + j -1W, (10)
S -2
S
2
+
2
z
2
z+3+
2
z W PEVD algorithms
T X The first attempt at producing a PEVD algorithm began with the
which is constructed from eigenvalues K (z) = diag{z + 3 second-order sequential best rotation (SBR2) [21], which was
+ z -1, jz + 3 - jz -1} and their corresponding eigenvectors motivated by Jacobi’s method for numerically computing the
q 1, 2 (z) = [1, ! z -1] T / 2 . The evaluation of the eigenvalues EVD [2]. The PEVD of R (z), i.e., (7), as given by (9) for N = 1
on the unit circle m 1, 2 (e jX) is presented in Figure 3(a). and established in the “Analytic EVD” section, can be approxi-
Figure 3(b) shows the Hermitian angle of the eigenvectors mated using an iterative algorithm and is expressed as [21], [46]
jX H j0 jX
{ m (e ) = arccos (; q 1 (e ) q m (e ) ;) is drawn in Figure 3(b).
R (z) . U (z) K (z) U P (z), (11)
Note that due to the analyticity of the EVD factors, all
these quantities evolve smoothly with the normalized angular where the columns of the polynomial matrix, U (z) ! C M # M,
frequency Ω. An all-pass modification of the eigenvectors correspond to the eigenvectors with their associated eigenval-
might be as simple as imposing a delay; while this will not ues on the diagonal polynomial matrix, K (z) ! C M # M . The
affect { m (e jX), it can increase support of Q [n] %–: Q (z). Laurent polynomial matrix factors U (z) and K (z) are neces-
While in the preceding example, the factorization yields sarily analytic, being of finite order. However, under certain
polynomial factors, this does not have to be the case: they circumstances, the theoretical factors in the PEVD might not
could be Laurent and power series. For example, modify- be analytic [23], [24], in which case, the Laurent polynomial
ing the previous eigenvectors by arbitrary all-pass functions matrix factors U (z) and K (z) are only approximations of the
does not invalidate the decomposition, but it may change true factors, hence the approximation in (11).
the order of q m (z) to 3, i.e., a power series. More gener- Rewriting (11) as
ally, Laurent polynomial matrices R (z) are likely to lead
K (z) . U P (z) R (z) U (z), (12)
to algebraic and even transcendental functions as EVD fac-
tors [23], [24]. Nonetheless, recall from the “Introduction” the diagonalization of R (z) can be achieved by generalized
section that analyticity implies absolute convergence in the similarity transformations, with U (z) satisfying the parauni-
time domain. Therefore, the best least-squares approxima- tary, or lossless, condition [20]
tion is achieved by truncation. Further, as the approxima-
tion order increases, the approximation error can be made 5
arbitrarily small. 4
λm(e jΩ)
The components of the analytic PEVD have some useful 3

properties. The matrix of eigenvectors Q [n] can be viewed 2
as a lossless filter bank. Clearly, it transforms the input time 1
0 π /4 π /2 3π /4 π 5π /4 3π /2 7π /2 2π
series into another set of time series. However, being parauni-
Normalized Angular Frequency Ω
tary, the energy in the output signals is the same as that of
(a)
the input signals. Furthermore, the output signals are strongly
decorrelated. That is, any two signals have zero cross-corre- π /2
3π /8
ϕm(e jΩ)
lation coefficients at all lags. Significantly, the signals are not

π /4
temporally whitened; i.e., they do not have an impulse as their
π /8
autocorrelation function. Note that the order of a z transform 0
is connected to the time-domain support of the correspond- 0 π /4 π /2 3π /4 π 5π /4 3π /2 7π /2 2π
ing time series. Thus, the computational cost of implementing Normalized Angular Frequency Ω
such a filter bank is related to the order of Q (z). In general, the (b)
eigenvalues of a narrowband covariance matrix have differ-
m=1 m=2
ing magnitudes, with the presence of small values indicating
approximate linear dependency among the input signals. Simi- FIGURE 3. (a) Analytic eigenvalues on the unit circle and (b) Hermitian
larly, the eigenvalues on the diagonal of K [n] can show linear angles of the corresponding analytic eigenvectors, measured against a
dependence but in a frequency-dependent manner. reference vector [23].

z4 z2
z2 z2 z2 z0
0
z z0 z0 z –2
z –2 z –2 z –2
z –4
z3 z3
z1 z1
z1 z1 z –1 z –1
z –1 z –1 z –3 z –3
(a) (b) (c) (d)
FIGURE 4. Each PEVD iteration involves the following four steps. (a) The polynomial matrix is first searched for the maximum off diagonal across all
lags. (b) The second delay step brings the largest element to the principal z 0-plane. (c) The third is the zeroing step, which transfers energy from the
off-diagonal elements to the diagonal. (d) The final trimming step discards negligibly small coefficients in the outer matrix slices.
U P (z) U (z) = U (z) U P (z) = I, (13) a predefined threshold, a delay polynomial matrix is applied to
bring the element to the principal z 0-plane, as shown in Fig-
where I is the identity matrix. The similarity transform U (z) ure 4. A unitary matrix, designed to zero out two elements on
may be calculated via an iterative algorithm, such as the the zero-lag plane, is applied to the entire polynomial matrix.
SBR2, and sequential matrix diagonalization (SMD) [46]. Note that applying one elementary paraunitary transformation
Here, a sequence of elementary paraunitary transformations may make some previously small off-diagonal elements larger,
G i (z) (i = 1, f) are applied to R (z) until the polynomial but overall, the algorithm converges to a diagonal matrix. As
matrix becomes approximately diagonal; i.e., starting from observed in Figure 4, the delay step can increase the polynomi-
Ru 0 (z) = R (z), the expression al order and make it unnecessarily large. Therefore, a trimming
u i (z) = G Pi (z) R
u i - 1 (z) G i (z) (14) procedure [21] is used to control the growth of the polynomial
R
order by discarding negligibly small coefficients in the outer
u N I (z) is approximately diagonal for some
is iterated until R planes, e.g., z -4 and z 4 in Figure 4. Furthermore, the similarity
N I . An elementary paraunitary transformation takes the form transformations in (12) affect a pair of dominant elements so
of the product of a unitary transformation and a polynomial that the search space can be halved due to the preservation of
delay matrix, diag{1, f, 1, z n, 1, f, 1} . symmetry. The algorithm terminates when the magnitudes of
Figure 4 gives the steps involved during every iteration of all off-diagonal elements fall below the preset threshold and
the SBR2. At each iteration, the algorithm searches for the when a user-defined maximum number of iterations is reached.
off-diagonal element with the largest magnitude across all This has led to a family of time-domain algorithms based on
z-planes, as marked in red in Figure 4. If the magnitude exceeds the SBR2 [21] and SMD [46]. The computational complexity
of these numerical algorithms is at least O (M 3 T) due to matrix
multiplication applied to every lag [47]. The additional complex-
5 ity incurred over the EVD approach is essential for the temporal
γ SBR2,m(e jΩ)
4 decoupling of broadband signals. Furthermore, some promising

3 efforts using parallelizable hardware [48] and numerical tricks
[49] have been proposed, and the decomposition can be com-
2
puted in a fraction of a second. These algorithms are also guar-
"
1 anteed to produce polynomial paraunitary eigenvectors but tend
0 π /4 π /2 3π /4 π 5π /4 3π /2 7π /2 2π
to generate spectrally majorized eigenvalues, which may not be
(a)
analytic. Two functions f1 (z) and f2 (z) are said to be spectral-
ly majorized if, on the unit circle, one function’s magnitude is
π /2
always greater than the other’s. Figure 5 presents the results of
ϕ SBR2,m(e jΩ)
3π /8
using the SMD algorithm to process the matrix used to generate
π /4 Figure 3. In Figure 5, the eigenvalue in blue is always greater
" π /8 than the one in red. In contrast, the (analytic) eigenvalues in Fig-
0 ure 3 intersect and are not spectrally majorized. As described in
0 π /4 π /2 3π /4 π 5π /4 3π /2 7π /2 2π Figure 5(b), forcing a spectrally majorized solution for the eigen-
Normalized Angular Frequency Ω values leads to the eigenvectors having discontinuities that are
(b) difficult to approximate with polynomials. To get an accurate
m=1 m=2
result, high-order polynomials are required. This, in turn, has
consequences for the implementation cost of any signal process-
FIGURE 5. The results of using SMD to decompose the matrix in Figure 3: ing based on the output of these algorithms. Note, however, that
(a) eigenvalues on the unit circle and (b) Hermitian angles of the corre- spectral majorization can be advantageous in some situations;
sponding analytic eigenvectors, measured against a reference vector [23]. see the “PEVD-Based Subband Coding” section.

Unlike in the case of the SBR2 algorithm [21], there is no Spatial filtering and steering vector
proof that the SMD algorithm will always produce spectrally Spatial filtering uses the fact that wavefronts arriving from dif-
majorized eigenvalues, although evidence from the use of this ferent sources have a different delay profile when arriving at
algorithm strongly supports this conjecture. Given the issues the sensors. If there are L spatially separated sources, then for
that spectral majorization produces in terms of exploiting Q (z) the ,th source, , = 1, f, L, and let this delay profile be
as a filter bank and for identifying subspaces, recent work has {x ,, 1, f, x ,,M}, where x ,,m is the delay at the mth sensor with
been directed at designing an algorithm that can produce a respect to some common reference point. We further define a
PEVD whose components are guaranteed to be analytic. One vector of transfer functions a , (z) = [d x ,,1 (z), f, d x ,,M (z)] T con-
such approach [50] involves working in the frequency domain taining fractional delay filters, where d x [n] %s d x (z) imple-
and taking steps to ensure that the spectral coherence is not lost. ments a delay by x ! R samples [18]. We refer to a , (z) as a
A number of algorithms have been designed for decomposi- broadband steering vector since, when evaluated at a fixed fre-
tions of fixed order and without proven convergence. This includes quency X ,, the ,th source can be regarded as a narrowband
the approximate EVD (AEVD) algorithm [51], which applies a signal with center frequency X ,, in which case this vector of
fixed number of elementary paraunitary operations in the time functions reduces to the well-known steering vector
domain in an attempt to diagonalize R (z) . In the DFT domain, a , (z); z =e jX = a , (e jX,) ! C M . The latter contains the phase
,
[52] aims to extract maximally smooth eigenvalues and eigenvec- shifts that each sensor experiences with respect to the ,th
tors, which can target the extraction of the analytic solution. source. If at least two sensors satisfy the spatial sampling the-
orem, and for a particular frequency X , = X 0, this steering
vector is unique with respect to the DOA of the ,th source.
We want to process the array data x [n] ! C M by a vector
Key Statement of filters w [n] %s w (z), with w P(z) = [w 1 (z), f, w M (z)] and
w m (z) :V w m [n] is the filter processing the mth sensor signal
Approximating analytic functions by polynomials
such that the array output y [n] is the sum of the filtering oper-
allows the development of PEVD algorithms based on
ations, y [n] = R v w H [- o] x [n - o] = R m, v w m [o] x m [n - o] .
an elementary paraunitary operator. The resulting algo-
The definition of the filter vector w [n], with its time-reversed
rithms are guaranteed to produce polynomial parauni-
and conjugated weights, may seem cumbersome, but it follows
tary eigenvectors but tend to generate spectrally
similar conventions for complex-valued data [53] and will later
majorized eigenvalues. This property has benefits as
simplify the z-transform notation.
well as drawbacks.
Narrowband beamforming
In the narrowband case, the delay filters can be replaced
Example applications using PEVD by complex coefficients in a vector w H = [w 1, f, w M] ! C M
This section highlights three application cases that demon- that implement phase shifts. To generate a different gain
strate key examples where PEVD-based approaches can offer f, , , = 1, f, L with respect to each of the L sources, the
advantages over state-of-the-art processing. In the “PEVD- beamformer defined by w must satisfy the constraint equation
Based Adaptive Beamforming” section, we demonstrate how,
a 1H (e jX1) f1
> h H w = > h H, (15)
for adaptive beamforming, the computational complexity is
decoupled from the TDL length that otherwise determines the
a HL (e jX L)
8
cost of a broadband adaptive beamformer. The “PEVD-Based fL
1442443
C f
Subband Coding” section shows how, in subband coding, the
L#M L
PEVD can generate a system with optimized coding gain and where C ! C and f ! C are the constraint matrix and
helps to formulate optimum compaction filter banks that pre- associated gain vector for the L constraints. In the presence of
viously could be stated only for the two-channel case. Finally, spatially white noise, the minimum mean-square error
the “Polynomial Subspace Speech Enhancement” section (MMSE) solution is the quiescent beamformer w q = C @ f
addresses how the preservation of spectral coherence can pro- [53], where C @ is the pseudo-inverse of C. If the noise is spa-
vide perceptually superior results over DFT-based speech tially correlated, then the LCMV formulation
E" ; y [n];2 ,
enhancement algorithms.
min
w
s.t. Cw = f (16)
PEVD-based adaptive beamforming provides the MMSE solution, now constrained by (15).
To explore PEVD-based beamforming, we first recall some Solutions to (16) include, for example, the Capon beamform-
aspects of narrowband beamforming before defining a linearly er, w opt = (R [0]) -1 C H [C (R [0]) -1 C H] -1 f, and the general-
constrained minimum variance (LCMV) beamformer using ized sidelobe canceller (GSC). For the GSC, a “quiescent
both TDL- and PEVD-based formulations. We work with an beamformer” w q implements the constraints in C, and a “block-
arbitrary geometry of M sensors but, for simplicity, assume ing matrix” B is constructed such that CB = 0. When operating
free-space propagation and that the array is sufficiently far on the array data x [n], the blocking matrix output is free of any
field to neglect any loss in amplitude across its sensors. desired signal components protected by the constraints. All that

remains now is to suppress any undesired signal components in structured interference that is not addressed by the constraint
the quiescent beamformer output that correlate with the blocking equation. A signal vector correlated with this remaining inter-
matrix output. This unconstrained optimization problem for the ference is produced by the blocking matrix B b ! C T (M - L) # TM,
vector w a in Figure 6(a) can be addressed by adaptive filtering whose columns, akin to the narrowband case, must span the
algorithms via a noise cancellation architecture [53]. The overall null-space of C b such that C b B b = 0. Its output is then linearly
response of the adapted GSC is w = w q - Bw a . combined by an adaptive filter v a ! C T (M - L) such that the over-
all beamformer output in Figure 6(b) is minimized in the MSE
TDL-based GSC sense. Note that the TDL length determines the dimensions
In the broadband case, each sensor is followed by a TDL of of all GSC components, with the overall adapted response of
length T to implement a finite-impulse response filter that can the beamformer, with respect to the input x [n] extended to the
resolve explicit time delays [54]. This leads to the concatenat- TDL representation in X [n], being v = v q - B b v a .
ed data vector X [n] = [x H [n], f, x H [n - T + 1]] H ! C MT
presented in the “TDL Processing” section. The weight vector PEVD-based GSC
v ! C MT now performs a linear combination across this spa- In the PEVD-based approach, we replace narrowband quanti-
tiotemporal window such that the beamformer output ties in the narrowband formulation by their polynomial equiv-
becomes y [n] = v H | [n] . Analogous to (15), a constraint alents to address the broadband case. This includes
equation C b v = fb defines the frequency responses in a num- substituting the Hermitian transpose {·} H by a para-Hermitian
ber of directions. The constraint formulation for a linear array transposition {·} P. Thus, the constraint equation becomes
with a look direction toward broadside is as straightforward as
a P1(z) f1 (z)
> h Hw (z) = > h H .(17)
in the narrowband case [55]. For linear arrays with off-broad-
side constraints and for arbitrary arrays, the formulation of
a PL(z)
= >
fL (z)
constraints becomes trickier and can be based on stacked nar-
C (z) f( ) z
rowband constraints across a number of DFT bins, akin to the
single-frequency formulation that leads to (15). Since it may The constraint matrix C (z) is therefore made up of broadband
not be clear how many such constraints should be stacked, steering vectors, and the gain vector f (z) contains the transfer
robust approaches start with a large number, which is then functions f, (z), , = 1, f, L that should be imposed on the L
trimmed to a reduced set of linearly independent constraints sources at the beamformer output. Both quantities are of the
by using, e.g., a QR decomposition [56]. Overall, with respect same dimensions as in the narrowband case but are now func-
to the narrowband case, the dimensions of the constraint tions of the complex variable z. Writing the beamformer out-
matrix and constraining vector will increase approximately put as y [n] = R v w H [- o] x [n - o] allows the broadband
T-fold such that C b ! C TL # TM and fb ! C TL . LCMV problem to be formulated as [25]
For the broadband GSC [57], a quiescent beamformer
v q = C @b fb ! C TL will generate an output that still contains any
w P(z) R (z) w (z) d
z
min # s.t. C (z) w (z) = f (z),
w(z) | z | = 1 z
(18)
wq
where R (z) is the CSD matrix of x [n] . The evaluation of (18)
q (z) at a single frequency X 0 leads back to the narrowband formu-
lation via the substitution z = e jX0 .
x[n] + y [n]
– The solution to the broadband LCMV problem can
B wa be found as the equivalent of the Capon beamformer
B(z) a (z) w opt (z) = R -1 (z) C P(z) {C (z) R -1 (z) C P(z)} -1 f (z), which is a
direct extension of the narrowband formulation. To access this
(a) solution, the inversion of the para-Hermitian matrices R (z)
and, subsequently, C (z) R -1 (z) C P(z) can be accomplished via
vq
PEVDs [58]. Once factorized, the resulting paraunitary matri-
ces are straightforward to invert, and it remains to invert the
individual eigenvalues; for this, recall the comment on analytic
TDL
x[n] x[n] + y [n] functions as divisors in the “Mathematical Background” sec-

−
tion. Alternatively, to avoid the nested matrix inversions of the
Bb va Capon beamformer and to exploit iterative schemes for their
general numerical robustness, an iterative unconstrained opti-
mization can be performed via a broadband PEVD-based GSC,
(b)
whereby, with respect to Figure 6(a), the quiescent beamformer
FIGURE 6. The GSC for (a) the narrowband (black quantities in boxes) and is w q (z) = C P(z) {C (z) C P(z)} -1 f (z) . The pseudo-inverse of
PEVD-based cases (blue quantities in boxes) and (b) the TDL-based case. a polynomial matrix for the quiescent solution can again be

obtained via a PEVD of the para-Hermitian term C (z) C P(z) the blocking matrix suffice. The adaptive filter is adjusted by a
[58]. Furthermore, its subspace decomposition also reveals the normalized least-mean-squares algorithm [53]. Note that
null-space of C (z) that can be used to define the columns of J 2 1 T. Overall, per iteration, the PEVD-based GSC takes
the blocking matrix B (z) such that C (z) B (z) = 0. It remains 12.3 kMACs, while the TDL-based GSC requires 3.46
only to operate a vector w a (z) of (M - L) adaptive filters on MMACs, which is indeed more than a factor of T higher.
the output of the blocking matrix to complete the optimization To evaluate the beamformer performance, we determine
of this PEVD-based GSC. Note that the overall response of the the gain response or directivity pattern of the beamformer by
beamformer is w (z) = w q (z) - B (z) w a (z) . probing the adapted overall beamformer response by sweeping
a broadband steering vector a i (z) across a set of angles {{ i}
with a corresponding delay profile. For the directivity pattern,
the angle-dependent transfer function G (z, { i) = w P(z) a i (z)
Key Statement can be evaluated on the unit circle. For the PEVD-based
GSC, this directivity pattern is displayed in Figure 7(a);
Using polynomial matrix notations and the PEVD, nar- the response (not displayed) for the TDL-based GSC is
rowband approaches, such as the Capon beamformer very similar. A difference can, however, be noted in the
and the GSC, can be directly extended to the broad- look direction, which, in the case of the TDL-based GSC,
band case. is protected by a number of point constraints along the
frequency axis, as highlighted in Figure 7(b). The gain
response satisfies these point constraints, but it shows sig-
Compared to the narrowband GSC in Figure 6(a), all quan- nificant deviations from the ideal flat response between
tities have retained their dimensions but are now functions of z. the constrained frequencies. In contrast, the PEVD-based
It now remains to set the polynomial orders of the different GSC beamformer is based on a single broadband constraint
components for an implementation. The quiescent vector w q (z) equation, which shows a significantly lower deviation from
depends on the constraint formulation, and its order J 1 deter- the desired look direction gain. This is due to the formula-
mines the accuracy of the fractional delay filters. The order J 2 of tion in the time domain, which preserves spectral coher-
the blocking matrix B (z) needs to be sufficiently high such that ence. There are downsides, and the gain response will
no source signal components covered by the constraint equation break down closer to X = r, due to the imperfections that
leak into the adaptive part w a (z) . The order J 3 of the latter has
to be sufficient to minimize the power of the output, y [n] . Thus,
Normalized Angular Frequency Ω/π
unlike in the TDL-based broadband beamformer case, the orders

0.9 0
of the components are somewhat decoupled. 0.8 –5
If the optimization of the adaptive part is addressed by 0.7 –10
inexpensive least-mean-squares type algorithms, the compu- 0.6 –15
Gain/[dB]
tational cost in both the TDL- and PEVD-based approaches 0.5 –20
–25
is governed by the blocking matrix. In the TDL-based case, 0.4
–30
it requires T 2 (M 2 - ML) multiplications and additions, while 0.3
–35
the PEVD-based blocking matrix expends only J 2 (M 2 - ML) 0.2 –40
such operations. With typically J 2 . T, the PEVD-based real- 0.1 –45
ization is less expensive by a factor of approximately the length 0
–90 –40 –10 0 30 45 80 –90
of the TDL, T. Angle of Arrival ϕ /[°]
(a)
Numerical example
Gain in Look Direction/[dB]
0.15
A linear array with M = 8 elements spaced by half the wave-
0.1
length of the highest-frequency component has a look direc-
0.05
tion toward j = 30c, which is protected by a constraint. Three
0
“unknown” interferers with directions j = {- 40c, - 10c, 80c}
–0.05
are active over a frequency range of 0.2 # X/r # 0.9 at –0.1
-20-dB signal-to-interference-plus-noise and need to be adap- –0.15
tively suppressed. The data are further corrupted by spatially 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6
and temporally white additive noise 50 dB below the signal Normalized Angular Frequency Ω/π
levels of the interferers. A TDL-based GSC operates with a TDL Based PEVD Based TDL Constraints
TDL length of T = 175. For a PEVD-based GSC, the adap-
(b)
tive filter uses the same temporal dimension J 3 = T, but to
match the MSE performance of the TDL-based version, a FIGURE 7. The (a) directivity pattern for the adapted PEVD-based GSC
length of J 1 = 51 for the fractional delay filters in the quies- and (b) gain response in the look direction ({ = 30°) for the PEVD- and
cent beamformer and a temporal dimension of J 2 = 168 for TDL-based GSC.

are inherent in fractional delay filters operating at close to x [o] . Thus, a higher bit resolution is dedicated to those sub-
half the sampling rate [18]. bands of y [n] that possess higher power. By not increasing the
overall number of bits with respect to x [o], judicious distribu-
tion of the coding effort results in an increase in the coding gain
measure: the ratio between the arithmetic and geometric means
Key Statement of the variances of the subband signals in y [n] [59]. A coding
gain greater than one can be exploited as an increased signal-to-
The PEVD-based GSC can implement the constraint
quantization-noise ratio under constant word length or in terms
equation more easily and precisely than the TDL-based
of a reduction in the number of bits required for quantization
version and possesses a significantly lower complexity
while retaining the same quality for the quantized signals.
when addressing nontrivial constraints.
Optimum coding gain and PEVD

To maximize the coding gain under the constraint of the parauni-
PEVD-based subband coding tarity of Q (z), two necessary and sufficient conditions of y [n]
have been identified [59]: 1) the subband signals in y [n] must
Data representation and task be strongly decorrelated such that R y (z) is diagonal, and 2)
Although this article addresses techniques for array signals, in they must be spectrally majorized such that for the ele-
many circumstances, multichannel signal representations are ments S m (z) along the diagonal, on the unit circle, we have
derived from a single-channel signal by demultiplexing [20], S m (e jX) $ S m +1 (e jX) 6X and m = 1, f, (M - 1) . Due to
[59], [60], [61]. Let x [o] be such a single-channel signal. Parseval’s theorem, this implies that the powers of the subband
Demultiplexing by M and an implicit decimation operation by signals are also ordered in a descending fashion. While under the
the same factor, or serial-to-parallel conversion, is performed constraint of paraunitary, this does not change the arithmetic
to obtain a data vector x [n] = [x [nM], x [nM - 1] , f, mean; it minimizes the geometric mean of the subband variances
x [nM - M + 1]] T . This demultiplexed vector x [n] possesses and thus maximizes the coding gain. Optimum subband coders
the same form as the data vectors considered in the “Signal Q (z) have been derived for the case of a zeroth-order filter bank,
Model” section. While the number and type of samples that where they reduce to the Karhunen-Loève transform (KLT), and
are held in x [n] remain unaltered from those in x [o], the rep- for the infinite-order filter bank case [59], [61]. Executing the
resentation in x [n] allows clever data reduction and coding PEVD (described in the “PEVD Algorithms” section) on R [x]
schemes through filter bank-based processing, for which we leads to R y [x] = K [x] that directly satisfies the preceding con-
ultimately exploit the PEVD. ditions and thus provides a solution to a subband coder of finite
order [27] whose theoretical evaluation otherwise eluded the
Principal component filter banks and subband coding research community except for the case of M = 2 [60].
Generally, we want to process the data x [n] through a trans- As discussed in the “PEVD Algorithms” section, at each
formation such that y [n] = R v Q H [- o] x [n - o] . Specifically, SBR2 algorithm iteration, the parameters of the elementary
we wish this transformation to be lossless, i.e., for paraunitary operator are selected such that the most domi-
Q (z) :V Q [n] to be paraunitary, such that a perfect recon- nant cross-covariance term of the input space-time covariance
struction via x [n] = R v Q [o] y [n - o] is possible. To the origi- matrix is zeroed. There are two problems with this “greedy”
nal unmultiplexed single-channel signal, the transformation optimization approach: 1) cross-correlation energies spread
Q H represents the analysis filter bank, whereas the transfor- among subbands of the weakest power can end up being
mation Q implements the synthesis (reconstruction) filter ignored, which limits the extent to which spectral majorization
bank [20]. The matrices Q P (z) and Q (z) are known as the is performed, and 2) there is a stronger tendency to annihilate
analysis polyphase matrix and synthesis polyphase matrix, cross-correlations due to noise in powerful subbands rather
respectively, and the paraunitarity of Q (z) guarantees perfect than true cross terms related to weak subbands, which causes
reconstruction of the overall filter bank system when operat- a degradation in strong decorrelation performance. The coding
ing back-to-back. gain variant of the SBR2, namely, SBR2C [27], alleviates these
The polyphase matrix Q (z) can be designed to implement problems because it uses a cost function based on the coding
a series of low-pass, bandpass, and high-pass filters to split the gain measure, which is proportionately equally receptive to
signal x [o] into signal components with different spectral con- cross-correlations among any of the subbands.
tent. However, the filter bank Q (z) can also be signal depen-
dent. Chief among such systems are principal component, Numerical example
or optimum compaction, filter banks (PCFBs), which aim to Consider a signal x [o] described by a fourth-order autoregres-
assign as much power of x [o] into as few successive compo- sive model [27], [35]; its PSD S (e jX) appears in Figure 8(a).
nents of y [n] as possible. The purpose of this is to discard some Demultiplexing by M = 4 produces a pseudocirculant matrix
components of y [n], thus producing a lower-dimensional rep- R (z) whose analytic eigenvalues S m (X) = S (e j (X/M - 2r (m - 1))),
resentation of the data. A closely related task is subband cod- m = 1, f, M, are 8r-periodic modulated versions of this PSD
ing, where a quantization is performed on y [n] rather than on [20]. For 0 # X # 2r, they are depicted as gray-highlighted

curves in Figure 8(b). Although the 8r periodicity of these coding gain obtained for each instance in the ensemble by the
functions means that R (z) has no analytic eigenvalues, an maximum coding gain of the infinite-order PCFB; the latter can
estimated CSD matrix R t (z), here based on 10 4 samples of be derived from each of the MA(14) processes [27]. This
x [o], does possess an analytic EVD due to the perturbation by ensemble-averaged normalized coding gain verses the order of
the estimation error [24]. Applying the SMD [27] to R t (z) will the polynomial matrix Q (z) is detailed in Figure 10. The figure
y
generate a strongly decorrelated signal vector n] via a [ shows results for the KLT, the AEVD algorithm in [51], and the
paraunitary operation Q (z). The eigenvalues mt m (e jX) extract- SBR2C and SMD. The KLT is the optimum zeroth-order sub-
ed by the SMD algorithms are also in Figure 8(b). These band coder. The AEVD algorithm is a fixed-order technique
closely match the folded PSD of x [o] highlighted in gray but that aims to generate a PEVD but without proved convergence.
are spectrally majorized. Note that, like the AEVD algorithm, the SMD algorithm for
Interpreting Q (z) as a polyphase analysis matrix, the asso- zeroth-order systems (i.e., length-one polynomials) reduces to
ciated four-channel filter bank is characterized in Figure 9. an ordinary EVD that is equivalent to the KLT and optimum for
The theoretically optimum infinite-order PCFB [59] is also narrowband source signals, as shown in the figure. Both the
shown. These are obtained by assigning every demultiplexed SBR2C and SMD converge toward the optimum performance
frequency component of x [o] to one of four filters, in descend- of an infinite-order PCFB as the polynomial order of Q (z)
ing magnitude. This yields a binary mask in the Fourier increases. This is indeed what would be expected since the
domain, which would require the implementation of infinite PEVD is effectively the broadband generalization of the KLT.
sinc functions in the time domain. In contrast, the finite-order Due to its specific targeting of the coding gain and the resulting
filters computed by the SMD algorithm, each derived from an enhanced spectral majorization, the SBR2C outperforms the
eigenvector in Q (z) corresponding to the eigenvalues in Fig- SMD here and thus provides a highly useful trade-off between
ure 8, very closely approximate the PCFB except where the polynomial order and coding gain.
input PSD is small and arguably unimportant.
Ensemble results
To demonstrate the wider benefit of the proposed subband
coder design, a randomly generated ensemble of 100 moving
average processes of order 14 [MA(14)] produces signals x [o]
Magnitude |Hm(e )|
jΩ
that are demultiplexed by M = 4. For each ensemble probe, the 1 4

space-time covariance matrix is estimated from 2 11 samples of
0.5
x [o] as a basis for the subband coder design. To average the 3
subband coding results across this ensemble, we normalize the 0 2 m
0
π /8
π /4
/8
π /2
1
/8
3π
/4
/8
5π
π
3π
7π
15
PSD/[dB]
10
5
FIGURE 9. The magnitude responses ; H m (e jX) ;, m = 1, f, 4 of the
0 M = 4-channel filter bank equivalent to the polyphase analysis matrix
–5 Q (z), with the theoretical PCFB of infinite order shown in gray.
0 π /4 π /2 3π /4 π 5π /4 3π /2 7π /4 2π
(a)
15 1
10log10 |λ m(e )|
10
Normalized Coding Gain
jΩ
0.95
Ensemble-Averaged
"
5
0.9
0
0.85
SBR2C
–5 SMD
0 π /4 π /2 3π /4 π 5π /4 3π /2 7π /4 2π 0.8 AEVD
Normalized Angular Frequency Ω KLT
m=1 m=2 m=3 m=4 0.75

100 101 102 103
(b) Polynomial Length
FIGURE 8. The (a) PSD of input x [o] and (b) eigenvalues extracted by the FIGURE 10. The averaged normalized coding gain in its dependence on
SMD algorithm for the subband coding problem; the M = 4-times-folded the length (order: plus one) of Q (z) for an ensemble of random MA(14)
PSD of the input signal is shown in gray. processes and the case of demultiplexing with M = 4.

This is combined with the space-time covariance of the ambi-
ent noise R v (z) to form R vu (z). The channel polynomial vec-
Key Statement tor is au (z) = [au 1 (z), f, au M (z)] ! C M, where au m (z) is a
polynomial obtained by taking the z transform of the direct
Hitherto, algorithms for M > 2 -channel paraunitary filter path and early reflections in the AIR from the source to the
banks for subband coding were suboptimal. PEVD- mth microphone, i.e., au m (z) = R iI = 0 au m [i] z -i, dropping ,
designed M -channel filter banks now closely approxi- for brevity.
mate the ideal system.
PEVD-based speech enhancement
The PEVD of (20) decomposes the polynomial matrix into
Polynomial subspace speech enhancement
E= sP G, (21)
0 U Pu (z)
R x (z) = 6U su (z) U vu (z)@;
Speech enhancement is important for applications involving K su (z)
human-to-human communications, such as hearing aids and 0 K vu (z) U vu (z)
telecommunications, and human-to-machine interactions,
including robot audition, voice-controlled systems, and auto- where {.} su and {.} vu are associated with the signal-plus-noise
matic speech recognition. These speech signals are often cap- (or, simply, signal) and noise-only (or, simply, noise) subspac-
tured by multiple microphones, commonly found in many es, respectively. Unlike some speech enhancement approach-
devices today, and provide opportunities for spatial processes, the proposed method does not use any noise or relative
ing. Moreover, speech signals captured by different micro- transfer function (RTF) estimation algorithms since the strong
phones naturally exhibit temporal correlations, especially in decorrelation property of the PEVD implicitly orthogonaliz-
reverberant acoustic environments. This section shows that it es the subspaces across all time lags in the range of x.
is advantageous to use PEVD algorithms to capture and pro- Consequently, speech enhancement can be achieved by com-
cess these spatiotemporal correlations, thus preserving spec- bining components in the signal subspace while nulling com-
tral coherence. A more comprehensive treatment with ponents residing in the noise subspace.
listening examples and code is available in [28]. The paraunitary U (z) is a lossless filter bank. This implies
that U (z) can distribute spectral power only among chan-
Multichannel reverberant signal model nels and not change the total signal and noise power over all
Consider a scenario where there is a single speaker s [n], an subspaces. The eigenvector filter bank is used to process the
array of microphones, and uncorrelated background noise v [n]. microphone signals, using
The speech propagates from the source to each microphone m
through the channels with acoustic impulse responses (AIRs), y [n] = / U H [- o] x [n - o], (22)
a ,,m [n], that are assumed to be time invariant. The AIR models o
the direct path propagation of the speech signal from the speak-
er to the microphone as well as reverberation due to multipath where U (n) %–: U (z) ! C M # M. Since the polynomial eigen-
reflections from objects and walls in enclosed rooms. vector matrix U (z) is constructed from a series of delay
Background noise is then added to each microphone. The signal and unitary matrices, each vector u m [n] has a filter-and-
model in the “Signal Model” section, with , = 1, can describe sum structure.
this situation. Across M microphones, the signal vector For a single source, the signal subspace has a dimen-
x [n] ! C M is used to compute the space-time covariance sion of one. Therefore, the enhanced signal can be extracted
matrix R x [x] in (6) and its z transform R x (z) in (7). from the first channel of the processed outputs y [n]. The
Exploiting the reverberation model in [62], the early reflec- enhanced output y 1 [n], associated with the signal subspace,
tions in the AIR represent closely spaced distinct echoes that includes mainly speech components originally distributed
perceptually reinforce the direct path component and may over all microphones but now summed coherently. In con-
improve speech intelligibility in certain conditions. On the trast, the noise subspace is dominated by ambient noise and
other hand, the late reflections in the AIR consist of randomly the late reverberation in the acoustic channels. The orthogo-
distributed small amplitude components, and the associated nality between subspaces is a result of strong decorrelation,
late reverberant signal components are commonly assumed to expressed as R y (z) = K (z), where R y (z) :-% R y [x] is com-
be mutually uncorrelated with the direct path and early signal puted from R y [x] = E {y [n] y H [n - x]}.
components [28], [62]. Thus, (7) can be written as In practice, assuming quasi stationarity, the speech signals
are processed frame by frame such that R x [x] in (6) can be
R x (z) = au (z) au P (z) rs (z) + R l (z) + R v (z) (19) recursively estimated. Additionally, the two-sided z transform
= R su (z) + R vu (z), (20) of R x [x] in (7) can be approximated by some truncation win-
dow W, which determines the extent of the supported temporal
where rs (z) :–% rs [x] and rs [x] is the autocorrelation sequence correlation of the speech signal. The time-domain PEVD algo-
of the source. The space-time covariance matrices of the late rithms, such as the SBR2 and SMD, are used to compute (21)
reverberation R l (z) are modeled as a spatially diffuse field. because they preserve spectral coherence of the speech signals

and do not introduce audible artifacts. The proposed algorithm for dereverberation [28]. To measure speech intelligibility and
can also cope in noise-only and reverberation-only scenarios, to account for processing artifacts, short-time objective intelli-
as explored in [28]. Experimental results are next presented to gibility (STOI) can be used. These measures are computed for
show these general principles applied for a specific case. the signals before and after enhancement by using the pro-
posed and benchmark algorithms. The improvement T is
Experimental setup reported. Positive T values show improvements in all mea-
Anechoic speech signals, which were sampled at 16 kHz, sures except TBSD, for which a negative value indicates a
were taken from the TIMIT corpus [63]. AIR measurements reduction in spectral distortions.
and babble noise recordings for the M = 3-channel “mobile”
array were taken from the ACE corpus [64]. ACE lecture room Results and discussions
2 has a reverberation time T60 of 1.22 s. For each Monte Carlo An illustrative example based on clean speech s [n] corrupted
simulation, 50 trials were conducted. In each trial, sentences by 5-dB babble noise in the reverberant ACE lecture room 2 is
from a randomly selected speaker were concatenated to have presented in Figure 11. The spectrogram of the first micro-
an 8- to 10-s duration. The anechoic speech signals were phone signal x 1 [n] shows temporal smearing due to reverber-
then convolved with the AIRs for each microphone before ation and the addition of babble noise. Comparing the plots
being corrupted by additive noise. The SNRs ranged from for x 1 [n] with the processed signals y 1 [n], the dotted cyan
−10 to 20 dB. boxes in Figure 11 qualitatively show the attenuation and
some suppression of the babble noise and reverberation for the
Comparative algorithms PEVD and COLSUB. This is supported by Table 1, which
PEVD-based enhancement can be compared against other shows that the PEVD significantly improves the STOI and
algorithms, such as the oracle multichannel Wiener filter NSRR while coming second in the FwSegSNR and BSD,
(OMWF), weighted power minimum distortionless response after the COLSUB. Although the COLSUB makes the most
(WPD), and two subspace approaches, multichannel subspace significant improvement in the FwSegSNR, the solid white
(MCSUB) [65] and colored subspace (COLSUB) [66], which boxes highlight the speech structures in s [n], which are lost
use an EVD and a generalized EVD (GEVD), respectively. after the processing of x 1 [n] to generate y 1 [n], as evident
Furthermore, unlike the PEVD approach, noise estimation is between 3 and 3.3 s and 4.2 and 4.7 s in Figure 11. This has
required for the GEVD. The OMWF is based on the concate- resulted in artifacts in the listening examples and the lowest
nation of a minimum variance distortionless response beam- improvement in the STOI. The OMWF, which uses complete
former followed by a single-channel Wiener filter. The knowledge of the clean speech signal, is the second best in the
OMWF provides an ideal performance upper bound since it STOI and slightly improves other metrics, similar to the WPD,
uses complete prior knowledge of the clean speech signal, which uses the ground truth steering vector. The MCSUB
based on [67], where the filter length is 80. Practical multi- offers limited improvement. Listening examples also highlight
channel Wiener filters, which rely on the RTF and noise esti- that the PEVD does not introduce audible processing artifacts
mation algorithms, do not perform as well as the OMWF, and into the enhanced signal [28].
comparative results can be found in [28]. The WPD is an inte- Results for the Monte Carlo simulation involving 50 speak-
grated method for noise reduction and dereverberation [68]. ers in lecture room 2 and corrupted by −10- to 20-dB babble
The ground truth DOA is provided to compute the steering noise are available in Figure 12. For SNR # 10 dB, the COL-
vector for the WPD to avoid signal direction mismatch errors. SUB outperforms other algorithms in TFwSegSNR but gives
The PEVD does not use any knowledge of the speech, DOA, the worst TSTOI. On the other hand, the OMWF, designed
and array geometry. to minimize speech distortion by using knowledge of clean
Experiments presented here to illustrate comparative speech, performs the best in TSTOI but not in TFwSegSNR.
performance use PEVD parameters, chosen following [28], This also reflects the fact that speech intelligibility may not
including d = N 1 /3 # 10 -2 denoting the threshold of the necessarily be affected by noise levels, up to some limit.
dominant off-diagonal column norm, where N 1 is the square of Despite not being given any information on the target speech,
the trace norm of R xx (0); trim factor n = 10 -3; and L = 500 the PEVD performs comparably to the OMWF and ranks first
iterations. In all experiments, the frame size T and window W in TNSRR and second in TFwSegSNR and TSTOI.
are set to 1,600. With this parameter selection, correlations At a 20-dB SNR, algorithms targeting reverberation, such
within 100 ms, which are assumed to include the direct path as the WPD, perform better than noise reduction approaches.
and early reflection components, are captured and used by the Similar to generalized weighted prediction error in the rever-
algorithm. The source corresponding to these experiments is beration-only case in [28], the WPD processes the reverber-
available in [28]. ant signals aggressively by removing most early reflections
but not the direct path and late reflections, as observed in the
Evaluation measures listening examples. Furthermore, the WPD uses the ground
The frequency-weighted segmental SNR (FwSegSNR) can be truth DOA to compute the ideal steering vector, leading
used to evaluate the noise reduction and normalized signal-to- to the best improvement in TBSD and TSTOI. Listening
reverberant ratio (NSRR) and Bark spectral distortion (BSD) examples for the PEVD indicate that the direct path and early

Power/Decade (dB)
–25 –20 –15 –10 –5 0 5 10
8 8
7 7
6 6
Frequency (kHz)
Frequency (kHz)
5 5
4 4
3 3
2 2
1 1
0 0
1 2 3 4 5 1 2 3 4 5
Time (s) Time (s)
(a) (b)
8 8
7 7
6 6
Frequency (kHz)
Frequency (kHz)
5 5
4 4
3 3
2 2
1 1
0 0
1 2 3 4 5 1 2 3 4 5
Time (s) Time (s)
(c) (d)
FIGURE 11. Spectrograms, with corresponding time-domain signals, for the processing of a noisy reverberant speech example in ACE lecture room 2 and
5-dB babble noise. Dotted cyan boxes highlight noise- and reverberation-suppressed regions as a result of processing. Solid white boxes highlight regions
where speech structures are lost using the COLSUB but not PEVD processing. Listening examples are available in [28]. The (a) clean speech signal s [n],
(b) noisy speech signal x 1 [n], (c) COLSUB-enhanced signal y 1 [n], and (d) PEVD-enhanced signal y 1 [n].
reflections are retained in the enhanced signal in the first

Table 1. The enhancement of a single reverberant speech sample in
lecture room 2 and 5-dB ACE babble noise. channel. The late reverberations, absent in the enhanced sig-
nal, are observed in the second and third channels because of
Algorithm FwSegSNR STOI NSRR BSD
orthogonality [28]. Even without additional information, the
Noisy −10.9 dB 0.664 −7.57 dB 0.69 dB PEVD performs comparably to the WPD and ranks second in
OMWF −11.1 dB 0.747 −7.42 dB 0.6 dB TNSRR and TSTOI.
Despite not being given knowledge of the DOA, target
MCSUB −11.7 dB 0.711 −11.5 dB 0.93 dB
speech, and array geometry, the PEVD consistently ranks first
COLSUB −6.6 dB 0.678 −7.9 dB 0.35 dB for TNSRR and second in TSTOI and TFwSegSNR. over
PEVD −8.21 dB 0.75 −6.13 dB 0.4 dB the range of scenarios. Comprehensive results with listening
WPD −8.9 dB 0.723 −6.27 dB 0.45 dB examples and code for the noise-only, reverberation-only, and
noisier reverberant scenarios are available in [28].

10
10
0.1
∆FwSegSNR (dB)
5 5
∆NSRR (dB)
0
0
∆STOI
0
–5
–5 –0.1 –10
–15
–10
–0.2 –20
–10 –5 0 5 10 20 –10 –5 0 5 10 20 –10 –5 0 5 10 20
SNR (dB) SNR (dB) SNR (dB)
(a) (b) (c)
OMWF MCSUB COLSUB PEVD WPD
FIGURE 12. A comparison of speech enhancement performance for recorded AIR and babble noise in ACE lecture room 2, with a 1.22-s reverberation
time: the (a) TFwSegSNR (higher is better), (b) TSTOI (higher is better), and (c) TNSRR (higher is better).
more easily and precisely for adaptive broadband beamform-

ing while achieving a lower complexity than the TDL-based
Key Statement approach. For multichannel subband coding, the PEVD
design approximates the ideal optimal data encoding system
PEVD-based speech enhancement consistently improves
and overcomes the previous issues with the more-than-two-
noise reduction metrics, speech intelligibility scores, and
channel case. The PEVD-based algorithm, which uses only
dereverberation measures over a wide range of acoustic
microphone signals, can consistently enhance speech sig-
scenarios. This blind and unsupervised algorithm
nals, without introducing any audible artifacts, and performs
requires no knowledge of the array geometry and does
comparably to an oracle algorithm, as observed in the listen-
not use any channel and noise estimation algorithms but
ing examples. In addition to the applications presented in this
performs comparably to an oracle algorithm. More nota-
article, the PEVD is also successfully used for blind source
bly, due to the preservation of the spectral coherence
separation [30], MIMO system design [26], source identifica-
using time-domain PEVD algorithms, the proposed algo-
tion [31], and broadband DOA estimation [32].
rithm does not introduce noticeable processing artifacts
into the enhanced signal. Code and listening examples
Future work
are provided in [28].
Similar extensions from the EVD to an analytic or PEVD can
be undertaken for other linear algebraic operations, e.g., the
nonpara-Hermitian EVD, singular value decomposition
Conclusions and future perspectives (SVD), QR decomposition, and generalized SVD. Algorithms
This article has demonstrated the use of polynomial matrices for SVD and QR decomposition have appeared but are with-
to model broadband multichannel signals and the use of the out a theoretical foundation with respect to their existence.
PEVD to process them. Previous approaches using TDLs and Powerful narrowband techniques, such as independent com-
STFTs do not lead to proper generalization of narrowband ponent analysis, may find their polynomial equivalents.
algorithms and are suboptimal. Instead of considering only While a number of low-cost implementations have already
the instantaneous covariance matrix, the space-time covari- emerged, algorithmic scalability is an area of active investiga-
ance matrix has been proposed to completely capture the section. We hope that these theoretical and algorithmic develop-
ond-order statistics of multichannel broadband signals. ments will motivate the signal processing community to
Motivated by the optimum processing of narrowband signals experiment with polynomial techniques and take these
using the EVD, i.e., for a single lag, the PEVD has been pro- beyond the successful application areas showcased in this
posed to process broadband signals across a range of time article. Resources including code and demonstration pages
lags. In most cases, an analytic PEVD exists and can be are available in [28] and [69].
approximated by polynomials using numerical algorithms,
which tend to generate spectrally majorized eigenvalues and Acknowledgment
paraunitary eigenvectors. The work of Stephan Weiss was supported by the U.K.
PEVD-based processing for three example applications has Engineering and Physical Sciences Research Council
been presented and is advantageous over state-of-the-art pro- (EPSRC), under grant EP/S000631/1, and the MoD
cessing. The PEVD approach can implement the constraints University Defense Research Collaboration in Signal

Processing. The work of Patrick A. Naylor was funded University of Strathclyde, G1 1XW Glasgow, U.K., in 1998,
through EPSRC grant EP/S035842/1 and the European where he is currently a professor of signal processing. His
Union’s Horizon 2020 research and innovation program, research interests include adaptive, multirate, and array signal
under Marie Skłodowska-Curie grant 956369. processing, with applications in acoustics, communications,
audio, and biomedical signal processing. He is a Senior
Authors Member of IEEE.
Vincent W. Neo (vincent.neo09@imperial.ac.uk) received his Patrick A. Naylor (p.naylor@imperial.ac.uk) received his
Ph.D. degree in electrical and electronic engineering in 2022 Ph.D. degree from Imperial College London, SW7 2AZ
from Imperial College London. He is currently a principal London, U.K., where he is currently a professor of speech
engineer in the Singapore Defence Science and Technology and acoustic signal processing. He is a member of the
Agency, working on speech technology, and a visiting post- Board of Governors of the IEEE Signal Processing Society
doctoral researcher with the Department of Electrical and and a past president of the European Association for Signal
Electronic Engineering, Imperial College London, SW7 2AZ Processing. His research interests include microphone
London, U.K. His research interests include multichannel sig- array signal processing, speaker diarization, and multi-
nal processing and polynomial matrix decomposition, with channel speech enhancement for applications, including
applications to speech, audio, and acoustics. He is a Member binaural hearing aids and augmented reality. He is a Fellow
of IEEE. of IEEE.
Soydan Redif (soydan.redif@aum.edu.kw) received his
Ph.D. degree in electronics and electrical engineering from the References
[1] G. Strang, Linear Algebra and Its Application, 2nd ed. New York, NY, USA:
University of Southampton, U.K. He is currently an associate Academic, 1980.
professor at the College of Engineering and Technology, [2] G. H. Golub and C. F. van Loan, Matrix Computations, 3rd ed. Baltimore, MD,
American University of the Middle East, Dasman 15453, USA: The Johns Hopkins Univ. Press, 1996.
[3] S. Haykin and K. J. R. Liu, Eds., Handbook on Array Processing and Sensor
Kuwait. His research interests include adaptive and array sig- Networks. Hoboken, NJ, USA: Wiley, 2010.
nal processing applied to source separation, communications, [4] N. S. Jayant and P. Noll, Digital Coding of Waveforms Principles and
power, biomedical, and wearable systems. He is a Senior Applications to Speech and Video. Englewood Cliffs, NJ, USA: Prentice-Hall,
1984.
Member of IEEE.
[5] R. O. Schmidt, “Multiple emitter location and signal parameter estimation,”
John G. McWhirter (mcwhirterjg@cardiff.ac.uk) received IEEE Trans. Antennas Propag., vol. 34, no. 3, pp. 276–280, Mar. 1986, doi:
his Ph.D. degree in theoretical physics from Queen’s 10.1109/TAP.1986.1143830.
University of Belfast, U.K., in 1973. He is currently an emeri- [6] M. Vetterli and J. Kovačević, Wavelets and Subband Coding. Upper Saddle
River, NJ, USA: Prentice-Hall, 1995.
tus professor at the University of Cardiff, CF24 3AA Cardiff, [7] M. Moonen and B. De Moor, SVD and Signal Processing, III: Algorithms,
U.K. His research interests include independent component Architectures and Applications. New York, NY, USA: Elsevier, 1995.
analysis for blind signal separation and polynomial matrix [8] H. L. Van Trees, Optimal Array Processing. Part IV of Detection, Estimation,
and Modulation Theory. New York, NY, USA: Wiley, 2002.
algorithms for broadband sensor array signal processing. He
[9] P. Comon and C. Jutten, Handbook of Blind Source Separation: Independent
was elected as a fellow of the Royal Academy of Engineering Component Analysis and Applications, 1st ed. New York, NY, USA: Academic,
in 1996 and the Royal Society in 1999. His work has attracted 2010.
various awards, including the European Association for Signal [10] T. K. Moon and W. C. Stirling, Mathematical Methods and Algorithms for
Signal Processing, 1st ed. Englewood Cliffs, NJ, USA: Prentice-Hall, 2000.
Processing Group Technical Achievement Award in 2003. [11] R. Klemm, Space-Time Adaptive Processing: Principles and Applications.
Jennifer Pestana (jennifer.pestana@strath.ac.uk) received London, U.K.: Inst. Elect. Eng., 1998.
her D.Phil. degree in numerical analysis from the University [12] A. Rao and R. Kumaresan, “On decomposing speech into modulated compo-
nents,” IEEE Trans. Speech Audio Process., vol. 8, no. 3, pp. 240–254, May 2000,
of Oxford, U.K., in 2012. She is currently a lecturer in the doi: 10.1109/89.841207.
Department of Mathematics and Statistics, University of [13] S. Weiss and I. K. Proudler, “Comparing efficient broadband beamforming
Strathclyde, G1 1XH Glasgow, U.K. Her research interests architectures and their performance trade-offs,” in Proc. IEEE Int. Conf. Digit.
Signal Process. (DSP), Jul. 2002, pp. 417–424, doi: 10.1109/ICDSP.2002.1027910.
include numerical linear algebra and matrix analysis and their
[14] W. Kellermann and H. Buchner, “Wideband algorithms versus narrowband
application to problems in science and engineering. algorithms for adaptive filtering in the DFT domain,” in Proc. Asilomar Conf.
Ian K. Proudler (ian.proudler@strath.ac.uk) received Signals, Syst. Comput., 2003, pp. 1–5, doi: 10.1109/ACSSC.2003.1292194.
[15] Y. Avargel and I. Cohen, “System identification in the short-time Fourier trans-
his Ph.D. degree in digital signal processing from the form domain with crossband filtering,” IEEE Trans. Audio, Speech, Language
University of Cambridge, U.K., in 1984. He is currently a Process., vol. 15, no. 4, pp. 1305 –1319, May 20 07, doi: 10.1109/
TASL.2006.889720.
visiting professor at the University of Strathclyde, G1 1XW
[16] B. Widrow, P. Mantey, L. Griffiths, and B. Goode, “Adaptive antenna sys-
Glasgow, U.K. He was an honorary editor for IEE tems,” Proc. IEEE, vol. 55, no. 12, pp. 2143–2159, Dec. 1967, doi: 10.1109/
Proceedings: Radar, Sonar, and Navigation and the 2002 PROC.1967.6092.
recipient of the Institution of Electrical Engineers J.J. [17] R. T. Compton Jr., “The relationship between tapped delay-line and FFT pro-
cessing in adaptive arrays,” IEEE Trans. Antennas Propag., vol. 36, no. 1, pp.
Thomson Medal. His research interests include adaptive fil- 15–26, Jan. 1988, doi: 10.1109/8.1070.
tering, adaptive beamforming, multichannel signal process- [18] T. I. Laakso, V. Valimaki, M. Karjalainen, and U. K. Laine, “Splitting the unit
ing, and blind signal separation. delay [FIR/All Pass Filters Design],” IEEE Signal Process. Mag., vol. 13, no. 1, pp.
30–60, Jan. 1996, doi: 10.1109/79.482137.
Stephan Weiss (stephan.weiss@strath.ac.uk) received his [19] T. Kailath, Linear Systems, 1st ed. Englewood Cliffs, NJ, USA: Prentice-Hall,
Ph.D. degree in electronic and electrical engineering from the 1980.

[20] P. P. Vaidyanathan, Multirate Systems and Filterbanks, 1st ed. Englewood [47] F. K. Coutts, J. Corr, K. Thompson, S. Weiss, I. K. Proudler, and J. G.
Cliffs, NJ, USA: Prentice-Hall, 1993. McWhirter, “Memory and complexity reduction in parahermitian matrix manipula-
[21] J. G. McWhirter, P. D. Baxter, T. Cooper, S. Redif, and J. Foster, “An EVD tions of PEVD algorithms,” in Proc. Eur. Signal Process. Conf. (EUSIPCO), 2016,
algorithm for para-Hermitian polynomial matrices,” IEEE Trans. Signal Process., pp. 1633–1637, doi: 10.1109/EUSIPCO.2016.7760525.
vol. 55, no. 5, pp. 2158–2169, May 2007, doi: 10.1109/TSP.2007.893222. [48] S. Kasap and S. Redif, “Novel field-programmable gate array architecture for
[22] S. Icart and P. Comon, “Some properties of Laurent polynomial matrices,” in computing the eigenvalue decomposition of para-Hermitian polynomial matrices,”
Proc. IMA Int. Conf. Math. Signal Process., Dec. 2012, pp. 1–4. IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 3, pp. 522–536,
Mar. 2014, doi: 10.1109/TVLSI.2013.2248069.
[23] S. Weiss, J. Pestana, and I. K. Proudler, “On the existence and uniqueness of the
eigenvalue decomposition of a para-Hermitian matrix,” IEEE Trans. Signal Process., [49] F. K. Coutts, I. K. Proudler, and S. Weiss, “Efficient implementation of itera-
vol. 66, no. 10, pp. 2659–2672, May 2018, doi: 10.1109/TSP.2018.2812747. tive polynomial matrix EVD algorithms exploiting structural redundancy and paral-
lelisation,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 66, no. 12, pp.
[24] S. Weiss, J. Pestana, I. K. Proudler, and F. K. Coutts, “Corrections to ‘On the 4753–4766, Dec. 2019, doi: 10.1109/TCSI.2019.2937006.
existence and uniqueness of the eigenvalue decomposition of a para-Hermitian
matrix’,” IEEE Trans. Signal Process., vol. 66, no. 23, pp. 6325–6327, Dec. 2018, [50] S. Weiss, I. K. Proudler, and F. K. Coutts, “Eigenvalue decomposition of a
doi: 10.1109/TSP.2018.2877142. parahermitian matrix: Extraction of analytic eigenvalues,” IEEE Trans. Signal
Process., vol. 69, pp. 722–737, Jan. 2021, doi: 10.1109/TSP.2021.3049962.
[25] S. Weiss, S. Bendoukha, A. Alzin, F. K. Coutts, I. K. Proudler, and J.
Chambers, “MVDR broadband beamforming using polynomial matrix techniques,” [51] A. Tkacenko, “Approximate eigenvalue decomposition of para-Hermitian sys-
in Proc. Eur. Signal Process. Conf. (EUSIPCO), 2015, pp. 839–843, doi: 10.1109/ tems through successive FIR paraunitary transformations,” in Proc. IEEE Int. Conf.
EUSIPCO.2015.7362501. Acoust., Speech Signal Process. (ICASSP), 2010, pp. 4074–4077, doi: 10.1109/
ICASSP.2010.5495751.
[26] R. Brandt and M. Bengtsson, “Wideband MIMO channel diagonalization in the
time domain,” in Proc. Int. Symp. Pers., Indoor Mobile Radio Commun., 2011, pp. [52] M. Tohidian, H. Amindavar, and A. M. Reza, “A DFT-based approximate
1958–1962, doi: 10.1109/PIMRC.2011.6139853. eigenvalue and singular value decomposition of polynomial matrices,” EURASIP J.
Appl. Signal Process., vol. 1, no. 93, pp. 1–16, Dec. 2013, doi: 10.1186/1687-6180-
[27] S. Redif, J. G. McWhirter, and S. Weiss, “Design of FIR paraunitary filter banks 2013-93.
for subband coding using a polynomial eigenvalue decomposition,” IEEE Trans. Signal
Process., vol. 59, no. 11, pp. 5253–5264, Nov. 2011, doi: 10.1109/TSP.2011.2163065. [53] S. Haykin, Adaptive Filter Theory, 2nd ed. Englewood Cliffs, NJ, USA:
Prentice-Hall, 1991.
[28] V. W. Neo, C. Evers, and P. A. Naylor, “Enhancement of noisy reverberant
speech using polynomial matrix eigenvalue decomposition,” IEEE/ACM Trans. [54] K. M. Buckley, “Spatial/spectral filtering with linear constrained minimum
Audio, Speech, Language Process., vol. 29, pp. 3255–3266, Oct. 2021, doi: variance beamformers,” IEEE Trans. Acoust., Speech, Signal Process., vol. 35, no.
10.1109/TASLP.2021.3120630. 3, pp. 249–266, Mar. 1987, doi: 10.1109/TASSP.1987.1165142.
[29] J. Corr, J. Pestana, S. Weiss, I. K. Proudler, S. Redif, and M. Moonen, [55] W. Liu and S. Weiss, Wideband Beamforming: Concepts and Techniques.
“Investigation of a polynomial matrix generalised EVD for multi-channel Wiener fil- Hoboken, NJ, USA: Wiley, 2010.
tering,” in Proc. Asilomar Conf. Signals, Syst. Comput., 2016, pp. 1354–1358, doi: [56] R. G. Lorenz and S. P. Boyd, “Robust minimum variance beamforming,” IEEE
10.1109/ACSSC.2016.7869596. Trans. Signal Process., vol. 53, no. 5, pp. 1684–1696, May 2005, doi: 10.1109/
[30] S. Redif, S. Weiss, and J. G. McWhirter, “Relevance of polynomial matrix TSP.2005.845436.
decompositions to broadband blind signal separation,” Signal Process., vol. 134, pp. [57] K. M. Buckley, “Broad-band beamforming and the generalized sidelobe cancel-
76–86, May 2017, doi: 10.1016/j.sigpro.2016.11.019. ler,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-34, pp. 1322–1323,
[31] S. Weiss, N. J. Goddard, S. Somasundaram, I. K. Proudler, and P. A. Naylor, Oct. 1986, doi: 10.1109/TASSP.1986.1164927.
“Identification of broadband source-array responses from sensor second order statis- [58] S. Weiss, A. P. Millar, and R. W. Stewart, “Inversion of parahermitian matri-
tics,” in Proc. Sens. Signal Process. Defence Conf. (SSPD), 2017, pp. 1–5, doi: ces,” in Proc. Eur. Signal Process. Conf. (EUSIPCO), Aug. 2010, pp. 447–451.
10.1109/SSPD.2017.8233237. [59] P. P. Vaidyanathan, “Theory of optimal orthonormal subband coders,” IEEE
[32] W. Coventry, C. Clemente, and J. Soraghan, “Enhancing polynomial MUSIC Trans. Signal Process., vol. 46, no. 6, pp. 1528–1543, Jun. 1998, doi:
algorithm for coherent broadband sources through spatial smoothing,” in Proc. Eur. 10.1109/78.678466.
Signal Process. Conf. (EUSIPCO), 2017, pp. 2448–2452. [60] B. Xuan and R. Bamberger, “FIR principal component filter banks,” IEEE
[33] A. Oppenheim and R. W. Schafer, Digital Signal Processing, 2nd ed. Trans. Signal Process., vol. 46, no. 4, pp. 930 –940, Apr. 1998, doi:
Englewood Cliffs, NJ, USA: Prentice-Hall, 1993. 10.1109/78.668547.
[34] B. Girod, R. Rabebstein, and A. Stenger, Signals and Systems. New York, NY, [61] A. Kirac and P. P. Vaidyanathan, “Theory and design of optimum FIR compac-
USA: Wiley, 2001. tion filters,” IEEE Trans. Signal Process., vol. 46, no. 4, pp. 903–919, Apr. 1998,
[35] A. Papoulis, Probability, Random Variables, and Stochastic Processes, 3rd doi: 10.1109/78.668545.
ed. New York, NY, USA: McGraw-Hill, 1991. [62] P. A. Naylor and N. D. Gaubitch, Eds., Speech Dereverberation. London,
[36] I. Gohberg, P. Lancaster, and L. Rodamn, Matrix Polynomials, 2nd ed. U.K.: Springer-Verlag, 2010.
Philadelphia, PA, USA: SIAM, 2009. [63] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L.
[37] V. Kučera, Analysis and Design of Discrete Linear Control Systems. Dahlgren, and V. Zue, “TIMIT acoustic-phonetic continuous speech corpus,”
Englewood Cliffs, NJ, USA: Prentice-Hall, 1991. Linguistic Data Consortium, Philadelphia, PA, USA, Corpus LDC93S1,
1993.
[38] R. E. Crochiere and L. R. Rabiner, Multirate Digital Signal Processing.
Englewood Cliffs, NJ, USA: Prentice-Hall, 1983. [64] J. Eaton, N. D. Gaubitch, A. H. Moore, and P. A. Naylor, “Estimation of room
acoustic parameters: The ACE challenge,” IEEE/ACM Trans. Audio, Speech,
[39] M. Davis, “Factoring the spectral matrix,” IEEE Trans. Autom. Control, vol. 8, Language Process., vol. 24, no. 10, pp. 1681–1693, Oct. 2016, doi: 10.1109/
no. 4, pp. 296–305, Oct. 1963, doi: 10.1109/TAC.1963.1105614. TASLP.2016.2577502.
[40] “The polynomial toolbox.” Polyx. Accessed: Apr. 30, 2023. [Online]. [65] F. Jabloun and B. Champagne, “A multi-microphone signal subspace approach
Available: http://www.polyx.com for speech enhancement,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.
[41] J. J. Shynk, “Frequency-domain and multirate adaptive filtering,” IEEE Signal (ICASSP), 2001, pp. 205–208, doi: 10.1109/ICASSP.2001.940803.
Process. Mag., vol. 9, no. 1, pp. 14–37, Jan. 1992, doi: 10.1109/79.109205. [66] Y. Hu and P. C. Loizou, “A subspace approach for enhancing speech corrupted
[42] A. Gilloire and M. Vetterli, “Adaptive filtering in subbands with critical sam- by colored noise,” IEEE Signal Process. Lett., vol. 9, no. 7, pp. 204–206, Jul. 2002,
pling: Analysis, experiments, and application to acoustic echo cancellation,” IEEE doi: 10.1109/LSP.2002.801721.
Trans. Signal Process., vol. 40, no. 8, pp. 1862–1875, Aug. 1992, doi: [67] S. Doclo and M. Moonen, “GSVD-based optimal filtering for single and multi-
10.1109/78.149989. microphone speech enhancement,” IEEE Trans. Signal Process., vol. 50, no. 9, pp.
[43] F. Rellich, Perturbation Theory of Eigenvalue Problems. New York, NY, 2230–2244, Sep. 2002, doi: 10.1109/TSP.2002.801937.
USA: Gordon & Breach, 1969. [68] T. Nakatani and K. Kinoshita, “A unified convolutional beamformer for simul-
[44] T. Kato, Perturbation Theory for Linear Operators. Singapore: Springer, 1980. taneous denoising and dereverberation,” IEEE Signal Process. Lett., vol. 26, no. 6,
pp. 903–907, Jun. 2019, doi: 10.1109/LSP.2019.2911179.
[45] G. Barbarino and V. Noferini, “On the Rellich eigendecomposition of para-Her-
mitian matrices and the sign characteristics of *-palindromic matrix polynomials,” [69] S. Weiss, J. Corr, K. Thompson, J. G. McWhirter, and I. K. Proudler.
Linear Algebra Appl., vol. 672, pp. 1–27, Sep. 2023, doi: 10.1016/j.laa.2023.04.022. “Polynomial EVD toolbox.” Polynomial Eigenvalue Decomposition. Accessed:
Apr. 30, 2023. [Online]. Available: http://pevd-toolbox.eee.strath.ac.uk
[46] S. Redif, S. Weiss, and J. G. McWhirter, “Sequential matrix diagonalisation
algorithms for polynomial EVD of para-Hermitian matrices,” IEEE Trans. Signal
Process., vol. 63, no. 1, pp. 81–89, Jan. 2015, doi: 10.1109/TSP.2014.2367460. SP


Luis Albert Zavala-Mondragón , Peter H.N. de With , and Fons van der Sommen
A Signal Processing
Interpretation of Noise-
Reduction Convolutional
Neural Networks
Exploring the mathematical formulation of encoding-decoding CNNs.
©SHUTTERSTOCK.COM/DABARTI CGI
E
ncoding-decoding convolutional neural networks (CNNs) striven to explain the internal operation of these CNNs. Still,
play a central role in data-driven noise reduction and can these ideas are either scattered and/or may require significant
be found within numerous deep learning algorithms. How- expertise to be accessible for a bigger audience. To open up
ever, the development of these CNN architectures is often this exciting field, this article builds intuition on the theory of
done in an ad hoc fashion and theoretical underpinnings for deep convolutional framelets (TDCFs) and explains diverse
important design choices are generally lacking. Up to now, encoding-decoding (ED) CNN architectures in a unified theo-
there have been different existing relevant works that have retical framework. By connecting basic principles from signal
processing to the field of deep learning, this self-contained
material offers significant guidance for designing robust and
Date of current version: 3 November 2023 efficient novel CNN architectures.

Introduction the connections between signal processing and CNNs [17], [18].
A well-known image processing application is noise/artifact This unified treatment of signal processing-inspired CNNs has
reduction of images, which consists of estimating a noise/ resulted in more explainable [6], [8], better performing [6], and
artifact-free signal out of a noisy observation. To achieve this, more memory-efficient designs [19].
conventional signal processing algorithms often employ explicit This article has three main objectives. The first is to sum-
assumptions on the signal and noise characteristics, which has marize the diverse explanations of the components of encoding-
resulted in well-known algorithms such as wavelet shrinkage [1], decoding CNNs applied to image noise reduction based on the
sparse dictionaries [2], total-variation minimization [3], and low- concept of deep convolutional framelets [12], and on elementary
rank approximation [4]. With the advent of deep learning tech- signal processing concepts. Both aspects are considered with the
niques, signal processing algorithms applied to image denoising, aim of achieving an in-depth understanding of the internal opera-
have been regularly outperformed and increasingly replaced by tion of encoding-decoding CNNs, and to show that the design
encoding-decoding CNNs. choices have implicit assumptions about the signal behavior
In this article, rather than conventional signal processing inside the CNN. A second objective is to offer practitioners tools
algorithms, we focus on so-called encoding-decoding CNNs. for optimizing their CNN designs with signal processing con-
These models contain an encoder that maps the input to multi- cepts. Third and finally, the aim is to show practical use cases
channel/redundant representations and a decoder, which maps where existing CNNs are analyzed in a unified framework, there-
the encoded signal back to the original domain. In both, the by enabling a better comparison of different designs by making
encoder and decoder, sparsifying nonlinearities, which suppress their internal operation explicitly visible. Our analysis is based
parts of the signal, are applied. In contrast to conventional signal on existing works [6], [12], [20] by authors who analyzed CNNs
processing algorithms, encoding-decoding CNNs are often pre- where the nonlinearities are ignored. In this article, we overcome
sented as a solution, which does not make explicit assumptions this limitation and present a complete analysis including the non-
on the signal and noise. For example, in supervised algorithms, linear activations, which reveals important assumptions implicit
an encoding-decoding CNN learns the optimal parameters to fil- in the analyzed models.
ter the signal from a set of paired examples of noise/artifact-free The structure of this article is as follows. The “Notation”
images and images contaminated with noise/artifacts [5], [6], [7], section introduces the notation used in this text. The “Encod-
which highly simplifies the solution of the noise-reduction probing-Decoding CNNs” section describes the signal model and
lems as this circumvents use of explicit modeling of the signal the architecture of encoding-decoding networks. Afterward, the
and noise. Furthermore, the good performance and simple use “Signal Processing Fundamentals” section addresses fundamen-
of encoder-decoder CNNs have enabled additional data-driven tal aspects of signal processing, such as singular value decom-
noise-reduction algorithms, where CNNs are embedded as part position (SVD), low-rank approximation, and framelets as well
of a larger system. Examples of such approaches are unsuper- as estimation of signals in the framelet domain. All the concepts
vised noise reduction [8] and denoising based on generative of the “Encoding-Decoding CNNs” and “Signal Processing Fun-
adversarial networks [9]. Besides this, smoothness in signals can damentals” sections converge in the “Bridging the Gap Between
also be obtained by advanced regularization using CNNs, e.g., by Signal Processing and CNNs: Deep Convolutional Framelets and
exploiting data-driven model-based iterative reconstruction [10]. Shrinkage-Based CNNs” section, where the encoding-decoding
Despite the impressive noise-reduction performance and CNNs are interpreted in terms of a data-driven low-rank approxi-
flexibility of encoding-decoding CNNs, these models also have mation, and of wavelet shrinkage. Afterward, based on the
downsides that should be considered. First, the complexity and learnings from the “Bridging the Gap Between Signal Process-
heuristic nature of such designs often offers restricted under- ing and CNNs: Deep Convolutional Framelets and Shrinkage-
standing of the internal operation of such architectures [11]. Based CNNs” section, the “Mathematical Analysis of Relevant
Second, training and deployment of CNNs requires special- Designs” section shows the analysis of diverse architectures from
ized hardware and use of significant computational resources. a signal processing perspective and under a set of explicit assump-
Third and finally, restricted understanding of signal modeling in tions. Afterward, the “What Happens in Trained Models?” sec-
encoding-decoding CNNs does not clearly reveal the limitations tion explores whether some of the theoretical properties exposed
of such models and, consequently, it is not obvious how to over- here are related to trained models. Based on the diverse described
come these problems. models and theoretical operation of CNNs, the “Which Network
To overcome the limitations of encoding-decoding CNNs, Fits My Problem?” section addresses a design criterion that can
new research has tackled the lack of explainability of these be used to design or choose new models and briefly describes the
models by acknowledging similarity of the building blocks of state of the art for noise reduction with CNNs. Finally, the “Con-
encoding-decoding CNNs applied to image noise reduction and clusions and Future Outlook” section elaborates on concluding
the elements of well-known signal processing algorithms, such remarks and discusses the diverse elements that have not yet been
as wavelet decomposition, low-rank approximation [12], [13], (widely) explored by current CNN designs.
[14], variational methods [15], lower-dimensional manifolds
[8], and convolutional sparse coding [16]. Furthermore, practi- Notation
cal works based on the shrinkage-based CNNs inspired in well- CNNs are composed by basic elements such as convolution, ac-
established wavelet shrinkage algorithms has further deepened tivation, and down-/upsampling layers. To achieve better clarity

in the explanations given in this article, we define the mathemati- where N V and N H denote the filter dimensions in the vertical
cal notation to represent the basic operations of CNNs. Some of and horizontal directions, respectively. Finally, we define the to-
the definitions presented here are based on the work of Zavala- tal tensor dimension of A and Q by (N C # N R # N V # N H) and
Mondragón et al. [19]. (N R # 1 # N V # N H), where N R and N C are the number of row
In the following, a scalar is represented by a lower-case letter and column entries, respectively. If the tensor A contains the con-
(e.g., a), while a vector is represented by an underlined lower- volution weights in a CNN, the row-entry dimensions represent
case letter (e.g., a). Furthermore, a matrix, such as an image or the input number of channels to a layer, while the number of
convolution mask, is represented by a boldface lowercase letter column elements denotes the number of output channels.
(e.g., variables x and y). Finally, a tensor is defined by a boldface Having defined the notation for the variables, we focus on
uppercase letter. For example, the two arbitrary tensors A and Q a few relevant operators. First, the transpose of a tensor ($) R,
are defined by expressed by
Q < = ^q 0 f q N R - 1 h.(2)
a 00 f a 0N C - 1 q0
A=f h j h p, Q = f h p.(1) Furthermore, the convolution of two tensors is written as AQ
a 0N R - 1 f a NN RC -- 11 q NR - 1 and specified by
J NR - 1 0 r N
Here, entries a rc and q r represent 2D arrays (matrices). As the K / ar ) q O
defined tensors are used in the context of CNNs, matrices a rc K r=0 O
AQ = K h O .(3)
and q r are learned filters, which have dimensions of (N V # N H), KNR - 1 O
KK / a r N R - 1
) q OO
r
L r=0 P
Table 1. Relevant symbols used in this article.
Here, the symbol defines the convolution between two matri-
)
Symbol Meaning ces (images).
f(2 .) ($) Downsampling operation In this article, images that are 2D arrays (matrices) are often
f(2 -) ($) Upsampling operation convolved with 4D tensors. When this operation is performed,
I Convolution identity images are considered to have dimensions of (1 # 1 # N V # N H).
K Encoding convolution kernel In addition, in this article, matrix I is the identity signal for the
Ku Decoding convolution kernel convolution operator, which, for a 2D image, is the Kronecker
W Filters for the forward discrete wavelet transform delta/discrete impulse (an image with a single nonzero pixel
W u Filters for the inverse discrete wavelet transform with unity amplitude at the center of the image). Furthermore,
WH High-pass filters of the forward discrete wavelet transform we indicate that variables in the decoding path of a CNN are
WL Low-pass filter of the forward discrete wavelet transform u , bu ).
distinguished with a tilde (e.g., K
x Noiseless image
Additional symbols that are used throughout the article are
y Noisy image
the down and upsampling operations by a factor s, which are
h Additive noise
b Bias vector
denoted by f(s .) ($) and f(s -) ($) for downsampling and upsam-
t Threshold level pling, respectively. In this article, both operations are defined in
) Image convolution the same way as in multirate filter banks. For example, consider
Kx Tensor convolution between tensor K and signal x the signal
($)R Transpose of a tensor
x = ^1, 2, 3, 4, 5, 6, 7, 8, 9, 10 h.(4)
($) + ReLU activation
x ($) ($) Generic thresholding/shrinkage operation If we apply the downsampling operator to x by a factor of two,
C ($) ($) Generic clipping operation it results in
z = f(2 .) ( x) = ^1, 3, 5, 7, 9h (5)
Tensor Convolution Sum where z is the downsampled version of x. Conversely, the result
of applying the upsample operator f(2 -) (·) gives the result
∗ +
K0 f(2 -) (z) = ^1, 0, 3, 0, 5, 0, 7, 0, 9, 0 h .(6)

ReLU Shrinkage Clipping
Additional operators used in the article are rectified linear units
b t t (ReLUs), shrinkage/thresholding, and clipping, which are repre-
sented by (·) +, x (·) (·), and C ($) ($), respectively.
Downsampling Upsampling
For better clarity, the most important symbols used in this
↓ ↑
article are listed in Table 1. In addition, graphical representations
FIGURE 1. Symbols used for the schematic representations of the CNNs of some of the symbols that are used to graphically describe
addressed in this article. CNNs are shown in Figure 1.

Encoding-decoding CNNs the memory requirements of the design. The second main part
is the decoder, which maps the multichannel representation back
Signal model and noise-reduction configurations to the original space. The third main part is the nonlinearities,
In noise-reduction applications, the common additive signal which suppress specific parts of the signal. In summary, the most
model is defined by basic encoding-decoding step in a CNN G (·) is expressed by
y = x + h (7)
G (y) = G dec (G enc (y)) (10)
where the observed signal y is the result of contaminating a
noiseless image x with additive noise h. Assume that the noise- where G enc ($) is the encoder, which is generally defined by
less signal x is to be estimated from the noisy observation y. C 0 = E 0 (y),
In deep learning applications, this is often achieved by models C 1 = E 1 (C 0),
with the form
C 2 = E 2 (C 1),

xt = G (y).(8) h
C N - 1 = E N - 1 (C N - 2),
Here, G (·) is a generic encoding-decoding CNN. We refer to
G enc (y) = C N - 1 . (11)
this form of noise reduction as nonresidual. Alternatively, it is
possible to find xt by training G (·) to estimate the noise compo- Here, C n represents the code generated by the nth encoding
nent ht and subtracting it from the noisy image y to estimate the E n (·), which can be expressed by
noiseless image xt , or equivalently,
C n = E n (C n - 1) = f(s .) (A ( b n - 1) (K n - 1 C n - 1)) .(12)
xt = y - G (y).(9)
This model is referred to as residual [5], [7], [21] because the Here, the function A (·) (·) is a generic activation used in the
output of the network is subtracted from its input. For reference, encoder, and f(s .) (·) is a downsampling function by factor s.
Figure 2 portrays the difference of the placement of the encoding- Complementary to the encoder, the decoder network maps the
decoding structure in residual and nonresidual configurations. multichannel sparse signal back to the original domain. Here, we
define the decoder by
Encoding-decoding CNNs
Encoding-decoding (convolutional) neural networks are rooted u N - 2 = D N - 1 (C N - 1),
C
in techniques for data-dimensionality reduction and unsuper- h
vised feature extraction, where a given signal is mapped to an al- u 1 = D 2 (C
C u 2),
ternative space via a nonlinear transformation. This space should u 0 = D 1 (C
C u 1),
have properties that are somehow attractive for the considered G (y) = D 0 (C u 0) (13)
task. For example, for dimensionality reduction, the alternative
space should be lower dimensional than the original input. In t n is the nth decoded signal, which is produced by the nth
where C
this article, we are interested in models that are useful for noise- decoder layer, yielding the general expression
reduction applications. Specifically, this manuscript addresses
models that are referred to as encoding-decoding CNNs, such u n - 1 = D n (C
C u n) = A u n)) .(14)
u Rn f(s -) (C
u ( bu ) (K
as the model by Ranzato et al. [22], in which the encoder uses
convolution filters to produce multichannel/redundant repre- In (14), A u (·) (·) is the activation function used in the decoder, and
sentations, in which sparsifying nonlinearities are applied. The f(s -) (·) is an upsampling function of factor s.
sparsified signal is later mapped back to the original representa-
tion. It should be noted that despite the fact that the origins of
the encoding-decoding CNNs are linked to feature extraction, Nonresidual Configuration
this type of architecture was quickly shown to be useful for other
Noisy Encoding-Decoding Noiseless
applications such as noise reduction, which is the topic of this Input Estimate
Network G (⋅)
article. For the rest of this manuscript, whenever we mention an
encoding-decoding CNN, we are referring to a design that fol-
lows the same basic principles as Ranzato’s design. Residual Configuration
Noisy Noiseless
It can be observed that encoding-decoding CNNs are consti- +
Input Estimate
tuted of three main parts. The first is the encoder, which maps the −
Encoding-Decoding Estimated Noise
incoming image to a representation with more image channels Network G (⋅)
with a convolution layer. Every channel of the resulting redundant
representation contains a fraction of the content of the original
FIGURE 2. Residual and nonresidual network configurations. Note that the
signal. It should be noted that the encoder often (but not neces- main difference between both designs is the global skip connection oc-
sarily) decreases the resolution of the higher-dimensional repre- curring in the residual structure. Still, it can be observed that the network
sentation to enable multiresolution processing, and to decrease G ($) may contain skip connections internally.

An important remark is that the encoder-decoder CNN does N SV - 1
not always contain down-/upsampling layers, in which case, the y= / ( u n v nR) $ v [n](15)
n=0
decimation factor s is unity, which causes f(1-) (x) = f(1.) (x) = x
for any matrix x. Furthermore, it should also be noted that we in which N SV is the number of singular values, n is a scalar in-
assume that the number of channels of the code C N is always dex, while u n and v n are the nth left and right singular vectors,
larger than the previous one: C N - 1 . Furthermore, it should be respectively. Furthermore, vector v contains the singular values,
noted that a single encoder layer E n (·) and its corresponding and each of its entries v [n] is the weight assigned to every basis
decoder layer D n (·) can be considered a single-layer encoder- pair u n, v n . This means that the product ( u n v Rn ) contributes more
decoder network/pair. to the image content for higher values of v [n]. It is customary
For this article, the encoding convolution filter for a given for the singular values to be ranked in descending order and for
layer K has dimensions of (N o # N i # N h # N v), where N i and the amplitudes of the singular values v to be sparse, therefore,
N o are the number of input and output channels for a convolution v [0] & v [N SV - 1]. The reason for this sparsity is because im-
layer, respectively. Similarly, N h and N v are the number of ele- age (patches) intrinsically have high correlation. For example,
ments in the horizontal and vertical directions, respectively. Note many images contain repetitive patterns (e.g., a wall with bricks,
that the encoder increases the number of channels of the signal a fence, rooftop tiles, or a zebra’s stripes) or uniform regions (for
(e.g., N o 2 N i), akin to Ranzato’s design [22]. Furthermore, it is example, the sky or a person’s skin). This means that an image
assumed that the decoder is symmetric in the number of channels patch may contain only a few linearly independent vectors that
to the encoder, therefore, the dimensions of the decoding con- describe most of the image’s content. Consequently, a higher
volution kernel K u R are (N i # N o # N h # N v). The motivation of weight is assigned to such image bases.
this symmetry is to emphasize the similarity between the signal Given that the amplitudes of the singular values of y in SVD are
processing and the CNN elements. sparse, it is possible approximate yt with only a few bases: ( u n v Rn ).
Note that this procedure reduces the rank of signal y, and hence it
Signal processing fundamentals is known as low-rank approximation. This process is equivalent to
As shown by Ye et al. [12], within encoding-decoding CNNs,
N LR - 1
the signal is treated akin to well-known sparse representations, yt = / ( u n v nR) $ v [n](16)
where the coefficients used for the transformation are directly n=0
learned from the training data. Prior to addressing this impor-
tant concept in more detail, relevant supporting concepts such as where N SV 2 N LR . Note that this effectively cancels the product
sparsity, sparse transformations, and nonlinear signal estimation ( u n v Rn ), where the weight given by v [n] is low. Alternatively, it
in the wavelet domain are explained. is possible to assign a weight of zero to the product ( u n v Rn ) for
n $ N LR .
Sparsity The low-rank representation of a matrix is desirable for
A sparse image is a signal where most of the coefficients are small diverse applications, among which we can find image denois-
and the relatively few large coefficients capture most of the infor- ing. The motivation for using low-rank approximation for this
mation [23]. This characteristic allows to discard low-amplitude application results from the fact that, as mentioned earlier, natu-
components with relatively small perceptual changes. Hereby, the ral images are considered low rank due to the strong spatial cor-
use of sparse signals is attractive for applications such as image relation between pixels, whereas noise is high rank (it is spatially
compression, denoising, and suppression of artifacts. uncorrelated). As a consequence, reducing the rank/number of
Despite the convenient characteristics of sparse signals, natu- singular values decreases the presence of noise while still provid-
ral images are often nonsparse. Still, there are numerous transfor- ing a good approximation of the noise-free signal, as exemplified
mations that allow for mapping the signal to a sparse domain and in Figure 3.
that are analogous to the internal operations of CNNs. For exam-
ple, SVD factorizes the image in terms of two sets of orthogonal Framelets
bases, of which few basis pairs contain most of the energy of Just as with SVD, framelets are also commonly used for image
the image. An alternative transformation is based on framelets, processing. In a nutshell, a framelet transform is a signal rep-
where an image is decomposed in a multichannel representation, resentation that factorizes/decomposes an arbitrary signal into
whereby each resulting channel contains a fragment of the Fou- multiple bands/channels. Each of these channels contains a seg-
rier spectrum. In the following sections, we address all of these ment of the energy of the original signal. In image and signal
representations in more detail. processing, the framelet bands are the result of convolving the
analyzed signal with a group of discrete filters that have finite
Sparse signal representations length/support. In this article, the most important characteristic
that the filters of the framelet transform should comply with is
SVD and low-rank approximation that the bands they generate capture all the energy contained on
Assume that an image (patch) is represented by a matrix y with the input to the decomposition. This is important to avoid the loss
dimensions of (N r # N c), where N r and N c are the number of of information of the decomposed signal. In this text, we refer to
rows and columns, respectively. Then, the SVD factorizes y as framelets that comply with the previous characteristics as tight

framelets, and the following paragraphs describe this property Nonlinear signal estimation in the framelet domain
in more detail. As mentioned in the “Framelets” section, framelets decompose a
In its decimated version, the framelet decomposition for tight given image y by convolving it with a tensor F. Note that many
frames is represented by of the filters that compose F have a high-pass nature. Images
often contain approximately uniform regions in which the varia-
Yfram = f(2 .) (Fy) (17)
tion is low, therefore, convolving a signal y with a high-pass filter
in which Yfram is the decomposed signal, and F is the framelet f h – where f h ! F – produces the sparse detail band d = f h ) y in
basis (tensor). Note that the signal Yfram has more channels than which uniform regions have low amplitudes, while transitions,
y. Furthermore, the original signal y is recovered from Yfram by i.e., edges, contain most of the energy of the bands.
Assume a model in which a single pixel d ! d is observed,
y = Fu R f(2 -) ( Yfram) · c.(18)
which is contaminated with additive noise h. Then, the resulting
Here, Fu is the filter of the inverse framelet transform and c denotes observed pixel z is defined by
an arbitrary constant. If c = 1, the framelet is normalized. Finally,
z = d + h.(22)
note that the framelet transform can also be undecimated. This
means that in undecimated representations, the downsampling and To recover the noiseless pixel d from observation z, it is possible
upsampling layers, f(2 .) ($) and f(2 -) ($), are not used. An important to use the point-maximum a posteriori (MAP) estimate [1], [24],
property of the undecimated representation is that it is less prone defined by the maximization problem
to aliasing than its decimated counterpart, but more computation-
ally expensive. Therefore, for efficiency reasons, the decimated dt = argmax [ln (P (d ; z))] .(23)
d
framelet decomposition is often preferred over the undecimated
representation. In summary, the decomposition and synthesis Here, the log-posterior ln (P (d ; z)) is defined by
of the decimated framelet decomposition is represented by
ln (P (d ; z)) = ln (P (z ; d )) + ln (P (d )) (24)
y = Fu R f(2 -) ( f(2 .) (Fy)) $ c (19)
where the conditional probability density function (PDF) P (z ; d )
while for the undecimated framelet it holds that expresses the noise distribution, which is often assumed Gauss-
ian and defined by
y = Fu R (Fy) · c .(20)
P (z ; d ) ? exp e - o .(25)
(z - d ) 2
2v 2h
A notable normalized framelet is the discrete wavelet trans-
form (DWT), where variables F and Fu are replaced by tensors Here, v 2h is the noise variance. Furthermore, as prior probabil-
W = ^w LL, w LH, w HL, w HHh and W u = ^w u LL, w
u LH, w u HHh,
u HL, w ity, it is assumed that the distribution of P (d ) corresponds to
respectively. Here, w LL is the filter for the low-frequency band, a Laplacian distribution, which has been used in wavelet-based
while w LH, w HL, w HH are the filters used to extract the detail in denoising [1]. Therefore, P (d ) is mathematically described by
the horizontal, vertical, and diagonal directions, respectively.
Finally, wu LH w
u LH, w
u HL, w
u HH are the filters of the inverse deci-
mated DWT. Clean Clean NSV = 8 Clean NSV = 32
To understand the DWT more intuitively, Figure 4 shows the
decimated framelet decomposition using the filters of the DWT.
Note that the convolution Wy results in a four-channel signal,
where each channel contains only a fraction of the spectrum of
image y. This allows for downsampling of each channel with mini-
mal aliasing. Furthermore, to recover the original signal, each indi-
vidual channel is upsampled, thereby introducing aliasing, which Noisy Noisy NSV = 8 Noisy NSV = 32
is then removed by the filters of the inverse transform. Finally, all
the channels are added and the original signal is recovered.
Analogous to the low-rank approximation, in framelets, the
reduction of noise is achieved by setting the noisy components
to zero. These components are typically assumed to have low
amplitude when compared to the amplitude of the sparse signal,
as expressed by FIGURE 3. SVD reconstruction of clean and corrupted images with a
different number of singular values. Note that reconstruction of the clean
yu = Fu R f(2 -) (x (t) ( f(2 .) (Fy))) $ c (21) image with eight or 32 singular values (N SV = 8 or N SV = 32, respec-
tively) yields to reconstructions indistinguishable from the original image.
where x t ($) is a generic thresholding/shrinkage function, which This contrasts with their noisy counterparts, where N SV = 8 reconstructs
sets each of the pixels in f(2 .) (Fy) to zero when values are lower a smoother image in which the noise is attenuated, while N SV = 32
than the threshold level t . reconstructs the noise texture perfectly.

44
Forward Transform Inverse Transform
y y
Wy f (2↓)(Wy ) f (2↓) A f (2↓)(Wy )B W f (2↑) A f (2↓)(Wy )B
W W
Convolutional Framelet Basis Downsampling Upsampling Convolutional Inverse Transformation Framelet
F{wLL ∗ y} Ff (2↓){wLL ∗ y} F{ f (2↑) A f (2↓) (wLL ∗ y)B } F{ wLL ∗ f (2↑) A f (2↓) (wLL ∗ y)B }
F{wLH ∗ y} Ff (2↓){wLH ∗ y} F{ f (2↑) A f (2↓) (wLH ∗ y)B } F{ wLH ∗ f (2↑) A f (2↓) (wLH ∗ y)B }
Input F{ W f (2↑) A f (2↓) (Wy )B }
F{y} y
IEEE SIGNAL PROCESSING MAGAZINE

|
F{wHL ∗ y} Ff (2↓){wHL ∗ y} F{ f (2↑) A f (2↓) (wHL ∗ y)B } F{ wHL ∗ f (2↑) A f (2↓) (wHL ∗ y)B }
November 2023
|
F{wHH ∗ y} Ff (2↓){wHH ∗ y} F{ f (2↑) A f (2↓) (wHH∗ y)B } F{ wHH ∗ f (2↑) A f (2↓) (wHH ∗ y)B }
FIGURE 4. 2D spectrum analysis of the decimated discrete framelet decomposition and reconstruction. In the figure, function F " $ , stands for the amplitude Fourier spectrum of the input argument. The yellow
squares indicate a region in the low-frequency area of the Fourier spectrum, while the orange, purple and blue squares indicate the high-pass/detail bands. For these images, ideal orthogonal bases are assumed.
Note that the forward transform is composed by two steps. First, the signal is convolved with the wavelet basis ^ Wy h . Afterward, downsampling is applied to the signal ^ f(2 .) (Wy) h . During the inverse transfor-
u
mation, the signal is upsampled by inserting zeros between each sample ^ f(2 -) (f(2 .) (Wy)) h, which causes spatial aliasing (dashed blocks). Finally, the spatial aliasing is removed by the inverse transform filter W
and all the channels are added ^ W u R f(2 -) (f(2 .) (Wy)) h .
For reference and further understanding, Figure 6 portrays
P (d ) ? exp c - m, (26)
;d ;
vd the elements composing the noise model of (22), signal-transfer
where v d is the dispersion measure of the Laplace distribution. characteristics of the ReLU, soft-shrinkage and clipping func-
For reference, Figure 5 portrays an example of both a Gaussian tions, and the effect that these functions have on the signal of the
and a Laplacian PDF. Note that the Laplacian distribution has a observed noisy detail band z.
higher probability of zero elements occurring than the Gaussian
distribution for the same standard deviation. Finally, substituting ReLU
(25) and (26) in (24) results in If (27) is solved for d while constraining the estimator to be posi-
tive, the noiseless estimate dt becomes
(z - d ) 2 ; d ;
ln (P (d ; z)) ? - - .(27)
2v 2h vd dt = (z - t) + (28)
In (27), maximizing d in ln (P (d ; z)) with

the first derivative criterion, in an (un) con- 0.4
Probability
strained way, leads to two common activa-
0.2
tions in noise-reduction CNNs: the ReLU
and the soft-shrinkage function. Further- 0.0
more, the solution also can be used to derive −4 −2 0 2 4 −4 −2 0 2 4
(a) (b)
the so-called clipping function, which is use-
ful in residual networks. FIGURE 5. The probability density function for (a) Gaussian and (b) Laplacian distributions.
Noiseless dn Noise ηn Contaminated Signal zn
1
Amplitude
−1
0 5 10 0 5 10 0 5 10
Sample Index
(a)
Soft (·)
(· − t)+ t(t) C(t)(·)
0.4
Output Amplitude
0.2
−0.2
−0.4
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

Input Amplitude zn
(b)
Soft (z )
t(t)
(zn − t)+ n C(t)(zn)
1
Amplitude
−1
0 5 10 0 5 10 0 5 10
Sample Index
(c)
FIGURE 6. Signals involved in the additive noise model, input/output transfer characteristics of activation layers and estimates produced by the activation
layers when applied to the noise-contaminated signal. (a) The signals involved in the additive noise model. (b) The output amplitude of activation func-
tions with respect to the input amplitude. (c) Finally, the application of the activation functions to the noisy observation z.

which is also expressed by where C Soft
(t) ( $ ) is the soft-clipping function. Note that this func-
tion also can be expressed by
(z - t) + = '
z - t, if z $ t,
(29) t, if z $ t,
* z, if t $ z 2 - t, (35)
0, if t 2 z.
C Soft
(t) (z) =
Here, the threshold level is defined by - t, if - t $ z.
2
vh
t=
vd
.(30) Other thresholding layers
One of the main drawbacks of the soft-threshold activation is
Note that this estimator cancels the negative and low-amplitude that it is a biased estimator. This limitation has been addressed
elements of d lower than the magnitude of the threshold level t. by the hard and semihard thresholds, which are (asymptotically)
For example, if the signal content on the feature map is low, then unbiased estimators for large input values. In this section, we fo-
v d " 0. In such case, t " + 3 and, consequently, dt " 0. This cus solely on the semihard threshold and avoid the hard variant
means that the channel is suppressed. Alternatively, if the fea- because is discontinuous and therefore not suited to models that
ture map has strong signal presence, i.e., v d " 3, consequently, rely on gradient-based optimization, such as CNNs.
t " 0, and then dt " (z) + . Among the semihard thresholds, two notable examples are
A final remark is made on the modeling of functions of a the garrote shrink and the shrinkage functions generated by
CNN. It should be noted that the estimator of (28) is analogous derivatives of Gaussians (DoGs) [19], [26]. The garrote shrink
to the activation function of a CNN, known as an ReLU. How- function x Soft
($) ( $ ) is defined by
ever, in a CNN, the value of t would be the bias b learned from
the training data. Gar (z 2 - t 2) +
x (t) (z) = .(36)
z
Soft shrinkage/thresholding
If (27) is maximized in an unconstrained way, the estimate dt is Furthermore, an example of a shrinkage function based on the
DoG is given by
dt = x Soft
(t) (z) = (z - t) + - (- z - t) + . (31) DoG DoG
x (t) (z) = z - C (t) (z) (37)
Here, x Soft
(t) ( $ ) denotes the soft-shrinkage/-thresholding function, where the semihard clipping function with the DoG C DoG
($) ( $ ) is
which is often also written in the form given by
(t) (z) = z · exp c - m(38)

zp
z + t, if z $ t, C DoG
x (t) (z) = * 0,
Soft tp
if t 2 z $ - t, (32)
z - t, if - t 2 z. in which p is an even number.
The garrote and semihard DoG shrinkage functions are
It can be observed that the soft threshold enforces the low-ampli- shown in Figure 7 as well as their clipping counterparts. Note
tude components whose magnitude is lower than the magnitude the shrinkage functions’ approximate unity for | z | & t, therefore,
threshold level t to zero. In this case, t is also defined by (30). they are asymptotically unbiased for large signal values.
It should be noted that the soft-shrinkage estimator can also be The final thresholding function addressed in this section is
obtained from a variational perspective [25]. Finally, it can be the linear expansion of thresholds (LETs) proposed by Blu and
observed that soft shrinkage is the superposition of two ReLU Luisier [26]. This technique, known as LETs, combines multiple
functions, which has been pointed out by Fan et al. [18]. thresholding functions to improve performance and is defined by
NT - 1
Soft clipping LET
x ( t ) (z) = / a n · x (t n) (z) (39)
n=0
In the “ReLU” and “Soft Shrinkage/Thresholding” sections, the
estimate dt is obtained directly from the noisy observation z. Al- where a n is the weighting factor assigned to each threshold
ternatively, it is possible to estimate the noise h and subtract it where all weighting factors should add up to unity.
from z, akin to the residual CNNs represented by (9). This can be
achieved by solving the model Bridging the gap between signal processing
and CNNs: Deep convolutional framelets
ht = z - dt = z - x (t) (z) (33) and shrinkage-based CNNs
Soft
The next sections address the theoretical operation of noise-

which is equivalent to reduction CNNs based on ReLUs and shrinkage/thresholding
functions. The first part describes the TDCFs [12] and is the
most extensive study on the operation of encoding-decoding
Soft
ht = C (t) (z) = z - ((z - t) + - (- z - t) +) (34) ReLU-based CNNs up to now. Afterward, we focus on the

operation of networks that use shrinkage functions instead of u,
ditions, the encoding and decoding convolution filters, K and K
ReLUs [17], [18], [19], with the aim of mimicking well-estab- respectively, should comply with
lished denoising algorithms [1]. Finally, the final part addresses
the connections between both methods and additional links be- u R (Ky) + · c .(40)
y=K
tween CNNs and signal processing.
It can be noted that (40) is an extension of (20), which de-
TDCFs scribes the reconstruction characteristics of tight framelets.
The TDCFs [12] describes the operation of encoding-decoding From this point on, we refer to convolutional kernels compli-
ReLU-based CNNs. Its most relevant contributions are 1) to es- ant with (40) as phase-complementary tight framelets. As a fi-
tablish the equivalence of framelets and the convolutional layers nal remark, it should be noted that a common practice in CNN
of CNNs, 2) to provide conditions that preserve the signal integ- designs is to also use ReLU nonlinearities in the decoder. In
rity within an ReLU CNN, and 3) explain how ReLUs and con- such a case, the phase-complementary tight-framelet condi-
volution layers reduce noise within an encoding-decoding CNN. tion can still be met as long as the pixels y ! y comply with
The similarity between framelets and the encoding and y $ 0, which is equivalent to
decoding convolutional filters can be observed when comparing
(12) and (14) with (17) and (18), where it becomes visible that u R (Ky) + · c) + .(41)
y = (y) + = (K
the convolution structure of encoding-decoding CNNs is analo-
gous to the forward and inverse framelet decomposition. It can be observed that the relevance of the properties defined in
Regarding the signal reconstruction characteristics, the (40) and (41) is that they ensure that a CNN can propagate any
TDCFs [12] states the following. First, to be able to recover an arbitrary signal, which is important to avoid any distortions (such
arbitrary signal y ! R N , the number of output channels of a con- as image blur) in the processed images.
volution layer with ReLU activation should at least double the An additional element of the TDCFs regarding reconstruction
number of input channels. Second, the encoding convolution of the signal is to show that conventional pooling layers (e.g.,
kernel K should be composed of pairs of filters with opposite average pooling) can discard high-frequency information of the
phases. These two requirements ensure that any negative and signal, which effectively blurs the processed signals. Further-
positive values propagate through the network. Under these con- more, Ye et al. [12] have demonstrated that this can be fixed by
DoG DoG
Transfer t(t) (·) Transfer C(t) (·)
1
0.5
Output Amplitude
−0.5
−1
−1 −0.75 −0.5 −0.25 0 0.25 0.5 0.75 1 −1 −0.75 −0.5 −0.25 0 0.25 0.5 0.75 1
Gar Gar
Transfer t(t) (·) Transfer C(t) (·)
1
0.5
Output Amplitude
−0.5
−1
−1 −0.75 −0.5 −0.25 0 0.25 0.5 0.75 1 −1 −0.75 −0.5 −0.25 0 0.25 0.5 0.75 1
FIGURE 7. Transfer characteristics of the semihard thresholds based on the difference of Gaussians and of the garrote shrink as well as their clipping
counterparts. Note that in contrast with the soft-shrinkage and clipping functions shown in Figure 6, the semihard thresholds tend to unity for large
values, while the semihard clipping functions tend to zero for large signal intensities.

replacing the conventional up-/downsampling layers by revers- values of the feature maps by clipping the signal. Therefore,
ible operations, such as the DWT. To exemplify this property, we in residual networks, the shrinkage functions can be explicitly
refer to Figure 4. If only an average pooling layer followed by an replaced by clipping activations.
upsampling stage were to be applied, the treatment of the signal Visual examples of the operation of a single-layer shrinkage
would be equivalent to the low-frequency branch of the DWT. and of clipping networks are presented in Figure 9, where it can
Consequently, only the low-frequency spectrum of the signal be noted that the operation of shrinkage and clipping networks is
would be recovered and the images processed with that struc- analogous to their ReLU counterparts, with the main difference
ture would become blurred. In contrast, if the full-forward and being that shrinkage and clipping networks do not require phase
inverse wavelet transform of Figure 4 is used for up and downs- complements in the encoding and decoding layers as ReLU-
ampling, it is possible to reconstruct any signal, irrespective of based CNNs do.
its frequency content.
The ultimate key contribution of the TDCFs is its explana- Shrinkage and clipping in ReLU networks
tion of the operation of ReLU-based noise-reduction CNNs. As addressed in the “Nonlinear Signal Estimation in the Framelet
For a nonresidual configuration, ReLU CNNs perform the fol- Domain” section, the soft-threshold function is the superposition
lowing operations. 1) The convolution filters decompose the of two ReLU activations. As a consequence, it is feasible that in
incoming signal into a sparse multichannel representation. ReLU CNNs, shrinkage behavior could arise in addition to the
2) The feature maps, which are uncorrelated to the signal, con- low-rankness enforcement mentioned in the “TDCFs” section. It
tain mainly noise. In this case, the bias and the ReLU activation should be noted that this can only happen if the number of chan-
cancel the noisy feature maps in a process analogous to the MAP nels of the encoder and decoder complies with the redundancy
estimate shown in the “ReLU” section. 3) The decoder recon- constraints of the TDCFs, and if the decoder is linear. To prove
structs the filtered image. Note that this process is analogous to this, (31) is reparameterized as
the low-rank decomposition described in the “SVD and Low-
Rank Approximation” section. In the case of residual networks, dt = K
u R (Ky + b) + (42)
the CNN learns to estimate the noise, which means that in that
configuration the ReLU nonlinearities suppress the channels where convolution filters K and K u R are defined by K = ((I
with high activation. R
- I)) and K u = ^Î - Ihh, respectively, and b = ^- t - t hR
R
A visual example of low-rank approximation in ReLU CNNs represents the threshold value.
is shown in Figure 8, illustrating the operation of an idealized In addition to the soft-shrinkage function, note that the
single-layer encoding-decoding ReLU CNN operating in both clipping function described by (34) also can be expressed by
a residual and nonresidual way. It can be noted ReLU activation (42) if K = ^Î - I I - IhhR, K u R = ^Î - I - I Ihh, and
suppresses specific channels in the sparse decomposition pro- b = ^0 0 - t - t hR . It can be noted that representing the clip-
vided by the encoder, thereby preserving the low-rank structures ping function in convolutional form requires four-times-more
in the nonresidual network. Alternatively, in the residual exam- channels than the original input signal.
ple, the ReLUs eliminate the feature maps with high activation, It should be noted that the ability of ReLUs to approximate
which results in a noise estimate that is subtracted from the input other signals has also been observed by Daubechies et al. [29],
to estimate the noiseless signal. who have proven that deep ReLU CNNs are universal function
approximators. In addition, Ye and Sung [13] have demonstrated
Shrinkage and clipping-based CNNs that the ReLU function is the main source of the high-approxi-
As in ReLU networks, the encoder of shrinkage networks [17], mation power of CNNs.
[18], [19] separates the input signal in a multichannel representa-
tion. As a second processing stage, shrinkage networks estimate Additional links between encoding-decoding CNNs and
the noiseless encoded signal by canceling the low-amplitude pix- existing signal processing techniques
els in the feature maps in a process akin to the MAP estimate Up to now, it has been assumed that operation of the encoding
of the “Soft Shrinkage/Thresholding” section. As final step, the and decoding convolution filters is limited to mapping the input
encoder reconstructs the estimated noiseless image. Note that image to a multichannel representation and to reconstructing it
the use of shrinkage functions reduces the number of channels (i.e., K and K u R comply with K u R (K) + = I $ c) . Still, it is pos-
required by ReLU counterparts to achieve perfect signal recon- sible that, in addition to performing decomposition and synthesis
struction because the shrinkage activation preserves positive and tasks, the encoding-decoding structure also filters/colors the sig-
negative values, while ReLUs preserve only the positive part of nal in a way that improves image estimates. It should be noted
the signal. that this implies that the perfect reconstruction encoding-decod-
As shown in the “Signal Model and Noise-Reduction Con- ing structure is no longer preserved. For example, consider the
figurations” section, in residual learning, a given encoding- following linear encoding-decoding structure
decoding network estimates the noise signal h so that it can be u R (Ky) (43)
xt = K
subtracted from the noisy observation y to generate the noiseless
estimate xt . As shown in the “Soft Clipping” section, in the fra- which can be reduced to
melet domain, this is achieved by preserving the low-amplitude xt = k ) y.(44)

Nonresidual Network Decoder
Encoder
Ky (Ky + b) + K (Ky + b)+
+wLL ∗ y (+wLL ∗ y + b[0])+
Input
y Encoding –wLL ∗ y ReLU (–wLL ∗ y + b[1])+ Decoding
X
Convolution Activation Convolution
+wLH ∗ y (+wLH ∗ y + b[2])+
–wLH ∗ y (–wLH ∗ y + b[3])+
+wHL ∗ y (+wHL ∗ y + b[4])+
–wHL ∗ y (–wHL ∗ y + b[5])+

+wLL K
(+wHH ∗ y + b[6])+
–wLL +wHH ∗ y
+wLH (–wHH ∗ y + b[7])+
K –wLH –wHH ∗ y
+wHL 2
–wLL
+wLL
–wLH
–wHL
+wLH
+wHL
–wHH
+wHH
–wHL
0
Bias
+wHH
–wHH –2
1 2 3 4 5 6 7 8
Channel Index
Residual Network
Encoder Decoder
Ky (Ky + b) + K (Ky + b)+

+wLL ∗ y

(+wLL ∗ y + b[0])+
Input
|
–wLL ∗ y ReLU Decoding η
y Encoding (–wLL ∗ y + b[1])+ X
Convolution +wLH ∗ y Activation Convolution
(+wLH ∗ y + b[2])+
−
–wLH ∗ y (–wLH ∗ y + b[3])+
November 2023
+wHL ∗ y (+wHL ∗ y + b[4])+
|
–wHL ∗ y (–wHL ∗ y + b[5])+ K
+wLL +
–wLL +wHH ∗ y (+wHH ∗ y + b[6])+
+wLH
–wLH –wHH ∗ y (–wHH ∗ y + b[7])+
b
K +wHL 2
–wLL
+wLL
–wLH
–wHL
+wLH
+wHL
–wHH
+wHH
–wHL
+wHH 0
–wHH
Bias b [n]
–2
0 1 2 3 4 5 6 7
Channel Index n
FIGURE 8. Operation of a simplified denoising (non)residual ReLU CNN according to the TDCFs. In the figure, the noisy observation y is composed by two vertical bars plus uncorrelated Gaussian noise. Further-
more, for this example, the encoding and decoding convolution filters (K and K u , respectively) are the Haar basis of the 2D DWT and its phase-inverted counterparts. Given the content of the image, the image
in the decomposed domain Ky produces only a weak activation for the vertical and diagonal filters (w LH and w HH, respectively), and those feature maps contain mainly noise. In the case of the nonresidual
network, the ReLUs and biases suppress the channels with low activation [see the ^ Ky + b h+, column], which is akin to the low-rank approximation. In contrast, in the residual example, the channels with image
content are suppressed while preserving the uncorrelated noise. Finally, the decoding section reconstructs the noise-free estimate xu for the nonresidual network or the noise estimate ht for the residual example,
where it is subtracted from y to compute the noiseless estimate xt .
49
50
Nonresidual Network
Encoder Decoder
Ky DoG DoG
τ (b) (Ky) K τ (b) (Ky)
Input
Shrinkage Decoding
y Encoding DoG X
wLL ∗ y Activation τ (b[0]) (wLL ∗ y) Convolution
Convolution
wLH ∗ y DoG
τ (b[1]) (wLH ∗ y)
wHL ∗ y DoG
τ (b[2]) (wHL ∗ y)
wHH ∗ y DoG
τ (b[3]) (wHH ∗ y)
b K
+wLL 0.5
K –wLH
0
Bias
+wHL
+wLL
–wHH
–wLH
+wHL
–wHH
–0.5
1 2 3 4 5 6 7
Channel Index
Residual Network
Encoder Decoder

Ky DoG DoG
|
(b) (Ky) K (b) (yK)
Input
y Encoding Clipping DoG Decoding η X
Convolution wLL ∗ y Activation (b[0]) (wLL ∗ y)
Convolution
November 2023
wLH ∗ y DoG −
|
(b[1]) (wLH ∗ y)
wHL ∗ y DoG
(b[2]) (wHL ∗ y)
wHH ∗ y DoG
(b[3]) (wHH ∗ y)
b
0.5 K +
+wLL
K –wLH 0
Bias
+wHL
–wHH –0.5
+wLL
–wLH
+wHL
–wHH
1 2 3 4 5 6 7
Channel Index
FIGURE 9. Operation of denoising in shrinkage and clipping networks. In the nonresidual configuration, the noisy signal y is decomposed by a set of convolution filters, which, for this example, are the 2D Haar
basis functions of the DWT (Ky). As a second step, the semihard shrinkage produces an MAP estimate of the noiseless detail bands/feature maps ^ x DoG ( b ) (Ky) h . As third and final step, the decoder maps the
estimated noiseless encoded signal to the original image domain. In the residual network, the behavior is similar, but the activation layer is a clipping function that performs an MAP estimate of the noise in the
feature maps, which is reconstructed by the decoder to generate the noise estimate ht . After reconstruction, the noise estimate is subtracted from the noisy observation y to generate the noise-free estimate xu .
Here, k = K u R K is optimized to reduce the distance between y computational requirements needed to execute each network,
and the ground truth x. Consequently, the equivalent filter k can and its overall complexity.
be considered a Wiener filter. It should be noted that this article Signal reconstruction analysis provides a theoretical indica-
is not the first to address the potential Wiener-like behavior of a tion that a given CNN design can propagate any arbitrary signal
CNN. For example, Mohan et al. [14] suggested that by elimi- when considering the use of ideal filters (i.e., they provide perfect
nating the bias of the convolution layers, the CNN could behave reconstruction and are maximally sparse). In other words, for a
more akin to the Wiener filter and be able to generalize better to fixed network architecture, there exists a selection of parameters
unseen noise levels. It should be noted that by doing so, the CNN (weights and biases) that make the neural network equal to the
can also behave akin to the switching behavior described by the identity function. This result is important because a design that
TDCFs, which can be described by the equation cannot propagate arbitrary signals under ideal conditions will
potentially distort the signals that propagate through it by design.
(z) + = '
z, if z $ 0, Consequently, this cannot be fixed by training with large datasets
(45)
0 if z 1 0 and/or with the application of any special loss term. To better
understand the signal reconstruction analysis, we provide a brief
where z is a pixel that belongs to the signal z = k ) x. It can be example where it is a nonresidual CNN G ($), where we propa-
observed that in contrast with the low-rank behavior described gate a noiseless signal x contaminated with noise h so that
in the “TDCFs” section, in this case, the switching behavior is
only dependent on the correlation between signal x and filter k. x . G (x + h) .(46)
Consequently, if the value of z is positive, its value is preserved.
On the contrary, if the correlation between x and k is negative, Here, an ideal CNN allows us to propagate any x while cancel-
then the value of z is canceled. Consequently, the noise reducing the noise component h, irrespective of the content of x. If we
tion becomes independent/invariant of the noise level. It can be switch our focus to an ideal residual CNN R ($), it is possible to
observed that this effect can be considered a nonlinear extension observe that
of signal annihilation filters [30].
It should be noted that aside from the low-rank approxima- xt . R (y) = y - G (y) .(47)
tion interpretation of ReLU-based CNNs, additional links to
other techniques can be derived. For example, the decomposi- Here, G ($) is the encoding-decoding section of the residual net-
tion and synthesis provided by the encoding-decoding structure work R ($) . Consequently, it is desirable that the network G ($) is
is also akin to the nonnegative matrix factorization [31], in which able to propagate the noise h, while suppressing the noiseless
a signal is factorized as a weighted sum of positive bases. In this signal x, which is equivalent to
conception, feature maps are the bases, which are constrained
to be positive by the ReLU function. Furthermore, an additional h . G (x + h) . (48)
interpretation of encoding-decoding CNNs can be obtained by
analyzing them from a low-dimensional manifold representa- It should be noted that in both residual and nonresidual cases,
tion perspective [8]. Here, the convolution layers of CNNs are there are two behaviors. On one hand, there is a signal that the
interpreted as two operations. On one hand they can provide network decomposes and reconstructs (almost) perfectly, and on
a Hankel representation, and on the other they can provide a the other a signal is suppressed. Signal reconstruction analysis
bottleneck that reduces dimensionality of the manifold of image focuses on the signals that the network can propagate or recon-
patches. It should be noted that the Hankel-like structure attrib- struct, rather than the signal cancelation behavior. Consequently,
uted to the convolution layers of CNNs has also been noted by we focus on the linear part of G ($) (i.e., its convolution structure),
the TDCFs [12]. Two final connections with signal process- of which, according to the “TDCFs” section, we assume that it
ing and CNNs are the variational formulation combined with handles decomposition and reconstruction of the signal within
kernel-based methods [15] and the convolutional sparse coding the CNN. It should be noted that the idealized model assumed
interpretation of CNNs [16]. here is only considered for analysis purposes as practical imple-
mentations do not guarantee that this exact behavior is factually
Mathematical analysis of relevant designs obtained. For more information, see the “Additional Links Be-
To demonstrate an application of the principles summarized in tween Encoding-Decoding CNNs and Existing Signal Process-
the “Signal Processing Fundamentals” and “Bridging the Gap ing Techniques” section and “Fitting Low-Rank Approximation
Between Signal Processing and CNNS: Deep Convolutional in Rectified Linear Unit Convolutional Neural Networks” and
Framelets and Shrinkage-Based CNNs” sections, this section an- “Network Depth.”
alyzes relevant designs of ReLUs and shrinkage CNNs. The anal- To test the perfect reconstruction in nonresidual CNNs, we
yses focus on three main aspects: 1) overall descriptions of the propose the following procedure. 1) We assume an idealized
network architecture, 2) the signal reconstruction characteristics model G ($), where its convolution filters, K n and K u n, comply
provided by the convolutional layers of the encoder and decoder with the phase-complementary tight-framelet condition, and
subnetworks, and 3) the number operations O ($) executed by the where the biases and nonlinearities suppress low-amplitude
trainable parts of the network, as this will give insight into the (and negative for ReLU activations) samples from the feature

maps. 2) The biases/thresholds of ReLU/shrinkage CNNs are set ing model of the previous point. The mathematical simplification
to zero (or to infinity for clipping activations). It can be observed of the model should lead to the identity function if the model
that this condition prevents low-rank (or high rank for residual complies with the perfect reconstruction property.
models) approximation behavior of the idealized CNN. Under To conclude the explanation on the perfect reconstruc-
this circumstance, it should be possible to prove that the ana- tion analysis, we provide two relevant considerations. First,
lyzed CNN can perfectly reconstruct any signal. 3) The last step it can be claimed that a residual network, such as the model
involves simplifying the mathematical description of the result- R (y) = y - G (y) discussed in (47), is able to reconstruct any
Fitting Low-Rank Approximation in Rectified Linear Unit Convolutional Neural Networks

To further understand the analogy between convolutional that the entries D = Ky are positive, which may be not
neural networks (CNNs) and low-rank approximation always true. In this situation, tensor D requires redundant
established by the theory of deep convolutional framelets, channels in which their respective phases are inverted to
we can use as starting point the definition of singular value avoid signal loss. Furthermore, it should also be noted that
decomposition, which is expressed in (15), by in a CNN, the bias/threshold level is not inferred from the
statistics of the feature maps but learned from the data pre-
N SV - 1
y= | (u n
v Rn ) $ v [n] . sented to the network during training.
n=0
Multilayer designs
Given that left and right singular vector pairs u n v Rn gener- It should be noted that CNNs contain multiple layers,
ate an image D [n], then (15) can be rewritten as which recursively decompose/reconstruct the signal. This
N SV - 1 may pose an advantage with respect to conventional low-
y= | D [n] $ v [n] (S1) rank approximation algorithms for a few reasons. First, the
n=0
data-driven nature of CNNs allows us to learn the basis
where tensor D = ^ ( u 0 v R0 ) f ( u N - 1 v RN - 1) hR contains the
SV SV
functions, which optimally decompose and suppress noise
products of the left and right singular vectors and has in the signal. Second, as networks are deep, the incoming
dimensions of (N SV # 1 # M # N ) . Furthermore, the equa- signal is recursively decomposed and sparsified. This multi-
tion can be further reformulated to decomposition scheme is very similar to the designs used
in noise-reduction algorithms based on framelets. It can be
u R D (S2)
y=K noted that, in the past, recursive sparsifying principles
in which K u R = ^ ( v [0]) f ( v [N SV - 1]) h, where the brackets have been observed in methods such as the (learned) itera-
of the (1 # 1) filters have been excluded for simplicity. In tive soft-thresholding algorithm [27], [28] as well as convo-
addition, it is now assumed that it is desirable to perform a lutional sparse coding. In fact, the convolutional
low-rank approximation of signal y based on the reformu- sparse-coding approach has been used for interpreting the
lation of (43). If we assume that D ! R $N 0 , then the low- operation of CNNs [16].
rank approximation can be expressed by What about practical implementations?
When training a CNN, the parameters of the model (i.e.,
u R (D + b ) + (S3)
yt = K
K, K u R , and b ) are updated to reduce the loss between
in which the values b are set to zero for the channels of the processed noisy signal and the ground truth, which
D that have high contributions to the image content. does not warranty that the numerical values of the convolu-
Conversely, the channels of D [n] with less perceptual rele- tion filters and biases of the trained model comply with the
vance are then canceled by assigning large negative val- assumptions performed here. This is because CNNs do
ues to the corresponding entries of b. As a final not have mechanisms to enforce that filters have properties
reformulation, we can assume that the basis images D such as sparsity or perfect reconstruction and negative val-
are the result of decomposing the input image y with a set ues for the biases. Consequently, CNNs may not necessar-
of convolution filters, i.e., D = Ky ; this transforms (44) into ily perform a low-rank approximation of the signal,
although the mathematical formulation of the low-rank
u R (Ky + b ) + . (S4)
xt = K approximation and the single-layer encoding decoding
are similar. Hence, the analysis presented here should be
Here, it is visible that (S4) is analogous to the encoding- treated as insight into the mathematical formulation and/or
decoding architecture defined in (10)–(14), and the encoder potential properties that can be enforced for specific appli-
and decoder filters are akin to the framelet formulation pre- cations, and not as a literal description of what trained
sented in the “Framelets” section. Note that (S4) assumes models do.

signal when G (y) = 0 for any y = x + h. Still, this does not models, we specify the analyzed designs of the perfect recon-
convey information about the behavior of the encoding-decoding struction models in which the low-rank approximation behavior
network G ($), which should be able to perform a perfect decom- is avoided by setting the bias values to zero using a special opera-
position and reconstruction of the noise signal h, as discussed tor P " $ , .
in (55). To avoid this trivial solution, instead of analyzing the For the analyses regarding the total number of operations
network R ($), the analysis described for nonresidual models is of the trainable parameters, it is assumed that the tensors
applied to the encoding-decoding structure G ($), which means K 0, K u u0)R, (K
u R0 , (K u R1 , shown in Figure 10,
u d0)R, K 1, and K
that the residual connection is excluded from the analysis. have dimensions of (C 0 # 1 # N f # N f ), (1 # C 0 # N f # N f ),
The second concluding remark is that to distinguish the equa- (1 # C 0 # N f # N f ), (1 # C 0 # N f # N f ), (C 1 # C 0 # N f # N f ),
tions of the perfect signal reconstruction analysis from other and (C 0 # C 1 # N f # N f ), respectively. Here, C 0 and C 1
Network Depth
It should be noted that one of the key elements of convolu- Summary
tional neural networks (CNNs) is their network depth, which In conclusion, the mathematical formulation of deep net-
we address in this section. To illustrate the effect of network works is analogous to a recursive data-driven low-rank
depth, assume an arbitrary N-layer encoding-decoding approximation, where the input to the successive encod-
CNN, in which the encoding layers are defined by ing-decoding pairs is the low-rank approximated encoded
signal generated by the encoder of the previous level. Still,
E 0 = (K 0 y + b 0 ) + , as mentioned in “Fitting Low-Rank Approximation in
E 1 = ( K 1 E 0 + b 1) + , Rectified Linear Unit Convolutional Neural Networks,” low-
E 2 = (K 2 E 1 + b 2 ) + , (S5)
rank approximation algorithms and CNNs are similar in
h
terms of mathematical formulation, but we cannot ensure
E N - 1 = ( K N - 1 E N - 2 + b N - 1) +
that the values obtained during training for the encoding

E n = (K n E n - 1 + b n) + . (S6) and decoding filters and their biases have the properties
needed to ensure that a CNN is an exact recursive data-
Here, E n represents the encoded signal at the nth decom- driven low-rank approximation. For example, it is possible
position level, while K n, b n are the convolution weights that the filters of the encoder and decoder do not recon-
and biases for the nth encoding layer, respectively. As struct the signal perfectly because this may not be neces-
addressed in the “ReLU” and “TDCFs” sections, the role of sary to reduce the loss function used to optimize the
the rectified linear unit activations is to enforce sparsity network.
and nonnegativity, which can be interpreted as the pro-
cess of suppressing noninformative bases in the low-rank Is it possible to impose a tighter relationship between
approximation algorithm. Consequently, every encoded low-rank approximation and CNNs?
signal E n is an encoded sparsified version of the signal In specific applications where signal preservation and
E n - 1 . To recover the signal, we apply the decoder part of interpretability is required (e.g., medical imaging), it is
the CNN, given by desirable that the operation of CNNs is closer to the low-
rank approximation description. To achieve this, the
E u RN - 1 E N - 1 + bu N - 1) +,
u N - 1 = (K CNNs embedded in frameworks such as the convolution-
h al analysis operator [S1] and Fast Iterative Soft
Eu 1 = (Ku R2 Eu 2 + bu 2) +, (S7) Thresholding Algorithm Network [S2] explicitly train the
E 0 = (K 1 E 1 + bu 1) +,
u u R u
filters K n and K u n to have properties such as perfect sig-
xu = (Ku 0R Eu 0 + bu 0) + nal reconstruction and sparsity. By enforcing these char-

Eu n - 1 = (K u n + bu n) + . (S8)
u nR E acteristics, the mathematical descriptions of low-rank
behavior and of CNNs are more similar and the models
Here, xt is the low-rank estimate/denoised version of the become inherently more interpretable and predictable in
input signal y, while E u Rn , bu n are the decoded signal
u n, K their operation.
components at the nth composition level and the decod-
References
er convolution weights and biases for the nth layer, [S1] I. Y. Chun, Z. Huang, H. Lim, and J. Fessler, “Momentum-net: Fast and
respectively. In (S8), every decoded signal E u n is the convergent iterative neural network for inverse problems,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 45, no. 4, pp. 4915–4931, Apr. 2023,
low-rank estimate of the encoded layer E (n - 1) . It should doi: 10.1109/TPAMI.2020.3012955.
be noted that the activation of each of the decoder lay- [S2] J. Xiang, Y. Dong, and Y. Yang, “FISTA-net: Learning a fast iterative
shrinkage thresholding network for inverse problems in imaging,” IEEE
ers ($ + bu n) + can further enforce sparsity on the low-rank Trans. Med. Imag., vol. 40, no. 5, pp. 1329–1339, May 2021, doi:
estimates E u (n - 1) . 10.1109/TMI.2021.3054167.

represent the number of channels after the first and second con- assumed that there is at least one decoder filter. 4) No coad-
volution layers, respectively, and all the convolution filters are aptation between the filters of the encoder and decoder lay-
assumed to be square with N f # N f pixels. Furthermore, the ers is considered.
input signal x has dimensions of (1 # 1 # N r # N c), where N r The remainder of this section shows analyses of a selection of
and N c denote the number of rows and columns, respectively. a few representative designs. Specifically, the chosen designs are
The analyses shown for the different networks in this arti- the U-Net [32] and its residual counterpart, the filtered backpro-
cle have the following limitations. 1) The analyzed networks jection network [21]. (Matlab implementation by their authors
have only enough decomposition levels and convolution lay- available at https://github.com/panakino/FBPConvNet.) Addi-
ers to understand their basic operation. The motivation for tional designs analyzed here are the residual encoder-decoder
this simplification is to keep the analyses short and clear. CNN [5] (Pytorch implementation by their authors available at
Moreover, the same principles can be extended to deeper https://github.com/SSinyu/RED-CNN) as well as the learned
networks because the same single-decomposition CNNs wavelet frame-shrinkage network (LWFSN) (Pytorch imple-
would be recursively embedded within the given archi- mentation available at https://github.com/LuisAlbertZM/demo
tectures. 2) The normalization layers are not considered LWFSN TMI and interactive demo available at IEEE’s code
because they are linear operators that provide mean shifts ocean https://codeocean.com/capsule/9027829/tree/v1. The
and amplitude scaling. Consequently, for analysis purposes, demo also includes as reference pytorch implementations of
it can be assumed that they are embedded in the convolu- FBPConvNet and the tight-frame UNet.). For reference, all the
tion weights. 3) For every encoder convolution kernel, it is designs are portrayed in Figure 10.
U-Net/Filtered Backprojection Convolutional Network

y + x
Concatenate +
Undecimated Path –
Encoding Decoding Convolution
+
K 0 b0 Downsample Upsample (K 0 )
Z
De WL WL (K 0 )
cim
ate
dP K 1 b1
ath K0 b1
U (y)
Residual Encoder-Decoder CNN (RED)

y + x
K 0 b0 b1 K0
Q Q
R1(Q) Rdnc
Renc K 1 b1 K 0 (Q)
0 (y) 0 R (y)
LWFSN
Encoding Shrinkage Decoding
y Inverse DWT x
Forward DWT
K0 K0
+
De WH WH
cim t0
ate
dP
ath WL WL L (y)
FIGURE 10. Simplified structure of encoding-decoding ReLU CNNs. The displayed networks are the U-Net/filtered backprojection network the encoder-
decoder residual CNN (RED) and finally, the learned wavelet-frame shrinkage network (LWFSN). Note that for all the designs, the encoding-decoding
structures are indicated by dashed blocks. It should be kept in mind that the drawings are simplified, they do not contain normalization layers, are shal-
low, commonly appearing dual convolutions are drawn as one layer.

U-Net/filtered backprojection network Furthermore, the low-frequency path is
U-Net—overview of the design u LR f(2 -) ^(K

u 0d) R W
U d (y) = (K u 1R (K 1 P {Z}) +) +h (58)
The first networks analyzed are the U-Net and filtered backpro-
jection networks, both of which share the encoding-decoding where P {Z} is defined by
structure U (·) . However, they differ in the fact that the U-Net is
nonresidual, while the filtered backprojection network operates P {Z} = f(2 .) (W L (K 0 y) +) .(59)
in a residual configuration. Therefore, an estimate of the noise-
less signal xt from the noisy observation y in the conventional If K 1 is a phase-complementary tight frame, we know that
U-Net is achieved by Ku R1 (K 1 Z) + = Z $ c 1 . Consequently, (66) becomes
xt = U (y) (49) P {U d} (y) = (K u LR f(2 -) (f(2 .) ((W L K 0 y) +)) $ c 1 .(60)

u d0) R W
whereas in the filtered backprojection network, U ($) is used in a Here, it can be noted that if K 0 is a phase-complementary tight
residual configuration, which is equivalent to framelet, then P {U d} (y) approximates a low-pass version of y,
or equivalently,
xt = y - U (y) .(50)
P {U d} (y) . W L y $ c 1 (61)
If we now switch our focus to the encoding-decoding structure of
the U-Net U (y), it can be shown that it is described by where W L is a low-pass filter. Finally, substituting (65) and (69)
in (63) results in
U (y) = U u (y) + U d (y) (51)
P {U} (y) . (I $ c 0 + W L $ c 1) y.(62)
where U u (y) corresponds to the undecimated path and is defined
by This result proves that the design of the U-Net cannot evenly re-
construct all the frequency of y unless c 1 = 0, in which case, the
u u0)R (K 0 y + b ) + (52)
U u (y) = (K whole low-frequency branch of the network is ignored. Note that
-0
this limitation is inherent to its design and cannot be circumvent-
while the decimated path is ed by training with large datasets and/or with any loss function.
u LR f(2 -) ^(K
u 0d)R W
U d (y) = (K u 1R (K 1 Z + b ) + + bu ) +h .(53) U-Net—number of operations
-1 -1
It can be noted that encoding filter K 0 convolves x at its
Here, signal Z is defined by original resolution and maps it to a tensor with C 0 channels.
Therefore, the number of operations O ($) for kernel K 0 is
Z = f(2 .) (W L (K 0 y + b- 0) +) .(54) O (K 0) = C 0 $ N r $ N c $ N 2f floating-point operations (FLOPs).
Conversely, due to the symmetry between encoder and d ecoder
Note that the decimated path contains two nested encoding- filters, O (K u u0) = O (Ku d0) = O (K 0) . Furthermore, for this de-
decoding architectures, as observed by Jin et al. [21], who has sign, filter K 1 processes the signal encoded by K 0, which
acknowledged that the nested filtering structure is akin to the is downsampled by a factor of one half, and maps it from C 0
(learned) iterative soft-thresholding algorithm [27], [28]. to C 1 channels. This results in the estimated operation cost
O (K 1) = O (K u 1) = C 0 $ C 1 $ N r $ N c $ N 2f $ (2)-2 [FLOPs]. Fi-
U-Net—signal reconstruction analysis nally, adding the contributions of filters K 0, K u u0, K
u d0, K 1, and
To prove whether the U-Net can perfectly reconstruct any signal, Ku 1 results in
we assume that the biases are equal to zero; on this condition, the
network P {U} (y) is defined by O (U ) = (3 + 2 -1 $ C 1) $ C 0 $ N r $ N c $ N 2f [FLOPS] .(63)
P {U} (y) = P {U u} ( y) + P {U d} ( y) (55) U-Net—concluding remarks

The U-Net/FBPConvNet is a flexible multiresolution architecture.
where subnetwork P {U u} ($) is defined by Still, as has been shown, the pooling structure of this CNN may be
suboptimal for noise-reduction applications because its configura-
u u0) R (K 0 y) + .(56)
P {U u} (y) = (K tion does not allow for recovery of the signal’s frequency infor-
mation evenly. This has been noted and fixed by Han and Ye [6],
Assuming that (K 0, Ku u0) is a complementary-phase tight-frame- who introduced the tight-frame U-Net in which the down-/upsam-
u
let pair, then P {U } (y) is simplified to pling structure is replaced by the DWT and its inverse. This simple
modification overcomes limitations of the U-Net and improved its
P {U u} (y) = y · c 0.(57) performance for artifact removal in compressed sensing imaging.

Residual encoder-decoder CNN P " R 0 , ( y) = K
u 0 (K 0 y) + .(73)
R
Residual encoder-decoder CNN—overview of the design Just as with R 1 ($), it is assumed that the convolution kernels are
The residual encoder-decoder CNN shown in Figure 10 consists tight framelets. Therefore, (73) becomes
of nested single-layer residual encoding-decoding networks. For
example, in the network showcased in Figure 10, we can see that P " R 0 , (y) = y.(74)
network R 1 ($) is nested into R 0 ($) . Furthermore, for this case,
the image estimate is given by Consequently, R 0 ($) and R 1 ($) can reconstruct any arbitrary sig-
nal under complementary-phase tight-frame assumptions.
xt = (y + R 0 (y) + bu 0) + (64)
Residual encoder-decoder CNN—number of operations
in which R 0 ($) is the outer residual network and bu 0 is the bias for In this case, all the convolution layers operate at the original
the output layer. Note that the ReLU placed at the output layer resolution of image x. Therefore, the number of operations O ($)
intrinsically assumes that the estimated signal xt is positive. for kernel K 0 and K u 0 is O (K 0) = O (K
u 0) = C 0 $ N r $ N c $ N 2f
From (64), the output of the subnetwork R 0 ($) is defined by [FLOPs], while K 1 and K u 1 require O (K 1) = O (K u 1) =
2
C 0 $ C 1 $ N r $ N c $ N f [FLOPs]. By adding the contributions of
Z = R dec t
0 (Q) . (65) both encoding-decoding pairs, the total operations for the residu-
al encoder-decoder becomes
Here, the decoder R dec
0 ($) is defined by
O (R) = 2 $ (1 + C 1) $ C 0 $ N r $ N c $ N 2f [FLOPS] .(75)
R dec t
0 (Q) = K t .(66)
u R0 Q
Residual encoder-decoder CNN—concluding remarks
t is the noiseless estimate of the intermediate signal Q
In (66), Q The residual encoder-decoder network consists of a set of nested
and is defined by single-resolution residual encoding-decoding CNNs. The single-
resolution design increases its computation cost with respect to
t = (Q + R 1 (Q) + bu ) + (67)
Q multiresolution designs, such as the U-Net. In addition, it should
1
be noted that the use of an ReLU as the output layer of the en-
where the network R 1 ($) is coder-decoder residual network forces the signal estimates to
be positive, but this is not always convenient. For example, in
u 1 (K 1 Q + b ) + .(68)
R 1 (Q) = K
R
computerized tomography imaging, it is common that images
1
contain positive and negative values.
Furthermore, Q represents the signal encoded by R 0 ($), or equiv-
alently, LWFSN
Q= R enc
0 (y) (69)
LWFSN—description of the architecture
where R enc
0 ($) is defined by The LWFSN network is a multiresolution architecture in which
the DWT is used for down-/upsampling and also as part of the
R 0enc (y) = K 0 y.(70) decomposition where shrinkage is applied. In this CNN, the
noiseless estimates are produced by
Residual encoder-decoder CNN—signal
xt = L (y) (76)
reconstruction analysis
As mentioned earlier, the residual encoder-decoder CNN is com- where L ($) represents the encoding-decoding structure of the
posed by nested residual blocks, which are independently ana- LWFSN, and the encoding-decoding network L ($) is
lyzed to study the reconstruction characteristics of this network.
First, block R 1 ($), is given by L (y) = L L (y) + L H (y) .(77)
P " R 1 , (Q) = K
u 1 (K 1 Q) + .(71)
R
Here, the high-frequency path is given by
Under complementary-phase tight-frame assumptions for the L H (y) = K u H f(2 -) ^x (LET

u 0W R R
t 0) ^ f(2 .) ^ W H K 0 y hhh . (78)
u 1), (71) reduces to
pair (K 1, K
Note that in this design, the encoder leverages the filter W H
P " R 1 , (Q) = Q (72) to generate a sparse signal prior to the shrinkage stage, i.e.,
x ( t 0) ^ f(2 .) ^ W H K 0 y hh . Meanwhile, the low-frequency path
LET
which shows that the encoder and decoder R 1 ($) can approxi- L L ($) is
mately reconstruct any signal. Now, switching to R 0, it can be
observed that the linear part is L L ( y) = K u L f(2 -) ^ f(2 .) ^W L K 0 y hh .(79)
u 0W R R

LWFSN—signal reconstruction analysis LWFSN—residual variant
When analyzing signal propagation of the LWFSN, we set the To illustrate the use of clipping activations in residual noise re-
threshold level t 0 = 0. This turns (77) into duction, the residual version of the LWFSN is included. Note
that there are two main differences with the conventional LWF-
P " L , (y) = P " L L , (y) + P " L H , (y) .(80)
SN. First, the shrinkage functions are replaced by clipping acti-
vations. Second, the low-frequency signal is suppressed. This is
performed because the original design of the LWFSN does not
Here, P " L H , ($) is defined by have any nonlinearities in that section. This is akin to the low-
frequency nulling proposed by Kwon and Ye [33]. The modified
P " L H , (y) = K u H f(2 -) ^ f(2 .) ^W H K 0 yhh (81)
u 0W
R R
LWFSN is shown in Figure 11. It can be observed that by setting
to zero the low-frequency branch of the design, the model inher-
while the low-frequency path P " L L , ($) is mathematically de- ently assumes that the noise is high pass.
scribed by
(Residual) LWFSN—concluding remarks
P " L L , (y) = K u L f(2 -) ^ f(2 .) ^W L K 0 y hh .(82)
u 0WR R
The (residual) LWFSN (r)LWFSN is a design that explicitly
mimics wavelet-shrinkage algorithms. It can be observed that
Substituting (81) and (82) in (80) results in the (r)LWFSN inherently assumes that noise is high frequency
and explicitly avoids nonlinear processing in the low-frequency
P " L , (y) = K u R f(2 -) ^ f(2 .) ^WK 0 yhh .(83)
u 0W band. Follow-up experiments also included nonlinearities in the
R
low-frequency band of the LWFSN [34] and obtained results

similar to the original design.
u R f(2 -) ^ f(2 .) ^WQ hh . Conse-
For the DWT, it holds that Q = W
quently, (91) is simplified to What happens in trained models?
P " L , (y) = K
u 0 K 0 y.(84) Properties of convolution kernels and low-rank approximation
R
The assumption that the convolution filters of a CNN behave as

u R0 K 0 = I $ c, with
Assuming that K 0 is a tight framelet, i.e., K (complementary-phase) tight framelets is useful for analyzing
c = 1, then the theoretical ability of a CNN to propagate signals. However,
it is difficult to prove that trained models comply with this as-
P " L , (y) = y.(85) sumption because there are diverse elements affecting the op-
timization of the model, e.g., initialization of the network, the
This proves that the encoding-decoding section of the LWFSN data presented to the model, and the optimization algorithm as
allows for perfect signal reconstruction. well as its parameters. In addition, in real CNNs, there may be
coadaptation between diverse CNN layers, which may prevent
LWFSN—number of operations the individual filters of the CNN from behaving as tight framelets
The LWFSN contains a simpler convolution structure than as the decomposition and filtering performed by one layer is not
the networks reviewed up to now. Therefore, for a single- independent from the rest [35].
level decomposition architecture, the total number of opera- To test whether the behavior of the filters of trained CNNs
tions is can converge to complementary-phase tight framelets, at
least on a simplified environment, we propose training a
O (L) = 2 $ C 0 $ N r $ N c $ N 2f [FLOPS] .(86) toy model, as displayed in Figure 12. If the trained filters of
Residual LWFSN
y + x
–
Encoding Clipping Decoding
Forward DWT Inverse DWT

K0 K0
+
De WH WH
cim t0
ate
dP 0
ath WL L res (y)
WL
FIGURE 11. The residual version of the LWFSN. It can be noticed that the low-frequency branch of the network is nulled. In deeper networks, it would
further decomposed and the nulling would be activated at the deepest level (lowest resolution).

learning rate for the optimizer was set to 10–3, and the batch size
Toy Model
– was set to one sample. The convolution kernels were initialized
Encoding Decoding
with Xavier initialization using a uniform distribution (see Glo-
y * * * * * * M(y)
rot and Bengio [36]). The code is available at IEEE’s code ocean
K0 b0 K1 b1 K2 b2 K̃2 b̃2 K̃1 b̃1 K̃0 at https://codeocean.com/capsule/7845737/tree.
Using the described settings, we trained the toy model and
FIGURE 12. The toy model used for the experiment on the properties of the tested whether the phase-complementary tight-framelet property
filters of a trained CNN. The dimensions for tensors K 0, K 1, and K 2
are (6 # 1 # 3 # 3), (12 # 6 # 3 # 3), and (24 # 12 # 3 # 3), respectively. holds for filters of the deepest level: l = 2. The results for the
The network is symmetric and the filter dimensions for the decoder operation K u R2 (K 2) + are displayed in Figure 13(a), which shows
convolution kernels K u n are the same as their corresponding encoding that when the weights of the encoder and decoder have different
kernel K n . initial values, the kernel pair ^K 2, K u 2 h is not a complementary-
phase tight framelet. We have observed that the forward and
an encoder-decoder pair of the toy model ^K l, K u l h, (where l inverse filters of wavelets/framelets are often the same or at
denotes one of the decomposition levels) behave as a comple- least very similar. Based on this reasoning, we initialized the toy
mentary-phase tight framelet, then the pair ^K l, Ku l h approxi- model with the same initial values of the kernel pair ^K n, K u nh.
mately complies with the condition presented in (40), which, for As shown in Figure 13(b), with the proposed initialization, the
identity input I, simplifies to filters of the CNN converge to tensors with properties reminis-
u Rn (K n) + = I $ c n (87) cent of complementary-phase tight-framelets. This suggests that
K
the initialization of the CNN has an important influence on the
in which c n is an arbitrary constant. convergence of the model to a specific solution.
The toy model is trained on images that contain multiple Figure 14 displays test images processed with two toy mod-
randomly generated overlapping triangles. All the images were els, one trained with different and one trained with the same ini-
scaled to the range of [0,1]. For this experiment, the input to tial values for the encoding-decoding pairs. It can be observed
the images is the noise-contaminated image, and the objective/ that there are no significant differences between the images pro-
desired output is the noiseless image. For training the CNNs, duced by both models. In Figure 14(e) and (f), we set the bias
normally distributed noise with a standard deviation of 0.1 was of both networks to zero. In this case, it is expected that the
added to the ground truth. For every epoch, a batch of 192 train- networks will reconstruct the noisy input, as confirmed by the
ing images was generated. For validation and test images, we figure, where both CNNs partly reconstruct the original noisy
used the “astronaut” and “cameraman” images included in the signal. This result suggests that the ReLU plus bias pairs oper-
Scipy software package. The model was optimized with Adam ate akin to the low-rank approximation mechanism proposed by
for 25 epochs with a linearly decreasing learning rate. The initial the TDCFs.
Initial K2 = K̃ 2 Initial K2 = K̃ 2
1 1
0.75 0.75
0.5 0.5
0.25 0.25
0 0
–0.25 –0.25
–0.5 –0.5
–0.75 –0.75
–1 –1
(a) (b)
FIGURE 13. The phase-complementary tight-framelet test for the trained-toy network, initialized with random weights. (a) The product Ku R2 (K 2) +, where
the initialization of K 2 and K u 2 is different. It can be seen that the pair (K 2, K u 2) does not comply with the complementary-phase framelet criterion of
(95). This contrasts with (b), which displays the result of the product K u R2 (K 2) +, for the same CNN, but where the initial values of K
u 2 and K 2 are identi-
cal. For this initialization, the filters approximate the complementary-phase tight-framelet criterion.

The following conclusions can be drawn from this experi- this evaluation is displayed in Figure 15. These results con-
ment. First, the filters of the CNN may not necessarily confirm that the performance of the original toy model degrades
verge to complementary-phase tight framelets. This is possibly for higher noise levels. In contrast, the adaptive and bias-free
caused by initialization of the network and/or the interaction/ toy models perform better than the original toy model for
coadaptation between the multiple encoder/decoder layers. most of the noise levels.
Second, we confirm that for our experimental setting, the low- The results of this experiment confirm the diverse noise-
rank approximation behavior in the CNN can be observed. For reduction mechanisms within a CNN as well as show that CNNs
example, when setting the biases and thresholds to zero, part of have certain modeling limitations. For example, noise invari-
the noise texture (high-rank information) is recovered. Third, it ance, which can be addressed by further incorporating prior
is possible that linear filtering happens in the network as well, knowledge into the model, such as with the case of the adaptive
which may explain why noise texture is not fully recovered model or by forcing the model to have a more Wiener-like behav-
when setting the biases to zero. Fourth and finally, we observed ior, such as with the case of the bias-free model. In the case of
that the behavior of the trained models changes drastically the bias-free model, note that, theoretically, it should be possible
depending on factors such as the learning rate and initialization
values of the model. For this reason, we consider this experi-
ment and its outcome more as a proof of concept, where further Noisy Input Init K n = K̃ n
investigation is needed.
Generalization
From the explanations in the “Nonlinear Signal Estimation in
the Framelet Domain” section, it can be noted that the bias/
threshold used in CNNs can modulate how much of the signal
is suppressed by the nonlinearities. In addition, the “Addition-
al Links Between Encoding-Decoding CNNs and Existing SNR = 15.33 (dB) SNR = 23.04 (dB)
Signal Processing Techniques” section established that there (a) (b)
are additional mechanisms for noise reduction within the Ground Truth
Init K n = K̃ n
CNN, such as the Wiener-like behavior observed by Mohan et
al. [14]. This raises the question of how robust conventional
CNNs are to noise-level changes, different from the level at
which the model has been trained. To perform such an experi-
ment, we trained two variants of the toy model. The first vari-
ant was inspired by the multiscale sparse coding network of
Mentl et al. [17], where the biases of each of the nonlinearities
(in this case, an ReLU) are multiplied by an estimate of the SNR = 23.09 (dB)
standard deviation of the noise. In the design of this example, (c) (d)
the noise estimate vt h, which, in accordance to Chang et al.,
Init K n = K̃ n, bn = 0 Init K n = K̃ n, bn = 0
[1] is defined by
vt h = 1.4826 $ median ^ ; fHH ) x ; h . (88)
Here, variable fHH is the diagonal convolution filter of the

DWT with Haar basis. For comparison purposes, we refer to
this model as an adaptive toy model. The second variant of the
toy model that was tested examines the case where the convo- SNR = 19.43 (dB) SNR = 19.02 (dB)
lution layers of the model do not add bias to the signal. This
model is based on the bias-free CNNs proposed by Mohan (e) (f)
et al. [14], in which the bias of every convolution filter is
set to zero during training. The purpose of this setting is to FIGURE 14. The processed “cameraman” image for (in)dependently
sampled initialization for the encoding and decoding filters. (a) The
achieve better generalization on the model as it is claimed that noise-contaminated input ^ v h = 0.1 h and (d) the noiseless reference.
this modification causes the model to behave independent of (b) and (e) The processed noisy image with the toy model trained with
the noise level. different initialization for its convolution filters, while (c) and (f) are im-
We trained the described variants of the toy models with ages processed with the model where the same initial values are used for
the same settings as the experiment in the “Properties of Con- the encoding and decoding filters. (b) and (c) Nearly identical images in
terms of quality and signal-to-noise ratio (SNR) so that initialization has
volution Kernels and Low-Rank Approximation” section. no effect. (b) and (c) The same model presented that processed (b), (e),
The three models are evaluated on the test image with varying and (f) but where its bias is set to zero. As expected, the noise is partly
noise levels: v n ! 60.1, 0.15, 0.175, 0.2, 0.225@ . The result of reconstructed.

to obtain exactly the same behavior with the original toy model be considered, for example, the required performance, memory
if the biases of the model would have converged to zero. This required to train/deploy models, whether certain signal preser-
reasoning suggests that the large number of free parameters and vation characteristics are required, target execution time for the
nonlinear behavior of the model can potentially prevent finding model, characteristics of the images being processed, and so on.
the optimal/robust solution, in which case, the incorporation of Based on these requirements, diverse design elements of CNNs
prior knowledge can help improve the model. can be more or less desirable, for example, the activation func-
tions, use of single/multiresolution models, need for skip con-
Which network fits my problem? nections, and so forth. The following sections briefly discuss
such elements by focusing on the impact that such elements have
Design elements in terms of performance and potential computational cost. A
When choosing or designing a CNN for a specific noise-reduc- summary of the main conclusions of these elements is included
tion application, multiple choices and design elements should in Table 2.
Input, ση = 0.1 Input, ση = 0.15 Input, ση = 0.175 Input, ση = 0.2 Input, ση = 0.225
SNR = 15.31 (dB) SNR = 11.79 (dB) SNR = 10.45 (dB) SNR = 9.27 (dB) SNR = 8.25 (dB)
(a)
Original, ση = 0.1 Original, ση = 0.15 Original, ση = 0.175 Original, ση = 0.2 Original, ση = 0.225
(b)
Adaptive, ση = 0.1 Adaptive, ση = 0.15 Adaptive, ση = 0.175 Adaptive, ση = 0.2 Adaptive, ση = 0.225
(c)
Bias Free, ση = 0.1 Bias Free, ση = 0.15 Bias Free, ση = 0.175 Bias Free, ση = 0.2 Bias Free, ση = 0.225
(d)
FIGURE 15. A comparison of the baseline (original) toy model against its adaptive and bias-free variants. The models are evaluated in the cameraman
picture with increasing noise levels. (a) The noisy input. (b) The images processed with the original toy model. (c) Results of the adaptive toy model.
(d) Finally, results corresponding to the bias-free model. It can be observed that the performance original toy model degrades as the noise level in-
creases, while the performance-adaptive and bias-free models degrade less with increased noise levels, resulting in pictures with lower noise levels.

Nonlinearity case is the deep K-singular-value-decomposition network [37],
In the literature, the most common activation function in CNNs which achieves a performance close to (but slightly less good)
is the ReLU. There are two main advantages of an ReLU with than the ReLU-based DnCNN. Among the few examples we
respect to other activations. First, ReLUs potentially enforce found were that an ReLU CNN performed better than shrinkage-
more sparsity in the feature maps than, for example, soft shrink- based models, i.e., in Fan et al. [18], where they compared vari-
age, because ReLUs not only cancel the small values of feature ants of the soft autoencoder and found that the shrinkage-based
maps like shrinkage functions do but also all the negative values. model outperformed the ReLU variant.
The second advantage of an ReLU is its capacity to approximate
other functions (see the “Shrinkage and Clipping in ReLU Net- Single/multiscale designs
works” section). Note that the high capacity of the ReLU to rep- The advantage of single-scale models is that they avoid aliasing
resent other functions [13], [29] (often referred to as expressivity) because no down-/upsampling layers are used. Still, this comes
may also be one of the reasons why these models are prone at the expense of more computations and more memory. Fur-
to overfitting. thermore, this approach may lead to models with larger filters
Better expressivity of ReLU CNNs may be the reason why, at and/or deeper networks that achieve the same receptive field as
the time of this writing, that ReLU-based CNNs perform margin- multiscale models, which may further increase the computation
ally better than shrinkage-based models in terms of metrics such costs of single-scale models.
as signal-to-noise ratio or the structural similarity index metric In the case of multiscale models, the main consideration
[19], [37], [38]. Despite this small benefit, the visual charac- should be that the down-/upsampling structure should allow
teristics of estimates produced by ReLUs and shrinkage-based perfect signal reconstruction to avoid introducing aliasing and/
networks are very similar. Furthermore, the computational cost or distortion to the image estimates (e.g., the DWT in the tight-
of ReLU-based designs is potentially higher than those with frame U-Net and in the LWFSN).
shrinkage functions because ReLUs require more feature maps
to preserve signal integrity. For example, the LWFSN shown (Non)residual models
in the “LWFSN” section achieves a performance very close to Residual noise-reduction CNNs often perform better than their
the FBPConvNet and the tight-frame U-Net for noise reduction nonresidual counterparts (e.g., the U-Net versus FBPConvNet
in computerized tomography, but only with a small fraction of and the LWFSN versus the rLWFSN). This may be because the
the total trainable parameters, which allows for a faster and less trained models have more freedom to learn the filters because
computation-expensive model [19]. the design does not need to learn to reconstruct the noiseless sig-
As a concluding remark, it can be noted that regardless of nal, it need only estimate the noise [12]. Also, it can be observed
the expressivity of the ReLU activation, it is not entirely clear that nonresidual models potentially need more parameters than
whether this means that ReLU activations outperform other residual networks because the propagation/reconstruction of the
functions, such as the soft threshold in general. We were unable noiseless signal is also dependent on the number of channels of
to find articles that specifically focus on comparing the perfor- the network.
mance of ReLU/shrinkage-based models. In spite of this, there
are some works that compare shrinkage-based CNNs with other State of the art
(architecturally different) models based on ReLUs that indicate Defining the state of the art in image denoising with CNNs is
that the compared ReLU-based designs slightly outperform challenging for diverse reasons. First, there is a wide variety of
shrinkage-based ones. For example, Herbreteau and Kervrann available CNNs, which are not often compared to each other.
[38] proposed the DCT2-Net, a shrinkage-based CNN, which, Second, the suitability of a CNN for a given task may depend
despite of its good performance, is still outperformed by the on image and noise characteristics, such as noise distribution
ReLU-based denoising CNN (DnCNN) [7] CNN. Similar behav- and (non)stationarity. Third, the large number of variables in
ior was observed by Zavala-Mondragón et al. [19], where their terms of, e.g., optimization, data, and data augmentation, adds
shrinkage-based LWFSN could not outperform the ReLU-based reproducibility issues, which further complicate making a fair
FBPConvNet [21] nor the tight-frame U-Net [6]. Another similar comparison among all the available models [11]. In addition, it
Table 2. Design elements and their impact on performance and computation cost.
Design Elements Expressivity Performance Number of Parameters Receptive Field Per Layer
Activation ReLU High Best High Not applicable (N/A)
— Shrinkage Low Good Medium N/A
— Clipping Low Good Medium N/A
Scale Single scale High Good High Big
— Multiscale High Good Medium/high Small
Topology Nonresidual High Good Higher N/A
— Residual High Best Lower N/A

should be noted that for many of the existing models, the per- doing so, we expect that next the generation of CNN denoisers
formance gap between state-of-the-art models and other CNNs will not only be better performing but also more interpretable
is often small. and reliable.
Despite the aforementioned challenges, we have found some
models that could be regarded as the state of the art. The first of Acknowledgment
which is the denoising residual U-Net [39], which is a bias-free We thank Dr. Ulugbek Kamilov and the anonymous re-
model [14] that incorporates a U-Net architecture with resid- viewers for their valuable feedback and suggestions for
ual blocks. In addition, the DRU-Net uses an additional input this article.
to indicate to the network the noise intensity, which increases
its generalization to different noise levels. An additional state- Authors
of-the-art model is DnCNN [7]. This network is residual and Luis Albert Zavala-Mondragón (lzavala905@gmail.com)
single-scale while also using ReLU activations. Another state- received his M.Sc. degree in electrical engineering from
of-the-art model is the multilevel-wavelet CNN [40], which has Eindhoven University of Technology, 5600 MB Eindhoven,
a design very similar to that of the tight-frame U-Net [6]. Both of The Netherlands, where he is currently a Ph.D. candidate in
these models are based on the original U-Net design [32] but are signal processing. He has experience in the field of hard-
deployed in a residual configuration, and the down-/upsampling ware emulation (Intel, Mexico) and computer vision
structure is based on the DWT. Furthermore, in addition to using (Thirona, The Netherlands). His research interests include
standalone encoding-decoding CNNs, CNNs have been used the development of efficient and explainable computer
as proximal operators within model-based methods [39], [41], vision pipelines for health-care applications. He is a Student
which further improves the denoising power of nonmodel-based Member of IEEE.
encoding-decoding CNNs. Peter H.N. de With (p.h.n.de.with@tue.nl) received his
Ph.D. degree in computer vision from Delft University of
Conclusions and future outlook Technology, where he is a full professor and leads the Video
In this article, the widely used encoding-decoding CNN archi- Coding and Architectures research group at Eindhoven
tecture was analyzed from several signal processing principles. University of Technology, 5600 MB Eindhoven, The
This analysis revealed the following conclusions. 1) Multiple Netherlands. He was a researcher at Philips Research Labs, full
signal processing concepts converge in the mathematical for- professor at the University of Mannheim, and VP of video tech-
mulation of encoding-decoding CNNs models. For example, nology at CycloMedia. He is the coauthor of more than 70 ref-
the convolution and down-/upsampling structure of the encod- ereed book chapters and journal articles, 500 conference
er-decoder structure is akin to the framelet decomposition: the publications, and 40 international patents. He was a Technical
activation functions are rooted in classical signal estimators. In Committee member of the IEEE Consumer Electronics Society,
addition, linear filtering may also happen within the model. 2) ICIP, Society of Photo-Optical Instrumentation Engineers, and
The activations implicitly assume noise and signal characteris- he is co-recipient of multiple paper awards. He is a Fellow of
tics of the feature maps. 3) There are still many signal process- IEEE and a member of the Royal Holland Society of Sciences
ing developments that can be integrated with current CNNs, and Humanities.
further improving their performance in terms of accuracy, ef- Fons van der Sommen (fvdsommen@tue.nl) received his
ficiency, or robustness. Ph.D. degree in computer vision. He is an associate professor at
Despite the signal processing nature of encoding-decoding Eindhoven University of Technology, 5600 MB Eindhoven, The
CNNs, at the time of this writing, the integration of CNNs Netherlands. As head of the health-care cluster at the Video
and existing signal processing algorithms is at an early stage. Coding and Architectures research group, he has worked on a
A clear example of the signal modeling limitations of current variety of image processing and computer vision applications,
CNN denoisers is the activation functions, where the estimators mainly in the medical domain. His research interests include
provided by current activation layers neglect spatial correlation signal processing and information theory and strives to exploit
of the feature maps. Possible alternatives to solving this limi- methods from these fields to improve the robustness, efficiency,
tation could be to perform an activation function inspired by and interpretability of modern-day artificial intelligence archi-
denoisers working on principles such as Markov random fields tectures, such as convolutional neural networks. He is a
[42], locally spatial indicators [43], and multiscale shrinkage Member of IEEE.
[24]. Further ideas are provided by the extensive survey on
denoising algorithms by Pižurika and Philips [44]. Additional References
[1] S. G. Chang, B. Yu, and M. Vetterli, “Adaptive wavelet thresholding for image
approaches that can be further explored are nonlocal [45] and denoising and compression,” IEEE Trans. Image Process., vol. 9, no. 9, pp. 1532–
collaborative filtering [46]. Both techniques exploit the redun- 1546, Sep. 2000, doi: 10.1109/83.862633.
dancy in natural images and only a few models are exploring [2] M. Elad and M. Aharon, “Image denoising via learned dictionaries and sparse rep-
resentation,” in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognit.
these properties [47], [48]. (CVPR), Piscataway, NJ, USA: IEEE Press, 2006, vol. 1, pp. 895–900, doi: 10.1109/
Finally, we encourage the reader to actively consider the prop- CVPR.2006.142.
[3] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal
erties of the signals processed, design requirements, and existing algorithms,” Phys. D, Nonlinear Phenomena, vol. 60, nos. 1–4, pp. 259–268, Nov.
signal processing algorithms when designing new CNNs. By 1992, doi: 10.1016/0167-2789(92)90242-F.

[4] K. H. Jin and J. C. Ye, “Annihilating filter-based low-rank Hankel matrix approach [26] T. Blu and F. Luisier, “The sure-let approach to image denoising,” IEEE Trans.
for image inpainting,” IEEE Trans. Image Process., vol. 24, no. 11, pp. 3498–3511, Image Process., vol. 16, no. 11, pp. 2778–2786, Nov. 2007, doi: 10.1109/
Nov. 2015, doi: 10.1109/TIP.2015.2446943. TIP.2007.906002.
[5] H. Chen et al., “Low-dose CT with a residual encoder-decoder convolutional neural [27] I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm
network,” IEEE Trans. Med. Imag., vol. 36, no. 12, pp. 2524–2535, Dec. 2017, doi: for linear inverse problems with a sparsity constraint,” Commun. Pure Appl. Math., vol.
10.1109/TMI.2017.2715284. 57, no. 11, pp. 1413–1457, Nov. 2004, doi: 10.1002/cpa.20042.
[6] Y. Han and J. C. Ye, “Framing u-net via deep convolutional framelets: Application [28] K. Gregor and Y. LeCun, “Learning fast approximations of sparse coding,” in
to sparse-view CT,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1418–1429, Jun. 2018, Proc. 27th Int. Conf. Mach. Learn., 2010, pp. 399–406.
doi: 10.1109/TMI.2018.2823768. [29] I. Daubechies, R. DeVore, S. Foucart, B. Hanin, and G. Petrova, “Nonlinear
[7] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaussian approximation and (deep) ReLU networks,” Constructive Approximation, vol. 55, no.
denoiser: Residual learning of deep CNN for image denoising,” IEEE Trans. 1, pp. 127–172, 2022, doi: 10.1007/s00365-021-09548-z.
Image Process., vol. 26, no. 7, pp. 3142–3155, Jul. 2017, doi: 10.1109/TIP.2017. [30] J. C. Ye, J. M. Kim, K. H. Jin, and K. Lee, “Compressive sampling using annihi-
2662206. lating filter-based low-rank interpolation,” IEEE Trans. Inf. Theory, vol. 63, no. 2, pp.
[8] T. Yokota, H. Hontani, Q. Zhao, and A. Cichocki, “Manifold modeling in embed- 777–801, Feb. 2017, doi: 10.1109/TIT.2016.2629078.
ded space: An interpretable alternative to deep image prior,” IEEE Trans. Neural Netw. [31] A. Cichocki, R. Zdunek, and S.-i. Amari, “Nonnegative matrix and tensor factor-
Learn. Syst., vol. 33, no. 3, pp. 1022–1036, Mar. 2022, doi: 10.1109/TNNLS.2020. ization [Lecture Notes],” IEEE Signal Process. Mag., vol. 25, no. 1, pp. 142–145,
3037923. 2008, doi: 10.1109/MSP.2008.4408452.
[9] K. C. Kusters, L. A. Zavala-Mondragón, J. O. Bescós, P. Rongen, P. H. de With, [32] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for bio-
and F. van der Sommen, “Conditional generative adversarial networks for low-dose CT medical image segmentation,” in Medical Image Computing and Computer-Assisted
image denoising aiming at preservation of critical image content,” in Proc. 43rd Annu. Intervention, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds. Cham,
Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), Piscataway, NJ, USA: IEEE Press, Switzerland: Springer International Publishing, 2015, pp. 234–241.
2021, pp. 2682–2687, doi: 10.1109/EMBC46164.2021.9629600.
[33] T. Kwon and J. C. Ye, “Cycle-free cyclegan using invertible generator for unsuper-
[10] H. Gupta, K. H. Jin, H. Q. Nguyen, M. T. McCann, and M. Unser, “CNN-based vised low-dose CT denoising,” IEEE Trans. Comput. Imag., vol. 7, pp. 1354–1368,
projected gradient descent for consistent CT image reconstruction,” IEEE Trans. Med. 2021, doi: 10.1109/TCI.2021.3129369.
Imag., vol. 37, no. 6, pp. 1440–1453, Jun. 2018, doi: 10.1109/TMI.
2018.2832656. [34] L. A. Zavala-Mondragón et al., “On the performance of learned and fixed-frame-
let shrinkage networks for low-dose CT denoising,” Med. Imag. Deep Learn., 2022.
[11] M. T. McCann, K. H. Jin, and M. Unser, “Convolutional neural networks for [Online]. Available: https://openreview.net/pdf?id=WGLqD0zHXy9
inverse problems in imaging: A review,” IEEE Signal Process. Mag., vol. 34, no. 6, pp.
85–95, Nov. 2017, doi: 10.1109/MSP.2017.2739299. [35] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov,
“Improving neural networks by preventing co-adaptation of feature detectors,” 2012,
[12] J. C. Ye, Y. Han, and E. Cha, “Deep convolutional framelets: A general deep learn- arXiv:1207.0580.
ing framework for inverse problems,” SIAM J. Imag. Sci., vol. 11, no. 2, pp. 991–1048,
2018, doi: 10.1137/17M1141771. [36] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedfor-
ward neural networks,” in Proc. 13th Int. Conf. Artif. Intell. Statist., JMLR Workshop
[13] J. C. Ye and W. K. Sung, “Understanding geometry of encoder-decoder CNNs,” in Conf. Proc., 2010, pp. 249–256.
Proc. 36th Int. Conf. Mach. Learn., PMLR, Cambridge, MA, USA, 2019, pp. 7064–
7073. [37] M. Scetbon, M. Elad, and P. Milanfar, “Deep K-SVD denoising,” IEEE
Trans. Image Process., vol. 30, pp. 5944–5955, Jun. 2021, doi: 10.1109/TIP.
[14] S. Mohan, Z. Kadkhodaie, E. P. Simoncelli, and C. Fernandez-Granda, “Robust 2021.3090531.
and interpretable blind image denoising via bias-free convolutional neural networks,”
in Proc. Int. Conf. Learn. Representations, 2020. [Online]. Available: https://iclr.cc/ [38] S. Herbreteau and C. Kervrann, “DCT2net: An interpretable shallow CNN for
virtual_2020/poster_HJlSmC4FPS.html image denoising,” IEEE Trans. Image Process., vol. 31, pp. 4292–4305, Jun. 2022,
doi: 10.1109/TIP.2022.3181488.
[15] M. Unser, “From kernel methods to neural networks: A unifying variational for-
mulation,” 2022, arXiv:2206.14625. [39] K. Zhang, Y. Li, W. Zuo, L. Zhang, L. Van Gool, and R. Timofte, “Plug-and-play
image restoration with deep denoiser prior,” IEEE Trans. Pattern Anal. Mach. Intell.,
[16] V. Papyan, Y. Romano, and M. Elad, “Convolutional neural networks analyzed via vol. 44, no. 10, pp. 6360–6376, Oct. 2022, doi: 10.1109/TPAMI.2021.3088914.
convolutional sparse coding,” J. Mach. Learn. Res., vol. 18, no. 1, pp. 2887–2938,
2017. [40] P. Liu, H. Zhang, K. Zhang, L. Lin, and W. Zuo, “Multi-level wavelet-CNN for
image restoration,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshops,
[17] K. Mentl et al., “Noise reduction in low-dose CT using a 3D multiscale sparse 2018, pp. 773–782.
denoising autoencoder,” in Proc. IEEE 27th Int. Workshop Mach. Learn. Signal
Process. (MLSP), Piscataway, NJ, USA: IEEE Press, 2017, pp. 1–6, doi: 10.1109/ [41] V. Monga, Y. Li, and Y. C. Eldar, “Algorithm unrolling: Interpretable, efficient
MLSP.2017.8168176. deep learning for signal and image processing,” IEEE Signal Process. Mag., vol. 38,
no. 2, pp. 18–44, Mar. 2021, doi: 10.1109/MSP.2020.3016905.
[18] F. Fan, M. Li, Y. Teng, and G. Wang, “Soft autoencoder and its wavelet adaptation
interpretation,” IEEE Trans. Comput. Imag., vol. 6, pp. 1245–1257, Aug. 2020, doi: [42] M. Malfait and D. Roose, “Wavelet-based image denoising using a Markov ran-
10.1109/TCI.2020.3013796. dom field a priori model,” IEEE Trans. Image Process., vol. 6, no. 4, pp. 549–565, Apr.
1997, doi: 10.1109/83.563320.
[19] L. A. Zavala-Mondragón, P. Rongen, J. O. Bescos, P. H. De With, and F. Van der
Sommen, “Noise reduction in CT using learned wavelet-frame shrinkage networks,” [43] A. Pižurica and W. Philips, “Estimating the probability of the presence of a signal
IEEE Trans. Med. Imag., vol. 41, no. 8, pp. 2048–2066, Aug. 2022, doi: 10.1109/ of interest in multiresolution single- and multiband image denoising,” IEEE Trans.
TMI.2022.3154011. Image Process., vol. 15, no. 3, pp. 654–665, Mar. 2006, doi: 10.1109/TIP.2005.863698.
[20] L. A. Zavala-Mondragón, P. H. de With, and F. van der Sommen, “Image noise [44] A. Pižurica, “Image denoising algorithms: From wavelet shrinkage to nonlocal
reduction based on a fixed wavelet frame and CNNs applied to CT,” IEEE Trans. collaborative filtering,” in Wiley Encyclopedia of Electrical and Electronics
Image Process., vol. 30, pp. 9386–9401, 2021, doi: 10.1109/TIP.2021.3125489. Engineering. Hoboken, NJ, USA: Wiley, 1999, pp. 1–17.
[21] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural [45] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image restoration by sparse
network for inverse problems in imaging,” IEEE Trans. Image Process., vol. 26, no. 9, 3D transform-domain collaborative filtering,” in Proc. SPIE Image Process.,
pp. 4509–4522, Sep. 2017, doi: 10.1109/TIP.2017.2713099. Algorithms Syst. VI, International Society for Optics and Photonics, Bellingham, WA,
USA, 2008, vol. 6812, pp. 62–73.
[22] M. Ranzato, F. J. Huang, Y.-L. Boureau, and Y. LeCun, “Unsupervised learning of
invariant feature hierarchies with applications to object recognition,” in Proc. IEEE [46] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denois-
Conf. Comput. Vision Pattern Recognit., Piscataway, NJ, USA: IEEE Press, 2007, pp. ing,” in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognit. (CVPR),
1–8, doi: 10.1109/CVPR.2007.383157. Piscataway, NJ, USA: IEEE Press, 2005, vol. 2, pp. 60–65, doi: 10.1109/
CVPR.2005.38.
[23] E. J. Candes and M. B. Wakin, “An introduction to compressive sampling,” IEEE
Signal Process. Mag., vol. 25, no. 2, pp. 21–30, Mar. 2008, doi: 10.1109/ [47] D. Yang and J. Sun, “BM3D-net: A convolutional neural network for transform-
MSP.2007.914731. domain collaborative filtering,” IEEE Signal Process. Lett., vol. 25, no. 1, pp. 55–59,
2018, doi: 10.1109/LSP.2017.2768660.
[24] L. Sendur and I. W. Selesnick, “Bivariate shrinkage functions for wavelet-based
denoising exploiting interscale dependency,” IEEE Trans. Signal Process., vol. 50, no. [48] H. Lee, H. Choi, K. Sohn, and D. Min, “KNN local attention for image resto-
11, pp. 2744–2756, Nov. 2002, doi: 10.1109/TSP.2002.804091. ration,” in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognit., 2022, pp.
2139–2149.
[25] G. Steidl and J. Weickert, “Relations between soft wavelet shrinkage and total
variation denoising,” in Proc. Joint Pattern Recognit. Symp., Berlin, Germany:
Springer-Verlag, 2002, pp. 198–205, doi: 10.1007/3-540-45783-6_25. SP

TIPS & TRICKS
David Shiung , Jeng-Ji Huang , and Ya-Yin Yang
Tricks for Designing a Cascade of Infinite Impulse Response

Filters With an Almost Linear Phase Response
D
esigning filters with perfect frequency tions. When filtering with a linear phase their recursive structures, IIR filters are
responses (i.e., flat passbands, sharp response, the intended signal merely unable to provide linear phase responses.
transition bands, highly suppressed experiences a constant group delay and In this article, we provide our solution to
stopbands, and linear phase responses) preserves its waveform. This is a vital the problem of perfect filtering with re-
is always the ultimate goal of any digi- feature for many applications, e.g., the duced complexity. Our solution is real-
tal signal processing (DSP) practitioner. denoising of electrocardiography (ECG) ized through cascading a prototype IIR
High-order finite impulse response (FIR) records [4] and seismologic signals [3]. filter with a few shaping APFs for an
filters may meet these requirements when Traditionally, high-performance fil- almost linear phase response over the
we put no constraint on implementation tering with linear phase responses is passband. The proposed composite fil-
complexity. In contrast to FIR filters, infi- achievable by high-order FIR filters. ter can be used to replace any ordinary
nite impulse response (IIR) filters, owing This, in turn, increases the system com- FIR filter with fixed filter coefficients.
to their recursive structures, provide an plexity (the number of adders and mul- We then approach perfect filtering with
efficient way for high-performance filter- tipliers), although there are techniques reduced complexity.
ing at reduced complexity. However, also to cut the system complexity in half by APFs have wide applications in the
due to their recursive structure, IIR filters folding the symmetric filter coefficients fields of DSP and communication [11],
inherently have nonlinear phase respons- [5]. The filter cascade technique can be [12], [13]. Ideally, an APF has a constant
es, and this does restrain their applicabil- used for designing filters with reduced magnitude response over the whole fre-
ity. In this article, we propose two tricks complexity, e.g., the composite filter in quency bands. The novelty of this ar-
regarding cascading a prototype IIR filter [6] and interpolated FIR filters [5], [7]. A ticle is that a cascade of some IIR filters
with a few shaping all-pass filters (APFs) comprehensive survey regarding the de- can produce a composite filter with an
for an almost linear phase response over sign techniques of FIR filters is presented almost linear phase response over the
its passband. After performing a delicate in [8]. Among these design techniques, filter passband. In particular, the filter-
design on the prototype and shaping fil- the works in [9] and [10] extend the idea ing performance of this composite filter
ters, we approach perfect filtering with of interpolated FIR filters by first design- can perform quite close to a high-order
reduced complexity. ing a bandpass filter and then modifying (thus, high-complexity) FIR filter. This
it by shaping the model filter by using composite filter provides a way to perfect
Preliminaries some masking filters. This technique is filtering by using limited complexity. We
Over the past decades, we have wit- an attractive candidate for obtaining a first introduce the design of a composite
nessed the power of DSP in various filter with a sharp transition band. The low-pass filter (LPF) through first de-
fields of applications, e.g., wireless results show supremacy over other designing a prototype filter that meets the
communication [1], [2], seismology [3], signs in terms of filtering performance design specifications for the magnitude
and biomedical sciences [4]. Digital fil- and implementation complexity. Howev- response. Then, the prototype filter is cas-
tering undoubtedly plays an important er, there still exists room for further im- caded with a few shaping APFs to rebuild
role in realizing these fancy applica- provement. In contrast to FIR filters, IIR an almost linear phase response over the
filters do achieve high-performance fil- filter passband. This idea is then extended
tering with low system complexity due to to designing a high-pass filter (HPF). The
Date of current version: 3 November 2023 their recursive structures. But also due to example filter shows that the intended

signal waveform is preserved, while the Consider the design of a real- By (7) and (9), we know that the group
unwanted signal is highly suppressed. coefficient APF. If the filter order is delay contributed by a real-coefficient
The transfer function of a feasible one, we can set the angle of pole i to APF is always positive. From (4), we
APF suitable for our composite filter is zero or r. If the filter order is two, its also know that rebuilding the frequency
of the form [14] transfer function can be the multiplica- phase response of a composite filter is
tion of that of two order 1 APFs with equivalent to rebuilding its group delay.
H ap ^ z h =
z -1 - a ) complex conjugate poles a and a ),
(1) Four shaping filters with parameters
1 - az -1
respectively. This is because a real-co- ^ r, ih = ^ 0.9, 0h, ^ 0.9, 0.2rh, ^ 0.8, 0.3rh,
where a = re ji is the complex pole of efficient equation has paired conjugate and ^0.9, rh are shown in Figure 1,
H ap ^ z h and 0 # i 1 2r. For stabil- roots. Thus, the transfer function of a where the frequency is normalized to
ity, we need 0 # r 1 1. By substituting real-coefficient shaping filter can be ~ = r (radians/sample). Here, i corre-
z = e j~ into (1), the frequency response defined as sponds to the peak of the group delay
of the APF can be written as curve, while r controls its shape. The
HS ^ z h _ choices of r and i provide degrees of
-j~ ) Z -1
H ap (~) = e - -aj~ ] z -r , i=0
freedom in shaping the group delay of
1 - ae ] 1 - rz -1
) j~ the composite filter. Obviously, our
= e -j~ 1 - a e-j~ . (2) ] z -1 - a ) z -1 - a
design freedom increases when more
1 - ae [ $ , 0 1 i 1 r.
] 1 - az -1 1 - a * z -1 shaping APFs are used.
We can confirm that H ap ^~h has a
-1
] z +r , i=r
] 1 + rz -1
unity magnitude that is independent of \ Properties of cascaded filters
(8)
~. Reorganizing (2), the phase function The central idea of the cascade tech-
of H ap ^~h is Note that the shaping filter defined nique for filter design is to build a
in (8) still has a frequency magnitude high-performance filter by cascading a
\H ap ^~h = - ~ - 2 tan -1 response independent of ~. Extending number of low-performance filters [5].
the result in (7) for the transfer function This technique can be used to sharpen
c m. (3)
r sin (~ - i)
1 - r cos (~ - i) defined in (8), we have the transition band or to suppress the
stopband of the prototype filter [6],
The group delay and phase delay as- grd 6H S ^~h@ = [15], [16], [17]. In this article, we focus
sociated with a system with frequency Z 1 - r2 on cascading a prototype filter with
response H ^~h are defined as
] ,i = 0
2
] 1 + r - 2r cos (~) M shaping filters for an almost linear
] phase response. The relationship of the
d\H ^~h
1 - r2
x gr ^ ~ h _ -
]] 2 +
(4) 1 + r - 2r cos (~ - i) input and output sequences for the com-
d~ [ .
] 1 - r2 posite filter is presented in Figure 2. The
, 0 1 i 1 r
and ] 1 + r 2 - 2r cos (~ + i) z-domain input and output sequences are
]
\H ^ ~ h
1 - r2 denoted by X (z) and Y (z), respectively.
x ph ^ ~ h _ -
] 2 ,i = r
(5) \ 1 + r + 2r cos (~) In particular, the prototype filter with a
transfer function H P ^ z h is an IIR filter
~
(9)
respectively. The group delay and phase
delay have an important implication for
an APF. If a narrow-band sequence x (n) = 20
s (n) cos (~ 0 n) is passed through an 18 r = 0.9, θ = 0
APF, the filter output y (n) becomes [14] r = 0.9, θ = 0.2π
16 r = 0.8, θ = 0.3π
y (n) = s (n - x gr ^~ 0 h) cos
Group Delay (Samples)
r = 0.9, θ = π
14

^~ 0 ^n - x ph ^~ 0 hhh. (6) 12
By definition, the group delay of 10
H ap (~) can be presented as 8
6
grd 6H ap ^~h@ = 1 - r2
2
1 + r - 2r cos (~ - i) 4
= 1 - r2 (7)
2
2
1 - re ji e -j~ 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Since 0 # r 1 1 for a stable APF, by Normalized Frequency
(7), we confirm that the group delay is FIGURE 1. The group delays of the shaping APFs for ^ r, i h = ^ 0.9, 0 h, ^ 0.9, 0.2r h, ^ 0.8, 0.3r h, and
always positive for all frequency bands. ^ 0.9, r h, respectively.

for meeting the specifications regarding grd 6H C ^~h@ = grd 6H P ^~h@ phase response is linear within the pass-
/ grd [H S,m ^~h].

the magnitude response. A family of M M band edge frequency ~ p . Figure 3 ex-
+
shaping APFs with transfer functions m=1 plains the idea of compensating the
H S, m ^ z h, m = 1, f, M is for remodel- (12) group delay of the prototype filter with-
ing the phase response of the composite One trick for designing a filter with a in the frequency ~ p . The group delay
filter. The aggregate transfer function of near perfect frequency response is cas- margin for compensation is in green. In
the composite filter is cading a number of simple APFs with a essence, this is an optimization problem
prototype filter. The other trick is using for finding the best parameters r and i
HC ^ z h = = H P ^ z h % H S, m (z).
Y (z) M
a Chebyshev type 2 filter as the proto- for the M shaping filters. Clearly, as the
X (z)
m=1
(10) type filter for efficient compensation of number of shaping filters increases, we
the group delay. We elaborate on these expect a better fill for the margin. In ad-
Clearly, the frequency magnitude re- two points in the following. dition, the geometry of the compensated
sponse of the composite filter obtained margin also impacts the error caused
by replacing z = e j~ into (10) is identi- Compensate the prototype filter by compensation. This idea is further
cal to that of the prototype filter; i.e., In the context, we use the terms con- verified in the design examples, and the
H C ^~h = H P ^~h . The frequency stant group delay and linear phase in- results are promising when M $ 3 for
phase response of the composite filter terchangeably for ease of explanation. the considered design specifications.
can be related with that of the prototype Although a FIR filter can have a lin- The mean of the synthesized group de-
filter and the M shaping APFs by ear frequency phase response over the lay of the composite filter over the pass-
whole frequency bands, it is unneces- band [0, ~ p] can be written as
\H C ^~h = \H P ^~h + / \H S, m ^~h.
M
sary for a band-limited signal. Actually,
m GD = 1 #0 grd 6H C (~)@ d~. (13)
~p
(11) m=1 the filter phase response can be relaxed
~p
to be linear only within the passband.
Taking the negative derivative with This is because the other bands are al- The flatness of the synthesized group
respect to ~ at both sides of (11), we re- ready suppressed by the prototype filter. delay over the passband can be defined
late the relationship of the composite fil- Consider the problem of designing as the root mean square (RMS) group
ter with its component filters as follows: a low-pass composite filter where its delay error of H C (~):
T GD _ 1 # " grd 6H (~)@ - m ,2 .

~p
C GD
~p 0
(14)
Π m=1H S,m(z)
m
Prototype X(z)HP (z)
X (z)HP (z)
X(z) Filter, HS,1 (z) H S,M (z) Y(z)
HP (z) Here, T GD can be used in the filter
Shaping APFs design specifications to qualify to what
degree the composite filter can perform.
FIGURE 2. The input and output relationship when cascading the prototype filter with M
shaping APFs. Design a low-pass
composite filter
The design specifications include the
Prototype Filter following:
Composite Filter 1) Passband edge frequency ~ p = 0.1r
Margin for Compensation (radians/sample).
2) Stopband edge frequency ~ s =
0.15r (radians/sample).
3) The passband peak-to-peak ripple is
+ +
less than 0.05 dB.
ωp ωp 4) The suppression in the stopband is
no less than 73 dB.
5) The RMS group delay error over
60, ~ p@ is less than 0.2; i.e.,
T GD # 0.2.
6) The total filter order number is no
ωp π greater than 20.
Frequency, ω 7) The number of shaping APFs is no
greater than three; i.e., M # 3.
FIGURE 3. Compensating the group delay of a prototype filter within ~ p by using a number of The prototype filter is for meet-
shaping APFs. ing the frequency magnitude response

of the design specifications, while the filters, the problem of designing a com- Equation (15) ensures that the RMS
shaping APFs are for the linear phase posite filter is then reduced to finding the group delay error is minimized over the
response. To facilitate the design, we di- optimal parameters (r1, f, rM, i 1, f, i M) solution space, that is, (16)–(18). Note
vide the design specifications into two for a cascade of M shaping APFs satis- that M is a positive integer, and the vari-
groups. The specifications in the first fying specifications 5–7. The objective ables (r1, f, rM, i 1, f, i M) are real
group, including constraints 1–4, are for function of the optimization problem is numbers. In addition, one shaping filter
designing the prototype filter; the speci- min r1, f, rM, i 1, f, i M contributes an additional filter order of
fications in the second group, including two to the composite filter. Thus, specifi-
T GD (r1, f, M rM, i 1, f, i M) (15)
constraints 5–7, are for designing the cation 6 is satisfied if the composite filter
shaping filters. Some popular IIR filters, subject to is constrained by (16). Note that (15) is
e.g., Butterworth filters, elliptic filters, not a linear function, and we cannot use
least-pth-norm filters, and Chebyshev 1 # M # 3 (16) linear programming to solve the problem.
filters, all are candidates for the proto- 0 # r1, frM # 1(17) To simplify the problem, we set M to one,
type filter. We just arbitrarily choose 0 # i 1, f, i M # r. (18) two, and three, respectively, and search
the least-pth-norm filter and Chebyshev
type 2 filter as the candidate prototype
filters. We can use some software pack- 20
age, e.g., MATLAB, to design a proto- 0
type filter satisfying specifications 1–4. –70
–20
Magnitude Response (dB)
The frequency magnitude responses are

given in Figure 4. The filter orders are –40 –80
0.15 0.2
eight and 12 for the candidate least-pth- –60
norm IIR filter and Chebyshev type 2
–80
filter, respectively. We also arbitrarily
choose a least-squares FIR filter of filter –100
order 200 as a baseline for comparison. –120
Note that all three filters meet specifi-
cations 1–4, and the baseline FIR filter –140
is far more complex than the other two –160
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
IIR filters, considering its high filter or-
Normalized Frequency
der. We can easily improve the stopband
suppression for the two IIR filters by Least-Squares FIR, Order 200
Least-pth-Norm IIR, Order 8
cascading them with a complementary
Chebyshev Type 2 IIR, Order 12
comb filter [6], [15], [16], [17]. The fre-
quency magnitude responses of the pro-
totype filters are identical to those of the FIGURE 4. A comparison of the frequency magnitude responses for a least-squares FIR filter,
corresponding composite filters. least-pth-norm IIR filter, and Chebyshev type 2 IIR filter.
The group delays of the baseline FIR
filter and the other two prototype filters
are provided in Figure 5. We can see 120
that the group delays of the two proto-
type filters are monotonously increasing 100
over the passband 60, ~ p@. In particular,
the least-pth-norm IIR filter has a large 80

group delay margin to be filled by the
shaping APFs as compared with that of 60
the Chebyshev type 2 filter. The base-
Least-Squares (FIR)
line FIR filter undoubtedly has a con- 40
Least-pth-Norm (IIR)
stant group delay throughout the whole
Chebyshev Type 2 (IIR)
frequency bands, due to its symmet- 20
ric filter coefficients. But we show in
the example that the linear phase over 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
the filter passband is enough to preserve
the intended signal waveform.
Provided that we know the filter or- FIGURE 5. A comparison of the group delays for a least-squares FIR filter, least-pth-norm IIR filter,
ders of the two candidate prototype and Chebyshev type 2 IIR filter.

through the reduced solution space con-
6 strained by (17) and (18). The solution
Group Delay Error

Chebyshev Type 2 LPF space can be first sliced using a coarse
4 Least-pth-Norm LPF
RMS grid for finding candidate solutions. Then,
2 the solution space around the candidate
solutions is sliced using a fine grid. The
0 process is repeated a few rounds, as in the
1 2 3 4
M work in [18]. This method is especially
useful for well-behaved functions, and
FIGURE 6. A comparison of the RMS group delay errors by using different prototype filters. the optimal solution can be obtained in a
few rounds.
0 Figure 6 shows the RMS group delay
errors of using the two candidate proto-
–5 Least-Squares FIR Filter
type filters when M = 1, f, 4. We find
–10 Composite Filter
that choosing the Chebyshev type 2 filter
Phase Response (rad)
–15 as the prototype filter always achieves a

–20 lower RMS group delay error than that
–25 of the least-pth-norm filter. This is be-
–30 cause the margin of the group delay to
be compensated is smaller for the Che-
–35
byshev type 2 filter than that of the least-
–40
pth-norm filter. Thus, we get M = 3, and
–45 ^r1, r2, r3, i 1, i 2, i 3 h = (0.8820, 0.8869,
–50 0.8892, 0.0514, 0.1553, 0.2625). The
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency corresponding RMS group delay er-
ror is 0.1196. Using these numerical
FIGURE 7. A comparison of the frequency phase responses for a least-squares FIR filter and our results, we accomplish the design of
composite filter. The prototype filter is a Chebyshev type 2 IIR filter.
the composite filter and obtain the fre-
quency phase response in Figure 7. As
80
compared with the baseline FIR filter,
70 we find that the composite filter can
achieve an almost linear phase response
over the filter passband 60, ~ p@.
60
60
50 Figure 8 illustrates the group delays

55 of the composite filter and those of
0 0.05 0.1
40 its component filters. In this case, the
group delay of the composite filter over
30 Prototype Filter
Shaping Filter 1 the passband is around 58.5, which is
20 Shaping Filter 2 smaller than that of the baseline FIR fil-
Shaping Filter 3
ter. The component filter parameters of
10 Composite Filter
the low-pass composite filter are tabu-
0 lated in Table 1.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Table 2 provides a comparison of
the complexity among five LPFs, which
FIGURE 8. The group delays of the composite filter and associated shaping filters. are 1) our composite LPF, 2) a least-
squares FIR filter of order 200, 3) an
Table 1. The component filter parameters of the low-pass composite filter. equiripple FIR filter of order 134, 4) a
FIR filter of order 182 designed by the
Filter Type Prototype Shaping Shaping Shaping window design method, and 5) a narrow
Filter Parameter Filter Filter 1 Filter 2 Filter 3
transition-band FIR filter designed in
Filter order 12 2 2 2
[9]. From (8), we know that one shap-
Passband peak-to-peak ripple (dB) 0.05 0 0 0 ing APF needs four multipliers and
Stopband attenuation (dB) 73 0 0 0 four adders. With an order 12 prototype
Passband edge frequency (radians/sample) 0.1r N/A N/A N/A filter, the composite filter needs a total of
Stopband edge frequency (radians/sample) 0.15r N/A N/A N/A 36 (4 # 3 + 12 # 2 = 36) multipliers
and 36 (4 # 3 + 12 # 2 = 36) adders.

The other three low-pass FIR filters 0.4r (radians/sample), and w (n) is a does preserve the intended waveform
(i.e., cases 2, 3, and 4) are designed us- Hamming window; ~ 1 and ~ 2 are lo- and remove the interfering signal.
ing MATLAB. All five of them meet cated at the filter passband, while ~ 3 is This can be vital for correct diagnosis.
the same design specifications, e.g., at the stopband. The Hamming window The discrete-time Fourier transform
passband/stopband edge frequencies, of length L + 1 is defined as [5], [14] (DTFT) of windowed sinusoids equals
passband peak-to-peak ripple, sup- the convolution of the DTFT of the
pression in the stopband, and so on, as w (n) = window with that of an infinitely long
outlined in the preceding. Clearly, the sinusoid. A windowed sinusoid thus
0.54 - 0.46 cos ` 2rn j, 0 # n # L,
composite filter is significantly simpler * L . has a spread of spectrum, depending
0, otherwise
than the other four FIR filters. Note that on the window length, around the cen-
we fold the FIR filter architectures for (20) ter frequency of the sinusoid. The cor-
cases 2, 3, and 4 by utilizing the sym- responding DTFT magnitude ; X (~) ;
metric filter coefficients presented in [5] The complete input sequence is de- when L = 60 is in Figure 9(b). Note
for reduced complexity. Also shown in fined as that the filter passband is full of a wide-
Table 2 are a comparison of the timing band signal, with an accompanied out-
complexity and the group delay of the x (n) = x 1 (n) + x 2 (n - L - 1) of-band signal around ~ = ~ 3 = 0.4r

five designs. Cases 1–4 all have con- + x 3 (n - 2L - 2), n $ 0. (21) (radians/sample).
stant filter coefficients, and there is no Figure 10 compares the output se-
need to update them constantly. Thus, In fact, x (n) can be regarded as a quences filtered by 1) a Chebyshev type
the timing control and timing complex- simulation of an ECG record from a 2 filter, 2) the composite filter, and 3) a
ity for all fives designs are low. From field trial. Parts x 1 (n) and x 2 (n) denote least-squares FIR filter. All three filters
Figure 8, we know that the group delay the intended signals, while x 3 (n) is an meet the filter magnitude specifications.
for the signals inside the passband of the interfering signal. A composite filter The output sequences for each filter are
composite filter is around 58.5, which is with an almost linear phase response denoted as y 1 (n), y 2 (n), and y 3 (n),
smaller than for the other four designs.
Table 2 is based on the central idea
of using the minimum filter order for Table 2. A comparison of the complexity among five LPFs.
each design so as to satisfy the same Filter Type Least-Squares Equiripple Window Interpolated
design specifications. DSP practitio- Complexity/ Composite FIR, Order FIR, Order Method FIR, Bandpass
ners then leverage the constraints on Group Delay LPF, M = 3 200 134 Order 182 Method [9]
the implementation platform and freely Number of multipliers 36 101 68 97 52
choose among the feasible designs. The Number of adders 36 200 134 182 100
philosophy of our comparison is fairly Timing complexity Low Low Low Low Low
common and widely used in commer-
Group delay (samples) 58.5 100 67 91 85.5
cial software packages, e.g., the Filter
Design and Analysis Tool in MATLAB,
although we, indeed, can set all the filter
orders to be fixed and compare their fre- 0.1
quency responses. In addition, our com- 0.5
parisons belong to the architecture level.
x(n)
0
This means we can bypass circuit-level
concerns, and it is a fair comparison. –0.5
–1
0 50 100 150 200 250
Filtering results Sample Number, n
Figure 9(a) shows an input sequence (a)
x (n) consisting of three narrow-band 25
pulses of sinusoids. The pulses are given 20
as follows: 15
|X(ω)|
x 1 (n) = w (n) cos (~ 1 n) (19a) 10

5
x 2 (n) = w (n) cos ` ~ 2 n - r j (19b) 0
2 –1 –0.8 –0.6 –0.4 –0.2 0 0.2 0.4 0.6 0.8 1
x 3 (n) = w (n) cos ` ~ 3 n + r j (19c) Normalized Frequency
5 (b)
where ~ 1 = 0.07r (radians/sample), FIGURE 9. The input sequence for verifying example filters: the (a) waveform of signal x [n] and
~ 2 = 0.02r (radians/sample), ~ 3 = (b) corresponding discrete-time Fourier transform magnitude | X (~)|.

respectively. The Chebyshev type 2 fil- byshev type 2 filter is unable to maintain This is due to the composite filter having a
ter is actually the prototype filter of the an undistorted waveform for the wide- lower in-band group delay as compared
composite filter; it is of filter order 12. band signal, due to its nonlinear phase with that of the least-squares FIR filter.
The composite filter has three shap- response. Comparing y 2 (n) with y 3 (n) ,
ing APFs and is of filter order 18. The we see that both filters do suppress the Pole-zero diagram of the
least-squares FIR filter has a linear out-of-band signals while preserving the composite filter
phase response over the whole frequen- in-band signals with high fidelity. The The pole-zero diagram of the compos-
cy bands and is of filter order 200; it is output sequences for both filters are sim- ite filter for M = 3 is presented in Fig-
used as a benchmark for performance ilar except that the baseline FIR filter ure 11. There is a total of six (2 # M)
comparison. Comparing Figure 10(a) produces an additional delay of approx- pole-zero pairs scattered over the band
with Figure 9(a), we can see that the Che- imately 42 samples (100 - 58 = 42) . [- 0.1r, 0.1r] as expected. All the
poles are within the unit circle, and the
composite filter is stable in any case.
1
0.5 Designing high-pass and other
types of composite filters
y1(n)
0
–0.5 Distortion The procedures for designing a high-
–1 pass composite filter are similar to those
0 50 100 150 200 250
for a low-pass composite filter. Assume
Sample Number, n
(a)
that the filter specifications are
1) Stopband edge frequency ~ s =
1
0.5
2) Passband edge frequency ~ p =
y2(n)
0
–0.5
3) Maximum passband peak-to-peak
–1
0 50 100 150 200 250 ripple A p = 1 (dB).
Sample Number, n 4) Minimum stopband suppression
(b) A s = 70 (dB).
1 5) The RMS group delay error over the
0.5 passband 6~ p, r@ is less than 0.3.
y3(n)
0 6) The total filter order is less than 28.

–0.5 7) The number of shaping APFs is no
–1 greater than four; i.e., M # 4.
0 50 100 150 200 250
We can arbitrarily choose an IIR fil-
Sample Number, n
(c) ter that meets specifications 1–4 as the
prototype filter. For example, we choose
FIGURE 10. A comparison of the output sequences filtered by a (a) Chebyshev type 2 IIR filter, (b) a Butterworth IIR filter and a Cheby-
composite filter, and (c) least-squares FIR filter. shev type 2 IIR filter as the candidate
filters, which results in a filter order of
20 and 10, respectively. The objective
1
function of the optimization problem
Shaping APFs can be formulated as
0.8
0.6 min r1, f, rM, i 1, f, i M

0.4 T GD (r1, f, rM, i 1, f, i M) (22)
Imaginary Part
0.2 subject to
0
–0.2 1 # M # 4 (23)
–0.4 0 # r1, frM # 1(24)
–0.6 0 # i 1, f, i M # r. (25)
–0.8
–1 For the two candidate prototype fil-
–1 –0.5 0 0.5 1
ters, specification 6 is met by the con-
Real Part straint (23).
The frequency magnitude responses
FIGURE 11. A pole-zero diagram of the composite filter. of the two candidate prototype filters

are shown in Figure 12. Also shown of 36 (4 # 4 + 10 # 2 = 36) multipliers all meet the same design specifica-
in Figure 12 for comparison is an and a total of 36 (4 # 4 + 10 # 2 = 36) tions as the composite filter. Clearly, the
equiripple FIR filter of filter order 82. adders. The three high-pass FIR filters composite filter is the simplest among
The frequency magnitude responses are all designed using MATLAB, and the four filters. Notice that we already
of the composite filters are the same
as those of their corresponding proto-
type filters. 0
Figure 13 demonstrates the group –20
delays of the three filters. The Cheby-
Magnitude Response (dB)

–40
shev type 2 IIR filter has a strictly de-
–60
creasing group delay over the passband
[~ p, r] and is easier to be compensated –80
by the shaping APFs than the Butter- –100
worth IIR filter for a given number of –120
shaping APFs. –140
Figure 14 compares the RMS group –160
delay errors for the two candidate pro- –180
totype filters. We see that the RMS –200
group delay error of the Chebyshev type 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
2 filter is lower than that of the Butter- Normalized Frequency
worth filter. Cascading the Chebyshev Equiripple (FIR), Order 82
type 2 filter with M = 4 shaping APFs Butterworth (IIR), Order 20
results in an RMS group delay error of Chebyshev Type 2 (IIR), Order 10
0.2191, and the resultant filter order is
only 18. In contrast to the baseline FIR FIGURE 12. A comparison of the frequency magnitude responses for an equiripple FIR filter, a But-
filter of filter order 82, the composite terworth IIR filter, and a Chebyshev type 2 IIR filter.
filter shows a significant reduction in
complexity. However, when using the
80
Butterworth IIR filter as the prototype
filter, the RMS group error for M = 4 is 70 Equiripple (FIR)
Butterworth (IIR)
0.7421, which is unable to meet the de- 60 Chebyshev Type 2 (IIR)
sign specifications.
50
We already obtained M = 4, (r1, r2,
r3, r4, i 1, i 2, i 3, i 4) = (0.8928, 0.9175, 40
0.9038, 0.8845, 3.0135, 2.822, 2.92, 30
3.1005). Figure 15 describes how the
20
group delay of the composite filter is
synthesized by the five component fil- 10
ters (one prototype filter and four shap- 0
ing filters). The in-band group delay
of the composite filter is around 66. –10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
We confirm that the composite HPF is Normalized Frequency
stable in any case. The component filter
parameters of the high-pass composite FIGURE 13. A comparison of the group delays for an equiripple FIR filter, a Butterworth IIR filter, and
filter are tabulated in Table 3. This com- a Chebyshev type 2 IIR filter.
pletes the design of a high-pass compos-
ite filter.
Table 4 provides a comparison of 6
Group Delay Error
the complexity among four HPFs, Chebyshev Type 2 HPF

4 Butterworth HPF
which are 1) the composite HPF, 2) an
RMS
equiripple FIR filter of order 82, (3) a 2

generalized equiripple FIR filter of
order 104, and 4) a FIR filter of order 0
1 2 3 4
146 designed by the window design M
method. With an order 10 prototype
filter, the composite filter needs a total FIGURE 14. A comparison of the RMS group delay errors for two candidate prototype filters.

simplified the three FIR filters by using for adaptive filters that are constantly Conclusions
the folded architectures presented in [5]. changing filter coefficients according to This article presented two tricks for
If we want to design a bandpass or some optimization algorithms. This is approaching a perfect filter (i.e., flat
band-stop composite filter, the design because the recursive structures of IIR passband, sharp transition band, highly
procedures are the same as those of the filters inherently have a good memory suppressed stopband, and linear phase)
two example filters. We find that Che- for samples. Nevertheless, a recursive with reduced complexity. This goal
byshev type 2 filters inherently have structure does not necessarily result was realized through first designing
relatively lower group delay margins for in a slow operating speed from a very a prototype filter to meet the design
compensation than the least-pth-norm large-scale integration (VLSI) hard- specifications regarding the frequency
and Butterworth filters. It is beneficial ware perspective. In fact, the peak op- magnitude response. The phase func-
to select a Chebyshev type 2 filter as the erating speed of a filter is constrained tion of the prototype filter was then
prototype filter when designing a com- by its critical path. The peak operating remodeled by a cascade of delicately de-
posite filter. speed of an IIR filter can be increased signed shaping filters. After cascading
The proposed composite filter can by cutting down the critical path by us- the prototype IIR filter with the shaping
be used to replace any ordinary FIR ing, e.g., pipelining or retiming tech- APFs, we obtained a composite filter
filter with fixed filter coefficients. This niques, and so can that of the composite with an almost linear phase response
means our composite filter is not suited filter [19]. over the filter passband. Two example
filters with highly reduced complexity
were demonstrated. We found that the
70
Chebyshev type 2 filter is an appeal-
60 ing candidate for the prototype filter
68
66
of the composite filter. Our composite
50 64
filter shows quite similar filtering per-
0.85 0.9 0.95 1 formance as the baseline FIR filter of
40 significantly higher complexity. The
Composite Filter composite filter provides a way to ap-
30 Prototype Filter
Shaping Filter 1
proach perfect filtering using limited
20 Shaping Filter 2 complexity and is especially useful for
Shaping Filter 3 replacing any ordinary FIR filter with
Shaping Filter 4
10 fixed filter coefficients. Further VLSI
implementations focusing on operating
0 speed, hardware cost, and so on can be
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency the next steps to further investigate the
benefit of the proposed composite filter.
FIGURE 15. The synthesis of the group delay of the composite filter.
Authors
Table 3. The component filter parameters of the high-pass composite filter. David Shiung (davids@cc.ncue.edu.
tw) received his Ph.D. degree from
Filter Type Prototype Shaping Shaping Shaping Shaping National Taiwan University, Taipei, in
Filter Parameter Filter Filter 1 Filter 2 Filter 3 Filter 4
2002. He is an associate professor at
Filter order 10 2 2 2 2 National Changhua University of
Passband peak-to-peak 1 0 0 0 0 Education, Changhua 500, Taiwan. His
ripple (dB) research interests include signal pro-
Stopband attenuation (dB) 70 0 0 0 0 cessing for wireless communication and
Passband edge frequency 0.88r N/A N/A N/A N/A astronomical imaging. He is a Member
(radians/sample) of IEEE.
Stopband edge frequency 0.82r N/A N/A N/A N/A Jeng-Ji Huang (hjj2005@ntnu.edu.
(radians/sample) tw) received his Ph.D. degree from
National Taiwan University, Taipei, in
2004. He is a professor at National
Table 4. A comparison of the complexity among four HPFs. Taiwan Normal University, Taipei 106,
Filter Type Composite Equiripple FIR, Generalized Equiripple Window Method Taiwan. His research interests include
Complexity HPF, M = 4 Order 82 FIR, Order 104 FIR, Order 146 5G, LoRaWAN, and vehicular ad hoc
Number of multipliers 36 42 53 74 networks. He is a Member of IEEE.
Number of adders 36 82 104 146 Ya-Yin Yang (ivyyang64@gmail.
com) received her Ph.D. degree in

electrical engineering from National Dec. 2018, pp. 226–232, doi: 10.1109/IECBES. cell,” Electronics, vol. 8, no. 1, Jan. 2019, Art. no. 16,
2018.8626661. doi: 10.3390/electronics8010016.
Taiwan University, Taipei, in 2009. She
[5] R. G. Lyons, Understanding Digital Signal [13] S. M. Perera et al., “Wideband N-beam arrays
is currently an assistant researcher Processing, 3rd ed. Boston, MA, USA: Pearson, using low-complexity algorithms and mixed-signal
with the Institute of Computer and 2011. integrated circuits,” IEEE J. Sel. Topics Signal
[6] W.-S. Lu and T. Hinamoto, “Design of least- Process., vol. 12, no. 2, pp. 368–382, May 2018, doi:
Communication Engineering, National squares and minimax composite filters,” IEEE Trans. 10.1109/JSTSP.2018.2822940.
Cheng Kung University, Tainan 701, Circuits Syst. I, Reg. Papers, vol. 65, no. 3, pp. 982– [14] A. V. Oppenheim and R. W. Schafer, Discrete-
991, Mar. 2018, doi: 10.1109/TCSI.2017.2772345. Time Signal Processing, 3rd ed. Boston, MA, USA:
Taiwan. Her research interests include
[7] Y. Neuvo, D. Cheng-Yu, and S. K. Mitra, Pearson, 2010.
channel estimation, radio resource allo- “Interpolated finite impulse response filters,” IEEE [15] D. Shiung, “A trick for designing composite fil-
cation, and interference cancellation for Trans. Acoust., Speech, Signal Process., vol. 32, no. ters with sharp transition bands and highly suppressed
3, pp. 563–570, Jun. 1984, doi: 10.1109/TASSP. stopbands,” IEEE Signal Process. Mag., vol. 39, no.
wireless communication systems. 1984.1164348. 5, p p. 70 –76 , S e p. 2 0 2 2 , d o i: 10 .110 9/
[8] S. Roy and A. Chandra, “A survey of FIR filter MSP.2022.3165960.
References design techniques: Low-complexity, narrow transi- [16] D. Shiung, Y.-Y. Yang, and C.-S. Yang,
[1] M. Vaezi, Z. Ding, and H. V. Poor, Multiple tion-band and variable bandwidth,” Integration, “Cascading tricks for designing composite filters with
Access Techniques for 5G Wireless Networks and vol. 77, pp. 193–204, Mar. 2021, doi: 10.1016/j. sharp transition bands,” IEEE Signal Process. Mag.,
Beyond, 1st ed. Cham, Switzerland: Springer-Verlag, vlsi.2020.12.001. vol. 33, no. 1, pp. 151–162, Jan. 2016, doi: 10.1109/
2019. [9] S. Roy and A. Chandra, “Design of narrow transi- MSP.2015.2477420.
[2] A. Ghosh et al., “Reconfigurable signal processing tion band digital filter: An analytical approach,” [17] D. Shiung, Y.-Y. Yang, and C.-S. Yang,
and DSP hardware generator for 5G transmitters,” in Integration, vol. 68, pp. 38–49, Sep. 2019, doi: “Improving FIR filters by using cascade tech-
Proc. IEEE Nordic Circuits Syst. Conf. (NorCAS), 10.1016/j.vlsi.2019.06.002. niques,” IEEE Signal Process. Mag., vol. 33, no. 3,
Oslo, Norway, Oct. 2022, pp. 1–7, doi: 10.1109/ [10] S. Roy and A. Chandra, “On the order minimiza- pp. 108–114, May 2016, doi: 10.1109/MSP.2016.
NorCAS57515.2022.9934696. tion of interpolated bandpass method based narrow 2519919.
[3] S. Bose, A. De, and I. Chakrabarti, “Area-delay- transition band FIR filter design,” IEEE Trans. [18] D. Shiung, P.-H. Hsieh, and Y.-Y. Yang,
power efficient VLSI architecture of FIR filter for pro- Circuits Syst. I, Reg. Papers, vol. 66, no. 11, pp. “Parallels between wireless communication and astro-
cessing seismic signal,” IEEE Trans. Circuits Syst., 4287–4295, Nov. 2019, doi: 10.1109/TCSI.2019. nomical observation,” in Proc. IEEE 29th Annu. Int.
II, Exp. Briefs, vol. 68, no. 11, pp. 3451–3455, Nov. 2928052. Symp. Pers., Indoor Mobile Radio Commun.
2021, doi: 10.1109/TCSII.2021.3081257. [11] J. G. Proakis and M. Salehi, Fundamentals of (PIMRC), Bologna, Italy, Sep. 2018, pp. 1–6, doi:
[4] T. M. Chieng, Y. W. Hau, and Z. Omar, “The Communication Systems, 2nd ed. Boston, MA, USA: 10.1109/PIMRC.2018.8580926.
study and comparison between various digital filters Pearson, 2014. [19] K. K. Parhi, VLSI Digital Signal Processing
for ECG de-noising,” in Proc. IEEE-EMBS Conf. [12] S. R. Aghazadeh, H. Martinez, and A. Saberkari, Systems: Design and Implementation, 1st ed. New
Biomed. Eng. Sci. (IECBES), Sarawak, Malaysia, “5GHz CMOS all-pass filter-based true time delay York, NY, USA: Wiley, 1999.
Ruiming Guo and Thierry Blu
Super-Resolving a Frequency Band
T
his article introduces a simple formula modes of a sampled signal are the peaks that it is hard to reconcile with the fact
that provides the exact frequency of a of its DFT. We also learn that the accu- that the frequencies of a signal made
pure sinusoid from just two samples racy of these frequency values is limited of a sum of K complex exponentials
of its discrete-time Fourier transform by the inverse of the time range of the can be recovered exactly from as few
(DTFT). Even when the signal is not a signal (Heisenberg uncertainty), which as 2K samples by using a two-century-
pure sinusoid, this formula still works in a correlates nicely with the inherent fre- old method due to Gaspard de Prony.
very good approximation (optimally after quency resolution of a DFT: 2r/N, if This apparent contradiction is resolved
a single refinement), paving the way for N is the number of samples of the sig- by recognizing that Heisenberg uncer-
the high-resolution frequency tracking of nal. This knowledge is so deeply rooted tainty relies on a much weaker signal
quickly varying signals or simply improv-
ing the frequency resolution of the peaks
of a discrete Fourier transform (DFT).
Single-Frequency Estimation Box 1: Notations

We learn (and teach!) in college that the • Single-frequency signal: x n = a 0 e jn~ , n = 0, 1, f N - 1
0
frequency content of a signal is encrypt- • Uncertainty band: ~ 0 ! 6~ 1, ~ 2@, with ~ 2 - ~ 1 = integer # 2r/N
ed in its FT and that the main frequency • Discrete-time Fourier transform (DTFT): X (~) = R nN=-01 x n e -jn~
• Discrete Fourier transform (DFT): X (2rk/N ), where k = 0, 1, f, N - 1
Digital Object Identifier 10.1109/MSP.2023.3311592 • Peak of the DFT: k 0 = argmax k X (2rk/N ) A ~ DFT = 2rk 0 /N
Date of current version: 3 November 2023

assumption (basically, that its time and is only when this model is inexact that given by the Cramér-Rao lower bound,
frequency uncertainties are finite) than the estimated frequencies may be in- which assumes unbiased estimators and
the sum-of-exponentials model. And it accurate, with their uncertainty now known noise statistics.
The contrast between the Fourier ap-
proach (analytic, intuitive, but Heisenberg
ω1 + ω2 ω − ω 1 |X (ω 2)| − |X (ω 1)| limited) and Prony’s method (algebraic,
ω0 = + 2 arctan tan 2
2 4 |X (ω 2)| + |X (ω 1)| black box, but exact) has made it diffi-
cult to envision a higher frequency reso-
lution that would rely on the DFT. Yet,
given that the DFT coefficients are just
samples of a very smooth function (the
DTFT), analytic considerations suggest
that this function can be approximat
ed locally by a quadratic polynomial,
|X(ω 2)| leading to an estimate of its off-grid
peak location; as few as three samples
around the max of the DFT a lready
|X(ω1)| provide a very good estimate of this
frequency [1], [2].
0 ω1 ω2 The Trick
2π
The main motivation for this article is to
Uncertainty Band put forward a formula that provides the
Frequency ω
frequency value of the maximum of the
FIGURE 1. The frequency ~ 0 (blue) of a single-frequency signal is obtained from the DTFT DTFT from just two DTFT coefficients
values at the endpoints (red) of a frequency interval known to contain ~ 0 (assumption: of a single-frequency signal; see the de-
~ 2 - ~ 1 = integer # 2r/N ) . Dotted line: a full period of the DTFT X (~) of the signal. tailed setting in “Box 1: Notations.” Not
only is this formula exact, but it is also
very robust to the inaccuracies of the
model, as we shall see later. More pre-
Box 2: Step-by-Step Didactic Proof of (1) cisely, if we assume that the unknown
frequency ~ 0 of the signal x n lies inside
Steps
N-1
zN - 1 e jN(~ - ~) - 1 an “uncertainty band” 6~ 1, ~ 2@ where
|z
0
1. Geometric sum: = z - 1 with z = e j(~ - ~) leads to X (~) = a 0 j(~ - ~)

n 0
. ~ 2 - ~ 1 is an integer multiple of 2r/N,
n=0 e -1 0
e -e
ji -ji this formula specifies how the ampli-
2. Euler’s formula: sin i = 2j leads to tudes of the DTFT of x n at ~ 1 and ~ 2
N-1 sin (N~/2) should be combined so as to recover ~ 0
X ( ~ 0 - ~) = a 0 e j 2 ~
sin (~/2) (see the visualization in Figure 1)
.
sin (N~/2) ~1 + ~2
then X (~ 0 - ~) = ; a 0 ; ~0 = + 2 arctan
sin (~/2) 2
]Z]
c tan ` ~ 2 - ~ 1 j m.
Z] ~ 12 = (~ 1 + ~ 2) /2 ]] X (~ 1) = ; a 0 ; sin (N (u + B)) X (~ 2) - X (~ 1)
]] ]] sin (u + B) 4 X (~ 2) + X (~ 1)
3. Notation: ][] B = (~ 2 - ~ 1) /4 leads to [] . (1)
]] ]] sin (N (u - B))
] u = (~ 0 - ~ 12) /2 ]] X (~ 2) = ; a 0 ;
\ ] sin (u - B)
\ A proof is provided in “Box 2: Step-
r ; X (~ 2) ; sin (u + B)
4. r- Periodicity of sin x : B = integer # 2N leads to ; X (~ ) ; = sin (u - B) . by-Step Didactic Proof of (1),” requiring
1
only elementary electrical and electron-
; X (~ 2) ; sin (B + u)
5. Sign of sin: !u # B leads to ; X (~ ) ; = sin (B - u) . ics engineering math knowledge. This
1
; X (~ 2) ; formula becomes even simpler when
6. Trigonometry: sin (a ! b) = sin a cos b ! cos a sin b leads to ; X (~ ) ; =
1 the uncertainty bandwidth is small (i.e.,
tan B + tan u
tan B - tan u . ~ 2 - ~ 1 % r)
; X (~ 2) ; - ; X (~ 1) ;
7. Algebraic resolution: tan u = tan B # ; X (~ ) ; + ; X (~ ) ; which leads to X (~ 1)
2 1
~0 . ~1 #
X (~ 1) + X (~ 2)
u = ~ 0 - ~ 12 = arctan c tan ` ~ 2 - ~ 1 j m.
; X (~ 2) ; - ; X (~ 1) ;
2 4 ; X (~ 2) ; + ; X (~ 1) ; X (~ 2)
+ ~2 #
X (~ 1) + X (~ 2)

Choose* ω 1 and ω 2 = ω 1 + 2π/ N Choose ω1= ω0−π/N
Such that ω 1 ≤ ω 0 ≤ ω 2. and ω2 = ω0 + π/N.
ω1 ω¯ 1
ω2 ω¯ 2
Calculate the Magnitude Calculate the Magnitude

of DTFT at ω 1 and ω 2. of DTFT at ω1 and ω 2.
|X (ω 1)| |X (ω 1)|
|X (ω 2)| |X (ω 2)|
Calculate the Estimate of Calculate the Estimate of

the Frequency ω 0 Using (1). the Frequency ω 0 Using (1).
ω̄ 0 ω̄¯ 0
*For instance, [ω 1, ω 2] = ω DFT + [−π/N , π/N ].
FIGURE 2. A refined trick: a double application of (1) achieves a quality that is equivalent to the best unbiased single-frequency estimation algorithm.
which is what intuition would sug- the complex-valued samples b n are 1) The number of samples, N, is ran-
gest: a weighting of the end frequen- independent realizations of a Gauss- dom (uniform) between 10 and
cies based on the relative magnitude of ian random variable with variance 1,000.
2
their DTFT. v -the “additive white Gaussian 2) The frequency ~ 0 is random (uni-
In practice, this formula is most noise” assumption. When the number form) between -r and r.
useful when the uncertainty band is of samples N is large enough, a lin- 3) a 0 = 1.
smallest, i.e., ~ 2 - ~ 1 = 2r/N. Then, a earization of (1) makes it possible to 4) The standard deviation v of the noise
straightforward procedure for estimat- calculate (tediously; not shown here) is such that the SNR is random (uni-
ing a single frequency from N uniform the standard deviation T~ of the fre- form) between -5 dB and 40 dB.
signal samples is to quency estimation error. In particular, 5) The noise b n is drawn from an inde-
1) determine ~ DFT, the frequency of in the case where ~ 0 is at the center pendent identically distributed statis-
the peak of the signal DFT of the interval 6~ 1, ~ 2@, this error be- tic (Gaussian).
2) apply (1) with ~ 1 = ~ DFT - r/N haves according to In each test, the uncertainty band before
and ~ 2 = ~ DFT + r/N. refinement is set by ~ 1 = ~ DFT - r/N
This works because, for a single-fre- 2 r2 and ~ 2 = ~ DFT + r/N. For compari-
2
quency signal, the peak of the DTFT T~ = 4 = r T~ CR (2) son purposes, Figure 3 also shows the
is always within ! r/N of the maxi- N N SNR 4 6 distribution of errors of Jacobsen’s es-
mum of the DFT. But it would also be timator [1], [2], which is based on three
possible to bypass the computation of where SNR = ; a 0 ; /v and where T~ CR consecutive DFT coefficients around
the full DFT if a rough estimate of ~ 0 is the Cramér-Rao lower bound of ~ DFT . The better performance of our
were available, e.g., when the frequen- the problem (see [3] for a calcula- formula is likely due to the higher SNR
cy of the signal is continuously chang- tion). Obviously, T~ is very close enjoyed by the two DFT coefficients
ing (tracking between successive signal to T~ CR, within less than 1%. When around the maximum of the DTFT
windows, as in radar applications) or ~ 0 is closer to the extremities of the in comparison to the three DFT coef-
when the frequency is a priori known interval 6~ 1, ~ 2@, T~ deviates from ficients used by Jacobsen’s formula,
up to some perturbation (physical res- the Cramér-Rao lower bound by up to one of which has a significantly lower
onance experiments, laser-based opti- 80%, still a very low error in absolute SNR because it is further away from
cal measurements, etc.). terms. In fact, a simple refinement the DTFT peak by more than 2r/N. A
of the trick, as depicted in Figure 2, somewhat milder difference is also that
Robustness to Inaccuracies shows how to attain this lower bound, (1) assumes an exact single-frequency
When the single-exponential model outlining the near optimality of this model, whereas Jacobsen’s formula is
is not exact, (1) is still a very ro- procedure; no other unbiased single- but a local quadratic approximation.
bust estimator of its frequency. This frequency estimation algorithm would Beyond just noise, when the single-
is particularly so when the uncer- be able to improve this performance frequency model is rendered inaccu-
tainty bandwidth ~ 2 - ~ 1 is reduced by more than 1%. rate due to, e.g., quantization, sample
to 2r/N an assumption that we will This is confirmed in Figure 3 by windowing, or the addition of other sinu-
make from now on. Indeed, consider simulations that consist of 1 million soidal/polynomial terms, the frequency
a noise model x n = a 0 e jn~ 0 + b n where tests where estimation error of (1) is controlled by f,

the maximum error of the magnitude of Note that because it is valid for Multiple Frequencies
the DTFT at the frequencies ~ 1 and ~ 2 , every single “noise” instance, this This formula can easily be used in a
according to bound is of a very different nature multiple-frequency scenario provided
than the statistical result (2)—an av- that the frequencies to estimate are suf-
4f tan ` r j erage over infinitely many additive ficiently separated. A straightforward
; ~r 0 - ~ 0 ; # 2N (3) white Gaussian noise realizations. procedure consists of first locating the
N ; a0 ;
144424443 Interestingly, the inaccuracies of isolated peaks of the DFT of the signal
. 22rf for large N the model outside the uncertainty (e.g., using MATLAB’s findpeaks
N ; a0 ; band do not contribute to the esti- function) and then applying (1) to refine
mation error, which suggests that (3) each frequency individually; see an ex-
where ; a 0 ; is the amplitude of the single- can be used to predict the accuracy ample in Figure 4. The estimation error
frequency “ground-truth” signal. (See of a multiple-frequency estimation of each frequency can be quantified by
the proof in the “Error Bound (3)” sec- problem that uses the single-fre- using the bound (3) where, in the ab-
tion in “Other Proofs.”) quency trick. sence of other noise, the data inaccuracy
f in the neighborhood of that frequency
is essentially caused by the tail of the
0.4 DTFT of the other frequencies. An ex-
ample of such a calculation is shown in
0.35
the “Error Bound (4)” section in “Other
Proofs,” leading to the following state-
PDF-Normalized Count
0.3
ment. Assume that the frequencies ~ k
0.25 Jacobsen [1]
Trick of the signal are distant from each other
0.2 Refined Trick (modulo (-r, r]) by at least d~ 2 r/N
Cramér-Rao and that the amplitude of the dominant
0.15 Bound
sinusoid is A; then, the estimation error
0.1 of any of the frequencies of the signal is
0.05 bounded according to
0 ~r k - ~ k #
2r
8
N
–6 –4 –2 0 2 4 6
Normalized Estimation Error ∆ω /∆ω CR DFT resolution
2 (K - 1) tan (r/(2N )) A
# . (4)
FIGURE 3. Histograms from 1 million random tests (number of samples N, SNR, frequency, and r sin ((d~ - r/ N ) /2) ; a k ;
noise). The error that results from using Jacobsen’s frequency estimator [1], the trick (1), or the 14444444244444443
`super-resolution_ coefficient
refined trick (Figure 2) is further normalized by the Cramér-Rao lower bound of the estimation prob-
lem (T~ CR = 2 3 N -3/2 SNR -1). The standard deviations of the three estimators are 1.5325, 1.3008,
Here, ~r k , ~ k , ; and a k ; are the estimated
and 1.0092, respectively. PDF: probability density function.
frequency, the ground-truth frequency,
and its amplitude, respectively. Despite
its coarseness (see Figure 4), this inequal-
ity already demonstrates superresolution
potential since the “superresolution” coef-
ficient is usually smaller than one and, in
DTFT
fact, tends to zero when N tends to infinity
DFT Samples
(for fixed d~) .
Uncertainty Band
Empirically, a minimum value of
Endpoint DTFT
4r/N for d~, or two DFT bins, seems
Estimated Frequency
to be sufficient to obtain good fre-
Ground-Truth Frequency
quency estimates. Of course, this cheap
approach to high-resolution multifre-
quency estimation is not optimal, yet
0 2π it could be used as the starting point
Frequency ω
of any iterative algorithm designed to
maximize the likelihood of the problem.
FIGURE 4. Multiple frequencies (three complex exponentials, 12 samples) can be accurately estimat-
ed by first locating the isolated peaks (e.g., using findpeaks in MATLAB) and then applying
(1) individually. The actual estimation errors of the three frequencies (from left to right) are roughly Conclusion
(0.01, 0.02, 0.03) # 2r/12, well below the resolution 2r/12 of the DFT; for comparison, the upper The frequency of a single complex ex-
bound (4) provides the much more conservative values (0.66, 0.45, 0.58) # 2r/12. ponential can be found exactly using

Other Proofs
Error Bound (3) K
e jN(~ - ~) - 1
X (~) = | a k
k
.
A direct proof of this inequality uses the fact that k=1 e j(~ - ~) - 1
k
; arctan a - arctan b ; # ; a - b ; and the triangle inequality

; a + b ; # ; a ; + ; b ;. More specifically, denoting by X 1, X 2 the Evaluating the estimation error of the frequency ~ k 0 using
discrete-time Fourier transform (DTFT) of the “ground-truth” (1) requires calculating the bound f in (3), i.e., the maxi-
signal at ~ 1, ~ 2, and by f 1, f 2 the errors (caused by mum error between X (~) and the DTFT of a single-
noise or otherwise) on ; X 1 ;, ; X 2 ;, we have frequency model, when ; ~ - ~ k 0 ; # r/N (with the hypothesis
that the minimum distance between ~ k 0 and the other ~ k
~r 0 - ~ 0 = 2 arctan d tan a 2N k
r X1 + f1 - X2 - f2 n is at least d~ 2 r/N )
X1 + f1 + X2 + f2
- 2 arctan d tan ( 2N ) n
X1 - X2 jN (~ k - ~) jN (~ k - ~)
X (~) - a k 0 e j (~ k - ~) - 1 = a k e j (~ k - ~) - 1
/
r 0
X1 + X2 e 0
-1 k ! k0 e -1
# 2 tan a 2N k
r X1 + f1 - X2 - f2 X1 - X2 sin (N (~ k - ~) /2) (using triangle inequality
X1 + f1 + X2 + f2
-
X1 + X2 # / ;ak ;
144444444444444444424444444444444444443 k ! k0 sin ((~ k - ~) /2) and Euler’s formula)
2 f 1 ( X 2 + f 2) - 2 f 2 ( X 1 + f 1 ) A
= # / (denoting A = max a k )
( X 1 + X 2 ) ( X 1 + f 1 + X 2 + f 2) k ! k0 sin (( ~ k - ~) /2) k = 1fK
r max (;2f 1 ;, ;2f 2 ;) ( X 1 + f 1 + X 2 + f 2) (K - 1) A

# 2 tan a 2N k #
( X 1 + X 2 ) ( X 1 + f 1 + X 2 + f 2) min sin ((~ k - ~) /2)
144444444444444444444444444424444444444444444444444444443 k ! k0
r max ^ ; f 1 ;, ; f 2 ; h
= 4 tan a 2N k (K - 1) A (since ~ - ~ k0
X1 + X2 #
sin ((d~ - r/N ) /2) # r/N 1 d~) .
which leads to the inequality (3) after noticing that

The right-hand side of the last inequality provides an upper
; X 1 ; + ; X 2 ; $ N ; a 0 ; (because ~ 0 ! [~ 1, ~ 2]) .
bound for f, which we can use in (3) to find
Error Bound (4)
Denoting by ~ 1, ~ 2, f ~ K the K different frequencies and
a 1, a 2, f a K the associated (complex-valued) amplitudes, 2r # 2 (K - 1) tan (r/(2N )) A .
~r k 0 - ~ k 0 #
the DTFT of the samples x n is given by N r sin ((d~ - r/N ) /2) ; a k 0 ;
the magnitude of only two samples of Hong Kong Special Administrative Hong Kong, Sha Tin New Territories,
its DTFT, as this article shows. In the Region, China, in 2021. He is currently a Hong Kong Special Administrative
presence of noise or other inaccuracies, postdoctoral research associate at the Region, China, where he has been since
the trick that we provide is very robust Department of Electrical and Electronic 2008. He received two Best Paper
and can even be iterated once to reach Engineering, Imperial College London, Awards (2003 and 2006) and is the coau-
the theoretical optimum (Cramér-Rao SW7 2AZ London, U.K., working with thor of a paper that received a Young
lower bound)—up to less than 1%. The Prof. Ayush Bhandari on computational Author Best Paper Award (2009), all
robustness of this formula makes it pos- imaging and modulo sampling. He from the IEEE Signal Processing
sible to, e.g., refine the peaks of the DFT worked as a postdoctoral research fellow Society. His research interests include
of a signal, but we also anticipate that it with Prof. Thierry Blu at the Electronic wavelets, approximation and sampling
can be used as a tool for high-resolution Engineering (EE) Department at CUHK, theory, sparse representations, biomedi-
frequency estimation. For teaching pur- from 2021 to 2022. He received the Post cal imaging, optics, and wave propaga-
poses, we provide a step-by-step proof graduate Student Research Excellence tion. He is a Fellow of IEEE.
that requires only undergraduate signal Award from the EE Department of
processing knowledge. CUHK in 2022. His research interests
References
include sparse signal processing, sam- [1] E. Jacobsen and P. Kootsookos, “Fast, accurate
Acknowledgment pling theory, inverse problems, modulo frequency estimators [DSP Tips & Tricks],” IEEE
Signal Process. Mag., vol. 24, no. 3, pp. 123–125,
Thierry Blu is the corresponding author. sampling, and computational sensing May 2007, doi: 10.1109/MSP.2007.361611.
and imaging. [2] Ç. Candan, “A method for fine resolution frequen-
Authors Thierry Blu (thierry.blu@m4x.org) cy estimation from three DFT samples,” IEEE Signal
Process. Lett., vol. 18, no. 6, pp. 351–354, Jun. 2011,
Ruiming Guo (ruiming.guo@imperial. received his Ph.D. degree from Télécom doi: 10.1109/LSP.2011.2136378.
ac.uk) received his Ph.D. degree in elec- Paris (ENST) in 1996. He is a professor [3] P. Stoica and A. Nehorai, “MUSIC, maximum like-
lihood, and Cramér-Rao bound,” IEEE Trans. Acoust.,
tronic engineering at the Chinese in the Department of Electronic Speech, Signal Process., vol. 37, no. 5, pp. 720–741,
University of Hong Kong (CUHK), Engineering, the Chinese University of May 1989, doi: 10.1109/29.17564.

Shlomo Engelberg
Implementing Moving Average Filters Using Recursion
M
oving average filters output the av- N-1 Note, however, that the apparent
erage of N samples, and it is easy yn = 1 / xn - k singularity at z = 1 is removable, and
N k=0
to see (and to prove) that they are N-1 in fact, it is not properly speaking a
low-pass filters. A simple, causal moving = x n /N + 1 / xn - k singular point of the transfer function.
N k=1
average filter satisfies N-2
Dividing the numerator of H(z) by the
N-1 = x n /N + 1 / x ( n - 1) - k denominator, one finds that
yn = 1 / x n - k .(1) N k=0
H (z) = ^1 + z -1 + g z - (N - 1) h /N, z ! 0
N k=0 N-1
= x n /N + 1 / x (n - 1) - k - x n - N / N
Because of their simplicity and intui- N k=0
tive appeal, they are often preferred to and this is the version of the transfer
more complicated low-pass filters when which is equivalent to function one arrives at if one starts from
one would like to remove high-frequency (1) and proceeds naïvely. Here it is clear
noise and the demands on the filter are y n = x n /N + y n - 1 - x n - N /N (2) that H(0) is perfectly well defined and
not too great. is equal to 1. If one would like to write
and this points to a different possi- the transfer function in closed form, the
Background ble implementation. One can imple- correct way to do so is to write
In this column, we explain how implement the filter by setting the current
menting a moving average filter us- output, y n , equal to the sum of the 1 1 - z -N , z ! 0, 1
ing a recursive formulation (where previous output value, y n - 1, and the H (z) = * N 1 - z -1 . (3)
the current output is a function of the current sample of the input divided 1 z=1
current and previous inputs and previ- by N, x n /N, less the oldest sample
ous outputs) is possible. We then show of the input that was “part of” the When written this way, it is clear that
why it can be problematic and how to previous value of the output divided 1 is contained in the transfer function’s
deal with that problem (and this is by N, x n - N /N. If one would like to region of convergence, and this is yet
the “tip”). Finally, we describe cir- minimize the number of operations another way to see that the filter is stable
cumstances under which the problem needed to calculate each value of [1]. As we find in the section “An Im-
does not exist even though one might the output, y n (if an efficient imple- plementation Issue with the Recursive
have thought that it should (and this is mentation of the moving average fil- Implementation,” when rounding errors
the “trick”). ter is important), the method based are added to the picture, the system’s
on (2) is to be preferred. (See, for stability becomes a somewhat more in-
A Tip and a Trick example, [2].) volved question.
The Moving Average Filter Stability Rounding Errors in Floating

Let x k = x (kTs), Ts > 0 be samples of Is a moving average filter stable? The Point Calculations
x(t). One can define a moving average short answer is that it must be. Con- Floating point numbers such as C’s
filter by using (1), and this can be used sidering (1), it is clear that the filter is float data type are versatile, but when
as a template for implementing a mov- a finite-impulse-response (FIR) filter, adding a small floating point number to
ing average filter; looking at (1), one and all FIR filters are stable [1]. a large one, one generally loses some
would be inclined to calculate each Considering the (two-sided) Z-trans- accuracy. Floating point numbers gen-
sample of the output by summing the form of (2), we find that erally have a certain number of bits al-
last N values sampled and dividing the located to storing a “number,” called
Y (z) = X (z) /N + z -1 Y (z) - z -N X (z) /N.
sum by N. the mantissa, and a certain number of
One can, however, express the output The transfer function of the filter bits allocated to storing an exponent,
of a moving average filter, yn, as would seem to be the power of two by which the man-
tissa, a binary fraction, is multiplied. If
1 z -N num is the number to be stored, we
Digital Object Identifier 10.1109/MSP.2023.3294721 H (z) = 1 - -1 , z ! 0, 1.
Date of current version: 3 November 2023 N 1-z find that num = mantissa # 2 exponent .

The absolute value of the mantissa is of- composed of the desired output (related implementing a moving average filter
ten required to be greater than or equal to the input signal, xn) and the sum of all via recursion. When working in the C
to one and less than two, so that its first the previous rounding errors; it is programming language, for example,
digit is always a one (which makes it N-1 n use ints rather than floats in the
possible not to store that binary digit). yn = 1 /x +/r. moving average filter. (See [2], for ex-
N k = 0 n-k k = 0 k
In order to fix ideas, consider a sim- ample, for a somewhat different presen-
ple, nonpractical, example. Suppose one If there are no rounding errors, the tation of the problem and this solution.)
has four bits dedicated to the mantissa last term does not cause any problems. When implementing a moving aver-
and two to the exponent. Consider the If there are rounding age filter in C, it is often
sum 1.01b # 2 11 b + 1.01b # 2 00 b . Writ- errors, as one would actually convenient to
ing the summands out in binary, we find expect when adding
The absolute value of the work with C ints rath-
that we are adding 1010b to 1.010b. and subtracting float- mantissa is often required er than C floats. On
As the mantissa is limited to four bits, ing point numbers, to be greater than or the microcontroller we
the sum is 1011b = 1.011b # 2 11 b . Be- for example, then one equal to one and less than used (the ADuC841,
cause of the limited number of bits the expects to see the sum two, so that its first digit a member of the 8051
mantissa has, the sum is not accurate. It of these (generally is always a one (which family), when working
is off by 0.01b. very small errors) af- with ints, it is conve-
fect the output of the
makes it possible not to nient to store the (inte-
An Implementation Issue With system by causing a store that binary digit). ger) values read from
the Recursive Implementation small change to the the analog to digi-
Suppose that one uses the recursive imple- output, and one expects the maximum tal converter (ADC) in an array and not
mentation based on (2), but at each stage, size of this change to increase with time. convert the values to voltages. When the
there is a small amount of noise caused by As yn is the sum of N terms of the time comes to output values to the digital
rounding errors. Then (2) becomes form x n /N and of noise-related terms, to analog converter (DAC), the values are
yn should be substantially larger than already properly scaled, as the ADC and
y n = x n /N + y n - 1 - x n - N /N + rn (4)
any given x n /N. As we saw in the DAC can be set to scale values in the same
where rn is the noise term. section “Rounding Errors in Float- way. Working with ints makes a lot of
The Z-transform of (4) is ing Point Calculations,” adding small sense from the point of view of effective
floating point numbers to large ones and efficient programming, even if it leads
Y (z) = X (z) /N + z -1 Y (z) can lead to rounding errors. Thus, (4), to there being no point at which the int-
- z -N X (z) /N + R (z) which includes a noise term, seems to based program “knows” the numerical
or be an appropriate model for a mov- value of the voltage of the input or the out-
ing average filter implemented using put (in volts) and to the engineer needing
N (z - 1) Y (z) = (z - z - (N - 1)) X (z)
floating point numbers, the moving to work with unnormalized quantities and
+ NzR (z) .
average filter is marginally stable not volts.
Thus, we find that with respect to rounding errors, and
- (N - 1)
we expect the output voltage to tend to The Trick: The Unreasonable
Y (z) = 1 z - z X (z) drift slightly with time as the sum of Effectiveness of floats
N z-1
the (very, very small) rounding errors We now consider a case where the tip
+ z R (z), z ! 0, 1. changes (very slowly). turns out to be unnecessary, and that
z-1
brings us to the trick. We wanted to write
It is clear that the first term on the The Tip: Avoiding Error the a program to demonstrate the problem
right-hand side is the same as the trans- Fixed-Point Way with using floating point numbers when
fer function in (3), and it is the transfer If one stores all of one’s numbers as implementing a moving average filter
function of a stable system. The second fixed-point numbers or integers, then as recursively by implementing such a fil-
term is, however, another story. This long as there is no overflow or malfunc- ter on an ADuC841, which has a 12-bit
term truly has a single pole at z = 1, and tion when adding and subtracting, there ADC. To calculate the voltage corre-
this is the sign of a marginally stable is no “rounding error,” (2) is a complete sponding to the 12-bit value returned by
system. In fact, description of the filter, and the filter the ADC, one must multiply the 12-bit
is stable. Thus, if one works entirely in value (when considered as an integer)
z
, | z |2 1 integer or fixed point arithmetic, the fil- by the voltage of the ADuC841’s refer-
z-1
ter is properly characterized by (2), and ence voltage, 2.5 V, and divide by 212.
is the transfer function of a summer. one can implement the filter recursively In order to demonstrate the problem
Thus, the output of a moving average without any stability issues. with using floats, we stored the mea-
filter that is implemented recursively Thus, the tip is to use some form of sured voltages and used them (and not
and that starts operating at time n = 0 is fixed point or integer arithmetic when the 12-bit values returned by the ADC)

in our calculations. We confidently ex- 1 V (dc). The formula used to convert Conclusions
pected to find that the output of the fil- samples to floating point was In this column, we provide a descrip-
ter would be the expected output and a tion of a problem that moving average
small “dc drift” because of the buildup sample_val = (ADCDATAH*256 filters can suffer from, and we explain
of the rounding error. We did not find + ADCDATAL)* 2.5 why that problem should not actually
such a drift. / CONVERSION_FACTOR; rear its head if the values read in from
The reason is actually fairly clear that A/D are stored as integers but that
and has to do with the way floating where ADCDATAL contains the lower we expect trouble from floating point
point operations are implemented. In eight bits of the ADC reading, and numbers. The tip, which is not new, is to
the C we used, a floating point number ADCDATAH contains the upper four bits use ints rather than floats. We then
is stored as a single sign-bit, (what of the ADC reading (and in the pro- explained why under certain conditions,
is effectively) a 24- gram we wrote, its one can store the numbers as floats
bit mantissa (where We then explained why four most significant and not suffer any ill effects. In particu-
the mantissa is actu- bits are always zero). lar, when using the ADuC841, the most
under certain conditions,
ally stored in 23 bits, At compile time, the natural formula for conversion from the
and the first bit of the one can store the numbers user could choose integer value of the A/D reading to the
mantissa, the “un- as floats and not suffer to make CONVER- floating point value of the measured
stored bit,” is always any ill effects. SIO N _ F A C T O R voltage leads to no loss of accuracy, to
one), and an 8-bit ex- either 4,096.0 (the no rounding errors, and even adding
ponent [3]. Multiplication by 2.5 is the correct conversion factor) or 3,000.0 thirty-two such measurements does not
same as adding twice a number to half (as though the correct conversion fac- bring us to the point where there will be
the number, and for binary numbers, tor was 3,000.0) by making use of the any rounding errors. To receive a copy of
this is the same as shifting the number following preprocessor commands and the code used to implement the moving
left by one and adding it to the num- either leaving in or commenting out the average filter, please send an e-mail to
ber shifted right by one. Multiplying a first line (that #defines EXACT). shlomoe@g.jct.ac.il.
floating point number by 2.5 lengthens
its mantissa by two bits. As the number #define EXACT Author
read from the ADC uses not more than #ifdef EXACT Shlomo Engelberg (shlomoe@g.jct.ac.
12 out of the 24 bits of the mantissa that #define CONVERSION_FACTOR 4096.0 il) received his bachelor’s and master’s
it is “entitled to,” this is not a problem. #else degrees in engineering from The
Similarly, dividing by 4, 096 = 2 12 is #define CONVERSION_FACTOR 3000.0 Cooper Union, New York, and his
simply changing the exponent (reduc- #endif Ph.D. degree in mathematics from New
ing it by 12), and again, no precision is York University’s Courant Institute.
lost. In total, at most 14 bits of the man- When CONVERSION _ FACTOR was He is a member of the Department
tissa are in use, so even after adding 32 set to 4,096.0, we expected to see of Electrical and Electronics Engi
such numbers to one another (as we do the dc offset remain constant. When neering, Jerusalem College of Tech
in our moving average filter), no more CONVERSION _ FACTOR was set to nology, Jerusalem 9116001, Israel. His
than 19 of the 24 bits are in use. Add- 3,000.0, we expected to see a very slow research interests include applied
ing the new measurement does not lead drift because of the information in the mathematics, instrumentation and mea-
to any imprecision, and the subtractions least significant bits that was ignored. surement, signal processing, coding
also come off without a hitch. We measured the output of the theory, and control theory. He is a
If you want to see the effects of system using the PicoScope 2204A. Senior Member of IEEE.
rounding error, divide by 3,000 (and When setting CONVERSION _ FAC-
multiply by it when necessary) instead TOR to 4,096.0 and inputting a 10-Hz References
[1] S. Engelberg, Digital Signal Processing: An
of by 4,096. Then you should see a very sine wave for a period of just over 2 h, Experimental Approach. London, U.K.: Springer-
slow drift in the “dc” value of the signal. we found a change in the dc value of Verlag, 2008.
about 1.0 mV (and from time to time, [2] S. W. Smith, The Scientist and Engineer’s Guide
to Digital Signal Processing, 2nd ed. San Diego, CA,
Numerical Example the dc value returned to the value USA: California Technical Publishing, 1999.
We wrote a program that implements a from which it started). When using a Accessed: Dec. 18, 2022. [Online]. Available: https://
www.analog.com/en/education/education-library/
moving average filter that averages the CONVERSION _ FACTOR of 3,000.0, scientist_engineers_guide.html
last 32 samples using C floats, and we we found that after a bit more than [3] “Floating-point numbers.” ArmKeil. Accessed:
examined the output of the filter when 1 h and 40 min, the dc value had increased Dec. 19, 2022. [Online]. Available: https://www.keil.
com/support/man/docs/c51/c51_ap_floatingpt.htm
its input was a sine wave “riding on” by almost 17 mV.

Yeonwoo Jeong , Behnam Tayebi , and Jae-Ho Han
Sub-Nyquist Coherent Imaging Using an Optimizing

Multiplexed Sampling Scheme
S
everal techniques have been devel- the Fourier transform. Familiarity with phase displacement over a period of time.
oped to overcome the limitation of holography is also beneficial. Thus, the principle of superposition and
sensor bandwidth for 2D signals [1]. the intensity form the basis for recording
Though compressive sensing is an attrac- Background the interference of coherent light waves
tive technique that reduces the number of As shown in Figure 1, when two light and comprise the core of holography [4].
measurements required to record infor- waves, E1 and E2, approach each other The holography technique was de-
mation on a sparse signal basis [2], [3], re- and finally meet at the recording point, veloped to preserve the depth informa-
cording information beyond the Nyquist P, this situation can be mathematically tion in an object signal, which cannot
frequency remains difficult when work- described by using the principle of super be captured by a normal camera [5].
ing with nonsparse signals. Given this position of electromagnetic fields and the When an object image is captured by
constraint, this article focuses on the use wave intensity at that point. The princi- an image sensor, the recorded intensity
of the physical bandwidth of a coherent ple of superposition in optics states that is proportional to the square of the ob-
signal in the complex form instead of its when multiple waves overlap in a me- ject signal amplitude, causing the phase
intensity form. The resulting trick com- dium, the resultant amplitude is equal to information of the object signal to be
bines holographic multiplexing with sam- the algebraic sum of the individual wave lost. By contrast, holography maintains
pling scheme optimization to obtain the amplitudes. The intensity is defined as the phase information using a refer-
information in a 2D coherent signal from the radiant power density of the light sig- ence signal. For example, a microscopy
beyond the Nyquist frequency range. The nal detected by a device such as a cam- technique using holography, as illus-
prerequisites for understanding this ar- era or an eye, and it can be expressed as trated in Figure 2, can be divided into
ticle are a knowledge of basic algebra and the time average of the wave amplitude two setup components: magnification
squared. As this article deals only with and frequency modulation. Light from
coherent light, electromagnetic waves the coherent light source is scattered
Date of current version: 3 November 2023 can be assumed to maintain a fixed and passed through the object, and the
Interference of Two Light Waves
E1 = E1e –i (kd 1 – ω t + φ 1) 2π
, k=
P λ
E2 = E2e –i (kd 2 – ω t + φ 2)
EP = E1 + E2 by the Principle of Superposition
d1 Intensity
I1 = E1E1∗ = E12 Where ∗ Means Complex Conjugate

d2 I2 = E2E2∗ = E22
IP = EPE∗P
= (E1 + E2)(E1 + E2)∗
Light Wave E1 = (E1 + E2)(E1∗ + E2∗)
Light Wave E2 = E1E1∗ + E2E2∗ + E1E2∗ + E2E1∗
= I1 + I2 + I12 Where I12 = E1E2∗ + E2E1∗
FIGURE 1. Interference of two light waves and the intensity. Here d is the distances traveled by the beams, ~ is the angular velocity, k is the angular wave-
number, and z represents the phase of the light at time 0. The intensity at point P has the interference term, I 12 other than the individual intensities.

objective lens magnifies the resulting erence signal, with the intensity of this circle describing the object signal inten-
object signal. The reference beam, ER, combined signal, ; E R + E O ;2, recorded sity is twice that of the object signal am-
can be subsequently produced using a by the image sensor. The recorded ob- plitude because the Fourier transform of
grating to duplicate the magnified signal ject signal EO can be represented with the two multiplied signals is equivalent
and reorient it to a specific angle. The its complex magnitude, ; E O (x, y) ;, and to the convolution operation between the
two signals after grating are physically the phase component, e iiO ^ x, yh, while individual Fourier transformed signals.
transformed into the frequency domain the magnitude and the phase of the re- The phase information of the object is
at the Fourier plane, while an analog fil- corded reference signal, ER, are | ER | and preserved in the bandwidths of the twin
ter removes all information except the e ii R x, respectively, where i R is obtained sidelobes, expressed as Q U O ^ fx ! i, fyh,
center intensity of the duplicated signal by dividing 2r by the grating period. simply reflecting the original frequency
to obtain the reference signal, ER. Final- As indicated by the recorded fre- information of the object shifted by
ly, a second lens is applied to cause the quency scheme shown in Figure 2(b), ! i. Therefore, in contrast to the case
original signal to interfere with the ref- the radius of the physical bandwidth of direct imaging [Figure. 2(a)], which
Sensor Plane
LASER
Coherent Light Source

x
Fourier Plane y
Magnified Object Λ θ ER I = ER + EO2

R
Objective Image
Lens EO Sensor
Object
Grating
(b)
First Lens Second Lens
Image First Order

I = EOEO∗ Sensor
= EO2 (a)
Fourier Transform Zeroth Order
Fourier Transform
∼ Algorithm
F(EO) = UO ( fx, fy) Analog Filter Algorithm
F(I) = F{EO EO∗)
∼ ∼
= UO ( fx, fy) ∗ UO ( fx, fy)
(∗: Convolution)
U0 U0 : The Range of the Sensor Bandwidth for the Direct Imaging
Kp Kp : The Range of the Physical Bandwidth of the Object Signal Intensity
kp kp : The Range of the Physical Bandwidth of the Object Signal Amplitude
(b) The Single Hologram Scheme

Object Signal, EO = EO (x, y)eiθO (x, y)
2π
Reference Signal, ER = EReiθR x, θR = Λ = Grating Period
∼ Λ
F(EO) = UO ( fx, fy)
I = ER + EO2
∗
= (ER + EO)(ER + EO)
= (ER + EO)(ER∗ + EO∗ )
fmax = ERER∗ + EO EO∗ + EOE∗R + ER EO∗ Autocorrelation (Center Lobe)
F(I ) = F{ERER∗ + EO EO∗ + EOER∗ + ER EO

∗
} Side Bandwidths
= F{ERER} + F{EO EO } + F{EO ER} + F{EREO∗ }
∗ ∗ ∗
∼ ∼ ∼ ∼ ∼
UR(0,0)2 UO ( fx, fy) ∗ UO ( fx, fy) UO ( fx – θ, fy) UO ( fx + θ, fy)
Constant Intensity at Center (∗ = Convolution)
U1
Physical Bandwidth of Object Signal Amplitude Digitized Bandwidth of Object Signal Amplitude
Physical Bandwidth of Object Signal Intensity (Shannon–Nyquist Theorem Applied)
Sensor Bandwidth Dead Zone Digitized Bandwidth of Object Signal Intensity
fmax Maximum Digitized Frequency (Shannon–Nyquist Theorem Applied)
FIGURE 2. The frequency scheme of (a) direct imaging and (b) a single hologram with an optical system.

p reserves only the magnitude of the ob- imaging scheme [U0 in Figure 2(a)], and located in a separate area using the
ject signal, holography also facilitates expanding the required sensor bandwidth holography technique, facilitating the
the full recovery of the phase informa- accordingly. Therefore, this article aimed reconstruction of the original object im-
tion from the object signal by program- to introduce a solution to increase the ra- age by extracting each of these signals
matically extracting the bandwidth from tio of the recorded amplitude bandwidth from the frequency area and combining
one of the sidelobes [6]. to the sensor bandwidth to overcome the them together, as shown in Figure 3(c).
According to the Nyquist theorem, inefficiencies otherwise associated with However, increasing the number of ho-
distortion can be avoided in digital inten- the Nyquist constraint. lograms does not necessarily decrease
sity recording systems by ensuring that the size of the dead zone in the sensor
the sampling rate used to digitize a signal Solution frequency domain.
is at least twice the maximum frequency Two primary techniques have been de- Indeed, the sensor bandwidth in-
range denoting the physical bandwidth of veloped to improve the available band- cluding the sensor bandwidth for the
the signal. Consequently, when an object width with respect to the total frequency two holograms in Figure 4(c) is larger
signal is captured directly, the frequency area of an image sensor: one based on than that for direct imaging, as shown in
range of the sensor bandwidth U0 in Fig- frequency multiplexing [7] and the other Figure 4(a) (i.e., U 2 2 U 0), with a con-
ure 2(a) should be at least four times the based on sampling scheme optimization siderable portion of the frequency do-
frequency range denoting the physical [8]. Frequency multiplexing is accom- main remaining unutilized. Figure 4(d)
bandwidth of the object signal amplitude plished by obtaining multiple holograms shows a geometrically optimized scheme
kp (i.e., at least twice the frequency range using one reference beam. With this ap- using a single hologram that exploits
denoting the physical bandwidth of the proach, the recorded intensity of N inde- the repetitive pattern of the Fourier do-
object signal intensity Kp). As a result, the pendent holograms at the image sensor main to improve the utilization of the
full sensor bandwidth required to record can be expressed as frequency; however, U 3 2 U 0 . Nota-
the signal intensity without distortion bly, none of the sampling schemes in
should be more than 16 (4 # 4) times the I = ; E R + E 1 + E 2 + g + E N ; 2 (3) Figure 4(b)–(d) can record amplitude
physical bandwidth of the amplitude, as N N N information beyond the Nyquist con-
indicated by the blue box in Figure 2(a); = E R E )R + / / E i E )j + / E R E i) straint, regardless of their utilization of
i=1 j=1 i=1
thus, the cost of direct intensity recording N the frequency domain.
is extremely high. + / E i E )R . A relationship between bandwidths
The situation is more adverse when i=1 must be established to optimize the
considering the Nyquist theorem for Furthermore, the following theoreti- bandwidth available for recording
a single hologram. As shown in Fig- cal constraint is set to ensure that the the amplitude of the signal beyond the
ure 2(b), the recorded intensity and its sum of the N hologram signals is equal Nyquist frequency. Here, the required
Fourier transform can be respectively to the total object signal: total bandwidth is defined as the sum of
expressed as follows: N
the bandwidths of sidelobes of the same
EO = / E i .(4) size, and each bandwidth in the 2D fre-
I= + +E R E )R
+ E O E )O E O E )R E R E )O i=1 quency domain is defined as the square
(1) Thus, when N = 4, as illustrated in of the length along a single direction of
F ^ I h = F " E R E )R , + F " E O E )O , Figure 3, frequency multiplexing of the frequency range. Thus, the maximum
+ F " E O E )R , + F " E R E )O , . holograms can be realized by magnify- frequency range along one axis of the
ing and separating the object images into total bandwidth k d can be written as
(2)
four distinct patches and recording them
The frequency information recorded simultaneously by overlapping the asso- k 2d = N ^k 2d, ih (5)
by the sensor includes the amplitudes of ciated signals. Compared to the normal
the twin sidelobes in the single-hologram imaging in Figure 3(a), as magnification where k d, i represents the maximum fre-
scheme shown in Figure 2(b) expressed as is inversely proportional to the recorded quency range along one axis of sidelobe
the last two terms on the right side of (2). In frequency range, the frequency range i and can be expressed by
addition, the direct intensity information required to capture the object decreases
of the object, which is represented as the according to the magnification of the k d, i = k d . (6)
N
central lobe in Figure 2(b), is expressed image, as shown in Figure 3(b). In addi-
as the second term on the right side of tion, overlapping the signals enables the The relationship between U0 and kd,i
(2). After applying the Nyquist theorem, image sensor, which uses the same size can be expressed by
the radii of the sidelobes double, caus- as for direct imaging, to capture the full
ing the required frequency range for the area of the object without decreasing the U 0 = ck d, i (7)
digitizing sensor in the single hologram field of view (FOV), i.e., the recorded
scheme, defined as U1 in Figure 2(b), image area, which is possible because where c denotes the optimal geometri-
to be at least twice that in the direct- the bandwidth of each signal has been cal factor of the 2D sampling scheme.

As kd can be considered the digitized yielding the optimal geometrical factor one axis of the sensor bandwidth can be
amplitude of the signal, denoted Kp, (7) c 1d = 2 ^ N + 1 h . expressed as U 0 = 1.92 ^2.71/ 2 h K p .
can be rewritten as Furthermore, as K p, 1d = Nk d, i, 1d , (9) As the resulting C is less than 2, this
can be rewritten as optimization achieves the sub-Nyquist
c
2 ^ N + 1h
U0 = K p / CK p . (8) condition. Similarly, Figure 4(f) and
N
U 0, 1d = K d, i, 1d (10) (g) illustrates the optimal sampling
N
Note that (8) relies upon the quan- schemes for three and four holograms
titative relationship between the maxi- indicating that the effective coherent with U 0 = 1.87K p and U 0 = 1.73K p ,
mum range along one axis of the sensor Nyquist factor for the 1D case, C 1d , respectively. In an extreme case with 16
bandwidth and the maximum amplitude is 2 ^ N + 1h /N. Therefore, for a large tightly packed holograms [Figure 4(i)],
range of the signal, expressed by c. number of holograms, the single-axis the values of c and C are 6 and 1.5,
Thus, the ratio of c to the square root of range of the sensor cannot be less than respectively. Indeed, all optimized
the number of holograms can be defined twice that of the single intensity because schemes in Figure 4(e)–(i) can be used
as the effective coherent Nyquist factor, only an infinite number of holograms to design a sub-Nyquist coherent imag-
C, to quantify the efficiency of the sen- ^ N " 3 h can achieve a C 1d value of 2. ing system.
sor bandwidth utilization. The higher dimensional scheme op-
The ideal case in a 1D scheme can timization does not exhibit such simple Discussion
be evaluated to provide a logical basis behavior. For example, in contrast to the To evaluate the efficiency of the pro-
for optimization in a higher dimensional nonoptimized scheme in Figure 4(c), posed method, the percentage of the sen-
scheme. As shown in Figure 5, assum- Figure 4(e) shows the optimal frequen- sor occupied by the required bandwidth
ing that the sidelobes for N holograms cy-sampling scheme for a two-hologram for digitizing the signal, relative to that
are tightly packed in the 1D Fourier technique, which can be easily pro- occupied by the bandwidth required for
domain, the relationship between the duced based on the fact that the radius the direct imaging, can be expressed as
maximum frequency range of the sen- of the central lobe should be twice that N
c / k d, i m
2
sor, U0,1d, and that of the digital band- of the sidelobes, corresponding to a c
width of each sidelobe, kd,i,1d, can be of 2.71. Accordingly, the relationship T = 100 # k 2d i=1
(11)
k 2d, 0
expressed as between the frequency range along the
axis of the digitized amplitude signal where k d, 0 is k d for the scheme of the
U 0, 1d = 2 ^ N + 1h k d, i, 1d (9) and its minimum frequency range on direct imaging.
Multiplexing Image Reconstruction
Object
Masking
Frequency Domain
Separating and
Decreasing FOV
Overlapping Signals
Moving to Center
FOV
Image Image Image

Sensor Sensor Sensor
Recorded Image Reconstruction

Image Inverse
Fourier Transform
(a) (b) (c)
FIGURE 3. Comparison of direct imaging and the method using frequency multiplexing. (a) Direct imaging, (b) direct imaging of the magnified object, and
(c) imaging the combined four patches of the object using frequency multiplexing with image reconstruction. FOV: field of view.

Table 1 compares the efficiencies theorem, the metric T should remain ure 4(c)–(i) are 12.5%, 15.26%, 27.23%,
of different sampling schemes when re- less than 25%. However, the values of T 28.76%, 44.44%, and 33.33%, respec-
cording frequency information using the for the multiplexed holography based on tively. Therefore, except for the first four
same sensor. According to the Nyquist the sampling schemes presented in Fig- cases listed in Table 1, the frequency
1
2
U0
(a) U2 > U0
(c)
U1 > U0
(b)
1 2 3 4
U4 < U0 U5 < U0 U4 < U0

U3 > U0 (e) (f) (g)
(d)
Physical Bandwidth of Intensity of SH
Digitized Bandwidth of Intensity of SH 12 16
Physical Bandwidth of Amplitude of SH
Digitized Bandwidth of Amplitude of SH

U7 < U0 U6 < U0
Digitized Bandwidth of Intensity of MH (h) (i)
Digitized Bandwidth of Amplitude of MH
FIGURE 4. 2D frequency schemes of (a) direct imaging, (b) single hologram without optimization, (c) two holograms without optimization, (d) single
hologram with optimization, and (e) two holograms with optimization. The sampling schemes are optimized for (f) three, (g) four, (h) 12, and (i) 16 holo-
grams. The physical bandwidth of the intensity and amplitude of the single hologram are shown in brown and gray, respectively, and its digitized intensity
and amplitude are shown in violet and black, respectively. The digitized intensity and amplitude of the multiplexed hologram are shown in light and dark
green, respectively. SH: single hologram; MH: multiple holograms.
N Sidelobes (Holograms)
U0,1d
kd,i,1d kd,i,1d
2kd,i,1d
FIGURE 5. The ideal case in the 1D scheme.

information was successfully recorded red squares show the optimal value of sideband cannot be moved independent-
beyond the Nyquist constraint. C manually obtained for the Fourier ly. Therefore, the light path is split us-
As mentioned previously, increas- domain schemes using the single and ing mirrors and beam splitters to create
ing the number of holograms can be in- multiple holograms in Figure 4(d)–(g). multiple reference beams with the same
ferred to reduce the value of C to less The results in Table 1 indicate that an number of holograms. Accordingly, the
than two. The ideal C for the sampling increase in N causes the value of C to modified intensity expression is given by
scheme can be induced from Figure 4(i), decrease toward a lower limit of 2 ,
I = ; E R1 + E S1 + E R2 + E S2
and it can be expressed as demonstrating that multiplexing the sig-
+ E R3 + E S3 + E R4 + E S4 ; 2 (13)
^ N + 2h
nal and optimizing the sampling scheme
C . 2c m . (12) significantly improved the quantity of 4 4 4 4
N
information recorded by the sensor.
= / / E R E )R + / / E S E )S
i j i j
i=1 j=1 i=1 j=1
Thus, for an infinite number of holo- As an example implementation, Fig- 4 4 4 4
grams, C = 2 , and the maximum val- ure 7 shows a sub-Nyquist coherent sys- + / / E R E )S + / / E S E )R
i j i j
i=1 j=1 i=1 j=1
ue of T is 50%. However, as this value tem using four holograms. In the first
corresponds to an infinite sensor area stage of the optical system, the object sig- where E S i and E R i represent the object
with an infinite number of holograms, nal is magnified and divided into four dis- and reference signals, respectively. Each
it cannot be achieved owing to the pres- tinct patches using an objective lens and reference and object patch signal can be
ence of an autocorrelation term and the masks. Although the FOV decreases in realized using two gratings and an ana-
resolution limits in the frequency do- inverse proportion to the magnification, log filter. As the grating replicates the
main. Therefore, the lower limit of the in this implementation, all patch signals incoming light at specified angles and
effective coherent Nyquist factor is 2 . are represented by holograms and final- different intensities along a single axis,
Figure 6 depicts C as a function of ly gathered at the image sensor. If only two gratings with appropriate specifica-
the number of captured holograms. The one reference beam is used as (3), the tions can set the required modulation
frequency in the 2D Fourier domain. In
the physical Fourier plane, the signals
Table 1. Comparison of different sampling schemes.
duplicated by the gratings are passed
No. 2D Frequency Scheme Number of Holograms C T Ref. through analog filters, leaving only two
1 Figure 4(a) Direct imaging 2 25% – signals. The four zeroth-order signals
2 Figure 4(b) 1 4 6.25% [6] are preserved as they represent the ob-
3 Figure 4(c) 2 2.83 12.5% [7]
4 Figure 4(d) 1 2.56 15.26% [8]
ject signal, while the first-order signals
5 Figure 4(e) 2 1.92 27.23% [9] remain as they represent the modulation
6 Figure 4(f) 3 1.87 28.76% – frequency. Thus, using this technique,
7 Figure 4(g) 4 1.73 33.41% – four different sets of object patches and
8 Figure 4(h) 12 1.5 44.44% – their corresponding reference signals
9 Figure 4(i) 16 1.73 33.33% –
can be provided, allowing the sidebands
10 Infinite hologram ∞ 1.41 50% –
to be located independently.
The first and second terms of (13)
represent the bandwidth of the central
circle, whereas the third and fourth terms
2.6 represent the twin sidelobe bandwidths.
Manual
Effective Coherent Nyquist Factor
The required information is contained

2.4 Eq. (12)
in the four sidebands, each capturing a
patch of the object image. However, ac-
2.2
cording to (13), as the four holograms
Nyquist Factor
2
share the same reference beam and are
therefore intermingled, they must be
1.8 separated. For example, as indicated by
= ^ E S 1 + E S 2 + E S 3 + E S 4h E )R 1
4
1.6 / E S E )R
i 1
(14)
i=1
1.4
0 5 10 15 20
the bandwidth of the first hologram
Number of Holograms
carried by the first reference signal
FIGURE 6. Effective coherent Nyquist factor ^ C h as a function of the number of captured holo- is colocated with the other three holo-
grams. The red squares (manual) show the optimal value of C for recording one, two, three, and grams because they are also carried by
four holograms. the first reference signal.

To solve this problem, each set of other, and neither do two orthogonally In summary, after dividing the magni-
objects and reference signals must be polarized beams. Exploiting the light fied object signal into several patches using
marked to avoid interference. There- properties, the two dichroic mirrors at masks, an optimal scheme can be realized
fore, for the optical system in Figure 7, the front of the optical system split one by freely moving the sidebands via the
the two light sources with different light path into two paths having differ- gratings and manipulating the interference
wavelengths, polarization beam split- ent wavelengths, respectively. Next, the by selecting or changing the state of light
ters, and dichroic mirrors are used. polarization beam splitter before the using multiple light sources and specially
The dichroic mirror is a specially masks splits each light path again into designed optical components. Likewise,
designed mirror that reflects light in the two orthogonally polarized beams. as the number of holograms increases, the
a specific band, and the polarization Therefore, each reference beam ends up additional components required to avoid
beam splitter is also a specially de- carrying only the corresponding patch interference among holograms, and the
signed beam splitter that splits light signal, and (14) is changed to space to install them, inevitably increase.
into two orthogonally polarized beams. 4
The reason for using those components / E S E )R
i 1 = E S 1 E )R 1, Conclusions
i=1
is that two light sources with different This article presented a trick for realiz-
(15)
a E S 2 E )R 1 = E S 3 E )R 1 = E S 4 E )R1 = 0.
wavelengths do not interfere with each ing sub-Nyquist coherent imaging based
Signal Intensity
in Frequency Domain
Multiplexing Overlapping Recording

Object Magnifying and Optimizing Signals
Scheme
Magnified Object
Magnification Multiplexing and Optimization

Dichroic Mirror
LASER
Polarization Beam Splitter

First Order
: Analog Filter
LASER Zeroth Order
Mirror
Object
Objective
ER3 Grating
Lens
ES3
Recording
Magnified
Object ER4
Image
ES4 Sensor
Mask
ER2
ES2
ER1
ES1
FIGURE 7. An example implementation of the sub-Nyquist coherent system using four holograms. It consists of three parts: magnification, multiplexing
with optimization, and recording. The optimized sampling scheme for four holograms can be implemented by physically splitting and combining the
magnified object signal.

on an optimized multiplexing hologram Communications Technology Planning and Department of Brain and Cognitive
scheme. The Nyquist theorem dictates Evaluation (IITP). Correspondence should Engineering, Korea University, Seoul
that the digital imaging bandwidth re- be sent to Behnam Tayebi and Jae-Ho Han 02841, South Korea. His current research
quires four times the frequency range (corresponding authors). Yeonwoo Jeong interests include novel optical imaging
denoting the physical bandwidth of the and Behnam Tayebi equally contributed for technologies and image processing for var-
object signal amplitude to avoid distor- this work. ious fields in biomedicine and neurosci-
tion as the intensity is proportional to ence research. He is a Member of IEEE.
the square of the amplitude when using Authors
coherent light sources. Thus, the full sen- Yeonwoo Jeong (forresearch4220@ References
[1] M. Mishali and Y. C. Eldar, “Sub-Nyquist sam-
sor bandwidth must be at least 16 (4 # 4) gmail.com) received his B.S. degree from pling,” IEEE Signal Process. Mag., vol. 28, no. 6, pp.
times the bandwidth of the original ob- the School of Electrical Engineering, 98–124, Nov. 2011, doi: 10.1109/MSP.2011.942308.
ject signal amplitude to record intensity, Korea University, Seoul, South Korea, in [2] R. G. Baraniuk, “Compressive sensing,” IEEE
Signal Process. Mag., vol. 24, no. 4, pp. 118–121, Jul.
wasting significant frequency space. To 2017. He is currently pursuing his Ph.D. 2007, doi: 10.1109/MSP.2007.4286571.
overcome this limitation, a sub-Nyquist degree with the Department of Brain and [3] H. E. A. Laue, “Demystifying compressive sens-
imaging scheme was proposed by exploit- Cognitive Engineering, Korea University, ing,” IEEE Signal Process. Mag., vol. 34, no. 4, pp.
171–176, Jul. 2017, doi: 10.1109/MSP.2017.2693649.
ing multiplexed holograms with optimi- Seoul 02841, South Korea. His research
[4] F. L. Pedrotti, L. M. Pedrotti, and L. S. Pedrotti,
zation. Several optimized schemes were interests include artificial intelligence and “Interference of light,” in Introduction to Optics, 3rd
presented based on two to 16 sub-Nyquist signal processing. ed. London, U.K.: Pearson Education, 2006.
[5] J. W. Goodman, “Holography,” in Introduction to
holograms, and a theoretical effective Behnam Tayebi (behnamty@gmail. Fourier Optics, 2nd ed. New York, NY, USA:
coherent Nyquist factor limit of 2 was com) received his Ph.D. degree in applied McGraw-Hill, 1996.
derived from the ideal cases. Finally, an physics and optics from Yonsei [6] B. Bhaduri et al., “Diffraction phase microscopy:
Principles and applications in materials and life sci-
example implementation of the proposed University, Seoul, South Korea, in 2015. ences,” Adv. Opt. Photon., vol. 6, no. 1, pp. 57–119,
sub-Nyquist coherent imaging technique He was with Korea University as a 2014, doi: 10.1364/AOP.6.000057.
was provided using four holograms with research professor, and he is currently [7] P. Girshovitz and N. T. Shaked, “Doubling the
field of view in off-axis low-coherence interferometric
two different coherent light sources. working as an optical engineering lead at imaging,” Light Sci. Appl., vol. 3, no. 3, Mar. 2014,
Inscopix, Mountain View, CA 94043 Art. no. e151, doi: 10.1038/lsa.2014.32.
Acknowledgment USA. His current research interests [8] K. Ishizuka, “Optimized sampling schemes for off-
axis holography,” Ultramicroscopy, vol. 52, no. 1, pp.
This work was supported by the Ministry of include nanoimaging, holography, and 1–5, Sep. 1993, doi: 10.1016/0304-3991(93)90017-R.
Science and ICT, Korea, under the Informa- signal processing. [9] B. Tayebi, F. Sharif, A. Karimi, and J.-H. Han,
“Stable extended imaging area sensing without
tion Technology Research Center support Jae-Ho Han (hanjaeho@korea.ac.kr) mechanical movement based on spatial frequency
program (IITP-2023-RS-2022-00156225) received his Ph.D. degree in electrical multiplexing,” IEEE Trans. Ind. Electron., vol. 65,
no. 10, pp. 8195–8203, Oct. 2018, doi: 10.1109/
and under the ICT Creative Consilience and computer engineering from Johns TIE.2018.2803721.
program (IITP-2023-2020-0-01819) super- Hopkins University, Baltimore, MD, SP
vised by the Institute for Information and USA. He is a full professor with the
DSP HISTORY (continued from page 16)

[13] D. Pantalony and R. B. Evans, “Seeing a voice: Com. [Online]. Available: https://griffonagedotcom. [Online]. Available: http://www.firstsounds.org/
Rudolph Koenig’s instruments for studying vowel wordpress.com/2017/04/23/daguerreotyping-the-voice- sounds/scott.php
sounds,” Amer. J. Psychol., vol. 117, no. 3, pp. 425– leon-scotts-phonautographic-aspirations/#_edn13 [24] R. N. Bracewell, The Fourier Transform and Its
442, 2004, doi: 10.2307/4149009. [19] É.-L. Scott de Martinville, “Brevet d’invention no. Applications. New York, NY, USA: McGraw-Hill, 1965.
[14] W. Koenig, H. K. Dunn, and L. Y. Lacy, “The 17897 (1857) and certificat d’addition no. 31470 [25] T. W. Körner, Fourier Analysis. Cambridge,
sound spectrograph,” J. Acoustical Soc. Amer., vol. (1859),” Institut National de la Propriété Industrielle, U.K.: Cambridge Univ. Press, 1988.
18, no. 1, pp. 19–49, 1946, doi: 10.1121/1.1916342. Paris, France, 2007. [Online]. Available: http://www.
firstsounds.org/publications/facsimiles/FirstSounds [26] M. Vetterli, J. Kovačević, and V. K. Goyal, Fourier
[15] E. Garber, “Reading mathematics, constructing phys- and Wavelet Signal Processing. Cambridge, U.K.:
ics: Fourier and his readers, 1822-1850,” in No Truth _Facsimile_02.pdf
Cambridge Univ. Press, 2013. [Online]. Available:
Except in the Details, A. J. Cox and D. M. Siegel, Eds. [20] R. Gribonval and E. Bacry, “Harmonic decom- https://fourierandwavelets.org/FWSP_a3.2_2013.pdf
Dordrecht, The Netherlands: Springer-Verlag, 1995, pp. position of audio signals with matching pursuit,”
31–54. IEEE Trans. Signal Process., vol. 51, no. 1, pp. [27] W. Thomson, “Harmonic analyzer,” Proc. Roy.
101–111, Jan. 2003, doi: 10.1109/TSP.2002.806592. Soc. London, vol. 27, pp. 371–373, May 1878. [Online].
[16] É.-L. Scott, Le Problème de la Parole s’Ecrivant Available: https://www.jstor.org/stable/113690
Elle-Même. Paris, France: Chez l’auteur, 1878. [21] É.-L. Scott, “Inscription automatique des sons de
[Online]. Available: https://www.firstsounds.org/ l’air au moyen d’une oreille artificielle,” in Proc. [28] A. A. Michelson and S. W. Stratton, “A new har-
publications/facsimiles/FirstSounds_Facsimile_08.pdf Comptes Rendus Hebdomadaires Séances Acad. Sci., monic analyzer,” Amer. J. Sci., vol. 5, no. 25, pp.
1861, vol. LIII, pp. 108–111. [Online]. Available: 1–13, 1898, doi: 10.2475/ajs.s4-5.25.1.
[17] É.-L. Scott de Martinville, “Principe de phonau-
tographie,” Académie des sciences de l’Institut de h t t p s : / / w w w. fi r s t s o u n d s . o r g / p u b l i c a t i o n s / [29] J. W. Cooley and J. W. Tukey, “An algorithm
France, Paris, France, Pli cacheté No. 1639, 1857. facsimiles/FirstSounds_Facsimile_06.pdf for the machine calculation of complex Fourier
[Online]. Available: https://www.academie-sciences. [22] First Sounds. [Online]. Available: https://www. series,” Math. Comput., vol. 19, no. 90, pp. 297–301,
fr/pdf/dossiers/pli_cachete/pli1639.pdf firstsounds.org/ 1965, doi: 10.2307/2003354.
[18] P. Feaster. Daguerrotyping the Voice: Léon Scott’s [23] “The phonautograms of Édouard-Léon Scott de
Phonautographic Aspirations. (2017). Griffonage-Dot- Martinville.” First Sounds. Accessed: Jul. 27, 2023. SP

SP EDUCATION
Sharon Gannot , Zheng-Hua Tan , Martin Haardt , Nancy F. Chen , Hoi-To Wai ,
Ivan Tashev , Walter Kellermann , and Justin Dauwels
Data Science Education: The Signal Processing Perspective
I
n the last decade, the signal process- Moreover, we think that now is the right of knowledge. Perhaps its most widely
ing (SP) community has witnessed a time to start defining our needs and in- known definition dates back to 1959
paradigm shift from model-based to spirations that will reflect the direction (paraphrased from Arthur Samuel [1]):
data-driven methods. Machine learn- the field of SP will take in years to come. “Learning algorithms to build a model
ing (ML)—more specifically, deep In this article, following a success- based on sample data, known as train-
learning—methodologies are nowadays ful panel at IEEE International Confer- ing data, in order to make predictions or
widely used in all SP fields, e.g., audio, ence on Acoustics, Speech, and Signal decisions without being explicitly pro-
speech, image, video, multimedia, and Processing (ICASSP 2022) held in Sin- grammed to do so.”
multimodal/multisensor processing, to gapore, we focus on these education ML is, thus, a method of data analy-
name a few. Many data-driven methods aspects and draft a manifesto for an sis that uses algorithms to enable com-
also incorporate domain knowledge to SP-oriented DS curriculum. We hope puter systems to identify patterns in
improve problem modeling, especially this article will encourage discussions data, learn from them, and make predic-
when computational burden, training among SP educators worldwide and pro- tions or decisions based on that learning.
data scarceness, and memory size are mote new teaching programs in the field. For the SP community, data come in
important constraints. the form of signals. While the definition
Data science (DS), as a research field, DS, ML, and SP: Interrelations of signals as the carriers of information
emerged from several scientific disci- DS is an interdisciplinary field that can remains unchanged, the variety of sig-
plines, namely, mathematics (mainly be taught from different perspectives. nal types is rapidly growing. Signals can
statistics and optimization), computer Indeed, DS-oriented material can be a be either 1D or multidimensional; can
science, electrical engineering (pri- segment of many existing teaching probe defined over a regular grid (time or
marily SP), industrial engineering, bio- grams in science, technology, engineer- pixels) or on an irregular graph; can be
medical engineering, and information ing, and mathematics. In this article, packed as vectors, matrices, or higher
technology. Each discipline offers an in- we aim at the more ambitious task of dimensional tensors; and can represent
dependent teaching program in its core defining a complete and comprehensive multimodal data.
domain with a segment dedicated to DS teaching program in DS that takes the As discussed, a significant compo-
studies. In recent years, numerous insti- unique SP perspective. nent of SP is dedicated to extracting,
tutes worldwide have started to provide To put things in context, SP is con- representing, and transforming (raw)
dedicated and comprehensive DS teach- cerned with extracting information data to information that accentuates cer-
ing programs with diverse applications. and knowledge from signals. Com- tain properties beneficial to downstream
mon SP tasks are the analysis, modi- tasks. While, traditionally, SP focuses on
Motivation and significance fication, enhancement, prediction, and processing raw data that have a physi-
We believe that there is a unique SP per- synthesis of signals [see also https:// cal grounding on planet Earth [e.g., au-
spective of DS that should be reflected signalprocessingsociety.org/volunteers/ dio, speech, radar, sonar, image, video,
in the education given to our students. constitution (Article II)]. In parallel to electrocardiogram, electroencephalo-
the evolution of the SP methodology, we gram, magnetoencephalography, and
are witnessing a fast-growing interest in econometric data], one may not need
Date of current version: 3 November 2023 the field of ML. ML is not a new field to be limited to this standard practice.
1053-5888/23©2023IEEE IEEE SIGNAL PROCESSING MAGAZINE | November 2023 | 89

A broader and more general definition of modalities and sensors, Wiener of teaching methodologies may include
of signals should include semantic data. and Kalman filters, and graph SP. online courses, labs with interactive
Semantic information ultimately stems ■■ ML: (variational) expectation maxi- programming exercises, flipped class-
from the cognitive space in the human mization, deep learning, reinforce- rooms, and hands-on experience that
mind, which originates from neuro- ment learning, end-to-end processing, may involve projects and teamwork. As
physiological activities in our brains. attention, transformers, graphical the DS discipline is vast and cannot usu-
Cognitive neuroscience is currently not neural networks, generative models, ally be fully covered by one institute, we
advanced enough to pinpoint how to map dimensionality reduction, kernel encourage educators to consider student
semantic information represented in a methods, subspace, and manifold exchange programs and joint programs
text to brain activation. Still, this limita- learning. between universities (especially with
tion does not prevent one from applying This dichotomy between the lists is rath- other countries) and to include intern-
the essence of SP approaches to under- er artificial. Recent trends have shown ships in the industry. Needless to say,
standing, representing, and modeling that these two paradigms are converg- science has no borders, and students will
text data or, more generally, semantic ing and are now strongly interrelated by greatly benefit from learning in different
information. (Text is, ultimately, just a routinely borrowing ideas and practices schools and listening to many points of
human-made representation for encod- from each other. view from world-leading experts in their
ing language and knowledge.) Moreover, We believe that modern teaching pro- respective disciplines.
multimodal signals are jointly analyzed grams should, therefore, emphasize the
and processed in some modern appli- SP and ML aspects of DS without sac- Learning outcomes
cations. Audiovisual SP is an excellent rificing other essential and fundamental The graduates of the program are expect-
example of two physical signals that are elements, namely, optimization, statis- ed to master the theory and practice (in-
jointly processed. Image captioning is a tics, linear algebra, multilinear (tensor) cluding programming skills) of modern
good example that involves both physical algebra, artificial intelligence (AI), algo- and classical SP and ML tools for han-
signals and semantic information and, rithms, data handling, transmission and dling various types of data, most notably,
hence, should be processed using meth- storage, and programming skills. data that originate from signals. They are
odologies adopted from both computer From the SP perspective, rigorous also expected to thoroughly understand
vision and natural language processing training in DS should bring students to the field’s underlying mathematical and
(NLP) disciplines. think more fundamentally about where statistical foundations as well as related
We, therefore, claim that the ICASSP the data at hand come from; what the fields, e.g., data handling, storage (da-
2020 motto, “From Sensors to Infor- data points and distributions represent; tabases and clouds), and transmission
mation, at the Heart of Data Science,” and how to model, sample, represent, (over the network), including reliability
can be further extended to all types of and visualize such information robustly and privacy preservation. With rigorous
data: physical, which is indeed captured so that it is insensitive to various sources training in SP and ML, graduates will
by sensors, as well as cognitive and se- and types of noise for different applica- be able to identify and apply the correct
mantic. The essence of the processing tions and tasks. tools for DS problems. Graduates should
tasks and the underlining methods re- Just as important as the technical specialize in several advanced topics in
main similar. skills, students must become aware of the general field of DS and become ac-
ethical issues related to DS, e.g., the pri- quainted with several domain-specific
The principles of DS education vacy of the data, biases in collecting the applications. Graduates will, thus, be
from SP and ML perspectives data, and the implications their future able to address complex DS problems
This section is dedicated to our view techniques and developments might im- considering ethical aspects and the sus-
of the essential principles of DS educa- pose on society and humanity. tainability of our global environment.
tion. Among other topics, we highlight
the importance of SP and ML in the DS Teaching methodologies DS undergraduate curriculum: A
discipline. All modern teaching programs—and, proposal from the SP perspective
perhaps, specifically DS teaching pro- In this section, we draft a proposed cur-
SP and ML methods grams—should give special attention to riculum for DS studies from the SP per-
Traditionally, we may think of two com- teaching methodologies that can be more spective. We are, of course, aware of the
plementary lists of DS methods stem- relevant and attractive to the younger different education systems around the
ming from the SP and ML disciplines. generation of students. While we cer- world. Nevertheless, we hope that such
A noncomprehensive list can include tainly do not claim that “traditional” a list can serve as a source of inspira-
the following: teaching methods—namely, a teacher tion to educators and policy leaders in
■■ SP: convolution, time-frequency lecturing in front of a class—should be academic institutes.
analysis (Fourier transform and abandoned, we encourage educators to In the following, we propose a four-
wavelets), linear systems, state- incorporate diverse teaching techniques year program (in Europe, it is common
space representations, the fusion in their curricula. A nonexhaustive list to have three years of undergraduate

studies plus two years of graduate stud- the time-frequency domain (Fourier privacy-preserving computing and
ies) comprising three layers: and wavelet transform, filter banks), communications, safe computing,
■■ mandatory: a strong background in ML and pattern recognition, statisti- and anomaly detection
math and statistics, hands-on pro- cal algorithms in SP, statistical and ■■ ML and DL hardware and software
gramming skills, basic data handling model-based algorithms in ML, tools: digital signal processors; field
and AI, SP and ML, and ethics adaptive SP, generative models, programmable gate arrays; CPUs;
■■ elective tracks: data sharing and supervised and unsupervised learn- GPUs; neuromorphic processing
communication over networks; ing, deep learning, time series and systems; parallel computing archi-
advanced algorithms and optimiza- sequences analysis and processing, tectures; parallel computing plat-
tion; security, reliability, and privacy graphical models, and ML operations form and application programming
preservation; and ML and DL hard- ■■ ethics: ethical and legal aspects of interface (CUDA); and Python, C,
ware and software tools DS, explainability, General Data and C++ computer languages.
■■ DS applications: in diverse domains. Protection Regulation, bias, privacy,
We next discuss each layer in de- and approval processes. Domain-specific DS applications
tail and give a list of relevant courses. This track offers a noncomprehensive
Naturally, each institute will pave its Elective specialization tracks (with list of courses in knowledge domains
own way toward the most suitable cur- lists of proposed courses) that extensively apply DS tools. Students
riculum. Students should elect courses from two are encouraged to learn several courses
or three specialization tracks to advance from this list to become acquainted
Mandatory areas (with lists of their knowledge in the field. Specializa- with real-life applications: economet-
proposed courses) tion tracks may include advanced SP, ML, rics, business intelligence, smart cities,
We believe that each student should be and optimization algorithms (we split the blockchain and cryptocurrency, electro-
extensively exposed to the field’s theo- SP and ML courses into two lists: basic optics, materials, bioinformatics, AI in
retical foundations and develop basic materials and an elective specialization health care and medical data mining,
hands-on and programming skills: track); dedicated DS-related hardware biomedical SP, DS in brain imaging, au-
■■ mathematics: calculus, linear alge- and software tools; and data sharing and dio/speech analysis and processing, mu-
bra, combinatorics, set theory and storing methodologies considering secu- sic SP and music information retrieval,
logic, harmonic analysis, differential rity and privacy preservation: NLP, image processing and computer
equations (regular and partial), ■■ data sharing and communication vision, computer graphics, wireless
numerical analysis, numerical alge- over networks: detection theory, communications, and autonomous ve-
bra, multilinear algebra, algebraic communication, wireless communi- hicles. [Students are required to choose
structures, optimization, and com- cation, ML for communications, only a small number of courses (e.g.,
plex functions computer networks, mathematical three or four) from the list to become
■■ statistics: probability theory, statis- analysis of networks, social net- acquainted with several domains that
tics, random processes, information works, cloud data handling, and fed- apply DS methodologies.]
theory, parameter estimation, and erated learning
statistical theory ■■ advanced algorithms and optimiza- Summary and further reading
■■ computer skills and algorithms: pro- tion: online algorithms, advanced In this article, we proposed a DS cur-
gramming basics, data structures algorithms, streaming algorithms, riculum focusing on SP and ML. We
and algorithms, Python (including big data, quantum learning, graph believe such a program can be relevant
libraries and packages—PyTorch, theory, advanced databases, game to many educators and researchers in
NumPy, SciPy, and more), object- theory, deterministic and stochastic the IEEE Signal Processing Society.
oriented programming, computer methods in operations research, This article follows a panel held at
architecture, computability, and analysis and mining of processes, ICASSP’22 [2].
cloud computing distributed computation, and cloud There have been several attempts
■■ hands-on: labs and tools as well as computing to define the DS discipline and the re-
annual projects with real data ■■ advances in SP and ML: array SP, quired curriculum for a major in DS.
■■ data handling and AI: introduction to blind source separation and indepen- Interested readers may refer to recent
DS (including the data processing dent component analysis, data reports by the U.S. National Academies
cycle), meetings with industry (R&D fusion (multiple sensors/modalities), of Sciences, Engineering, and Medicine
in DS, ethics, practical and real-world reinforcement learning, distributed [3]; Park City Math Institute [4]; and
problems, and needs), data analysis processing over networks (federated Israeli Academy of Sciences and Hu-
and visualization, data mining, data learning), graph SP, and graph neu- manities [5]. An overview of the history
representations, and introduction to AI ral networks of DS, its prospective future, and some
■■ SP and ML: representations and ■■ Security, reliability, and privacy guidelines for educating in the discipline
types of signals and systems, SP in preservation: coding, cryptography, can be found in [6]. All these references

address the DS discipline in general. In about the new undergraduate program Ingenieur (Ph.D.) degree from Munich
our article, we attempt to focus on the proposed by Bar-Ilan University, Israel. University of Technology in 1996. He
SP and ML perspectives. Readers are has been a full professor in the
also referred to an interesting discussion Authors Department of Electrical Engineering
between Prof. Alfred Hero and Prof. Sharon Gannot (sharon.gannot@biu. and Information Technology and head
Anders Lindquist about the impact of ac.il) received his Ph.D. degree in elec- of the Communications Research
ML on SP and control systems, which trical engineering from Tel-Aviv Laboratory at Ilmenau University of
can be found online [7]. Universty, Israel, in 2000. He is a pro- Technology, 98684 Ilmenau, Germany,
Several institutes worldwide already fessor with the Faculty of Engineering, since 2001. He received the 2009 Best
offer study programs in DS with an SP Bar-Ilan University, Ramat-Gan Paper Award from the IEEE Signal
flavor. A nonexhaustive list of study 5290002, Israel, where he heads the Processing Society; the Vodafone (for-
programs follows. The electrical and data science program; he also serves as merly Mannesmann Mobilfunk)
computer engineering faculty at the the faculty vice dean and served as the Innovations Award for outstanding
University of Michigan offers an ML deputy director of the Data Science research in mobile communications; the
curriculum [8]. A new undergraduate Institute. He served as the chair of the ITG Best Paper Award from the
program proposed by Bar-Ilan Univer- Audio and Acoustic Signal Processing Association of Electrical Engineering,
sity, Israel, follows the guidelines pro- Technical Committee. He will be the Electronics, and Information Tech
posed in this article. This program will general cochair of Interspeech, to be nology; and the Rohde & Schwarz
be opened in the 2023–2024 academic held in Jerusalem in 2024; currently, Outstanding Dissertation Award. He has
year. (The full program in Hebrew can he is the chair of the IEEE Signal served as a senior editor for IEEE
be found in [9].) A recent presentation Processing Society (SPS) Data Science Journal of Selected Topics in Signal
on AI curriculum [10] is exploring sev- Initiative and a member of the SPS Processing since 2019. His research
eral AI and DS teaching programs at Education Center Editorial Board, and interests include wireless communica-
both the undergraduate and graduate EURASIP Signal Processing for tions, array signal processing, high-res-
levels, including at the Technical Uni- Multisensor Systems TAC. He also olution parameter estimation, and
versity of Denmark [11], Carnegie Mel- serves as associate editor and senior tensor-based signal processing. He is a
lon [12], and the Massachusetts Institute area chair for several journals. He is a Fellow of IEEE.
of Technology [13]. Friedrich-Alexan- Fellow of IEEE. Nancy F. Chen (nfychen@i2r.a-star.
der University Erlangen-Nuremberg Zheng-Hua Tan (zt@es.aau.dk) edu.sg) received her Ph.D. degree in
offers an elite M.Sc. degree program received his Ph.D. degree in electronic biomedical engineering from the
in advanced SP and communications engineering from Shanghai Jiao Tong Massachusetts Institute of Technology
[14]. The corresponding M.Sc. degree University. He is a professor in the and Harvard in 2011. She is a Fellow,
program in communications and SP Department of Electronic Systems, the Senior Principal Scientist, Principal
has been taught at Ilmenau University Machine Learning Research Group Investigator and Group Leader at the
of Technology since 2009 [15], and a leader, and a cohead of the Centre for Institute for Infocomm Research and
similar M.Sc. degree program in sig- Acoustic Signal Processing Research at Centre for Frontier AI Research,
nals and systems at Delft University of Aalborg University, 9220 Aalborg, Agency for Science, Technology, and
Technology [16]. While not attempting Denmark, as well as a colead of the Research (A*STAR), Singapore,
to be exhaustive, this list demonstrates Pioneer Centre for AI, Denmark. He is 138632. She leads research efforts in
the broad interest of leading academic an associate editor for the IEEE Journal generative artificial intelligence with a
institutes in developing study programs of Selected Topics in Signal Processing focus on speech language technology
in SP-oriented DS for both the under- inaugural special series on “Artificial with applications in education, health-
graduate and graduate levels. Intelligence in Signal and Data care, and defense. Speech evaluation
The authors of this article hope that S c i e n c e — Tow a r d E x p l a i n a b l e , technology from her team has been
the ideas and guidelines presented here Reliable, and Sustainable Machine deployed at the Ministry of Education
can inspire DS and SP educators to de- Learning.” He is a TPC vice chair for in Singapore to support home-based
velop new teaching programs in this ICASSP 2024 and was the general chair learning and led to commercial spin-
fascinating field. for IEEE MLSP 2018 and a TPC offs. She has received numerous
cochair for IEEE SLT 2016. His work awards, including being named among
Acknowledgment has been recognized by the prestigious the Singapore 100 Women in Tech in
The authors are grateful to Prof. Mor IEEE Signal Processing Society 2022 2021, the Young Scientist Best Paper
Peleg from the University of Haifa, Best Paper Award. His research inter- Award at MICCAI 2021, the Best Paper
Israel, for fruitful discussions and for ests include deep representation learn- Award at SIGDIAL 2021, and the
drawing our attention to some of the ref- ing. He is a Senior Member of IEEE. 2020 P&G Connect + Develop Open
erences listed in the article as well as to Martin Haardt (martin.haardt@ Innovation Award. She is currently the
Dr. Ran Gelles for fruitful discussions tu-ilmenau.de) received his Doktor- program chair of the International

Conference on Learning Representations Erlangen-Nuremberg, 91058 Erlangen, References
[1] A. L. Samuel, “Some studies in machine learning
(ICLR), IEEE Distinguished Lecturer, a Germany. His service to the IEEE using the game of checkers,” IBM J. Res. Develop.,
Board Member of the International Signal Processing Society includes vol. 3, no. 3, pp. 210–229, Jul. 1959, doi: 10.1147/
rd.33.0210.
Speech Communication Association Distinguished Lecturer (2007–2008),
[2] S. Gannot, Z.-H. Tan, M. Haardt, N. F. Chen,
(ISCA), and a Senior Area Editor of Chair of the Technical Committee for H.-T. Wai, and I. Teshev, “Data science education:
IEEE/ACM Transactions on Audio, Audio and Acoustic Signal Processing The signal processing perspective,” Panel at IEEE
Int. Conf. Acoust., Speech Signal Process. (ICASSP)
Speech, and Language Processing. She is (2008–2010), Member of the IEEE May 2022. [Online]. Available: https://rc.signal
a Senior Member of IEEE. James L. Flanagan Award Committee processingsociety.org/conferences/icassp-2022/
SPSICASSP22VID1984.html?source=IBP
Hoi-To Wai (htwai@cuhk.edu.hk) (2011–2014), Member at Large SPS
[3] Envisioning the Data Science Discipline: The
received his Ph.D. degree from Arizona Board of Governors (2013–2015), Vice Undergraduate Perspective. Washington, DC, USA:
State University (ASU) in electrical President Technical Directions (2016- National Academy Press, 2018.
engineering in 2017. He is an assistant 2018), Member SPS Nominations [4] R. D. De Veaux et al., “Curriculum guidelines for
undergraduate programs in data science,” Annu. Rev.
professor with the Department of Appointments Committee (2019– Statist. Appl., vol. 4, no. 1, pp. 15–30, Mar. 2017, doi:
Systems Engineering and Engineering 2022), and Member of the SPS Fellow 10.1146/annurev-statistics-060116-053930.
[5] N. Ahituv, J. Ben-Dov, Y. Benjamini, Y. Bronner,
Management at the Chinese University Evaluation Committee (2023–). He has Y. Dudai, D. Raban, and R. Sharan, Teaching Data
of Hong Kong, Hong Kong, China. His served as the general chair of eight Science in Universities in All Disciplines. Jerusalem,
Israel: Israel Academy of Sciences and Humanities,
research interests include signal pro- mostly IEEE-sponsored workshops 2020. [Online]. Available: https://www.academy.ac.il/
cessing, machine learning, and distrib- and conferences. He is a corecipient of SystemFiles2015/2-1-21-English.pdf
uted optimization, with a focus on their 10 best paper awards, was awarded [6] D. Donoho, “50 years of data science,” J. Comput.
Graphical Statist., vol. 26, no. 4, pp. 745–766, Aug.
applications to network science. His the Julius von Haast Fellowship by 2017, doi: 10.1080/10618600.2017.1384734.
dissertation received the 2017 Dean’s the Royal Society of New Zealand [7] C. June, “Machine learning and systems: A con-
Dissertation Award from the Ira A. in 2012, and received the Group versation with 2020 Field Award winners Alfred Hero
and Anders Lindquist,” Elect. Comput. Eng., Univ. of
Fulton Schools of Engineering at ASU, Technical Achievement Award of the Michigan, Ann Arbor, MI, USA, Oct. 2019. [Online].
and he was a recipient of a Best Student European Association for Signal Available: https://ece.engin.umich.edu/stories/
machine-learning-and-systems-a-conversation-with
Paper Award at ICASSP 2018. He is a Processing (EURASIP) in 2015. His -2020-field-award-winners-al-hero-and-anders
Member of IEEE. research interests include speech sig- -lindquist
Ivan Tashev (ivantash@microsoft. nal processing, array signal process- [8] C. June, “Teaching machine learning in ECE,”
Elect. Comput. Eng., Univ. of Michigan, Ann Arbor,
com) received his Ph.D. degree in coming, and machine learning, especially MI, USA, Mar. 2022. [Online]. Available: https://ece.
puter science from the Technical for acoustic signal processing. He is a engin.umich.edu/stories/teaching-machine-learning
-in-ece
University of Sofia, Bulgaria, in 1990. fellow of EURASIP and a Life Fellow [9] “Data engineering - Bachelor’s degree,” Faculty
He is a partner software architect of IEEE. Eng., Bar-Ilan Univ., Ramat Gan, Israel, 2023.
[Online]. Available: https://engineering.biu.ac.il/
and leads the Audio and Acoustics Justin Dauwels (j.h.g.dauwels@ datascience
Research Group in Microsoft Research, tudelft.nl) received his Ph.D. degree in [10] Z.-H. Tan, “On artificial intelligence curriculum
Redmond, WA 98052 USA; is an affili- electrical engineering from the Swiss and problem-based learning [Slides],” Aalborg Univ.,
Aalborg, Denmark, 2021. [Online]. Available: https://
ate professor at the University of Polytechnical Institute of Technology people.es.aau.dk/~zt/online/AI-curriculum-Tan.pdf
Washington in Seattle; and is an honor- in Zurich in 2005. He is an associ- [11] [Online]. Available: https://www.dtu.dk/english/
ary professor at the Technical University ate professor of signal processing sys- education/undergraduate/undergraduate-programmes
-in-danish/bsc-eng-programmes/artificial-intelligence
of Sofia, Bulgaria. He also coordinates tems, Department of Microelectronics, -and-data
the Brain–Computer Interfaces project Delft University of Technology, 2628 [12] “B.S. in artificial intelligence,” School Comput.
in Microsoft Research. He has pub- CD Delft, The Netherlands. He is an Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA,
2023. [Online]. Available: https://www.cs.cmu.edu/
lished two books, two book chapters, associate editor of IEEE Transactions bs-in-artificial-intelligence/
and more than 100 scientific papers, on Signal Processing, an associate [13] “Interdisciplinary programs,” Massachusetts
Inst. Technol., Cambridge, MA, 2022–2023 USA.
and he is listed as an inventor for 50 editor of the Elsevier journal Signal [Online]. Available: http://catalog.mit.edu/inter
U.S. patents. His research interests Processing, a member of the edito- disciplinary/undergraduate-programs/degrees/
include audio signal processing, rial advisory board of International [14] “Elite master’s study programme: Advanced sig-
nal processing and communications engineering,”
machine learning, multichannel trans- Journal of Neural Systems, and an Inst. Digit. Commun., Erlangen, Germany, 2023.
ducers, and biosignal processing. He is organizer of IEEE conferences and [Online]. Available: https://www.asc.studium.
fau.de
a member of the Audio Engineering special sessions. His research team
[15] “Master of science in communications and sig-
Society and the Acoustical Society of has won several best paper awards at nal processing,” TU Ilmenau, Ilmenau, Germany,
America and a Fellow of IEEE. international conferences and from 2023. [Online]. Available: https://www.tu-ilmenau.
de/mscsp
Walter Kellermann (walter.keller- journals. His research interests
[16] “Track: Signals & systems,” TU Delft, Delft, The
mann@fau.de) received his Dr.-Ing. include data analytics with applica- Netherlands, 2023. [Online]. Available: https://www.
degree in electrical engineering from tions to intelligent transportation tudelft.nl/en/education/programmes/masters/electrical
-engineering/msc-electrical-engineering/track-signals
Technical University Darmstadt, systems, autonomous systems, and -systems
Germany, in 1988. He is a professor of analysis of human behavior and
communications at the University of physiology. SP

SP COMPETITIONS
Davide Cozzolino , Koki Nagano, Lucas Thomaz ,
Angshul Majumdar , and Luisa Verdoliva
Synthetic Image Detection

Highlights from the IEEE Video and Image Processing Cup 2022 Student Competition
T
he Video and Image Processing scenario. With the evolution of technol- architectures, such as diffusion-based
(VIP) Cup is a student competition ogy, new architectures and different models. With the first dichotomy, we
that takes place each year at the ways of generating synthetic data are ask that the detectors be robust to the
IEEE International Conference on Image continuously proposed [4], [5], [6], [7], occurrence of images that are only par-
Processing (ICIP). The 2022 IEEE VIP [8]. Therefore, detectors trained on tially synthetic, thus with limited data
Cup asked undergraduate students to some specific sources will end up work- on which to base the decision. As for
develop a system capable of distinguish- ing on target data of a very different architectures, there is already a signifi-
ing pristine images from generated ones. nature, often with disappointing results. cant body of knowledge on the detec-
The interest in this topic stems from the In these conditions, the ability of gener- tion of GAN-generated images [9], but
incredible advances in the artificial intel- alizing to new data becomes crucial to new text-based diffusion models are
ligence (AI)-based generation of visual keep providing a reliable service. More- now gaining the spotlight, and general-
data, with tools that allow the synthesis over, detectors are often required to ization becomes the central issue. With
of highly realistic images and videos. work on data that have been seriously the 2022 IEEE VIP Cup, we challenged
While this opens up a large number of impaired in several ways. For example, teams to design solutions that are able
new opportunities, it also undermines the when images are uploaded on social to work in the wild as only a fraction of
trustworthiness of media content and networks, they are normally resized and the generators used in the test data are
fosters the spread of disinformation on compressed to meet internal constraints. known in advance.
the Internet. Recently, there has been These operations tend to destroy impor- In this article, we present an over-
strong concern about the generation of tant forensic traces, calling for detectors view of this challenge, including the
extremely realistic images by means of that are robust to such events and competition setup, the teams, and their
editing software that includes the recent degrade performance gracefully. To technical approaches. Note that all of
technology on diffusion models [1], [2]. summarize, to operate successfully in the teams were composed of a profes-
In this context, there is a need to develop the wild, a detector should be robust to sor, at most one graduate student
robust and automatic tools for synthetic image impairments and, at the same (tutor), and undergraduate students
image detection. time, able to generalize well on images (from a minimum of three to a maxi-
In the literature, there has been an coming from diverse and new models. mum of 10 students).
intense research effort to develop effec- In the scientific community, there is
tive forensic image detectors, and many still insufficient (although growing) Tasks, resources, and evaluation
of them, if properly trained, appear to awareness of the centrality of these criteria
provide excellent results [3]. Such aspects in the development of reliable
results, however, usually refer to ideal detectors. Therefore, we took the Tasks
conditions and rarely stand the chal- opportunity of this VIP Cup to push The challenge consisted of two phases:
lenge of real-world application. First of further along this direction. In design- an open competition (split into two
all, testing a detector on images gener- ing the challenge, we decided to con- parts), in which any eligible team could
ated by the very same models seen in sider an up-to-date, realistic setting participate, and an invitation-only final.
the training phase leads to overly opti- with test data including 1) both fully Phase 1 of the open competition was
mistic results. In fact, this is not a realistic synthetic and partially manipulated designed to provide teams with a simpli-
images and 2) images generated by fied version of the problem at hand to
both established generative adversarial familiarize themselves with the task,
Date of current version: 3 November 2023 network (GAN) models and newer while phase 2 was designed to tackle a

more challenging task: synthetic data images were randomly resized and Evaluation criteria
generated using architectures not pres- compressed as happens when they are The submitted algorithms were scored
ent in the training. The synthetic images uploaded to a social network. In addi- by means of balanced accuracy for the
included in phase 1 were generated tion, they all had the same dimensions detection task (score = 0.7 × accuracy
using five known techniques, while the to avoid leaking information on the phase 1 + 0.3 × accuracy phase 2). The
generated models used in phase 2 were used generators (some models only three highest-scoring teams from the
unknown. During the final competition, generate data at certain specific resolu- open competition stage were selected as
the three highest-scoring teams from the tions). Some examples of generated finalists. These teams had the opportu-
open competition were selected and images used during the competition are nity to make an additional submission
were allowed to provide another sub- shown in Figure 1. on 8 October on a new dataset and were
mission graded on a new test set. Infor- Teams were provided with Python invited to compete in the final stage of
mation about the challenge is also scripts to apply these same operations the challenge at ICIP 2022 on 16 Octo-
available at https://grip-unina.github.io/ to the training dataset. For phase 2, ber 2022 in Bordeaux. Due to some
vipcup2022/. there were no available datasets since travel limitations, on that occasion, they
the generated models in this case were could make a live or prerecorded pre-
Resources unknown to the teams. However, partic- sentation, followed by a round of ques-
Participants were provided with a ipants were free to use any external tions from a technical committee. The
labeled training dataset of real and syn- data, besides the competition data. In event was hybrid to ensure a wide par-
thetic images. In particular, the dataset addition, participants were allowed to ticipation and allow teams who had visa
available for phase 1 comprised real use any available state-of-the-art meth- issues to attend virtually. In the final
images from four datasets (FFHQ [4], ods and algorithms to solve the prob- phase of the challenge, the judging
Imagenet [17], COCO [18], and LSUN lems of the challenge. committee considered the following
[19]), while synthetic images were gen- Teams were requested to provide the parameters for the final evaluation
erated using five known techniques: executable code to the organizers to test (maximum score was 12 points):
StyleGAN2 [11], StyleGAN3 [12], the algorithms on the evaluation datas- ■■ the innovation of the technical solu-
GLIDE [5], Taming Transformers [10], ets. The Python code was executed tion (one to three points)
and inpainted images with Gated Con- inside a Docker container with a GPU ■■ the performance achieved in phase
volution [13]. All the images of the test of 16 GB with a time limit of one hour 1 of the competition, where only
data were randomly cropped and to process a total of 5,000 images. The known models were used to gener-
resized to 200 × 200 pixels and then teams were allowed to submit their ate synthetic data (one to three
compressed using JPEG at different code and evaluate their performance points)
quality levels. This pipeline was used to five times during the period from 8 ■■ the performance achieved in phase 2
simulate a realistic scenario where August to 5 September 2022. of the competition, where unknown
FIGURE 1. Examples of synthetic images from the datasets used in the open competition. The first row shows samples from GLIDE [5], Taming Transform-
ers [10], StyleGAN2 [11], StyleGAN3 [12], and inpainting with Gated Convolution [13]. The second row shows samples from BigGAN [14], DALL-e mini
[6], Ablated Diffusion Model [15], Latent Diffusion [7], and LaMa [16]. The images in the fifth column are only locally manipulated (the regions outlined in
red are synthetic).

models were used to generate syn- difficulty to detect synthetic images manipulations. The same holds for test
thetic data (one to three points) coming from unknown models. Then, set 2 (unknown models) with the addi-
■■ the quality and clarity of the final we noted that even for the simpler sce- tional problem of images fully generat-
report, a four-page full conference nario, only four teams were able to ed using diffusion models, where
paper in the IEEE format (one to achieve an accuracy above 70%, which performances are on average lower than
three points) highlights that designing a detector that those obtained on images created by
■■ the quality and clarity of the final can operate well on both fully and GANs. We also provide results in terms
presentation (either prerecorded or locally manipulated images is not an of area under the receiver operating
live), a 15-min talk (one to three easy task. characteristic curve in Figure 5. In this
points). In Figure 3, we present some addi- situation, we can note that the first and
tional analyses of all of the submitted second places reverse on test set 2,
2022 VIP Cup statistics algorithms. Figure 3(a) aims at under- which underlines the importance to
and results standing how much computational properly set the right threshold for the
The VIP Cup was run as an online class complexity (measured by the execution final decision. A proper choice of the
through the Piazza platform, which time to process 10,000 images) impacts validation set is indeed very important
allowed easy interaction with the teams. the final score. Interestingly, there is to carry out a good calibration.
In total, we received 82 registrations for only a weak correlation between com-
the challenge, 26 teams accessed the putation effort and performance, with Highlights of the technical
Secure CMS platform, and 13 teams methods that achieve the same very approaches
made at least one valid submission. high score (around 90%) with very dif- In this section, we present an overview
Teams were from 10 different countries ferent execution times. Figure 3(b), of the approaches proposed by all of the
across the world: Bangladesh, China, instead, shows the results of each meth- participating teams for the challenge.
Germany, Greece, India, Italy, Poland, od on test set 1 and test set 2. In this All proposed methods relied on learn-
Sri Lanka, United States of America, case, a strong correlation is observed: if ing-based approaches and train deep
and Vietnam. an algorithm performs well/poorly on neural networks on a large dataset
Figure 2 presents the accuracy test set 1, the same happens on test set of real and synthetic images. Many
results obtained by the 13 teams partici- 2, even if the datasets do not overlap diverse architectures were consid-
pating in the two phases of the open and are completely separated in terms ered: GoogLeNet, ResNet, Inception,
competition. First, we can observe that of generating models. Xception, DenseNet, EfficientNet,
the performance on test set 1 including Finally, in Figure 4, we study in MobileNet, ResNeXt, ConvNeXt, and
images from known generators was some more detail the performance of the more recent Vision Transformers.
much higher than those obtained in an the three best performing techniques, The problem was often treated as a
open set scenario, where generators are reporting the balanced accuracy for binary classification task (real versus
unknown. More specifically, there were each method on each dataset. For test fake), but some teams approached it as
accuracy drops of around 10% for set 1 (known models), the most diffi- a multiclass classification problem with
the best techniques, confirming the cult cases are those involving local the aim to increase the degrees of
100 100
90 90
Accuracy on Test Set 1 (%)
Accuracy on Test Set 2 (%)
80 80
70 70
60 60
50 50
40 40
27,753
27,707
27,759
27,754
27,730
27,751
27,700
27,740
27,701
27,770
27,749
27,768
27,686
27,759
27,751
27,753
27,707
27,700
27,754
27,701
27,740
27,730
27,770
27,768
27,749
27,686
FIGURE 2. The anonymized results in terms of accuracy of the 13 teams on the two open competition datasets.

freedom for the predicting model and
also to include an extra class for 100 100
unknown models.
Accuracy (%) on Test Set 2

90 90
To properly capture the forensic trac-
es that distinguish pristine images from 80 80
generated ones, the networks considered
Score (%)
multiple inputs, not just the RGB image. 70 70
Indeed, it is well known that generators
60 60
fail to accurately reproduce the natural
correlation among color bands [20] and 50 50
also that the upsampling operation rou-
tinely performed in most generative 40 40
0 10 20 30 40 50 60 70 40 50 60 70 80 90 100
models gives rise to distinctive spectral Time (min) Accuracy (%) on Test Set 1
peaks in the Fourier domain [21]. There- (a) (b)
fore, some solutions considered as input
the image represented in different color FIGURE 3. The results of all of the submitted algorithms: (a) score versus time and (b) accuracy on
spaces, i.e., HSV and YCbCr, or com- test set 1 versus accuracy on test set 2.
puted the co-occurrence matrices on the
color channels. Moreover, to exploit fre- The majority of the teams trained resizing, and rotation, most teams
quency-based features, two-stream net- their networks on the data made avail- used augmentation based on Gaussian
works have been adopted, using features able for the challenge; however, some blurring and JPEG compression,
extracted from the Fourier analysis in of them increased this dataset by gen- found to be especially helpful in the
the second stream. A two-branch net- erating additional synthetic images literature [24], but also changes of sat-
work was also used to work both on using new generative models, such as uration, contrast, and brightness, as
local and global features, which were other architectures based on GANs well as CutMix and random cutout.
fused by means of an attention module and new ones based on diffusion
as done in [22]. In general, attention models. Of course, including more Finalists
mechanisms have been included in sev- generators during training helped to The final phase of the 2022 IEEE VIP
eral solutions. Likewise, the ensembling improve the performance, even if Cup took place at ICIP in Bordeaux, on
of multiple networks was largely used to some approaches were able to obtain 16 October 2022. Figure 6 shows the
increase diversity and boost perfor- good generalization ability even add- members of the winning team while
mance. Different aggregation strategies ing a few more models. In addition, receiving the award. In the following,
have been pursued with the aim to gen- augmentation was always carried out we describe the three finalist teams list-
eralize to unseen models and favor deci- to increase diversity and improve gen- ed according to their final ranking:
sions toward the real image class, as eralization. Beyond standard opera- FAU Erlangen-Nürnberg (first place),
proposed in [23]. tions, like image flipping, cropping, Megatron (second place), and Sherlock
Test Set 1 Test Set 2

100 100
90 90
Balanced Accuracy (%)
Balanced Accuracy (%)
80 80 27,686
27,749
70 70 27,768
60 60
50 50
Taming Tran.
Latent Diff.
GLIDE
StyleGAN2
StyleGAN3
GC Inpainting
AVG
BigGAN
DALL-e mini
ADM
LaMa
AVG
FIGURE 4. The balanced accuracy of the three best performing methods on images from test set 1 and test set 2.

Test Set 1 Test Set 2
100 100
90 90
80 80 27,686
AUC (%)
AUC (%)
27,749
70 70 27,768
60 60
50 50
Taming Tran.
Latent Diff.
GLIDE
StyleGAN2
StyleGAN3
GC Inpainting
AVG
BigGAN
DALL-e mini
ADM
LaMa
AVG
FIGURE 5. The area under the receiver operating characteristic curve (AUC) of the three best performing methods on images from test set 1 and test set 2.
tion. Models included during training

were the five known techniques
StyleGAN2 [11], StyleGAN3 [12],
GLIDE [5], Taming Transformers
[10], and inpainted images with
Gated Convolution [13]. In addition,
images generated using DALL∙E [25]
and VQGAN [10] were used.
Megatron
■■ Affiliation: Bangladesh University of
Engineering and Technology,
Bangladesh
■■ Supervisor: Shaikh Anowarul Fattah
■■ Students: Md Awsafur Rahman,
Bishmoy Paul, Najibul Haque
Sarker, and Zaber Ibn Abdul Hakim
■■ Technical approach: a multiclass
classification scheme and an ensem-
ble of convolutional neural networks
and transformer-based architectures.
An extra class was introduced to
FIGURE 6. The winning team (FAU Erlangen-Nürnberg) during the award ceremony at ICIP 2022 in detect synthetic images coming
Bordeaux.
from unknown models. Knowledge
distillation and test time augmenta-
(third place). We will also present some Beetz, ChangGeng Drewes, and tion were also included in the pro-
details on their technical approach. Tobias Gessler posed solution. The training set
■■ Technical approach: an ensemble of included, beyond the five known
FAU Erlangen-Nürnberg vision transformers pretrained on techniques, additional images coming
■■ Affiliation: Friedrich-Alexander- Imagenet-21k and fine-tuned on a from the following generators:
Universität Erlangen-Nürnberg, large dataset of 400,000 images. To ProGAN [26], ProjectedGAN [27],
Germany extract generalizable features, a pro- CycleGAN [28], DDPM [29],
■■ Supervisor: Christian Riess cedure based on weighted random Diffusion-GAN [30], Stable Diffusion
■■ Tutor: Anatol Maier sampling was adopted during training [31], Denoising Diffusion GAN [32],
■■ Students: Vinzenz Dewor, Luca aimed at balancing the data distribu- and GauGAN [33].

Sherlock ■■ Generalization is still a main issue in SemaFor program funded by DARPA
■■ Affiliation: Bangladesh University of synthetic image detection. In partic- under Agreement FA8750-20-2-1004;
Engineering and Technology, ular, it has been observed that one Horizon Europe vera.ai funded by the
Bangladesh main problem is how to set the cor- European Union, Grant Agreement
■■ Supervisor: Mohammad Ariful rect threshold in the more challeng- 101070093; a TUM-IAS Hans Fischer
Haque ing scenario of unseen generators Senior Fellowship; and PREMIER,
■■ Students: Fazle Rabbi, Asif Quadir, during training. funded by the Italian Ministry of Edu-
Indrojit Sarkar, Shahriar Kabir ■■ The detection task can benefit of the cation, University, and Research with-
Nahin, Sawradip Saha, and Sanjay attribution, which aims at identify- in the PRIN 2017 program. This work
Acharjee ing the model that was used for syn- is also funded by FCT/MCTES
■■ Technical approach: a two-branch thetic generation. through national funds and when
convolutional neural network that took We believe that the availability of the applicable cofunded EU funds under
as input features dataset (https://github. Projects UIDB/50008/2020 and
extracted in the In particular, the VIP com/grip-unina/ LA/P/0109/2020.
spatial and in the Cup has shown the DMimageDetec-
Fourier domain. need to develop models tion) created during Authors
The adopted ar that can be used in the the challenge can Davide Cozzolino (davide.cozzolino@
chitectures were stimulate the research unina.it) is an assistant professor
EfficientNet-b7
wild to detect synthetic on synthetic image with the Department of Electrical
and MobileN et images generated by new detection and motivate Engineering and Information Tech
-v3. In addition, architectures, such as the other researchers to nology, University Federico II, 80125
strong augmenta- recent diffusion models. work in this inter- Naples, Italy. He was cochair of the
tion was perform esting field. The ad IEEE CVPR Workshop on Media
ed, which included also CutMix vancements in generative AI make Forensics in 2020. He was part of the
beyond standard operations. During the distinction between real and fake teams that won the 2013 IEEE Image
training, only the five known gener- very thin, and it is very important to Forensics Challenge (both detection
ation techniques were considered. push the community to continuously and localization) and the 2018 IEEE
search for effective solutions [38]. In Signal Processing Cup on camera
Conclusions particular, the VIP Cup has shown model identification. His research
This article describes the 2022 VIP Cup the need to develop models that can interests include image processing and
that took place last October at ICIP. The be used in the wild to detect synthet- deep learning, with main contributions
aim of the competition was to foster ic images generated by new architec- in multimedia forensics. He is a
research on the detection of synthetic tures, such as the recent diffusion Member of IEEE.
images, in particular, focusing on imag- models. In this respect, it is important to Koki Nagano (knagano@nvidia.
es generated using the recent diffusion design explainable methods that can com) is a senior research scientist at
models [7], [8], [15], [34]. These archi- highlight which are the forensic arti- NVIDIA Research, Santa Clara, CA
tectures have shown an impressive abil- facts that the detector is exploiting 95051 USA. He works at the intersec-
ity to generate images guided by textual [39]. We hope that more and more tion of graphics and AI, and his
descriptions or pilot sketches, and there methods will be published in the research focuses on realistic digital
is very limited work on their detection research community and will be human synthesis and trustworthy visual
[35], [36], [37]. Below, we highlight the inspired by the challenge proposed computing including the detection and
main take-home messages that emerged in the 2022 IEEE VIP Cup at ICIP. prevention of visual misinformation. He
from the technical solutions developed is a Member of IEEE.
in this competition: Acknowledgment Lucas Thomaz (lucas.thomaz@co.
■■ The best-performing models are pre- The organizers express their gratitude it.pt) is a researcher at Instituto de
trained very deep networks that rely to all participating teams, to the local Telecomunicações, 2411-901 Leiria,
on a large dataset of real and syn- organizers at ICIP 2022 for hosting Portugal, and an associate professor
thetic images coming from several the VIP Cup, and to the IEEE Signal in the School of Technology and
different generators. Indeed, increas- Processing Society Membership Management, Polytechnic of Leiria,
ing diversity during training was a Board for the continuous support. Leiria 2411-901, Portugal. He is a
key aspect of the best approaches. Special thanks go to Riccardo Corvi member of the Student Services
■■ Augmentation represents a funda- and Raffaele Mazza from University Committee of the IEEE Signal Pro
mental step to make the model Federico II of Naples, who helped to cessing Society, supporting the VIP
more robust to post-processing build the datasets. The authors also Cup and the IEEE Signal Processing
operations and make it work in acknowledge the projects that support Cup, and the chair of the Engagement
realistic scenarios. this research: DISCOVER within the and Career Training Subcommittee. He

is a consulting associate editor for IEEE [4] T. Karras, S. Laine, and T. Aila, “A style-based [22] Y. Ju, S. Jia, L. Ke, H. Xue, K. Nagano, and S.
generator architecture for generative adversarial net- Lyu, “Fusing global and local features for generalized
Open Journal of Signal Processing. He works,” in Proc. IEEE/CVF Conf. Comput. Vision AI-synthesized image detection,” in Proc. IEEE Int.
is a Member of IEEE. Pattern Recognit., 2019, pp. 4396–4405, doi: 10.1109/ Conf. Image Process., 2022, pp. 3465–3469, doi:
CVPR.2019.00453. 10.1109/ICIP46576.2022.9897820.
Angshul Majumdar (angshul@iiitd.
[5] A. Q. Nichol et al., “GLIDE: Towards photoreal- [23] S. Mandelli, N. Bonettini, P. Bestagini, and S.
ac.in) received his Ph.D. from the istic image generation and editing with text-guided Tubaro, “Detecting GAN-generated images by
University of British Columbia. He is a diffusion models,” in Proc. Int. Conf. Mach. Learn., orthogonal training of multiple CNNs,” in Proc. IEEE
2022, pp. 16,784–16,804. Int. Conf. Image Process., 2022, pp. 3091–3095, doi:
professor at Indraprastha Institute of [6] B. Dayma et al. “DALL-E mini.” GitHub. 10.1109/ICIP46576.2022.9897310.
Information Technology, New Delhi Accessed: Dec. 14, 2022. [Online]. Available: https:// [24] S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and
github.com/borisdayma/dalle-mini A. Efros, “CNN-generated images are surprisingly
110020, India. He has been with the insti- easy to spot… for now,” in Proc. IEEE/CVF Conf.
[7] R. Rombach, A. Blattmann, D. Lorenz, P. Esser,
tute since 2012. He is currently the direc- and B. Ommer, “High-resolution image synthesis Comput. Vision Pattern Recognit., 2020, pp. 8692–
8701, doi: 10.1109/CVPR42600.2020.00872.
tor of the Student Services Committee of with latent diffusion models,” in Proc. IEEE/CVF
Conf. Comput. Vision Pattern Recognit., 2022, pp. [25] A. Ramesh et al., “Zero-shot text-to-image gen-
the IEEE Signal Processing Society. He 10,684–10,695. eration,” in Proc. Int. Conf. Mach. Learn., 2021, pp.
has previously served the Society as chair [8] Y. Balaji et al., “eDiff-I: Text-to-image diffusion
8821–8831.
of the Chapter’s Committee (2016–2018), models with ensemble of expert denoisers,” 2022, [26] T. Karras, T. Aila, S. Laine, and J. Lehtinen,
arXiv:2211.01324. “Progressive growing of GANs for improved quality,
chair of the Education Committee (2019), stability, and variation,” in Proc. Int. Conf. Learn.
[9] D. Gragnaniello, D. Cozzolino, F. Marra, G.
and member-at-large of the Education Poggi, and L. Verdoliva, “Are GAN generated images
Representations, 2018, pp. 1–12.
Board (2020). He is an associate editor for easy to detect? A critical analysis of the state-of-the- [27] A. Sauer, K. Chitta, J. Müller, and A. Geiger,
art,” in Proc. IEEE Int. Conf. Multimedia Expo, 2021, “Projected GANs converge faster,” in Proc. Adv.
IEEE Open Journal of Signal Processing pp. 1–6, doi: 10.1109/ICME51207.2021.9428429. Neural Inf. Process. Syst., 2021, pp. 1–13.
and Elsevier’s Neurocomputing. In the [10] P. Esser, R. Rombach, and B. Ommer, “Taming [28] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros,
transformers for high-resolution image synthesis,” in “Unpaired image-to-image translation using cycle-
past, he was an associate editor for IEEE consistent adversarial networks,” in Proc. IEEE Int.
Proc. IEEE/CVF Conf. Comput. Vision Pattern
Transactions on Circuits and Systems for Recognit., 2021, pp. 12,873–12,883, doi: 10.1109/ Conf. Comput. Vision, 2017, pp. 2242–2251, doi:
CVPR46437.2021.01268. 10.1109/ICCV.2017.244.
Video Technology. He is a Senior Member
[11] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. [29] J. Ho, A. Jain, and P. Abbeel, “Denoising diffu-
of IEEE. Lehtinen, and T. Aila, “Analyzing and improving the sion probabilistic models,” in Proc. Adv. Neural Inf.
Luisa Verdoliva (verdoliv@unina.it) image quality of StyleGAN,” in Proc. IEEE/CVF Process. Syst., 2020, pp. 6840–6851.
Conf. Comput. Vision Pattern Recognit., 2020, pp.
is a professor with the Department of 8110–8119.
[30] Z. Wang, H. Zheng, P. He, W. Chen, and M.
Zhou, “Diffusion-GAN: Training GANs with diffu-
Electrical Engineering and Information [12] T. Karras et al., “Alias-free generative adversarial sion,” in Proc. Int. Conf. Learn. Representations,
Technology, University Federico II, networks,” in Proc. Adv. Neural Inf. Process. Syst., 2023, pp. 1–13.
2021, vol. 34, pp. 852–863. [31] R. Rombach, A. Blattmann, D. Lorenz, P.
80125 Naples, Italy. She was an associ-
[13] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Esser, and B. Ommer. “Stable diffusion.” GitHub.
ate editor for IEEE Transactions on Huang, “Free-form image inpainting with gated con- Accessed: Dec. 14, 2022. [Online]. Available:
Information Forensics and Security volution,” in Proc. IEEE/CVF Int. Conf. Comput. https://github.com/CompVis/stable-diffusion
Vision, 2019, pp. 4471– 4480, doi: 10.1109/ [32] Z. Xiao, K. Kreis, and A. Vahdat, “Tackling the
(2017–2022) and is currently deputy edi- ICCV.2019.00457. generative learning trilemma with denoising diffusion
tor in chief for the same journal and [14] A. Brock, J. Donahue, and K. Simonyan, “Large GANs,” in Proc. Int. Conf. Learn. Representations,
scale GAN training for high fidelity natural image syn- 2022, pp. 1–15.
senior area editor for IEEE Signal
thesis,” in Proc. Int. Conf. Learn. Representations, [33] T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu,
Processing Letters. She is the recipient 2019, pp. 1–11. “Semantic image synthesis with spatially-adaptive
of a Google Faculty Research Award for [15] P. Dhariwal and A. Nichol, “Diffusion models normalization,” in Proc. IEEE/CVF Conf. Comput.
beat GANs on image synthesis,” in Proc. Adv. Neural Vision Pattern Recognit., 2019, pp. 2332–2341.
Machine Perception (2018) and a TUM- Inf. Process. Syst., 2021, vol. 34, pp. 8780–8794. [34] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and
IAS Hans Fischer Senior Fellowship [16] R. Suvorov et al., “Resolution-robust large mask M. Chen, “Hierarchical text-conditional image gener-
(2020–2024). She was chair of the IFS inpainting with Fourier convolutions,” in Proc. ation with clip latents,” 2022, arXiv:2204.06125.
IEEE/CVF Winter Conf. Appl. Comput. Vision, [35] R. Corvi, D. Cozzolino, G. Zingarini, G. Poggi,
TC (2021–2022). Her scientific interests 2022, pp. 3172–3182, doi: 10.1109/WACV51458. K. Nagano, and L. Verdoliva, “On the detection of
are in the field of image and video pro- 2022.00323. synthetic images generated by diffusion models,” in
cessing, with main contributions in the [17] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and Proc. IEEE Int. Conf. Acoust., Speech Signal
L. Fei-Fei, “ImageNet: A large-scale hierarchical Process., 2023, pp. 1–5.
area of multimedia forensics. She is a image database,” in Proc. IEEE Conf. Comput. [36] Z. Sha, Z. Li, N. Yu, and Y. Zhang, “DE-FAKE:
Fellow of IEEE. Vision Pattern Recognit., 2009, pp. 248–255, doi: Detection and attribution of fake images generated by
10.1109/CVPR.2009.5206848. t e x t - t o - i m a g e d i ff u s i o n m o d e l s ,” 2 0 2 2 ,
arXiv:2210.06998.
References [18] T.-Y. Lin et al., “Microsoft COCO: Common
objects in context,” in Proc. Eur. Conf. Comput. [37] J. Ricker, S. Damm, T. Holz, and A. Fischer,
[1] A. Mahdawi, “Nonconsensual deepfake porn is an
Vision, 2014, pp. 740–755. “Towards the detection of diffusion model deepfakes,”
emergency that is ruining lives,” Guardian, Apr.
2023. [Online]. Available: https://www.theguardian. [19] F. Yu, A. Seff, Y. Zhang, S. Song, T. 2022, arXiv:2210.14571.
com/commentisfree/2023/apr/01/ai-deepfake-porn Funkhouser, and J. Xiao, “LSUN: Construction of a [38] M. Barni et al., “Information forensics and secu-
-fake-images large-scale image dataset using deep learning with rity: A quarter century long journey,” IEEE Signal
humans in the loop,” 2015, arXiv:1506.03365. Process. Mag., vol. 40, no. 5, pp. 67-79, Jul. 2023,
[2] J. Vincent, “After deepfakes go viral, AI image
generator Midjourney stops free trials citing ‘abuse’,” [20] H. Li, B. Li, S. Tan, and J. Huang, “Identification of doi: 0.1109/MSP.2023.3275319.
Verge, Mar. 2023. [Online]. Available: https://www. deep network generated images using disparities [39] R. Cor vi, D. Cozzol ino, G. Poggi, K.
theverge.com/2023/3/30/23662940/deepfake-viral in color components,” Signal Process., vol. 174, Sep. Nagano, and L. Verdoliva, “Intriguing properties
-ai-misinformation-midjourney-stops-free-trials 2020, Art. no. 107616, doi: 10.1016/j.sigpro.2020.107616. of synthetic images: From generative adversarial
[3] L. Verdoliva, “Media forensics and deepfakes: An [21] X. Zhang, S. Karaman, and S.-F. Chang, “Detecting networks to diffusion models,” in Proc. IEEE
overview,” IEEE J. Sel. Topics Signal Process., vol. and simulating artifacts in GAN fake images,” in Proc. Comput. Vision Pattern Recognit. Workshops,
14, no. 5, pp. 910–932, Aug. 2020, doi: 10.1109/ IEEE Int. Workshop Inf. Forensics Secur., 2019, pp. 2023, pp. 973–982.
JSTSP.2020.3002101. 1–6, doi: 10.1109/WIFS47025.2019.9035107. SP

DATES AHEAD
Please send calendar submissions to:
Dates Ahead, Att: Samantha Walter, Email: walter.samantha@ieee.org
2023
NOVEMBER
Asia Pacific Signal and Information

Processing Association Annual Summit and
Conference (APSIPA ASC 2023)
31 October–3 November, Taipe, Taiwan.
General Chairs: JIng-Ming Guo, Gwo-Giun
Lee, Shih-Fu Chang, and Anthony Kuh
URL: https://www.apsipa2023.org
©SHUTTERSTOCK.COM/SAYAN URANAN
19th International Conference on Advanced
Video and Signal-Based Surveillance
(AVSS 2023)
6–9 November, Daegu, South Korea.
General Chairs: Jeng-Neng Hwang
and Michael S. Ryoo
URL: https://www.avss2023.org
The IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)
DECEMBER will be held in Seoul, Korea, 14–19 April 2024.
IEEE International Workshop on Information
Forensics and Security (WIFS 2023) IEEE International Conference on
4–7 December, Nuremberg, Germany. MAY Multimedia and Expo (ICME 2024)
General Chairs: Marta Gomez-Barrero IEEE Conference on Computational Imaging 15–19 July, Niagara Falls, Canada.
and Christian Riess Using Synthetic Apertures (CISA 2024) General Chairs: Junsong Yuan, Jiebo Luo,
URL: https://wifs2023.fau.de/ 3–6 May, Boulder, CO, USA. and Xiao-Ping Zhang
General Chairs: Alexandra Artusio-Glimpse, URL: https://2024.ieeeicme.org/
Ninth IEEE International Workshop on Paritosh Manurkar, Sam Berweger, Kumar
Computational Advances in Multi-Sensor Vijay Mishra, and Peter Vouras SEPTEMBER
Adaptive Processing (CAMSAP 2023) URL: https://2024.ieeecisa.org/
10–13 December, Costa Rica. IEEE 25th International Workshop
General Chairs: M. Haardt and André de Almeida IEEE International Symposium on on Signal Processing Advances in
URL: https://www.tuwien.at/etit/tc/en/ Biomedical Imaging (ISBI 2024) Wireless Communications (SPAWC 2024)
camsap-2023/ 27–30 May, Athens, Greece. 10–13 September, Lucca, Italy.
General Chairs: Konstantina S. Nikita, General Chair: Luca Sanguinetti
Workshop on Automatic Speech Recognition and Christos Davatzikos URL: https://spawc2024.org/
and Understanding (ASRU 2023) URL: https://biomedicalimaging.org/2024/
16–20 December, Taipei, Taiwan. OCTOBER
General Chairs: Chi-Chun Lee, Yu Tsao,
JUNE IEEE International Conference
and Hsin-Min Wang
URL: http://www.asru2023.org IEEE Conference on Artificial on Image Processing (ICIP 2024)
Intelligence (CAI 2024) 27–30 October, Abu Dhabi, UAE.
25–27 June, Singapore. General Chairs: Mohammed Al-Mualla
2024 General Chairs: Ivor Tsang,
Yew Soon Ong and Hussein Abbass
and Moncef Gabbouj
URL: https://2024.ieeeicip.org/
URL: https://ieeecai.org/2024/ SP
APRIL
JULY
IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP 2024) IEEE 13th Sensor Array and Multichannel
14–19 April, Seoul, Korea. Signal Processing Workshop (SAM 2024)
General Chairs: Hanseok Ko and Monson Hayes 8–11 July, Corvallis, OR, USA.
URL: https://2024.ieeeicassp.org/ General Chairs: Yuejie Chi and Raviv Raich
URL: https://attend.ieee.org/sam-2024/

Date of current version: 3 November 2023
MATLAB SPEAKS
MACHINE
LEARNING
With MATLAB® you can use clustering,
regression, classification, and deep
learning to build predictive models
and put them into production.
mathworks.com/machinelearning
© The MathWorks, Inc.

IEEE Signal Processing Magazine - November 2023

Uploaded by

Copyright:

Available Formats

IEEE Signal Processing Magazine - November 2023

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IEEE Signal Processing Magazine - November 2023

Uploaded by

Copyright:

Available Formats

What is the scope of the special issue?

What is the scope of the special issue?

What are the topics of interest for the special issue?

What are the topics of interest for the special issue?

Call for Papers

IEEE Signal Processing

FEATURES 64 Tips & Tricks

0.5 –20 Fourier and the Early Days

Digital Object Identifier 10.1109/MSP.2023.3317580

Digital Object Identifier 10.1109/MSP.2023.3317582

2 IEEE SIGNAL PROCESSING MAGAZINE | November 2023 |

Call for Papers

Digital Object Identifier 10.1109/MSP.2023.3316949

SPS Members, You Are All Heirs of Fourier!

4 IEEE SIGNAL PROCESSING MAGAZINE | November 2023 |

6 IEEE SIGNAL PROCESSING MAGAZINE | November 2023 |

Reflections on the Poland Chapter Celebration

IEEE SIGNAL PROCESSING MAGAZINE | November 2023 | 7

8 IEEE SIGNAL PROCESSING MAGAZINE | November 2023 |

IEEE SIGNAL PROCESSING MAGAZINE | November 2023 | 9

10 IEEE SIGNAL PROCESSING MAGAZINE | November 2023 |

Fourier and the Early Days of Sound Analysis

IEEE SIGNAL PROCESSING MAGAZINE | November 2023 | 11

12 IEEE SIGNAL PROCESSING MAGAZINE | November 2023 |

IEEE SIGNAL PROCESSING MAGAZINE | November 2023 | 13

FIGURE 4. Koenig’s sound spectrograph [14]. FIGURE 5. Édouard-Léon Scott de Martinville.

14 IEEE SIGNAL PROCESSING MAGAZINE | November 2023 |

Scott’s phonautograph versus Edison’s

IEEE SIGNAL PROCESSING MAGAZINE | November 2023 | 15

16 IEEE SIGNAL PROCESSING MAGAZINE | November 2023 |

75 years later, this group — now the IEEE Signal Processing

Celebrate with us:

Digital Object Identifier 10.1109/MSP.2023.3322949

18 IEEE SIGNAL PROCESSING MAGAZINE | November 2023 | 1053-5888/23©2023IEEE

IEEE SIGNAL PROCESSING MAGAZINE | November 2023 | 19

20 IEEE SIGNAL PROCESSING MAGAZINE | November 2023 |

This is motivated by the fact that analytic functions can be

IEEE SIGNAL PROCESSING MAGAZINE | November 2023 | 21

Preliminaries: Representing broadband signals Covariance matrices

22 IEEE SIGNAL PROCESSING MAGAZINE | November 2023 |

of the M array signals. The length T of these TDLs will deter-

IEEE SIGNAL PROCESSING MAGAZINE | November 2023 | 23

24 IEEE SIGNAL PROCESSING MAGAZINE | November 2023 |

The components of the analytic PEVD have some useful 3

lation coefficients at all lags. Significantly, the signals are not

IEEE SIGNAL PROCESSING MAGAZINE | November 2023 | 25

(a) (b) (c) (d)

4 decoupling of broadband signals. Furthermore, some promising

26 IEEE SIGNAL PROCESSING MAGAZINE | November 2023 |

IEEE SIGNAL PROCESSING MAGAZINE | November 2023 | 27

x[n] x[n] + y [n] functions as divisors in the “Mathematical Background” sec-

28 IEEE SIGNAL PROCESSING MAGAZINE | November 2023 |

unlike in the TDL-based broadband beamformer case, the orders

IEEE SIGNAL PROCESSING MAGAZINE | November 2023 | 29

Optimum coding gain and PEVD

30 IEEE SIGNAL PROCESSING MAGAZINE | November 2023 |

that are demultiplexed by M = 4. For each ensemble probe, the 1 4

m=1 m=2 m=3 m=4 0.75

0.5 –20 Fourier and the Early Days

K0 f(2 -) (z) = ^1, 0, 3, 0, 5, 0, 7, 0, 9, 0 h .(6)

(t) (z) = z · exp c - m(38)

xt = U (y) (49) P {U d} (y) = (K u LR f(2 -) (f(2 .) ((W L K 0 y) +)) $ c 1 .(60)

P {U} (y) = P {U u} ( y) + P {U d} ( y) (55) U-Net—concluding remarks

vt h = 1.4826 $ median ^ ; fHH ) x ; h . (88)