IEEE Signal Processing Magazine - November 2023
IEEE Signal Processing Magazine - November 2023
IEEE Signal Processing Magazine - November 2023
COLUMNS
Normalized Angular Frequency Ω/π
0.9 0
0.8 –5
0.7 –10
0.6 –15 11 DSP History
Gain/[dB]
PG. 18 PG. 38
IEEE SIGNAL PROCESSING MAGAZINE (ISSN 1053-5888) (ISPREG) is published bimonthly by the Institute of Electrical and Electronics Engineers, Inc., 3 Park Avenue, 17th Floor, New York,
NY 10016-5997 USA (+1 212 419 7900). Responsibility for the contents rests upon the authors and not the IEEE, the Society, or its members. Annual member subscriptions included in Society fee.
Nonmember subscriptions available upon request. Individual copies: IEEE Members US$20.00 (first copy only), nonmembers US$248 per copy. Copyright and Reprint Permissions: Abstracting
is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of U.S. Copyright Law for private use of patrons: 1) those post-1977 articles that carry a code at the bot-
tom of the first page, provided the per-copy fee is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923 USA; 2) pre-1978 articles without fee. Instructors are
permitted to photocopy isolated articles for noncommercial classroom use without fee. For all other copying, reprint, or republication permission, write to IEEE Service Center, 445 Hoes Lane,
Piscataway, NJ 08854 USA. Copyright © 2023 by the Institute of Electrical and Electronics Engineers, Inc. All rights reserved. Periodicals postage paid at New York, NY, and at additional mailing offices.
Postmaster: Send address changes to IEEE Signal Processing Magazine, IEEE, 445 Hoes Lane, Piscataway, NJ 08854 USA. Canadian GST #125634188 Printed in the U.S.A.
IEEEIEEE
SIGNAL PROCESSING
SIGNAL MAGAZINE| November
MAGAZINE
PROCESSING | July 2023
2023| | 1
IEEE Signal Processing Magazine
DEPARTMENTS EDITOR-IN-CHIEF
Christian Jutten—Université Grenoble Alpes,
ASSOCIATE EDITORS—COLUMNS AND FORUM
Ulisses Braga-Neto—Texas A&M University,
France USA
4 From the Editor Cagatay Candan—Middle East Technical
SPS Members, You Are All Heirs of Fourier! AREA EDITORS University, Turkey
Christian Jutten Feature Articles Wei Hu—Peking University, China
Laure Blanc-Féraud—Université Côte d’Azur, Andres Kwasinski—Rochester Institute of
7 President’s Message Technology, USA
France
Reflections on the Poland Xingyu Li—University of Alberta, Edmonton,
Chapter Celebration Special Issues
Alberta, Canada
Athina Petropulu Xiaoxiang Zhu—German Aerospace Center,
Xin Liao—Hunan University, China
Germany
Piya Pal—University of California San Diego,
Cover 3 Dates Ahead Columns and Forum USA
Rodrigo Capobianco Guido—São Paulo Hemant Patil—Dhirubhai Ambani Institute
State University (UNESP), Brazil of Information and Communication
H. Vicky Zhao—Tsinghua University, Technology, India
R.P. China Christian Ritz—University of Wollongong,
e-Newsletter Australia
Hamid Palangi—Microsoft Research Lab (AI),
USA ASSOCIATE EDITORS—e-NEWSLETTER
Social Media and Outreach Abhishek Appaji—College of Engineering, India
Emil Björnson—KTH Royal Institute of Technology, Subhro Das—MIT-IBM Watson AI Lab,
Sweden IBM Research, USA
Behnaz Ghoraani—Florida Atlantic University, USA
EDITORIAL BOARD Panagiotis Markopoulos—The University of Texas at
San Antonio, USA
Massoud Babaie-Zadeh—Sharif University of
Technology, Iran
Waheed U. Bajwa—Rutgers University,
IEEE SIGNAL PROCESSING SOCIETY
Athina Petropulu—President
USA
Min Wu—President-Elect
Caroline Chaux—French Center of National
Ana Isabel Pérez-Neira—Vice President,
Research, France
Conferences
Mark Coates—McGill University, Canada
Roxana Saint-Nom—VP Education
cover 3 Laura Cottatellucci—Friedrich-Alexander
Kenneth K.M. Lam—Vice President, Membership
University of Erlangen-Nuremberg, Germany
Marc Moonen—Vice President, Publications
Davide Dardari—University of Bologna, Italy
Alle-Jan van der Veen—Vice President, Technical
Mario Figueiredo—Instituto Superior Técnico,
©SHUTTERSTOCK.COM/SAYAN URANAN
Directions
University of Lisbon, Portugal
Sharon Gannot—Bar-Ilan University,
IEEE SIGNAL PROCESSING SOCIETY STAFF
Israel
Richard J. Baseil—Society Executive Director
Yifan Gong—Microsoft Corporation, USA
William Colacchio—Senior Manager, Publications
Rémi Gribonval—Inria Lyon, France and Education Strategy and Services
Joseph Guerci—Information Systems Rebecca Wollman—Publications Administrator
Laboratories, Inc., USA
Ian Jermyn—Durham University, U.K.
The IEEE International Conference on Acoustics, Speech, and
Signal Processing (ICASSP 2024) will be held in Seoul, Korea, Ulugbek S. Kamilov—Washington University, USA
14–19 April 2024. Patrick Le Callet—University of Nantes, IEEE PUBLISHING OPERATIONS
France Sharon M. Turk, Journals Production Manager
Sanghoon Lee—Yonsei University, Korea Katie Sullivan, Senior Manager,
Danilo Mandic—Imperial College London, U.K. Journals Production
Michalis Matthaiou—Queen’s University Belfast, Gail A. Schnitzer, Associate Art Director
U.K. Theresa L. Smith, Production Coordinator
Phillip A. Regalia—U.S. National Science Mark David, Director, Business Development -
Foundation, USA Media & Advertising
Gaël Richard—Télécom Paris, Institut
Felicia Spagnoli, Advertising Production Manager
Polytechnique de Paris, France
Peter M. Tuohy, Director, Production Services
Reza Sameni—Emory University, USA
Ervin Sejdic—University of Pittsburgh, USA Kevin Lisankie, Director, Editorial Services
Dimitri Van De Ville—Ecole Polytechnique Dawn M. Melley, Senior Director,
Fédérale de Lausanne, Switzerland Publishing Operations
Henk Wymeersch—Chalmers University
of Technology, Sweden
SCOPE: IEEE Signal Processing Magazine publishes tutorial-style articles on signal processing research and
applications as well as columns and forums on issues of interest. Its coverage ranges from fundamental principles
to practical implementation, reflecting the multidimensional facets of interests and concerns of the community. Its
mission is to bring up-to-date, emerging, and active technical developments, issues, and events to the research,
IEEE prohibits discrimination, harassment, and bullying. educational, and professional communities. It is also the main Society communication platform addressing important
For more information, visit issues concerning all members.
http://www.ieee.org/web/aboutus/whatis/policies/p9-26.html.
National Institute of Standards and Technology| Boulder | Colorado | 20-23 May 2024
The IEEE Signal Processing Society, the IEEE Synthetic Aperture Standards Committee, and the IEEE Synthetic Aperture Technical Working Group
enthusiastically invite you to the scenic campus of the National Institute of Standards and Technology (NIST) in Boulder, Colorado to a unique
gathering of researchers and engineers engaged in cutting-edge research on computational imaging and sensing using synthetic apertures
(SAs). The term SA refers generically to a discrete
measurement scheme together with an inverse problem
solution that yields imaging or sensing performance better
than the hardware system is inherently capable of, e.g.,
wider field-of-view, higher angular resolution. An SA may
sample a propagating wavefield or environmental
Important Dates parameters in the signal domain via linear motion of an
Special session proposals due: October 20, 2023 antenna or transducer, as in synthetic aperture radar (SAR),
Initial 2-page abstract submissions due: January 19, 2024 sonar (SAS), or channel sounding. Alternatively, an SA may
Tutorial proposals due: January 26, 2024 sample in the k-space domain via different look angles
Acceptance notification: March 1, 2024 around an object or scene, as in computed tomography,
Camera-ready 4+1-page paper due: April 5, 2024 spotlight SAR or Fourier ptychography. Lastly, an SA may be
constructed from a sparse array of sensors as in radiometry,
seismology, or radio astronomy. The front end of an SA may
be a conventional antenna, acoustic transducer, or a quantum sensor, such as a Rydberg probe, in advanced implementations. CISA will
highlight advances in the theoretical development, engineering practice, and standardization of all aspects of SA imaging and sensing.
Suggested topics for CISA are listed below.
Radar: Automotive SAR, mmWave and THz SAR, Magnetic resonance imaging: Image reconstruction from
polarimetric SAR, ISAR, 3-D imaging, High-dimensional under-sampled measurements
feature processing using tensors Ultrasound: Flow and velocity estimation
Sonar: Micronavigation and position uncertainty, Distributed sensors: Networked coherent radars, sonars
Bathymetry, Wideband regimes Power beaming: Wireless power transfer to UAVs
Optics: Phase retrieval, Ptychography, Holography, Coded Radiometry and remote sensing: 5G signal interference
diffraction imaging, Coded aperture imaging, Wirtinger Quantum receivers: Rydberg probes, Lithium-niobate
flow, Deep learning techniques piezoelectric sensors
5G: Channel sounding, Over-the-air calibration, MIMO Integrated sensing and communications: Coherent UAV
antenna testbeds, Intelligent reflecting surfaces, Near-field swarms
beam focusing Radio astronomy: Low-noise receivers, Satellite
Seismology: Wave migration and localization techniques interference mitigation
Inverse problems: Deconvolution and hardware de- Point cloud processing: LiDAR, 4D mmWave radar in
embedding, Neuromorphic computing methods robotics, autonomous driving
Data-driven signal processing: SAR focusing techniques Model-based image reconstruction: Regularization
Prospective authors should visit https://2024.ieeecisa.org/ for more details and to submit manuscripts. All manuscripts must adhere to IEEE
formatting guidelines and accepted papers will appear in IEEE Xplore. The 2024 CISA conference will be an in-person event and authors must
attend to present their papers live at NIST. For additional questions, please contact the co-chairs, Alexandra Artusio-Glimpse (alexandra.artusio-
glimpse@nist.gov), Paritosh Manurkar (paritosh.manurkar@nist.gov), Samuel Berweger (samuel.berweger@nist.gov), Peter Vouras
(synthetic_aperture_twg@ieee.org), or Kumar Vijay Mishra (kvm@ieee.org).
M
y three years of service as the editor- Such ideas and the book Digital Signal note that all the articles in this issue ex-
in-chief (EIC) of Signal Processing Processing [1] were revolutionary at a plicitly mention Fourier’s legacy.
Magazine (SPM) are now coming to time when computers were in their in- You all know what an eigenvalue de-
a close. During the past three years, many fancy. In fact, the concept of digital signal composition (EVD) is and some of its
of us were deeply affected by serious po- processing was met with mixed reviews uses, but do you know what a polyno-
litical, social, and environmental events and skepticism. mial EVD (PEVD) is? In feature article
such as the war in Ukraine; protests for But long before this came the contri- [A2], you will learn about PEVD and
freedom in Iran; coups d’état in Africa; butions of Jean-Baptiste Joseph Fourier its application in many problems involv-
the COVID-19 pandemic; seisms in Tur- who developed for our understanding the ing multichannel broadband signals.
key, Syria, and Morocco; huge floods in propagation of heat. His most famous Denoising is an essential task in SIP. Cur-
Libya and India; gigantic fires in North book [2], published 201 years ago, in rently, many methods for image denois-
America and Southern Europe; and an 1822, contains the basics of the Fourier ing use convolutional neural networks
avalanche of stones in the Alps, to name series and transform and their ability to (CNNs). Feature article [A3] proposes
a few. In such a context, I believe that the represent a large range of signals. Fou- an in-depth understanding of encoding-
IEEE slogan, “Advancing Technology rier’s ideas were also “out of the box,” decoding CNN architectures (convo-
for Humanity,” is incredibly relevant and and they were also received with reserva- lution, down/upsampling structures, ac-
timely. It also must be viewed in a wider tions from eminent scientists who could tivation functions, etc.) following signal
sense, including the preservation of Earth not understand how and why a sum of processing principles.
and sustainable development. In point continuous functions could approximate This issue contains four “Tips and
of fact, what would become of human- noncontinuous functions. Later, in 1829, Tricks” columns. In [A4], the authors pro-
ity without Earth? I believe that we must Dirichlet presented the theoretical results pose two tricks for approaching a perfect
always have this in mind when contem- concerning the convergence of Fourier filter with reduced complexity. In [A5], the
plating our future projects, asking for series [A1]. Fourier’s life is a real novel, authors present a trick for robust estima-
funding, and while teaching. which the curious reader can discover in tion of the frequency of a single complex
The year 2023 also marks the 75th this well-documented and fun work (un- exponential using the magnitude of only
anniversary of the IEEE Signal Process- fortunately, only in French) [3]. two samples of its discrete-time Fourier
ing Society (SPS), and this too offers transform. In [A6], the author shows that
us an opportunity to think about the In this issue coding numbers as integers rather than as
signal processing domain and ponder During this year in which we celebrated floating points can avoid rounding errors
its roots and its dazzling evolution. It is the 75th anniversary of the SPS, it was in the implementation of moving average
also interesting to think about the early mandatory to recall Fourier, and I warmly filters. Note that a copy of the code for this
contributions that are of the highest im- thank Patrick Flandrin for his article [A1], trick can be obtained by sending an e-mail
portance in our domain and became its which gives many historical details on to the author. Finally, [A7] presents a trick
pillars. During ICASSP 2023 in Rhodes, some of Fourier’s contributions and their for realizing sub-Nyquist coherent imag-
Alan Oppenheim, Ron Schafer, and Tony impact on sound analysis and record- ing based on an optimized multiplexing
Constantinides recounted the adventure ings. The article also highlights tricks for hologram scheme.
of digital signal processing in the 1970s. implementing computation before the In SP Education column [A8], the au-
computer era with amazing machines. As thors reflect on education in data science
obvious proof of the importance of Fou- (DS), including signal processing and
Digital Object Identifier 10.1109/MSP.2023.3318848
Date of current version: 3 November 2023 rier in signal and image processing (SIP), machine learning, with the objective that
Wollman for the valuable, efficient, and diversity of articles and Special Issues and Process. Mag., vol. 40, no. 7, pp. 64–73, Nov. 2023, doi:
10.1109/MSP.2023.3290772.
timely help she provided to the authors, also the quality of the work done by the [A5] R. Guo and T. Blu, “Super-resolving a frequency
teams of guest editors (GEs), and mem- design and editorial teams. band,” IEEE Signal Process. Mag., vol. 40, no. 7,
bers of the Editorial Board. I won’t forget Professor Tulay Adali, from the Uni- pp. 73–77, Nov. 2023, doi: 10.1109/MSP.2023.
3311592.
Rupal Blatt, webpage manager, for her versity of Maryland, Baltimore County, [A6] S. Engelberg, “Implementing moving average fil-
reactivity in updating the SPM webpages, will be taking over as EIC on 1 January ters using recursion,” IEEE Signal Process. Mag., vol.
40, no. 7, pp. 78–80, Nov. 2023, doi: 10.1109/
adding templates, adding calls for articles 2024. You will be able to read about her MSP.2023.3294721.
for Special Issues etc. vision for the magazine in her editorial in [A7] Y. Jeong, B. Tayebi, and J.-H. Han, “Sub-Nyquist
I would like to thank all of the authors the January 2024 issue. I know her very coherent imaging using an optimizing multiplexed sam-
pling scheme,” IEEE Signal Process. Mag., vol. 40,
who contributed feature articles and col- well; she is a great scientist, and she has no. 7, pp. 81–88, Nov. 2023, doi: 10.1109/
umns and forums. And finally, I warmly also served the SPS in different positions MSP.2023.3310710.
thank the guest editors who proposed for many years. She is now inviting sci- [A8] S. Gannot, Z.-H. Tan, M. Haardt, N. F. Chen,
H.-T. Wai, I. Tashev, W. Kellermann, and J. Dauwels,
exciting Special Issues and thank them entists to join her as area editors, and she “Data science education: The signal processing per-
for their efforts in managing the reviews will present herself and her team in more spective,” IEEE Signal Process. Mag., vol. 40, no. 7,
pp. 89–93, Nov. 2023, doi: 10.1109/MSP.2023.
from white papers to full articles and for detail in her first editorials. With her as 3294709.
providing the final manuscripts in due EIC, I know that SPM is in good hands. [A9] D. Cozzolino, K. Nagano, L. Thomaz, A.
time. SPM needs high-level tutorial-like Majumdar, and L. Verdoliva, “Synthetic image detec-
tion: Highlights from the IEEE video and image pro-
contributions covering SIP methods and Appendix: Related Articles cessing cup 2022 student competition,” IEEE Signal
[A1] P. Flandrin, “Fourier and the early days of sound anal- Process. Mag., vol. 40, no. 7, pp. 94–100, Nov. 2023,
applications, following trends in DS and ysis,” IEEE Signal Process. Mag., vol. 40, no. 7, pp. 11–16, doi: 10.1109/MSP.2023.3294720.
machine learning but always under the Nov. 2023, doi: 10.1109/MSP.2023.3297313.
SIP umbrella. Keynote speakers and or- [A2] V. W. Neo, S. Redif, J. G. McWhirter, J. Pestana, I.
ganizers of tutorials and special sessions
K. Proudler, S. Weiss, and P. A. Naylor, “Polynomial References
eigenvalue decomposition for multichannel broadband [1] A. V. Oppenheim and R. W. Schafer, Digital Signal
in conferences and workshops—you are signal processing,” IEEE Signal Process. Mag., vol. 40, Processing. Englewood Cliffs, NJ, USA: Prentice-Hall,
no. 7, pp. 18–37, Nov. 2023, doi: 10.1109/ 1975.
all potential candidates for SPM articles. MSP.2023.3269200.
[2] J. Fourier, Théorie Analytique de la Chaleur. Paris,
Don’t hesitate to contact the area editors [A3] L. A. Zavala-Mondragón, P. H. N. de With, and F. France: Firmin Didot, 1822. [Online]. Available:
to refine and concretize your draft article van der Sommen, “A signal processing interpretation of https://gallica.bnf.fr/ark:/12148/bpt6k1045508v.
noise-reduction convolutional neural networks,” IEEE texteImage
or idea for a Special Issue. Following the Signal Process. Mag., vol. 40, no. 7, pp. 38–63, Nov.
ideas of previous EICs, I added the covers 2023, doi: 10.1109/MSP.2023.3300100. [3] E. Marie and E. Cerisier, Les Oscillations de Joseph
[A4] D. Shiung, J.-J. Huang, and Y.-Y. Yang, “Tricks for Fourier. Nantes, France: Editions Petit à Petit, 2018.
of the 19 SPM issues published during the designing a cascade of infinite impulse response filters
last three years (Figure 1), illustrating the with an almost linear phase response,” IEEE Signal SP
M
y end of term as IEEE Signal Pro- Our Society has made many strides one should join the SPS in an era with
cessing Society (SPS) president to level that playing field by providing abundant freely available informa-
is fast approaching. It has been many initiatives to grow and diversify t ion, nu merous platforms for shar-
an incredible experience that has pro- our membership. I’ve discussed these ing scientific work, and a multitude of
vided me with so many opportunities initiatives in my past messages, and conference options? Some argued for
to engage with our members around I’ll detail some recent programs be- the significance of in-person network-
the globe, forge relationships with other low, but there are still many ques- ing and professional development, un-
IEEE Societies, and meet a diverse tions that require novel solutions. derlining the importance of providing
range of people that I hope will become conference discounts and travel grants.
active members of our Society in the War and peace Ultimately, the paramount value that
future. It has been a great privilege to This past September, I had the privilege the SPS provides is the assurance of
be at the helm of a Society that garners of visiting Poland to commemorate high quality—in publications, confer-
such a high level of worldwide respect the 20th anniversary of the IEEE SPS ences, technical activities, educational
and recognition. It has also provided Poland Chapter. During this event, I offerings, and high ethical standards
me with the chance to learn, identify presented the history of the SPS and and guidelines.
the challenges we still face, anticipate its impactful role in signal processing.
future challenges, and work to find Additionally, I had the opportunity to Attracting young minds
solutions that will make our Society, learn about the journey of the Poland There was also a discussion on how
and the world, a better place. Chapter and its various activities. The to engage students and young pro-
The SPS has a unique dual role. We anniversary celebration coincided with fessionals in SPS initiatives. While
strive to grow and advance technologi- the Signal Processing Symposium older generations viewed Society
cal innovation and problem-solve at (SPSympo). Since its inception in 2003, membership as their only option to
the scientific level—from the bench SPSympo has consistently attracted be connected with the outside sci-
to the applications of these technolo- attendees from Poland and neighboring entific community, today’s students
gies in the real world. We need to be countries, particularly Ukraine. Unfor- may need special encouragement to
mindful that our scientific pursuits tunately, due to geopolitical events, become members.
don’t exist in a vacuum, that they have researchers from Ukraine were unable The SPS is providing several events
many social, political, and ethical to travel abroad. It begs the question: designed to spark the interest of young-
implications, and that their very ex- Can we find innovative methods to er generations and foster visibility and
istence is often shaped by an uneven help our members and nonmembers growth. Those include the Signal Pro-
playing field—for scientists that are in countries in the midst of conflict, cessing Cup, the Video and Image Pro-
isolated within their research silos or warfare, and humanitarian crises? Per- cessing Cup, the 5-Minute Video Clip
by geopolitical events, for women and haps we could implement humanitar- Contest, hackathons, and society level
ethnic minorities, for citizens of so- ian programs or grants for accessing awards in the areas of Best PhD Disser-
called low-income countries, and for conferences via online attendance or tation Awards and Young Author Best
young people with economic or cul- open access to postconference tran- Paper Awards and at the conference
tural restraints. scripts and other options to help them level there are and best student paper
overcome these barriers? awards, all of which bring recogni-
During discussions at the Poland tion to students and generate a lot of
Digital Object Identifier 10.1109/MSP.2023.3322050
Date of current version: 3 November 2023 Chapter meeting, some wondered why excitement. There are also o pportunities
cost-effective compared to digital multi- for the SPS. Going forward, we need United States and some other coun-
channel phased arrays while delivering to make extra efforts to safeguard that tries, which features negative ads and
comparable estimation performance. quality and the climate that holds the disinformation.
The two-pronged goal of the SA TWG SPS to such high standards. Despite the many challenges, SPS
is to support theoretical and empirical Alongside the need to continually membership growth has been strong
techniques that underpin the estima- grow inclusivity, diversity, and inter- throughout 2023. In October 2023, the
tion of parameters of propagating waves connectedness in our membership and membership of SPS soared past our
through various media using SAs and in our scientific pursuits, we must con- goal of 20,000, reaching the highest
also identify novel applications for SAs tinually adapt and promote high ethical point in SPS history. This underscores
that are enabled by the precise measure- standards and guidelines with both our the enduring relevance and value that
ment and estimation of environmental technological innovations and within SPS has to offer.
parameters. The SA TWG, under the the SPS leadership. With my term as SPS president con-
leadership of Dr. Peter Vouras, is work- As innovation advances at an expo- cluding this year, I’m optimistic that the
ing on developing IEEE standards on nential rate, ethical concerns surround- SPS will strive to turn challenges into
SAs, establishing a shared repository for ing current and emerging technologies opportunities, so that we can grow and
data and algorithms, delivering webinars grow proportionally. It is crucial to con- diversify our membership, and provide
on the topic, organizing special issues front the impact of technology on pri- even more value to our members and
in journals, and providing challenges vacy, security, and the environment. The our communities.
and competitions that promote the urgency to cultivate researchers and en-
adoption of SAs in engineering school gineers with strong ethical foundations
curricula as well as job training for has never been greater; they may serve
graduating students. as a crucial line of defense in navigating
In an exciting development this these complex challenges.
year, the SA TWG, working with the Another challenge involves lead- Acknowledgment
IEEE Synthetic Aperture Standards ership. In an effort to energize the I would like to thank Theresa Argi-
Committee, will offer the inaugural members, our Society has embraced a ropoulos and Rich Baseil for their help
NIST–IEEE Conference on Compu- member-driven election for the presi- with this article.
tational Imaging Using Synthetic Ap- dent-elect. Yet the strength and agility
ertures. The conference will be held of this approach requires a continued
References
20–23 May 2024, at the scenic campus effort to increase membership diver- [1] “Mentoring Experiences for Underrepresented
of the National Institute of Standards sity by growing our global appeal. Young Researchers (ME-UYR) Program,” in IEEE
Signal Process. Soc., 2023. [Online]. Available: https://
and Technology, in Boulder, CO, USA. We should also put policies into place signalprocessingsociety.org/community-involvement/
to safeguard the election process and me-uyr-mentoring-experiences-underrepresented
-young-researchers-program
Ethical standards help prevent negative electioneering [2] “K-12 Outreach Initiatives,” in IEEE Signal
In my tenure as the SPS president, it has campaigns that increase internal divi- Process. Soc., May 2022. [Online]. Available: https://
signalprocessingsociety.org/community-involvement/
been amazing to experience firsthand siveness. If unchecked, electioneering k-12-outreach-initiatives
that people around the world have such can lead to the same behaviors ob-
high levels of respect and recognition served in the political climate of the SP
J
oseph Fourier’s methods (and their from the outset. This turned out not to Fourier theory, from heat
variants) are omnipresent in audio be the case, the whole project of Fouri- to sound
signal processing. However, it turns er being devoted to a different physical
out that the underlying ideas took some problem, namely, the theory of heat, Fourier
time to penetrate the field of sound anal- and to mathematical developments As mentioned in the preceding, the
ysis and that different paths were first attached to it. Whereas many attempts scientific work of Fourier culminated
followed in the period immediately fol- had been made before Fourier (by Ber- in his Théorie Analytique de la Chal-
lowing Fourier’s pioneering work, with noulli, d’Alembert, Euler, Lagrange, eur (Analytical Theory of Heat) [2] that
or without reference to him. This illus- and others) to solve the problem of was eventually published in its final
trates the interplay between mathematics vibrating strings and express solutions form in 1822, i.e., 11 years after hav-
and physics as well as the key role played by means of sine/cosine expansions, ing been first presented as a memoir
by instrumentation, with notable inven- Fourier himself seemed to have devel- to the French Academy of Sciences.
tions by outsiders to academia, such as oped almost no interest in applying Although its value was recognized at
Rudolph Koenig and Édouard-Léon his results in this direction. Indeed, that time by awarding Fourier a prize,
Scott de Martinville. while his 1822 treatise on the analyti- this contribution was received by the
cal theory of heat is more than 600 examiners (including Lagrange) with
Introduction pages long, there is only one sentence some reservations concerning rigor,
Fourier analysis, Fourier series, (Fast) evoking such a possibility: “If we apply raising convergence issues that were
Fourier transform. … Fourier has today those principles to the question of the eventually resolved in full generality
something of a common name. If his motion of vibrating strings, we shall by Dirichlet and others. Fourier’s semi-
presence is now ubiquitous in almost overcome the difficulties first encoun- nal work established, nevertheless, the
all fields of science and technol- tered in Daniel Bernoulli’s analysis.” foundations of modern harmonic analy-
ogy, the name of Fourier is especially It was only 20 years later that Fourier sis, a branch of mathematics that flour-
unavoidable for all those interested ideas entered explicitly the field of ished in the 19th and 20th centuries and
in the theory and practice of signal acoustics, thanks to Georg Simon Ohm proved to be of utmost importance in
processing. In particular, the meth- (most famous for his law of electrical numerous applications. Starting from a
ods he developed—and the attached conductivity, established in 1827). This problem in physics and considering that
fundamental concepts, such as that of was, however, not a fully shared recog- [2, p. 13] “the profound study of nature
spectral representation—are the cor- nition, and, between theory and experi- is the most fertile source of mathemati-
nerstone of audio signal processing ments, the following years witnessed a cal discovery,” Fourier is generally
(speech, music, and so on). This might number of developments aimed at ana- considered the creator of mathematical
suggest that they were developed in lyzing sounds, with or without a refer- physics [3]. Eager to solve physics prob-
connection with the idea of analyzing ence to Fourier. This is what this text is lems by giving solutions based on firm
and/or synthesizing sounds or at least about. In complement to the immedi- mathematical grounds, Fourier was also
that such an application was envisaged ate post-Fourier influences in acoustics deeply concerned with effective calcu-
discussed here, a comprehensive study lations, claiming explicitly that [2, p. 11]
of the (pre-Fourier) acoustics origins of “the [proposed] method does not leave
Digital Object Identifier 10.1109/MSP.2023.3297313
Date of current version: 3 November 2023 harmonic analysis can be found in [1]. anything vague and indefinite in its
FIRSTSOUNDS.ORG.
tion and the poor interest that his own
invention had received 20 years ear-
lier, Scott self-edited a long plea for his
rights and his vision of speech analysis FIGURE 7. Scott’s “waveform dictionary” [19].
just before his death the following year
[16]. Of course, one of the main reasons a number of annotated phonautograms (Figure 8), registered by Scott himself
why the phonograph attracted so much with the French Academy of Sci- on 9 April 1860, can now be heard [23],
more attention than the phonautograph ences, in 1861 [21]. These recordings the first recording of a human voice,
was that the former allowed for the were properly archived and preserved, 17 years before Edison.
replay of recorded sounds, which the yet forgotten until 2007, when David
latter did not. As Scott had said many Giovannoni—a historian specializing in Conclusions
times, reproducing sound was not part old recordings, who had learned of their The purpose of this text was not to dis-
of his program at all, his only goal being existence—had the idea of transforming cuss Fourier’s achievements per se: this
to decipher phonautograms. One can them into truly audible sounds. This was can be found in many textbooks, from
imagine that he would have found little made possible—within the First Sounds different perspectives (see, e.g., [24]
interest in regenerating real sounds from project [22] and thanks, in particular, for a classical introduction, [25] for a
his phonautograms, and yet this is what to Patrick Feaster—by getting high- more mathematically oriented treatise,
happened … in 2008. quality scans of the tracings and trans- or [26] for a modern treatment, includ-
forming them into digital files through ing recent variations). What was at stake
Hearing Scott modern signal processing techniques. was to see how Fourier’s ideas, which
Indeed, before deciding not to pursue This is how the folk song “Au Clair de today seem indissociable from sound
his project any further, Scott deposited la Lune” (“By the Light of the Moon”) analysis, were not immediately adopted
References
[1] O. Darrigol, “The acoustics origins of harmonic
analysis,” Arch. Hist. Exact Sci., vol. 61, no. 4, pp.
343–424, Jul. 2007, doi: 10.1007/s00407-007-0003-9.
[2] J. Fourier, Théorie Analytique de la Chaleur. Paris,
France: Firmin Didot, 1822. [Online]. Available: https://
g a l l i c a . b n f . f r /a r k : / 1 214 8 / b p t 6 k 10 4 55 0 8 v.
FIRSTSOUNDS.ORG.
texteImage
[3] J. Dhombres and J.-B. Robert, Fourier, Créateur
(b) De la Physique Mathématique. Paris, France: Belin,
1998.
FIGURE 8. Scott’s phonautogram of the folk song “Au Clair de la Lune” [19]. (a) The complete [4] G. S. Ohm, “Über die Definition des Tones, nebst
daran geknüpfter Theorie der Sirene und ähnlicher
recording and (b) an enlargement showing (on three successive revolutions of the drum) plots of tonbildender Vorrichtungen,” Ann. Phys. Chem.,
the recorded voice and the reference oscillation given by the tuning fork. vol. 135, no. 8, pp. 513–565, 1843, doi: 10.1002/andp.
18431350802.
[5] R. S. Turner, “The Ohm-Seebeck dispute,
in this context and that parallel path- networks when one wants to go beyond Hermann von Helmholtz, and the origins of physio-
logical acoustics,” Brit. J. Hist. Sci., vol. 10, no. 1, pp.
ways, based on different approaches, a black box. 1–24, Mar. 1977, doi: 10.1017/S0007087400015089.
have been followed. It is striking to We choose to close the piece of his- [6] M. J. Kromhout, “The unmusical ear: Georg
Simon Ohm and the mathematical analysis of sound,”
observe that some of the options con- tory that has been outlined here in 1878, Isis, vol. 111, no. 3, pp. 471–492, Sep. 2020, doi:
sidered then are still relevant today. For when Edison opened a new chapter. 10.1086/710318.
instance, an approach à la Koenig is Many things happened in the follow- [7] H. von Helmholtz, “Über combinationstöne,”
Ann. Phys. Chem., vol. 175, no. 12, pp. 497–540,
based, in a first step, on the extraction ing years, with progressive and more 1856, doi: 10.1002/andp.18561751202.
of some features (in his case, the Fou- and more pervasive use of Fourier tech- [8] L. de Broglie, Certitudes et Incertitudes de la
rier modes, even if not named as such) niques in many domains. In the same Science. Paris, France: Albin Michel, 1966.
upon which the analysis is performed in year, 1878, Lord Kelvin constructed [9] G. Rilling and P. Flandrin, “One or two frequen-
cies? The empirical mode decomposition answers,”
a second step, whereas an approach à la his harmonic analyzer [27] that proved IEEE Trans. Signal Process., vol. 56, no. 1, pp.
Scott bypasses such a preprocessing and instrumental during decades for analyz- 85–95, Jan. 2008, doi: 10.1109/TSP.2007.906771.
[10] R. Koenig, Catalogue des Appareils d’Acoustique
relies directly on the raw data, as can be ing and predicting tides. Other mechan- construits par Rudolph Koenig. Paris, France: Chez
the case in modern end-to-end recogni- ical [28], electromechanical [14], and, l’auteur, 1889. [Online]. Available: https://sound
a nd sc ienc e.d e /t ext /cat a log ue - d e s- a p p a r e i l s
tion systems. Another important issue later, electronical [29] systems followed, -dacoustique-construits-par-rudolph-koenig
is the quest for interpretability when eventually giving Fourier the full credit [11] R. Koenig, Quelques Expériences d’acoustique.
confronting experimental results with he deserves in the information era, but Paris, France: Chez l’auteur, 1882. [Online]. Available:
https://gallica.bnf.fr/ark:/12148/bpt6k5688601m.
formal descriptions, i.e., physics with this is another story. texteImage
mathematics (this was at the heart of [12] D. Pantalony, “Rudolf Koenig’s workshop of
the Ohm–Seebeck dispute). Yet, under Author sound: Instruments, theories, and the debate over
combination tones,” Ann. Sci., vol. 62, no. 1, pp.
different forms extended to algorithms Patrick Flandrin (flandrin@ens-lyon.fr) 57–82, 2005, doi: 10.1080/00033790410001712183.
and computational issues, such a ques- received his Ph.D. degree from INP
tion of understanding is today of para- Grenoble, France, in 1982. He is cur-
mount importance, e.g., in deep neural rently a CNRS emeritus research direc- (continued on page 88)
Polynomial Eigenvalue
Decomposition for
Multichannel Broadband
Signal Processing
A mathematical technique offering new insights and solutions
©SHUTTERSTOCK.COM/MARISHA
T
his article is devoted to the polynomial eigenvalue decom- health-care monitoring, astronomy and seismic surveillance,
position (PEVD) and its applications in broadband multi- and military technologies, including radar, sonar, and commu-
channel signal processing, motivated by the optimum nications [3]. The success of these applications often depends
solutions provided by the EVD for the narrowband case on the performance of signal processing tasks, including data
[1], [2]. In general, we would like to extend the utility of the compression [4], source localization [5], channel coding [6],
EVD to also address broadband problems. Multichannel broad- signal enhancement [7], beamforming [8], and source separa-
band signals arise at the core of many essential commercial tion [9]. In most cases and for narrowband signals, performing
applications, such as telecommunications, speech processing, an EVD is the key to the signal processing algorithm.
Therefore, this article aims to introduce the PEVD as a novel
mathematical technique suitable for many broadband signal
Digital Object Identifier 10.1109/MSP.2023.3269200
Date of current version: 3 November 2023 processing applications.
Algebra of Functions
We are interested in matrices whose entries are more gen- lar, if f (z) and g (z) are analytic on a domain D 1 C,
eral than complex numbers. Specifically, we are interested then f (z) + g (z) is analytic; that is, it can be expressed as
in entries that are analytic functions: matrices whose a locally convergent power series for any z ! D.
entries are analytic functions rather than real and complex Similarly, f (z) - g (z) and f (z) $ g (z) are analytic. Things
numbers, and the algebraic manipulation of these matrices become a little more complicated when we consider quo-
may, at first, seem a little exotic, but many operations for tients of the form f (z) /g (z), but the result is analytic every-
real and complex numbers carry over to this setting. where except at zeros of g (z), as might be expected. This
There are several different classes of functions, depend- “closure” is important since it means that as we manipu-
ing on what properties they have. For example, there are late analytic functions, we do not need to worry if the
discontinuous functions, continuous but nondifferentiable result is also analytic. Note, as well, that if the product
functions, functions that are continuous and differentiable f (z) $ g (z) / 0 on D, then f (z) / 0 on D or g (z) / 0
up to a certain order, and functions that are continuous on D.
and differentiable for all orders. The class of analytic func- If we now restrict our attention to polynomials in z, which
tions, by definition, has locally convergent power series. are analytic everywhere, and Laurent polynomials, which
Consequently, the functions are infinitely differentiable and are analytic everywhere except z = 0, then we can say
easier to work with than other types of functions. These something more. Indeed, if f (z) and g (z) are (Laurent)
series might have a finite number of terms, but in general, polynomials, then f (z) + g (z), f (z) - g (z), and f (z) $ g (z)
there are infinitely many. The truncation of these series are not just analytic but are also (Laurent) polynomials.
results in polynomial approximations of the underlying Now, however, we must exercise some care when consid-
analytic function. ering quotients f (z) /g (z) since the result will be analytic in
Analytic functions can be algebraic or transcendental. An D [except at the zeros of g (z) ] but not be a (Laurent)
algebraic function f (x) is a function that is a root of a poly- polynomial in general. However, f (z) /g (z), and, indeed,
nomial equation. More specifically, f is algebraic if it satis- any analytic function, can be arbitrarily well approximated
fies p (x, f (x)) = 0 for some irreducible polynomial p (x, y) by polynomials, as discussed in the preceding.
with coefficients in some field. Examples of algebraic func- Let us now consider matrices R(z) whose entries are
tions include rational functions and nth roots of polynomials. analytic functions in D 1 C. We start by noting that for
Note that the inverse function of an algebraic function (if it any fixed z 0 ! C, the matrix R (z 0) is simply a matrix of
exists) is also algebraic. An analytic function that is not alge- complex numbers that can be manipulated in the usual
braic is called a transcendental function. Examples include ways. For example, we can multiply R (z 0) by another
e x, sin (x), and cos (x) . Such functions have power series (conformable) matrix or vector, and we can compute the
representations with an infinite number of terms. eigenvalue decomposition (EVD) of R (z 0) . When we
Let us first consider analytic functions on their own. A instead allow z to vary, it is still possible to form, say,
function f (z) is (complex) analytic in a domain (an open matrix–matrix and matrix–vector products with R (z) .
set) D 1 C if at each point it can be written as a locally Indeed, using the arguments in the previous paragraphs, if
convergent Taylor series. (Note that this means that such a R (z) has analytic (polynomial) entries, then the resulting
function is infinitely differentiable.) The set D is known as matrix or vector will also have analytic (polynomial)
the domain of analyticity of f (z). We note that two differ- entries. However, it is not immediately obvious that we can
ent analytic functions f (z) and g (z) may have different write down a single z -dependent EVD of R (z) that holds
domains of analyticity, say, D f and D g . When we oper- for all values of z ! D. That this is true in certain circum-
ate on these functions, we assume that D f and D g over- stances is proved in a remarkable result from Rellich [S1].
lap, i.e., that they have a nontrivial intersection, and restrict
f (z) and g (z) to this common domain D = D f + D g . Reference
[S1] F. Rellich, “Störungstheorie der Spektralzerlegung. I. Mitteilung.
Then, we can perform certain fundamental operations on Analytische Störung der isolierten Punkteigenwerte eines beschränkten
Operators,” Mathematische Annalen, vol. 113, pp. DC–DCXIX, 1937, doi:
analytic functions, and the result will also be an analytic 10.1007/BF01571652. [Online]. Available: https://eudml.org/
function with the same domain of analyticity D. In particu- doc/159886
FIGURE 4. Each PEVD iteration involves the following four steps. (a) The polynomial matrix is first searched for the maximum off diagonal across all
lags. (b) The second delay step brings the largest element to the principal z 0-plane. (c) The third is the zeroing step, which transfers energy from the
off-diagonal elements to the diagonal. (d) The final trimming step discards negligibly small coefficients in the outer matrix slices.
U P (z) U (z) = U (z) U P (z) = I, (13) a predefined threshold, a delay polynomial matrix is applied to
bring the element to the principal z 0-plane, as shown in Fig-
where I is the identity matrix. The similarity transform U (z) ure 4. A unitary matrix, designed to zero out two elements on
may be calculated via an iterative algorithm, such as the the zero-lag plane, is applied to the entire polynomial matrix.
SBR2, and sequential matrix diagonalization (SMD) [46]. Note that applying one elementary paraunitary transformation
Here, a sequence of elementary paraunitary transformations may make some previously small off-diagonal elements larger,
G i (z) (i = 1, f) are applied to R (z) until the polynomial but overall, the algorithm converges to a diagonal matrix. As
matrix becomes approximately diagonal; i.e., starting from observed in Figure 4, the delay step can increase the polynomi-
Ru 0 (z) = R (z), the expression al order and make it unnecessarily large. Therefore, a trimming
u i (z) = G Pi (z) R
u i - 1 (z) G i (z) (14) procedure [21] is used to control the growth of the polynomial
R
order by discarding negligibly small coefficients in the outer
u N I (z) is approximately diagonal for some
is iterated until R planes, e.g., z -4 and z 4 in Figure 4. Furthermore, the similarity
N I . An elementary paraunitary transformation takes the form transformations in (12) affect a pair of dominant elements so
of the product of a unitary transformation and a polynomial that the search space can be halved due to the preservation of
delay matrix, diag{1, f, 1, z n, 1, f, 1} . symmetry. The algorithm terminates when the magnitudes of
Figure 4 gives the steps involved during every iteration of all off-diagonal elements fall below the preset threshold and
the SBR2. At each iteration, the algorithm searches for the when a user-defined maximum number of iterations is reached.
off-diagonal element with the largest magnitude across all This has led to a family of time-domain algorithms based on
z-planes, as marked in red in Figure 4. If the magnitude exceeds the SBR2 [21] and SMD [46]. The computational complexity
of these numerical algorithms is at least O (M 3 T) due to matrix
multiplication applied to every lag [47]. The additional complex-
5 ity incurred over the EVD approach is essential for the temporal
γ SBR2,m(e jΩ)
3π /8
using the SMD algorithm to process the matrix used to generate
π /4 Figure 3. In Figure 5, the eigenvalue in blue is always greater
" π /8 than the one in red. In contrast, the (analytic) eigenvalues in Fig-
0 ure 3 intersect and are not spectrally majorized. As described in
0 π /4 π /2 3π /4 π 5π /4 3π /2 7π /2 2π Figure 5(b), forcing a spectrally majorized solution for the eigen-
Normalized Angular Frequency Ω values leads to the eigenvectors having discontinuities that are
(b) difficult to approximate with polynomials. To get an accurate
m=1 m=2
result, high-order polynomials are required. This, in turn, has
consequences for the implementation cost of any signal process-
FIGURE 5. The results of using SMD to decompose the matrix in Figure 3: ing based on the output of these algorithms. Note, however, that
(a) eigenvalues on the unit circle and (b) Hermitian angles of the corre- spectral majorization can be advantageous in some situations;
sponding analytic eigenvectors, measured against a reference vector [23]. see the “PEVD-Based Subband Coding” section.
[52] aims to extract maximally smooth eigenvalues and eigenvec- shifts that each sensor experiences with respect to the ,th
tors, which can target the extraction of the analytic solution. source. If at least two sensors satisfy the spatial sampling the-
orem, and for a particular frequency X , = X 0, this steering
vector is unique with respect to the DOA of the ,th source.
We want to process the array data x [n] ! C M by a vector
Key Statement of filters w [n] %s w (z), with w P(z) = [w 1 (z), f, w M (z)] and
w m (z) :V w m [n] is the filter processing the mth sensor signal
Approximating analytic functions by polynomials
such that the array output y [n] is the sum of the filtering oper-
allows the development of PEVD algorithms based on
ations, y [n] = R v w H [- o] x [n - o] = R m, v w m [o] x m [n - o] .
an elementary paraunitary operator. The resulting algo-
The definition of the filter vector w [n], with its time-reversed
rithms are guaranteed to produce polynomial parauni-
and conjugated weights, may seem cumbersome, but it follows
tary eigenvectors but tend to generate spectrally
similar conventions for complex-valued data [53] and will later
majorized eigenvalues. This property has benefits as
simplify the z-transform notation.
well as drawbacks.
Narrowband beamforming
In the narrowband case, the delay filters can be replaced
Example applications using PEVD by complex coefficients in a vector w H = [w 1, f, w M] ! C M
This section highlights three application cases that demon- that implement phase shifts. To generate a different gain
strate key examples where PEVD-based approaches can offer f, , , = 1, f, L with respect to each of the L sources, the
advantages over state-of-the-art processing. In the “PEVD- beamformer defined by w must satisfy the constraint equation
Based Adaptive Beamforming” section, we demonstrate how,
a 1H (e jX1) f1
> h H w = > h H, (15)
for adaptive beamforming, the computational complexity is
decoupled from the TDL length that otherwise determines the
a HL (e jX L)
8
cost of a broadband adaptive beamformer. The “PEVD-Based fL
1442443
C f
Subband Coding” section shows how, in subband coding, the
L#M L
PEVD can generate a system with optimized coding gain and where C ! C and f ! C are the constraint matrix and
helps to formulate optimum compaction filter banks that pre- associated gain vector for the L constraints. In the presence of
viously could be stated only for the two-channel case. Finally, spatially white noise, the minimum mean-square error
the “Polynomial Subspace Speech Enhancement” section (MMSE) solution is the quiescent beamformer w q = C @ f
addresses how the preservation of spectral coherence can pro- [53], where C @ is the pseudo-inverse of C. If the noise is spa-
vide perceptually superior results over DFT-based speech tially correlated, then the LCMV formulation
E" ; y [n];2 ,
enhancement algorithms.
min
w
s.t. Cw = f (16)
PEVD-based adaptive beamforming provides the MMSE solution, now constrained by (15).
To explore PEVD-based beamforming, we first recall some Solutions to (16) include, for example, the Capon beamform-
aspects of narrowband beamforming before defining a linearly er, w opt = (R [0]) -1 C H [C (R [0]) -1 C H] -1 f, and the general-
constrained minimum variance (LCMV) beamformer using ized sidelobe canceller (GSC). For the GSC, a “quiescent
both TDL- and PEVD-based formulations. We work with an beamformer” w q implements the constraints in C, and a “block-
arbitrary geometry of M sensors but, for simplicity, assume ing matrix” B is constructed such that CB = 0. When operating
free-space propagation and that the array is sufficiently far on the array data x [n], the blocking matrix output is free of any
field to neglect any loss in amplitude across its sensors. desired signal components protected by the constraints. All that
wq
where R (z) is the CSD matrix of x [n] . The evaluation of (18)
q (z) at a single frequency X 0 leads back to the narrowband formu-
lation via the substitution z = e jX0 .
x[n] + y [n]
– The solution to the broadband LCMV problem can
B wa be found as the equivalent of the Capon beamformer
B(z) a (z) w opt (z) = R -1 (z) C P(z) {C (z) R -1 (z) C P(z)} -1 f (z), which is a
direct extension of the narrowband formulation. To access this
(a) solution, the inversion of the para-Hermitian matrices R (z)
and, subsequently, C (z) R -1 (z) C P(z) can be accomplished via
vq
PEVDs [58]. Once factorized, the resulting paraunitary matri-
ces are straightforward to invert, and it remains to invert the
individual eigenvalues; for this, recall the comment on analytic
TDL
Gain/[dB]
tational cost in both the TDL- and PEVD-based approaches 0.5 –20
–25
is governed by the blocking matrix. In the TDL-based case, 0.4
–30
it requires T 2 (M 2 - ML) multiplications and additions, while 0.3
–35
the PEVD-based blocking matrix expends only J 2 (M 2 - ML) 0.2 –40
such operations. With typically J 2 . T, the PEVD-based real- 0.1 –45
ization is less expensive by a factor of approximately the length 0
–90 –40 –10 0 30 45 80 –90
of the TDL, T. Angle of Arrival ϕ /[°]
(a)
Numerical example
Gain in Look Direction/[dB]
0.15
A linear array with M = 8 elements spaced by half the wave-
0.1
length of the highest-frequency component has a look direc-
0.05
tion toward j = 30c, which is protected by a constraint. Three
0
“unknown” interferers with directions j = {- 40c, - 10c, 80c}
–0.05
are active over a frequency range of 0.2 # X/r # 0.9 at –0.1
-20-dB signal-to-interference-plus-noise and need to be adap- –0.15
tively suppressed. The data are further corrupted by spatially 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6
and temporally white additive noise 50 dB below the signal Normalized Angular Frequency Ω/π
levels of the interferers. A TDL-based GSC operates with a TDL Based PEVD Based TDL Constraints
TDL length of T = 175. For a PEVD-based GSC, the adap-
(b)
tive filter uses the same temporal dimension J 3 = T, but to
match the MSE performance of the TDL-based version, a FIGURE 7. The (a) directivity pattern for the adapted PEVD-based GSC
length of J 1 = 51 for the fractional delay filters in the quies- and (b) gain response in the look direction ({ = 30°) for the PEVD- and
cent beamformer and a temporal dimension of J 2 = 168 for TDL-based GSC.
Ensemble results
To demonstrate the wider benefit of the proposed subband
coder design, a randomly generated ensemble of 100 moving
average processes of order 14 [MA(14)] produces signals x [o]
Magnitude |Hm(e )|
jΩ
π /4
/8
π /2
1
/8
3π
/4
/8
5π
π
3π
7π
15
Normalized Angular Frequency Ω
PSD/[dB]
10
5
FIGURE 9. The magnitude responses ; H m (e jX) ;, m = 1, f, 4 of the
0 M = 4-channel filter bank equivalent to the polyphase analysis matrix
–5 Q (z), with the theoretical PCFB of infinite order shown in gray.
0 π /4 π /2 3π /4 π 5π /4 3π /2 7π /4 2π
Normalized Angular Frequency Ω
(a)
15 1
10log10 |λ m(e )|
10
Normalized Coding Gain
jΩ
0.95
Ensemble-Averaged
"
5
0.9
0
0.85
SBR2C
–5 SMD
0 π /4 π /2 3π /4 π 5π /4 3π /2 7π /4 2π 0.8 AEVD
Normalized Angular Frequency Ω KLT
FIGURE 8. The (a) PSD of input x [o] and (b) eigenvalues extracted by the FIGURE 10. The averaged normalized coding gain in its dependence on
SMD algorithm for the subband coding problem; the M = 4-times-folded the length (order: plus one) of Q (z) for an ensemble of random MA(14)
PSD of the input signal is shown in gray. processes and the case of demultiplexing with M = 4.
the direct path propagation of the speech signal from the speak-
er to the microphone as well as reverberation due to multipath where U (n) %–: U (z) ! C M # M. Since the polynomial eigen-
reflections from objects and walls in enclosed rooms. vector matrix U (z) is constructed from a series of delay
Background noise is then added to each microphone. The signal and unitary matrices, each vector u m [n] has a filter-and-
model in the “Signal Model” section, with , = 1, can describe sum structure.
this situation. Across M microphones, the signal vector For a single source, the signal subspace has a dimen-
x [n] ! C M is used to compute the space-time covariance sion of one. Therefore, the enhanced signal can be extracted
matrix R x [x] in (6) and its z transform R x (z) in (7). from the first channel of the processed outputs y [n]. The
Exploiting the reverberation model in [62], the early reflec- enhanced output y 1 [n], associated with the signal subspace,
tions in the AIR represent closely spaced distinct echoes that includes mainly speech components originally distributed
perceptually reinforce the direct path component and may over all microphones but now summed coherently. In con-
improve speech intelligibility in certain conditions. On the trast, the noise subspace is dominated by ambient noise and
other hand, the late reflections in the AIR consist of randomly the late reverberation in the acoustic channels. The orthogo-
distributed small amplitude components, and the associated nality between subspaces is a result of strong decorrelation,
late reverberant signal components are commonly assumed to expressed as R y (z) = K (z), where R y (z) :-% R y [x] is com-
be mutually uncorrelated with the direct path and early signal puted from R y [x] = E {y [n] y H [n - x]}.
components [28], [62]. Thus, (7) can be written as In practice, assuming quasi stationarity, the speech signals
are processed frame by frame such that R x [x] in (6) can be
R x (z) = au (z) au P (z) rs (z) + R l (z) + R v (z) (19) recursively estimated. Additionally, the two-sided z transform
= R su (z) + R vu (z), (20) of R x [x] in (7) can be approximated by some truncation win-
dow W, which determines the extent of the supported temporal
where rs (z) :–% rs [x] and rs [x] is the autocorrelation sequence correlation of the speech signal. The time-domain PEVD algo-
of the source. The space-time covariance matrices of the late rithms, such as the SBR2 and SMD, are used to compute (21)
reverberation R l (z) are modeled as a spatially diffuse field. because they preserve spectral coherence of the speech signals
8 8
7 7
6 6
Frequency (kHz)
Frequency (kHz)
5 5
4 4
3 3
2 2
1 1
0 0
1 2 3 4 5 1 2 3 4 5
Time (s) Time (s)
(a) (b)
8 8
7 7
6 6
Frequency (kHz)
Frequency (kHz)
5 5
4 4
3 3
2 2
1 1
0 0
1 2 3 4 5 1 2 3 4 5
Time (s) Time (s)
(c) (d)
FIGURE 11. Spectrograms, with corresponding time-domain signals, for the processing of a noisy reverberant speech example in ACE lecture room 2 and
5-dB babble noise. Dotted cyan boxes highlight noise- and reverberation-suppressed regions as a result of processing. Solid white boxes highlight regions
where speech structures are lost using the COLSUB but not PEVD processing. Listening examples are available in [28]. The (a) clean speech signal s [n],
(b) noisy speech signal x 1 [n], (c) COLSUB-enhanced signal y 1 [n], and (d) PEVD-enhanced signal y 1 [n].
5 5
∆NSRR (dB)
0
0
∆STOI
0
–5
–5 –0.1 –10
–15
–10
–0.2 –20
–10 –5 0 5 10 20 –10 –5 0 5 10 20 –10 –5 0 5 10 20
SNR (dB) SNR (dB) SNR (dB)
(a) (b) (c)
FIGURE 12. A comparison of speech enhancement performance for recorded AIR and babble noise in ACE lecture room 2, with a 1.22-s reverberation
time: the (a) TFwSegSNR (higher is better), (b) TSTOI (higher is better), and (c) TNSRR (higher is better).
[29] J. Corr, J. Pestana, S. Weiss, I. K. Proudler, S. Redif, and M. Moonen, [55] W. Liu and S. Weiss, Wideband Beamforming: Concepts and Techniques.
“Investigation of a polynomial matrix generalised EVD for multi-channel Wiener fil- Hoboken, NJ, USA: Wiley, 2010.
tering,” in Proc. Asilomar Conf. Signals, Syst. Comput., 2016, pp. 1354–1358, doi: [56] R. G. Lorenz and S. P. Boyd, “Robust minimum variance beamforming,” IEEE
10.1109/ACSSC.2016.7869596. Trans. Signal Process., vol. 53, no. 5, pp. 1684–1696, May 2005, doi: 10.1109/
[30] S. Redif, S. Weiss, and J. G. McWhirter, “Relevance of polynomial matrix TSP.2005.845436.
decompositions to broadband blind signal separation,” Signal Process., vol. 134, pp. [57] K. M. Buckley, “Broad-band beamforming and the generalized sidelobe cancel-
76–86, May 2017, doi: 10.1016/j.sigpro.2016.11.019. ler,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-34, pp. 1322–1323,
[31] S. Weiss, N. J. Goddard, S. Somasundaram, I. K. Proudler, and P. A. Naylor, Oct. 1986, doi: 10.1109/TASSP.1986.1164927.
“Identification of broadband source-array responses from sensor second order statis- [58] S. Weiss, A. P. Millar, and R. W. Stewart, “Inversion of parahermitian matri-
tics,” in Proc. Sens. Signal Process. Defence Conf. (SSPD), 2017, pp. 1–5, doi: ces,” in Proc. Eur. Signal Process. Conf. (EUSIPCO), Aug. 2010, pp. 447–451.
10.1109/SSPD.2017.8233237. [59] P. P. Vaidyanathan, “Theory of optimal orthonormal subband coders,” IEEE
[32] W. Coventry, C. Clemente, and J. Soraghan, “Enhancing polynomial MUSIC Trans. Signal Process., vol. 46, no. 6, pp. 1528–1543, Jun. 1998, doi:
algorithm for coherent broadband sources through spatial smoothing,” in Proc. Eur. 10.1109/78.678466.
Signal Process. Conf. (EUSIPCO), 2017, pp. 2448–2452. [60] B. Xuan and R. Bamberger, “FIR principal component filter banks,” IEEE
[33] A. Oppenheim and R. W. Schafer, Digital Signal Processing, 2nd ed. Trans. Signal Process., vol. 46, no. 4, pp. 930 –940, Apr. 1998, doi:
Englewood Cliffs, NJ, USA: Prentice-Hall, 1993. 10.1109/78.668547.
[34] B. Girod, R. Rabebstein, and A. Stenger, Signals and Systems. New York, NY, [61] A. Kirac and P. P. Vaidyanathan, “Theory and design of optimum FIR compac-
USA: Wiley, 2001. tion filters,” IEEE Trans. Signal Process., vol. 46, no. 4, pp. 903–919, Apr. 1998,
[35] A. Papoulis, Probability, Random Variables, and Stochastic Processes, 3rd doi: 10.1109/78.668545.
ed. New York, NY, USA: McGraw-Hill, 1991. [62] P. A. Naylor and N. D. Gaubitch, Eds., Speech Dereverberation. London,
[36] I. Gohberg, P. Lancaster, and L. Rodamn, Matrix Polynomials, 2nd ed. U.K.: Springer-Verlag, 2010.
Philadelphia, PA, USA: SIAM, 2009. [63] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L.
[37] V. Kučera, Analysis and Design of Discrete Linear Control Systems. Dahlgren, and V. Zue, “TIMIT acoustic-phonetic continuous speech corpus,”
Englewood Cliffs, NJ, USA: Prentice-Hall, 1991. Linguistic Data Consortium, Philadelphia, PA, USA, Corpus LDC93S1,
1993.
[38] R. E. Crochiere and L. R. Rabiner, Multirate Digital Signal Processing.
Englewood Cliffs, NJ, USA: Prentice-Hall, 1983. [64] J. Eaton, N. D. Gaubitch, A. H. Moore, and P. A. Naylor, “Estimation of room
acoustic parameters: The ACE challenge,” IEEE/ACM Trans. Audio, Speech,
[39] M. Davis, “Factoring the spectral matrix,” IEEE Trans. Autom. Control, vol. 8, Language Process., vol. 24, no. 10, pp. 1681–1693, Oct. 2016, doi: 10.1109/
no. 4, pp. 296–305, Oct. 1963, doi: 10.1109/TAC.1963.1105614. TASLP.2016.2577502.
[40] “The polynomial toolbox.” Polyx. Accessed: Apr. 30, 2023. [Online]. [65] F. Jabloun and B. Champagne, “A multi-microphone signal subspace approach
Available: http://www.polyx.com for speech enhancement,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.
[41] J. J. Shynk, “Frequency-domain and multirate adaptive filtering,” IEEE Signal (ICASSP), 2001, pp. 205–208, doi: 10.1109/ICASSP.2001.940803.
Process. Mag., vol. 9, no. 1, pp. 14–37, Jan. 1992, doi: 10.1109/79.109205. [66] Y. Hu and P. C. Loizou, “A subspace approach for enhancing speech corrupted
[42] A. Gilloire and M. Vetterli, “Adaptive filtering in subbands with critical sam- by colored noise,” IEEE Signal Process. Lett., vol. 9, no. 7, pp. 204–206, Jul. 2002,
pling: Analysis, experiments, and application to acoustic echo cancellation,” IEEE doi: 10.1109/LSP.2002.801721.
Trans. Signal Process., vol. 40, no. 8, pp. 1862–1875, Aug. 1992, doi: [67] S. Doclo and M. Moonen, “GSVD-based optimal filtering for single and multi-
10.1109/78.149989. microphone speech enhancement,” IEEE Trans. Signal Process., vol. 50, no. 9, pp.
[43] F. Rellich, Perturbation Theory of Eigenvalue Problems. New York, NY, 2230–2244, Sep. 2002, doi: 10.1109/TSP.2002.801937.
USA: Gordon & Breach, 1969. [68] T. Nakatani and K. Kinoshita, “A unified convolutional beamformer for simul-
[44] T. Kato, Perturbation Theory for Linear Operators. Singapore: Springer, 1980. taneous denoising and dereverberation,” IEEE Signal Process. Lett., vol. 26, no. 6,
pp. 903–907, Jun. 2019, doi: 10.1109/LSP.2019.2911179.
[45] G. Barbarino and V. Noferini, “On the Rellich eigendecomposition of para-Her-
mitian matrices and the sign characteristics of *-palindromic matrix polynomials,” [69] S. Weiss, J. Corr, K. Thompson, J. G. McWhirter, and I. K. Proudler.
Linear Algebra Appl., vol. 672, pp. 1–27, Sep. 2023, doi: 10.1016/j.laa.2023.04.022. “Polynomial EVD toolbox.” Polynomial Eigenvalue Decomposition. Accessed:
Apr. 30, 2023. [Online]. Available: http://pevd-toolbox.eee.strath.ac.uk
[46] S. Redif, S. Weiss, and J. G. McWhirter, “Sequential matrix diagonalisation
algorithms for polynomial EVD of para-Hermitian matrices,” IEEE Trans. Signal
Process., vol. 63, no. 1, pp. 81–89, Jan. 2015, doi: 10.1109/TSP.2014.2367460. SP
A Signal Processing
Interpretation of Noise-
Reduction Convolutional
Neural Networks
Exploring the mathematical formulation of encoding-decoding CNNs.
©SHUTTERSTOCK.COM/DABARTI CGI
E
ncoding-decoding convolutional neural networks (CNNs) striven to explain the internal operation of these CNNs. Still,
play a central role in data-driven noise reduction and can these ideas are either scattered and/or may require significant
be found within numerous deep learning algorithms. How- expertise to be accessible for a bigger audience. To open up
ever, the development of these CNN architectures is often this exciting field, this article builds intuition on the theory of
done in an ad hoc fashion and theoretical underpinnings for deep convolutional framelets (TDCFs) and explains diverse
important design choices are generally lacking. Up to now, encoding-decoding (ED) CNN architectures in a unified theo-
there have been different existing relevant works that have retical framework. By connecting basic principles from signal
processing to the field of deep learning, this self-contained
material offers significant guidance for designing robust and
Digital Object Identifier 10.1109/MSP.2023.3300100
Date of current version: 3 November 2023 efficient novel CNN architectures.
L r=0 P
Table 1. Relevant symbols used in this article.
Here, the symbol defines the convolution between two matri-
)
Symbol Meaning ces (images).
f(2 .) ($) Downsampling operation In this article, images that are 2D arrays (matrices) are often
f(2 -) ($) Upsampling operation convolved with 4D tensors. When this operation is performed,
I Convolution identity images are considered to have dimensions of (1 # 1 # N V # N H).
K Encoding convolution kernel In addition, in this article, matrix I is the identity signal for the
Ku Decoding convolution kernel convolution operator, which, for a 2D image, is the Kronecker
W Filters for the forward discrete wavelet transform delta/discrete impulse (an image with a single nonzero pixel
W u Filters for the inverse discrete wavelet transform with unity amplitude at the center of the image). Furthermore,
WH High-pass filters of the forward discrete wavelet transform we indicate that variables in the decoding path of a CNN are
WL Low-pass filter of the forward discrete wavelet transform u , bu ).
distinguished with a tilde (e.g., K
x Noiseless image
Additional symbols that are used throughout the article are
y Noisy image
the down and upsampling operations by a factor s, which are
h Additive noise
b Bias vector
denoted by f(s .) ($) and f(s -) ($) for downsampling and upsam-
t Threshold level pling, respectively. In this article, both operations are defined in
) Image convolution the same way as in multirate filter banks. For example, consider
Kx Tensor convolution between tensor K and signal x the signal
($)R Transpose of a tensor
x = ^1, 2, 3, 4, 5, 6, 7, 8, 9, 10 h.(4)
($) + ReLU activation
x ($) ($) Generic thresholding/shrinkage operation If we apply the downsampling operator to x by a factor of two,
C ($) ($) Generic clipping operation it results in
z = f(2 .) ( x) = ^1, 3, 5, 7, 9h (5)
Tensor Convolution Sum where z is the downsampled version of x. Conversely, the result
of applying the upsample operator f(2 -) (·) gives the result
∗ +
not always contain down-/upsampling layers, in which case, the y= / ( u n v nR) $ v [n](15)
n=0
decimation factor s is unity, which causes f(1-) (x) = f(1.) (x) = x
for any matrix x. Furthermore, it should also be noted that we in which N SV is the number of singular values, n is a scalar in-
assume that the number of channels of the code C N is always dex, while u n and v n are the nth left and right singular vectors,
larger than the previous one: C N - 1 . Furthermore, it should be respectively. Furthermore, vector v contains the singular values,
noted that a single encoder layer E n (·) and its corresponding and each of its entries v [n] is the weight assigned to every basis
decoder layer D n (·) can be considered a single-layer encoder- pair u n, v n . This means that the product ( u n v Rn ) contributes more
decoder network/pair. to the image content for higher values of v [n]. It is customary
For this article, the encoding convolution filter for a given for the singular values to be ranked in descending order and for
layer K has dimensions of (N o # N i # N h # N v), where N i and the amplitudes of the singular values v to be sparse, therefore,
N o are the number of input and output channels for a convolution v [0] & v [N SV - 1]. The reason for this sparsity is because im-
layer, respectively. Similarly, N h and N v are the number of ele- age (patches) intrinsically have high correlation. For example,
ments in the horizontal and vertical directions, respectively. Note many images contain repetitive patterns (e.g., a wall with bricks,
that the encoder increases the number of channels of the signal a fence, rooftop tiles, or a zebra’s stripes) or uniform regions (for
(e.g., N o 2 N i), akin to Ranzato’s design [22]. Furthermore, it is example, the sky or a person’s skin). This means that an image
assumed that the decoder is symmetric in the number of channels patch may contain only a few linearly independent vectors that
to the encoder, therefore, the dimensions of the decoding con- describe most of the image’s content. Consequently, a higher
volution kernel K u R are (N i # N o # N h # N v). The motivation of weight is assigned to such image bases.
this symmetry is to emphasize the similarity between the signal Given that the amplitudes of the singular values of y in SVD are
processing and the CNN elements. sparse, it is possible approximate yt with only a few bases: ( u n v Rn ).
Note that this procedure reduces the rank of signal y, and hence it
Signal processing fundamentals is known as low-rank approximation. This process is equivalent to
As shown by Ye et al. [12], within encoding-decoding CNNs,
N LR - 1
the signal is treated akin to well-known sparse representations, yt = / ( u n v nR) $ v [n](16)
where the coefficients used for the transformation are directly n=0
learned from the training data. Prior to addressing this impor-
tant concept in more detail, relevant supporting concepts such as where N SV 2 N LR . Note that this effectively cancels the product
sparsity, sparse transformations, and nonlinear signal estimation ( u n v Rn ), where the weight given by v [n] is low. Alternatively, it
in the wavelet domain are explained. is possible to assign a weight of zero to the product ( u n v Rn ) for
n $ N LR .
Sparsity The low-rank representation of a matrix is desirable for
A sparse image is a signal where most of the coefficients are small diverse applications, among which we can find image denois-
and the relatively few large coefficients capture most of the infor- ing. The motivation for using low-rank approximation for this
mation [23]. This characteristic allows to discard low-amplitude application results from the fact that, as mentioned earlier, natu-
components with relatively small perceptual changes. Hereby, the ral images are considered low rank due to the strong spatial cor-
use of sparse signals is attractive for applications such as image relation between pixels, whereas noise is high rank (it is spatially
compression, denoising, and suppression of artifacts. uncorrelated). As a consequence, reducing the rank/number of
Despite the convenient characteristics of sparse signals, natu- singular values decreases the presence of noise while still provid-
ral images are often nonsparse. Still, there are numerous transfor- ing a good approximation of the noise-free signal, as exemplified
mations that allow for mapping the signal to a sparse domain and in Figure 3.
that are analogous to the internal operations of CNNs. For exam-
ple, SVD factorizes the image in terms of two sets of orthogonal Framelets
bases, of which few basis pairs contain most of the energy of Just as with SVD, framelets are also commonly used for image
the image. An alternative transformation is based on framelets, processing. In a nutshell, a framelet transform is a signal rep-
where an image is decomposed in a multichannel representation, resentation that factorizes/decomposes an arbitrary signal into
whereby each resulting channel contains a fragment of the Fou- multiple bands/channels. Each of these channels contains a seg-
rier spectrum. In the following sections, we address all of these ment of the energy of the original signal. In image and signal
representations in more detail. processing, the framelet bands are the result of convolving the
analyzed signal with a group of discrete filters that have finite
Sparse signal representations length/support. In this article, the most important characteristic
that the filters of the framelet transform should comply with is
SVD and low-rank approximation that the bands they generate capture all the energy contained on
Assume that an image (patch) is represented by a matrix y with the input to the decomposition. This is important to avoid the loss
dimensions of (N r # N c), where N r and N c are the number of of information of the decomposed signal. In this text, we refer to
rows and columns, respectively. Then, the SVD factorizes y as framelets that comply with the previous characteristics as tight
y y
Wy f (2↓)(Wy ) f (2↓) A f (2↓)(Wy )B W f (2↑) A f (2↓)(Wy )B
W W
F{wLL ∗ y} Ff (2↓){wLL ∗ y} F{ f (2↑) A f (2↓) (wLL ∗ y)B } F{ wLL ∗ f (2↑) A f (2↓) (wLL ∗ y)B }
F{wLH ∗ y} Ff (2↓){wLH ∗ y} F{ f (2↑) A f (2↓) (wLH ∗ y)B } F{ wLH ∗ f (2↑) A f (2↓) (wLH ∗ y)B }
Input F{ W f (2↑) A f (2↓) (Wy )B }
F{y} y
November 2023
|
F{wHH ∗ y} Ff (2↓){wHH ∗ y} F{ f (2↑) A f (2↓) (wHH∗ y)B } F{ wHH ∗ f (2↑) A f (2↓) (wHH ∗ y)B }
FIGURE 4. 2D spectrum analysis of the decimated discrete framelet decomposition and reconstruction. In the figure, function F " $ , stands for the amplitude Fourier spectrum of the input argument. The yellow
squares indicate a region in the low-frequency area of the Fourier spectrum, while the orange, purple and blue squares indicate the high-pass/detail bands. For these images, ideal orthogonal bases are assumed.
Note that the forward transform is composed by two steps. First, the signal is convolved with the wavelet basis ^ Wy h . Afterward, downsampling is applied to the signal ^ f(2 .) (Wy) h . During the inverse transfor-
u
mation, the signal is upsampled by inserting zeros between each sample ^ f(2 -) (f(2 .) (Wy)) h, which causes spatial aliasing (dashed blocks). Finally, the spatial aliasing is removed by the inverse transform filter W
and all the channels are added ^ W u R f(2 -) (f(2 .) (Wy)) h .
For reference and further understanding, Figure 6 portrays
P (d ) ? exp c - m, (26)
;d ;
vd the elements composing the noise model of (22), signal-transfer
where v d is the dispersion measure of the Laplace distribution. characteristics of the ReLU, soft-shrinkage and clipping func-
For reference, Figure 5 portrays an example of both a Gaussian tions, and the effect that these functions have on the signal of the
and a Laplacian PDF. Note that the Laplacian distribution has a observed noisy detail band z.
higher probability of zero elements occurring than the Gaussian
distribution for the same standard deviation. Finally, substituting ReLU
(25) and (26) in (24) results in If (27) is solved for d while constraining the estimator to be posi-
tive, the noiseless estimate dt becomes
(z - d ) 2 ; d ;
ln (P (d ; z)) ? - - .(27)
2v 2h vd dt = (z - t) + (28)
1
Amplitude
−1
0 5 10 0 5 10 0 5 10
Sample Index
(a)
Soft (·)
(· − t)+ t(t) C(t)(·)
0.4
Output Amplitude
0.2
−0.2
−0.4
1
Amplitude
−1
0 5 10 0 5 10 0 5 10
Sample Index
(c)
FIGURE 6. Signals involved in the additive noise model, input/output transfer characteristics of activation layers and estimates produced by the activation
layers when applied to the noise-contaminated signal. (a) The signals involved in the additive noise model. (b) The output amplitude of activation func-
tions with respect to the input amplitude. (c) Finally, the application of the activation functions to the noisy observation z.
Here, x Soft
(t) ( $ ) denotes the soft-shrinkage/-thresholding function, where the semihard clipping function with the DoG C DoG
($) ( $ ) is
which is often also written in the form given by
DoG DoG
Transfer t(t) (·) Transfer C(t) (·)
1
0.5
Output Amplitude
−0.5
−1
−1 −0.75 −0.5 −0.25 0 0.25 0.5 0.75 1 −1 −0.75 −0.5 −0.25 0 0.25 0.5 0.75 1
Gar Gar
Transfer t(t) (·) Transfer C(t) (·)
1
0.5
Output Amplitude
−0.5
−1
−1 −0.75 −0.5 −0.25 0 0.25 0.5 0.75 1 −1 −0.75 −0.5 −0.25 0 0.25 0.5 0.75 1
FIGURE 7. Transfer characteristics of the semihard thresholds based on the difference of Gaussians and of the garrote shrink as well as their clipping
counterparts. Note that in contrast with the soft-shrinkage and clipping functions shown in Figure 6, the semihard thresholds tend to unity for large
values, while the semihard clipping functions tend to zero for large signal intensities.
A visual example of low-rank approximation in ReLU CNNs represents the threshold value.
is shown in Figure 8, illustrating the operation of an idealized In addition to the soft-shrinkage function, note that the
single-layer encoding-decoding ReLU CNN operating in both clipping function described by (34) also can be expressed by
a residual and nonresidual way. It can be noted ReLU activation (42) if K = ^^I - I I - IhhR, K u R = ^^I - I - I Ihh, and
suppresses specific channels in the sparse decomposition pro- b = ^0 0 - t - t hR . It can be noted that representing the clip-
vided by the encoder, thereby preserving the low-rank structures ping function in convolutional form requires four-times-more
in the nonresidual network. Alternatively, in the residual exam- channels than the original input signal.
ple, the ReLUs eliminate the feature maps with high activation, It should be noted that the ability of ReLUs to approximate
which results in a noise estimate that is subtracted from the input other signals has also been observed by Daubechies et al. [29],
to estimate the noiseless signal. who have proven that deep ReLU CNNs are universal function
approximators. In addition, Ye and Sung [13] have demonstrated
Shrinkage and clipping-based CNNs that the ReLU function is the main source of the high-approxi-
As in ReLU networks, the encoder of shrinkage networks [17], mation power of CNNs.
[18], [19] separates the input signal in a multichannel representa-
tion. As a second processing stage, shrinkage networks estimate Additional links between encoding-decoding CNNs and
the noiseless encoded signal by canceling the low-amplitude pix- existing signal processing techniques
els in the feature maps in a process akin to the MAP estimate Up to now, it has been assumed that operation of the encoding
of the “Soft Shrinkage/Thresholding” section. As final step, the and decoding convolution filters is limited to mapping the input
encoder reconstructs the estimated noiseless image. Note that image to a multichannel representation and to reconstructing it
the use of shrinkage functions reduces the number of channels (i.e., K and K u R comply with K u R (K) + = I $ c) . Still, it is pos-
required by ReLU counterparts to achieve perfect signal recon- sible that, in addition to performing decomposition and synthesis
struction because the shrinkage activation preserves positive and tasks, the encoding-decoding structure also filters/colors the sig-
negative values, while ReLUs preserve only the positive part of nal in a way that improves image estimates. It should be noted
the signal. that this implies that the perfect reconstruction encoding-decod-
As shown in the “Signal Model and Noise-Reduction Con- ing structure is no longer preserved. For example, consider the
figurations” section, in residual learning, a given encoding- following linear encoding-decoding structure
decoding network estimates the noise signal h so that it can be u R (Ky) (43)
xt = K
subtracted from the noisy observation y to generate the noiseless
estimate xt . As shown in the “Soft Clipping” section, in the fra- which can be reduced to
melet domain, this is achieved by preserving the low-amplitude xt = k ) y.(44)
–wLL
+wLL
–wLH
–wHL
+wLH
+wHL
–wHH
+wHH
–wHL
0
Bias
+wHH
–wHH –2
1 2 3 4 5 6 7 8
Channel Index
Residual Network
Encoder Decoder
|
–wLL ∗ y ReLU Decoding η
y Encoding (–wLL ∗ y + b[1])+ X
Convolution +wLH ∗ y Activation Convolution
(+wLH ∗ y + b[2])+
−
–wLH ∗ y (–wLH ∗ y + b[3])+
November 2023
+wHL ∗ y (+wHL ∗ y + b[4])+
|
–wHL ∗ y (–wHL ∗ y + b[5])+ K
+wLL +
–wLL +wHH ∗ y (+wHH ∗ y + b[6])+
+wLH
–wLH –wHH ∗ y (–wHH ∗ y + b[7])+
b
K +wHL 2
–wLL
+wLL
–wLH
–wHL
+wLH
+wHL
–wHH
+wHH
–wHL
+wHH 0
–wHH
Bias b [n]
–2
0 1 2 3 4 5 6 7
Channel Index n
FIGURE 8. Operation of a simplified denoising (non)residual ReLU CNN according to the TDCFs. In the figure, the noisy observation y is composed by two vertical bars plus uncorrelated Gaussian noise. Further-
more, for this example, the encoding and decoding convolution filters (K and K u , respectively) are the Haar basis of the 2D DWT and its phase-inverted counterparts. Given the content of the image, the image
in the decomposed domain Ky produces only a weak activation for the vertical and diagonal filters (w LH and w HH, respectively), and those feature maps contain mainly noise. In the case of the nonresidual
network, the ReLUs and biases suppress the channels with low activation [see the ^ Ky + b h+, column], which is akin to the low-rank approximation. In contrast, in the residual example, the channels with image
content are suppressed while preserving the uncorrelated noise. Finally, the decoding section reconstructs the noise-free estimate xu for the nonresidual network or the noise estimate ht for the residual example,
where it is subtracted from y to compute the noiseless estimate xt .
49
50
Nonresidual Network
Encoder Decoder
Ky DoG DoG
τ (b) (Ky) K τ (b) (Ky)
Input
Shrinkage Decoding
y Encoding DoG X
wLL ∗ y Activation τ (b[0]) (wLL ∗ y) Convolution
Convolution
wLH ∗ y DoG
τ (b[1]) (wLH ∗ y)
wHL ∗ y DoG
τ (b[2]) (wHL ∗ y)
wHH ∗ y DoG
τ (b[3]) (wHH ∗ y)
b K
+wLL 0.5
K –wLH
0
Bias
+wHL
+wLL
–wHH
–wLH
+wHL
–wHH
–0.5
1 2 3 4 5 6 7
Channel Index
Residual Network
Encoder Decoder
|
(b) (Ky) K (b) (yK)
Input
y Encoding Clipping DoG Decoding η X
Convolution wLL ∗ y Activation (b[0]) (wLL ∗ y)
Convolution
November 2023
wLH ∗ y DoG −
|
(b[1]) (wLH ∗ y)
wHL ∗ y DoG
(b[2]) (wHL ∗ y)
wHH ∗ y DoG
(b[3]) (wHH ∗ y)
b
0.5 K +
+wLL
K –wLH 0
Bias
+wHL
–wHH –0.5
+wLL
–wLH
+wHL
–wHH
1 2 3 4 5 6 7
Channel Index
FIGURE 9. Operation of denoising in shrinkage and clipping networks. In the nonresidual configuration, the noisy signal y is decomposed by a set of convolution filters, which, for this example, are the 2D Haar
basis functions of the DWT (Ky). As a second step, the semihard shrinkage produces an MAP estimate of the noiseless detail bands/feature maps ^ x DoG ( b ) (Ky) h . As third and final step, the decoder maps the
estimated noiseless encoded signal to the original image domain. In the residual network, the behavior is similar, but the activation layer is a clipping function that performs an MAP estimate of the noise in the
feature maps, which is reconstructed by the decoder to generate the noise estimate ht . After reconstruction, the noise estimate is subtracted from the noisy observation y to generate the noise-free estimate xu .
Here, k = K u R K is optimized to reduce the distance between y computational requirements needed to execute each network,
and the ground truth x. Consequently, the equivalent filter k can and its overall complexity.
be considered a Wiener filter. It should be noted that this article Signal reconstruction analysis provides a theoretical indica-
is not the first to address the potential Wiener-like behavior of a tion that a given CNN design can propagate any arbitrary signal
CNN. For example, Mohan et al. [14] suggested that by elimi- when considering the use of ideal filters (i.e., they provide perfect
nating the bias of the convolution layers, the CNN could behave reconstruction and are maximally sparse). In other words, for a
more akin to the Wiener filter and be able to generalize better to fixed network architecture, there exists a selection of parameters
unseen noise levels. It should be noted that by doing so, the CNN (weights and biases) that make the neural network equal to the
can also behave akin to the switching behavior described by the identity function. This result is important because a design that
TDCFs, which can be described by the equation cannot propagate arbitrary signals under ideal conditions will
potentially distort the signals that propagate through it by design.
(z) + = '
z, if z $ 0, Consequently, this cannot be fixed by training with large datasets
(45)
0 if z 1 0 and/or with the application of any special loss term. To better
understand the signal reconstruction analysis, we provide a brief
where z is a pixel that belongs to the signal z = k ) x. It can be example where it is a nonresidual CNN G ($), where we propa-
observed that in contrast with the low-rank behavior described gate a noiseless signal x contaminated with noise h so that
in the “TDCFs” section, in this case, the switching behavior is
only dependent on the correlation between signal x and filter k. x . G (x + h) .(46)
Consequently, if the value of z is positive, its value is preserved.
On the contrary, if the correlation between x and k is negative, Here, an ideal CNN allows us to propagate any x while cancel-
then the value of z is canceled. Consequently, the noise reduc- ing the noise component h, irrespective of the content of x. If we
tion becomes independent/invariant of the noise level. It can be switch our focus to an ideal residual CNN R ($), it is possible to
observed that this effect can be considered a nonlinear extension observe that
of signal annihilation filters [30].
It should be noted that aside from the low-rank approxima- xt . R (y) = y - G (y) .(47)
tion interpretation of ReLU-based CNNs, additional links to
other techniques can be derived. For example, the decomposi- Here, G ($) is the encoding-decoding section of the residual net-
tion and synthesis provided by the encoding-decoding structure work R ($) . Consequently, it is desirable that the network G ($) is
is also akin to the nonnegative matrix factorization [31], in which able to propagate the noise h, while suppressing the noiseless
a signal is factorized as a weighted sum of positive bases. In this signal x, which is equivalent to
conception, feature maps are the bases, which are constrained
to be positive by the ReLU function. Furthermore, an additional h . G (x + h) . (48)
interpretation of encoding-decoding CNNs can be obtained by
analyzing them from a low-dimensional manifold representa- It should be noted that in both residual and nonresidual cases,
tion perspective [8]. Here, the convolution layers of CNNs are there are two behaviors. On one hand, there is a signal that the
interpreted as two operations. On one hand they can provide network decomposes and reconstructs (almost) perfectly, and on
a Hankel representation, and on the other they can provide a the other a signal is suppressed. Signal reconstruction analysis
bottleneck that reduces dimensionality of the manifold of image focuses on the signals that the network can propagate or recon-
patches. It should be noted that the Hankel-like structure attrib- struct, rather than the signal cancelation behavior. Consequently,
uted to the convolution layers of CNNs has also been noted by we focus on the linear part of G ($) (i.e., its convolution structure),
the TDCFs [12]. Two final connections with signal process- of which, according to the “TDCFs” section, we assume that it
ing and CNNs are the variational formulation combined with handles decomposition and reconstruction of the signal within
kernel-based methods [15] and the convolutional sparse coding the CNN. It should be noted that the idealized model assumed
interpretation of CNNs [16]. here is only considered for analysis purposes as practical imple-
mentations do not guarantee that this exact behavior is factually
Mathematical analysis of relevant designs obtained. For more information, see the “Additional Links Be-
To demonstrate an application of the principles summarized in tween Encoding-Decoding CNNs and Existing Signal Process-
the “Signal Processing Fundamentals” and “Bridging the Gap ing Techniques” section and “Fitting Low-Rank Approximation
Between Signal Processing and CNNS: Deep Convolutional in Rectified Linear Unit Convolutional Neural Networks” and
Framelets and Shrinkage-Based CNNs” sections, this section an- “Network Depth.”
alyzes relevant designs of ReLUs and shrinkage CNNs. The anal- To test the perfect reconstruction in nonresidual CNNs, we
yses focus on three main aspects: 1) overall descriptions of the propose the following procedure. 1) We assume an idealized
network architecture, 2) the signal reconstruction characteristics model G ($), where its convolution filters, K n and K u n, comply
provided by the convolutional layers of the encoder and decoder with the phase-complementary tight-framelet condition, and
subnetworks, and 3) the number operations O ($) executed by the where the biases and nonlinearities suppress low-amplitude
trainable parts of the network, as this will give insight into the (and negative for ReLU activations) samples from the feature
y= | (u n
v Rn ) $ v [n] . sented to the network during training.
n=0
Multilayer designs
Given that left and right singular vector pairs u n v Rn gener- It should be noted that CNNs contain multiple layers,
ate an image D [n], then (15) can be rewritten as which recursively decompose/reconstruct the signal. This
N SV - 1 may pose an advantage with respect to conventional low-
y= | D [n] $ v [n] (S1) rank approximation algorithms for a few reasons. First, the
n=0
data-driven nature of CNNs allows us to learn the basis
where tensor D = ^ ( u 0 v R0 ) f ( u N - 1 v RN - 1) hR contains the
SV SV
functions, which optimally decompose and suppress noise
products of the left and right singular vectors and has in the signal. Second, as networks are deep, the incoming
dimensions of (N SV # 1 # M # N ) . Furthermore, the equa- signal is recursively decomposed and sparsified. This multi-
tion can be further reformulated to decomposition scheme is very similar to the designs used
in noise-reduction algorithms based on framelets. It can be
u R D (S2)
y=K noted that, in the past, recursive sparsifying principles
in which K u R = ^ ( v [0]) f ( v [N SV - 1]) h, where the brackets have been observed in methods such as the (learned) itera-
of the (1 # 1) filters have been excluded for simplicity. In tive soft-thresholding algorithm [27], [28] as well as convo-
addition, it is now assumed that it is desirable to perform a lutional sparse coding. In fact, the convolutional
low-rank approximation of signal y based on the reformu- sparse-coding approach has been used for interpreting the
lation of (43). If we assume that D ! R $N 0 , then the low- operation of CNNs [16].
rank approximation can be expressed by What about practical implementations?
When training a CNN, the parameters of the model (i.e.,
u R (D + b ) + (S3)
yt = K
K, K u R , and b ) are updated to reduce the loss between
in which the values b are set to zero for the channels of the processed noisy signal and the ground truth, which
D that have high contributions to the image content. does not warranty that the numerical values of the convolu-
Conversely, the channels of D [n] with less perceptual rele- tion filters and biases of the trained model comply with the
vance are then canceled by assigning large negative val- assumptions performed here. This is because CNNs do
ues to the corresponding entries of b. As a final not have mechanisms to enforce that filters have properties
reformulation, we can assume that the basis images D such as sparsity or perfect reconstruction and negative val-
are the result of decomposing the input image y with a set ues for the biases. Consequently, CNNs may not necessar-
of convolution filters, i.e., D = Ky ; this transforms (44) into ily perform a low-rank approximation of the signal,
although the mathematical formulation of the low-rank
u R (Ky + b ) + . (S4)
xt = K approximation and the single-layer encoding decoding
are similar. Hence, the analysis presented here should be
Here, it is visible that (S4) is analogous to the encoding- treated as insight into the mathematical formulation and/or
decoding architecture defined in (10)–(14), and the encoder potential properties that can be enforced for specific appli-
and decoder filters are akin to the framelet formulation pre- cations, and not as a literal description of what trained
sented in the “Framelets” section. Note that (S4) assumes models do.
Network Depth
It should be noted that one of the key elements of convolu- Summary
tional neural networks (CNNs) is their network depth, which In conclusion, the mathematical formulation of deep net-
we address in this section. To illustrate the effect of network works is analogous to a recursive data-driven low-rank
depth, assume an arbitrary N-layer encoding-decoding approximation, where the input to the successive encod-
CNN, in which the encoding layers are defined by ing-decoding pairs is the low-rank approximated encoded
signal generated by the encoder of the previous level. Still,
E 0 = (K 0 y + b 0 ) + , as mentioned in “Fitting Low-Rank Approximation in
E 1 = ( K 1 E 0 + b 1) + , Rectified Linear Unit Convolutional Neural Networks,” low-
E 2 = (K 2 E 1 + b 2 ) + , (S5)
rank approximation algorithms and CNNs are similar in
h
terms of mathematical formulation, but we cannot ensure
E N - 1 = ( K N - 1 E N - 2 + b N - 1) +
that the values obtained during training for the encoding
E n = (K n E n - 1 + b n) + . (S6) and decoding filters and their biases have the properties
needed to ensure that a CNN is an exact recursive data-
Here, E n represents the encoded signal at the nth decom- driven low-rank approximation. For example, it is possible
position level, while K n, b n are the convolution weights that the filters of the encoder and decoder do not recon-
and biases for the nth encoding layer, respectively. As struct the signal perfectly because this may not be neces-
addressed in the “ReLU” and “TDCFs” sections, the role of sary to reduce the loss function used to optimize the
the rectified linear unit activations is to enforce sparsity network.
and nonnegativity, which can be interpreted as the pro-
cess of suppressing noninformative bases in the low-rank Is it possible to impose a tighter relationship between
approximation algorithm. Consequently, every encoded low-rank approximation and CNNs?
signal E n is an encoded sparsified version of the signal In specific applications where signal preservation and
E n - 1 . To recover the signal, we apply the decoder part of interpretability is required (e.g., medical imaging), it is
the CNN, given by desirable that the operation of CNNs is closer to the low-
rank approximation description. To achieve this, the
E u RN - 1 E N - 1 + bu N - 1) +,
u N - 1 = (K CNNs embedded in frameworks such as the convolution-
h al analysis operator [S1] and Fast Iterative Soft
Eu 1 = (Ku R2 Eu 2 + bu 2) +, (S7) Thresholding Algorithm Network [S2] explicitly train the
E 0 = (K 1 E 1 + bu 1) +,
u u R u
filters K n and K u n to have properties such as perfect sig-
xu = (Ku 0R Eu 0 + bu 0) + nal reconstruction and sparsity. By enforcing these char-
Eu n - 1 = (K u n + bu n) + . (S8)
u nR E acteristics, the mathematical descriptions of low-rank
behavior and of CNNs are more similar and the models
Here, xt is the low-rank estimate/denoised version of the become inherently more interpretable and predictable in
input signal y, while E u Rn , bu n are the decoded signal
u n, K their operation.
components at the nth composition level and the decod-
References
er convolution weights and biases for the nth layer, [S1] I. Y. Chun, Z. Huang, H. Lim, and J. Fessler, “Momentum-net: Fast and
respectively. In (S8), every decoded signal E u n is the convergent iterative neural network for inverse problems,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 45, no. 4, pp. 4915–4931, Apr. 2023,
low-rank estimate of the encoded layer E (n - 1) . It should doi: 10.1109/TPAMI.2020.3012955.
be noted that the activation of each of the decoder lay- [S2] J. Xiang, Y. Dong, and Y. Yang, “FISTA-net: Learning a fast iterative
shrinkage thresholding network for inverse problems in imaging,” IEEE
ers ($ + bu n) + can further enforce sparsity on the low-rank Trans. Med. Imag., vol. 40, no. 5, pp. 1329–1339, May 2021, doi:
estimates E u (n - 1) . 10.1109/TMI.2021.3054167.
K 0 b0 b1 K0
Q Q
R1(Q) Rdnc
Renc K 1 b1 K 0 (Q)
0 (y) 0 R (y)
LWFSN
Encoding Shrinkage Decoding
y Inverse DWT x
Forward DWT
K0 K0
+
De WH WH
cim t0
ate
dP
ath WL WL L (y)
FIGURE 10. Simplified structure of encoding-decoding ReLU CNNs. The displayed networks are the U-Net/filtered backprojection network the encoder-
decoder residual CNN (RED) and finally, the learned wavelet-frame shrinkage network (LWFSN). Note that for all the designs, the encoding-decoding
structures are indicated by dashed blocks. It should be kept in mind that the drawings are simplified, they do not contain normalization layers, are shal-
low, commonly appearing dual convolutions are drawn as one layer.
whereas in the filtered backprojection network, U ($) is used in a Here, it can be noted that if K 0 is a phase-complementary tight
residual configuration, which is equivalent to framelet, then P {U d} (y) approximates a low-pass version of y,
or equivalently,
xt = y - U (y) .(50)
P {U d} (y) . W L y $ c 1 (61)
If we now switch our focus to the encoding-decoding structure of
the U-Net U (y), it can be shown that it is described by where W L is a low-pass filter. Finally, substituting (65) and (69)
in (63) results in
U (y) = U u (y) + U d (y) (51)
P {U} (y) . (I $ c 0 + W L $ c 1) y.(62)
where U u (y) corresponds to the undecimated path and is defined
by This result proves that the design of the U-Net cannot evenly re-
construct all the frequency of y unless c 1 = 0, in which case, the
u u0)R (K 0 y + b ) + (52)
U u (y) = (K whole low-frequency branch of the network is ignored. Note that
-0
this limitation is inherent to its design and cannot be circumvent-
while the decimated path is ed by training with large datasets and/or with any loss function.
u LR f(2 -) ^(K
u 0d)R W
U d (y) = (K u 1R (K 1 Z + b ) + + bu ) +h .(53) U-Net—number of operations
-1 -1
It can be noted that encoding filter K 0 convolves x at its
Here, signal Z is defined by original resolution and maps it to a tensor with C 0 channels.
Therefore, the number of operations O ($) for kernel K 0 is
Z = f(2 .) (W L (K 0 y + b- 0) +) .(54) O (K 0) = C 0 $ N r $ N c $ N 2f floating-point operations (FLOPs).
Conversely, due to the symmetry between encoder and d ecoder
Note that the decimated path contains two nested encoding- filters, O (K u u0) = O (Ku d0) = O (K 0) . Furthermore, for this de-
decoding architectures, as observed by Jin et al. [21], who has sign, filter K 1 processes the signal encoded by K 0, which
acknowledged that the nested filtering structure is akin to the is downsampled by a factor of one half, and maps it from C 0
(learned) iterative soft-thresholding algorithm [27], [28]. to C 1 channels. This results in the estimated operation cost
O (K 1) = O (K u 1) = C 0 $ C 1 $ N r $ N c $ N 2f $ (2)-2 [FLOPs]. Fi-
U-Net—signal reconstruction analysis nally, adding the contributions of filters K 0, K u u0, K
u d0, K 1, and
To prove whether the U-Net can perfectly reconstruct any signal, Ku 1 results in
we assume that the biases are equal to zero; on this condition, the
network P {U} (y) is defined by O (U ) = (3 + 2 -1 $ C 1) $ C 0 $ N r $ N c $ N 2f [FLOPS] .(63)
Residual encoder-decoder CNN—overview of the design Just as with R 1 ($), it is assumed that the convolution kernels are
The residual encoder-decoder CNN shown in Figure 10 consists tight framelets. Therefore, (73) becomes
of nested single-layer residual encoding-decoding networks. For
example, in the network showcased in Figure 10, we can see that P " R 0 , (y) = y.(74)
network R 1 ($) is nested into R 0 ($) . Furthermore, for this case,
the image estimate is given by Consequently, R 0 ($) and R 1 ($) can reconstruct any arbitrary sig-
nal under complementary-phase tight-frame assumptions.
xt = (y + R 0 (y) + bu 0) + (64)
Residual encoder-decoder CNN—number of operations
in which R 0 ($) is the outer residual network and bu 0 is the bias for In this case, all the convolution layers operate at the original
the output layer. Note that the ReLU placed at the output layer resolution of image x. Therefore, the number of operations O ($)
intrinsically assumes that the estimated signal xt is positive. for kernel K 0 and K u 0 is O (K 0) = O (K
u 0) = C 0 $ N r $ N c $ N 2f
From (64), the output of the subnetwork R 0 ($) is defined by [FLOPs], while K 1 and K u 1 require O (K 1) = O (K u 1) =
2
C 0 $ C 1 $ N r $ N c $ N f [FLOPs]. By adding the contributions of
Z = R dec t
0 (Q) . (65) both encoding-decoding pairs, the total operations for the residu-
al encoder-decoder becomes
Here, the decoder R dec
0 ($) is defined by
O (R) = 2 $ (1 + C 1) $ C 0 $ N r $ N c $ N 2f [FLOPS] .(75)
R dec t
0 (Q) = K t .(66)
u R0 Q
Residual encoder-decoder CNN—concluding remarks
t is the noiseless estimate of the intermediate signal Q
In (66), Q The residual encoder-decoder network consists of a set of nested
and is defined by single-resolution residual encoding-decoding CNNs. The single-
resolution design increases its computation cost with respect to
t = (Q + R 1 (Q) + bu ) + (67)
Q multiresolution designs, such as the U-Net. In addition, it should
1
be noted that the use of an ReLU as the output layer of the en-
where the network R 1 ($) is coder-decoder residual network forces the signal estimates to
be positive, but this is not always convenient. For example, in
u 1 (K 1 Q + b ) + .(68)
R 1 (Q) = K
R
computerized tomography imaging, it is common that images
1
contain positive and negative values.
Furthermore, Q represents the signal encoded by R 0 ($), or equiv-
alently, LWFSN
Q= R enc
0 (y) (69)
LWFSN—description of the architecture
where R enc
0 ($) is defined by The LWFSN network is a multiresolution architecture in which
the DWT is used for down-/upsampling and also as part of the
R 0enc (y) = K 0 y.(70) decomposition where shrinkage is applied. In this CNN, the
noiseless estimates are produced by
Residual encoder-decoder CNN—signal
xt = L (y) (76)
reconstruction analysis
As mentioned earlier, the residual encoder-decoder CNN is com- where L ($) represents the encoding-decoding structure of the
posed by nested residual blocks, which are independently ana- LWFSN, and the encoding-decoding network L ($) is
lyzed to study the reconstruction characteristics of this network.
First, block R 1 ($), is given by L (y) = L L (y) + L H (y) .(77)
P " R 1 , (Q) = K
u 1 (K 1 Q) + .(71)
R
Here, the high-frequency path is given by
which shows that the encoder and decoder R 1 ($) can approxi- L L ($) is
mately reconstruct any signal. Now, switching to R 0, it can be
observed that the linear part is L L ( y) = K u L f(2 -) ^ f(2 .) ^W L K 0 y hh .(79)
u 0W R R
P " L , (y) = K
u 0 K 0 y.(84) Properties of convolution kernels and low-rank approximation
R
Residual LWFSN
y + x
–
Encoding Clipping Decoding
FIGURE 11. The residual version of the LWFSN. It can be noticed that the low-frequency branch of the network is nulled. In deeper networks, it would
further decomposed and the nulling would be activated at the deepest level (lowest resolution).
Initial K2 = K̃ 2 Initial K2 = K̃ 2
1 1
0.75 0.75
0.5 0.5
0.25 0.25
0 0
–0.25 –0.25
–0.5 –0.5
–0.75 –0.75
–1 –1
(a) (b)
FIGURE 13. The phase-complementary tight-framelet test for the trained-toy network, initialized with random weights. (a) The product Ku R2 (K 2) +, where
the initialization of K 2 and K u 2 is different. It can be seen that the pair (K 2, K u 2) does not comply with the complementary-phase framelet criterion of
(95). This contrasts with (b), which displays the result of the product K u R2 (K 2) +, for the same CNN, but where the initial values of K
u 2 and K 2 are identi-
cal. For this initialization, the filters approximate the complementary-phase tight-framelet criterion.
Generalization
From the explanations in the “Nonlinear Signal Estimation in
the Framelet Domain” section, it can be noted that the bias/
threshold used in CNNs can modulate how much of the signal
is suppressed by the nonlinearities. In addition, the “Addition-
al Links Between Encoding-Decoding CNNs and Existing SNR = 15.33 (dB) SNR = 23.04 (dB)
Signal Processing Techniques” section established that there (a) (b)
are additional mechanisms for noise reduction within the Ground Truth
Init K n = K̃ n
CNN, such as the Wiener-like behavior observed by Mohan et
al. [14]. This raises the question of how robust conventional
CNNs are to noise-level changes, different from the level at
which the model has been trained. To perform such an experi-
ment, we trained two variants of the toy model. The first vari-
ant was inspired by the multiscale sparse coding network of
Mentl et al. [17], where the biases of each of the nonlinearities
(in this case, an ReLU) are multiplied by an estimate of the SNR = 23.09 (dB)
standard deviation of the noise. In the design of this example, (c) (d)
the noise estimate vt h, which, in accordance to Chang et al.,
Init K n = K̃ n, bn = 0 Init K n = K̃ n, bn = 0
[1] is defined by
Input, ση = 0.1 Input, ση = 0.15 Input, ση = 0.175 Input, ση = 0.2 Input, ση = 0.225
SNR = 15.31 (dB) SNR = 11.79 (dB) SNR = 10.45 (dB) SNR = 9.27 (dB) SNR = 8.25 (dB)
(a)
Original, ση = 0.1 Original, ση = 0.15 Original, ση = 0.175 Original, ση = 0.2 Original, ση = 0.225
SNR = 23.09 (dB) SNR = 21.24 (dB) SNR = 19.90 (dB) SNR = 18.49 (dB) SNR = 17.16 (dB)
(b)
Adaptive, ση = 0.1 Adaptive, ση = 0.15 Adaptive, ση = 0.175 Adaptive, ση = 0.2 Adaptive, ση = 0.225
SNR = 23.09 (dB) SNR = 21.10 (dB) SNR = 20.23 (dB) SNR = 19.45 (dB) SNR = 18.89 (dB)
(c)
Bias Free, ση = 0.1 Bias Free, ση = 0.15 Bias Free, ση = 0.175 Bias Free, ση = 0.2 Bias Free, ση = 0.225
SNR = 23.14 (dB) SNR = 21.57 (dB) SNR = 20.82 (dB) SNR = 20.04 (dB) SNR = 19.44 (dB)
(d)
FIGURE 15. A comparison of the baseline (original) toy model against its adaptive and bias-free variants. The models are evaluated in the cameraman
picture with increasing noise levels. (a) The noisy input. (b) The images processed with the original toy model. (c) Results of the adaptive toy model.
(d) Finally, results corresponding to the bias-free model. It can be observed that the performance original toy model degrades as the noise level in-
creases, while the performance-adaptive and bias-free models degrade less with increased noise levels, resulting in pictures with lower noise levels.
Table 2. Design elements and their impact on performance and computation cost.
Design Elements Expressivity Performance Number of Parameters Receptive Field Per Layer
Activation ReLU High Best High Not applicable (N/A)
— Shrinkage Low Good Medium N/A
— Clipping Low Good Medium N/A
Scale Single scale High Good High Big
— Multiscale High Good Medium/high Small
Topology Nonresidual High Good Higher N/A
— Residual High Best Lower N/A
D
esigning filters with perfect frequency tions. When filtering with a linear phase their recursive structures, IIR filters are
responses (i.e., flat passbands, sharp response, the intended signal merely unable to provide linear phase responses.
transition bands, highly suppressed experiences a constant group delay and In this article, we provide our solution to
stopbands, and linear phase responses) preserves its waveform. This is a vital the problem of perfect filtering with re-
is always the ultimate goal of any digi- feature for many applications, e.g., the duced complexity. Our solution is real-
tal signal processing (DSP) practitioner. denoising of electrocardiography (ECG) ized through cascading a prototype IIR
High-order finite impulse response (FIR) records [4] and seismologic signals [3]. filter with a few shaping APFs for an
filters may meet these requirements when Traditionally, high-performance fil- almost linear phase response over the
we put no constraint on implementation tering with linear phase responses is passband. The proposed composite fil-
complexity. In contrast to FIR filters, infi- achievable by high-order FIR filters. ter can be used to replace any ordinary
nite impulse response (IIR) filters, owing This, in turn, increases the system com- FIR filter with fixed filter coefficients.
to their recursive structures, provide an plexity (the number of adders and mul- We then approach perfect filtering with
efficient way for high-performance filter- tipliers), although there are techniques reduced complexity.
ing at reduced complexity. However, also to cut the system complexity in half by APFs have wide applications in the
due to their recursive structure, IIR filters folding the symmetric filter coefficients fields of DSP and communication [11],
inherently have nonlinear phase respons- [5]. The filter cascade technique can be [12], [13]. Ideally, an APF has a constant
es, and this does restrain their applicabil- used for designing filters with reduced magnitude response over the whole fre-
ity. In this article, we propose two tricks complexity, e.g., the composite filter in quency bands. The novelty of this ar-
regarding cascading a prototype IIR filter [6] and interpolated FIR filters [5], [7]. A ticle is that a cascade of some IIR filters
with a few shaping all-pass filters (APFs) comprehensive survey regarding the de- can produce a composite filter with an
for an almost linear phase response over sign techniques of FIR filters is presented almost linear phase response over the
its passband. After performing a delicate in [8]. Among these design techniques, filter passband. In particular, the filter-
design on the prototype and shaping fil- the works in [9] and [10] extend the idea ing performance of this composite filter
ters, we approach perfect filtering with of interpolated FIR filters by first design- can perform quite close to a high-order
reduced complexity. ing a bandpass filter and then modifying (thus, high-complexity) FIR filter. This
it by shaping the model filter by using composite filter provides a way to perfect
Preliminaries some masking filters. This technique is filtering by using limited complexity. We
Over the past decades, we have wit- an attractive candidate for obtaining a first introduce the design of a composite
nessed the power of DSP in various filter with a sharp transition band. The low-pass filter (LPF) through first de-
fields of applications, e.g., wireless results show supremacy over other de- signing a prototype filter that meets the
communication [1], [2], seismology [3], signs in terms of filtering performance design specifications for the magnitude
and biomedical sciences [4]. Digital fil- and implementation complexity. Howev- response. Then, the prototype filter is cas-
tering undoubtedly plays an important er, there still exists room for further im- caded with a few shaping APFs to rebuild
role in realizing these fancy applica- provement. In contrast to FIR filters, IIR an almost linear phase response over the
filters do achieve high-performance fil- filter passband. This idea is then extended
tering with low system complexity due to to designing a high-pass filter (HPF). The
Digital Object Identifier 10.1109/MSP.2023.3290772
Date of current version: 3 November 2023 their recursive structures. But also due to example filter shows that the intended
r = 0.9, θ = π
14
^~ 0 ^n - x ph ^~ 0 hhh. (6) 12
By definition, the group delay of 10
H ap (~) can be presented as 8
6
grd 6H ap ^~h@ = 1 - r2
2
1 + r - 2r cos (~ - i) 4
= 1 - r2 (7)
2
2
1 - re ji e -j~ 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Since 0 # r 1 1 for a stable APF, by Normalized Frequency
(7), we confirm that the group delay is FIGURE 1. The group delays of the shaping APFs for ^ r, i h = ^ 0.9, 0 h, ^ 0.9, 0.2r h, ^ 0.8, 0.3r h, and
always positive for all frequency bands. ^ 0.9, r h, respectively.
0.15r (radians/sample).
3) The passband peak-to-peak ripple is
+ +
less than 0.05 dB.
ωp ωp 4) The suppression in the stopband is
no less than 73 dB.
5) The RMS group delay error over
60, ~ p@ is less than 0.2; i.e.,
T GD # 0.2.
6) The total filter order number is no
ωp π greater than 20.
Frequency, ω 7) The number of shaping APFs is no
greater than three; i.e., M # 3.
FIGURE 3. Compensating the group delay of a prototype filter within ~ p by using a number of The prototype filter is for meet-
shaping APFs. ing the frequency magnitude response
60
60
0
This means we can bypass circuit-level
concerns, and it is a fair comparison. –0.5
–1
0 50 100 150 200 250
Filtering results Sample Number, n
Figure 9(a) shows an input sequence (a)
x (n) consisting of three narrow-band 25
pulses of sinusoids. The pulses are given 20
as follows: 15
|X(ω)|
where ~ 1 = 0.07r (radians/sample), FIGURE 9. The input sequence for verifying example filters: the (a) waveform of signal x [n] and
~ 2 = 0.02r (radians/sample), ~ 3 = (b) corresponding discrete-time Fourier transform magnitude | X (~)|.
0
–0.5 Distortion The procedures for designing a high-
–1 pass composite filter are similar to those
0 50 100 150 200 250
for a low-pass composite filter. Assume
Sample Number, n
(a)
that the filter specifications are
1) Stopband edge frequency ~ s =
1
0.82r (radians/sample).
0.5
2) Passband edge frequency ~ p =
y2(n)
0
0.88r (radians/sample).
–0.5
3) Maximum passband peak-to-peak
–1
0 50 100 150 200 250 ripple A p = 1 (dB).
Sample Number, n 4) Minimum stopband suppression
(b) A s = 70 (dB).
1 5) The RMS group delay error over the
0.5 passband 6~ p, r@ is less than 0.3.
y3(n)
0.2 subject to
0
–0.2 1 # M # 4 (23)
–0.4 0 # r1, frM # 1(24)
–0.6 0 # i 1, f, i M # r. (25)
–0.8
–1 For the two candidate prototype fil-
–1 –0.5 0 0.5 1
ters, specification 6 is met by the con-
Real Part straint (23).
The frequency magnitude responses
FIGURE 11. A pole-zero diagram of the composite filter. of the two candidate prototype filters
sign specifications.
50
We already obtained M = 4, (r1, r2,
r3, r4, i 1, i 2, i 3, i 4) = (0.8928, 0.9175, 40
0.9038, 0.8845, 3.0135, 2.822, 2.92, 30
3.1005). Figure 15 describes how the
20
group delay of the composite filter is
synthesized by the five component fil- 10
ters (one prototype filter and four shap- 0
ing filters). The in-band group delay
of the composite filter is around 66. –10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
We confirm that the composite HPF is Normalized Frequency
stable in any case. The component filter
parameters of the high-pass composite FIGURE 13. A comparison of the group delays for an equiripple FIR filter, a Butterworth IIR filter, and
filter are tabulated in Table 3. This com- a Chebyshev type 2 IIR filter.
pletes the design of a high-pass compos-
ite filter.
Table 4 provides a comparison of 6
Group Delay Error
66
of the composite filter. Our composite
Group Delay (Samples)
50 64
filter shows quite similar filtering per-
0.85 0.9 0.95 1 formance as the baseline FIR filter of
40 significantly higher complexity. The
Composite Filter composite filter provides a way to ap-
30 Prototype Filter
Shaping Filter 1
proach perfect filtering using limited
20 Shaping Filter 2 complexity and is especially useful for
Shaping Filter 3 replacing any ordinary FIR filter with
Shaping Filter 4
10 fixed filter coefficients. Further VLSI
implementations focusing on operating
0 speed, hardware cost, and so on can be
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency the next steps to further investigate the
benefit of the proposed composite filter.
FIGURE 15. The synthesis of the group delay of the composite filter.
Authors
Table 3. The component filter parameters of the high-pass composite filter. David Shiung (davids@cc.ncue.edu.
tw) received his Ph.D. degree from
Filter Type Prototype Shaping Shaping Shaping Shaping National Taiwan University, Taipei, in
Filter Parameter Filter Filter 1 Filter 2 Filter 3 Filter 4
2002. He is an associate professor at
Filter order 10 2 2 2 2 National Changhua University of
Passband peak-to-peak 1 0 0 0 0 Education, Changhua 500, Taiwan. His
ripple (dB) research interests include signal pro-
Stopband attenuation (dB) 70 0 0 0 0 cessing for wireless communication and
Passband edge frequency 0.88r N/A N/A N/A N/A astronomical imaging. He is a Member
(radians/sample) of IEEE.
Stopband edge frequency 0.82r N/A N/A N/A N/A Jeng-Ji Huang (hjj2005@ntnu.edu.
(radians/sample) tw) received his Ph.D. degree from
National Taiwan University, Taipei, in
2004. He is a professor at National
Table 4. A comparison of the complexity among four HPFs. Taiwan Normal University, Taipei 106,
Filter Type Composite Equiripple FIR, Generalized Equiripple Window Method Taiwan. His research interests include
Complexity HPF, M = 4 Order 82 FIR, Order 104 FIR, Order 146 5G, LoRaWAN, and vehicular ad hoc
Number of multipliers 36 42 53 74 networks. He is a Member of IEEE.
Number of adders 36 82 104 146 Ya-Yin Yang (ivyyang64@gmail.
com) received her Ph.D. degree in
T
his article introduces a simple formula modes of a sampled signal are the peaks that it is hard to reconcile with the fact
that provides the exact frequency of a of its DFT. We also learn that the accu- that the frequencies of a signal made
pure sinusoid from just two samples racy of these frequency values is limited of a sum of K complex exponentials
of its discrete-time Fourier transform by the inverse of the time range of the can be recovered exactly from as few
(DTFT). Even when the signal is not a signal (Heisenberg uncertainty), which as 2K samples by using a two-century-
pure sinusoid, this formula still works in a correlates nicely with the inherent fre- old method due to Gaspard de Prony.
very good approximation (optimally after quency resolution of a DFT: 2r/N, if This apparent contradiction is resolved
a single refinement), paving the way for N is the number of samples of the sig- by recognizing that Heisenberg uncer-
the high-resolution frequency tracking of nal. This knowledge is so deeply rooted tainty relies on a much weaker signal
quickly varying signals or simply improv-
ing the frequency resolution of the peaks
of a discrete Fourier transform (DFT).
frequency content of a signal is encrypt- • Uncertainty band: ~ 0 ! 6~ 1, ~ 2@, with ~ 2 - ~ 1 = integer # 2r/N
ed in its FT and that the main frequency • Discrete-time Fourier transform (DTFT): X (~) = R nN=-01 x n e -jn~
• Discrete Fourier transform (DFT): X (2rk/N ), where k = 0, 1, f, N - 1
Digital Object Identifier 10.1109/MSP.2023.3311592 • Peak of the DFT: k 0 = argmax k X (2rk/N ) A ~ DFT = 2rk 0 /N
Date of current version: 3 November 2023
0 ω1 ω2 The Trick
2π
The main motivation for this article is to
Uncertainty Band put forward a formula that provides the
Frequency ω
frequency value of the maximum of the
FIGURE 1. The frequency ~ 0 (blue) of a single-frequency signal is obtained from the DTFT DTFT from just two DTFT coefficients
values at the endpoints (red) of a frequency interval known to contain ~ 0 (assumption: of a single-frequency signal; see the de-
~ 2 - ~ 1 = integer # 2r/N ) . Dotted line: a full period of the DTFT X (~) of the signal. tailed setting in “Box 1: Notations.” Not
only is this formula exact, but it is also
very robust to the inaccuracies of the
model, as we shall see later. More pre-
Box 2: Step-by-Step Didactic Proof of (1) cisely, if we assume that the unknown
frequency ~ 0 of the signal x n lies inside
Steps
N-1
zN - 1 e jN(~ - ~) - 1 an “uncertainty band” 6~ 1, ~ 2@ where
|z
0
e -e
ji -ji this formula specifies how the ampli-
2. Euler’s formula: sin i = 2j leads to tudes of the DTFT of x n at ~ 1 and ~ 2
N-1 sin (N~/2) should be combined so as to recover ~ 0
X ( ~ 0 - ~) = a 0 e j 2 ~
sin (~/2) (see the visualization in Figure 1)
.
sin (N~/2) ~1 + ~2
then X (~ 0 - ~) = ; a 0 ; ~0 = + 2 arctan
sin (~/2) 2
]Z]
c tan ` ~ 2 - ~ 1 j m.
Z] ~ 12 = (~ 1 + ~ 2) /2 ]] X (~ 1) = ; a 0 ; sin (N (u + B)) X (~ 2) - X (~ 1)
]] ]] sin (u + B) 4 X (~ 2) + X (~ 1)
3. Notation: ][] B = (~ 2 - ~ 1) /4 leads to [] . (1)
]] ]] sin (N (u - B))
] u = (~ 0 - ~ 12) /2 ]] X (~ 2) = ; a 0 ;
\ ] sin (u - B)
\ A proof is provided in “Box 2: Step-
r ; X (~ 2) ; sin (u + B)
4. r- Periodicity of sin x : B = integer # 2N leads to ; X (~ ) ; = sin (u - B) . by-Step Didactic Proof of (1),” requiring
1
only elementary electrical and electron-
; X (~ 2) ; sin (B + u)
5. Sign of sin: !u # B leads to ; X (~ ) ; = sin (B - u) . ics engineering math knowledge. This
1
; X (~ 2) ; formula becomes even simpler when
6. Trigonometry: sin (a ! b) = sin a cos b ! cos a sin b leads to ; X (~ ) ; =
1 the uncertainty bandwidth is small (i.e.,
tan B + tan u
tan B - tan u . ~ 2 - ~ 1 % r)
; X (~ 2) ; - ; X (~ 1) ;
7. Algebraic resolution: tan u = tan B # ; X (~ ) ; + ; X (~ ) ; which leads to X (~ 1)
2 1
~0 . ~1 #
X (~ 1) + X (~ 2)
u = ~ 0 - ~ 12 = arctan c tan ` ~ 2 - ~ 1 j m.
; X (~ 2) ; - ; X (~ 1) ;
2 4 ; X (~ 2) ; + ; X (~ 1) ; X (~ 2)
+ ~2 #
X (~ 1) + X (~ 2)
FIGURE 2. A refined trick: a double application of (1) achieves a quality that is equivalent to the best unbiased single-frequency estimation algorithm.
which is what intuition would sug- the complex-valued samples b n are 1) The number of samples, N, is ran-
gest: a weighting of the end frequen- independent realizations of a Gauss- dom (uniform) between 10 and
cies based on the relative magnitude of ian random variable with variance 1,000.
2
their DTFT. v -the “additive white Gaussian 2) The frequency ~ 0 is random (uni-
In practice, this formula is most noise” assumption. When the number form) between -r and r.
useful when the uncertainty band is of samples N is large enough, a lin- 3) a 0 = 1.
smallest, i.e., ~ 2 - ~ 1 = 2r/N. Then, a earization of (1) makes it possible to 4) The standard deviation v of the noise
straightforward procedure for estimat- calculate (tediously; not shown here) is such that the SNR is random (uni-
ing a single frequency from N uniform the standard deviation T~ of the fre- form) between -5 dB and 40 dB.
signal samples is to quency estimation error. In particular, 5) The noise b n is drawn from an inde-
1) determine ~ DFT, the frequency of in the case where ~ 0 is at the center pendent identically distributed statis-
the peak of the signal DFT of the interval 6~ 1, ~ 2@, this error be- tic (Gaussian).
2) apply (1) with ~ 1 = ~ DFT - r/N haves according to In each test, the uncertainty band before
and ~ 2 = ~ DFT + r/N. refinement is set by ~ 1 = ~ DFT - r/N
This works because, for a single-fre- 2 r2 and ~ 2 = ~ DFT + r/N. For compari-
2
quency signal, the peak of the DTFT T~ = 4 = r T~ CR (2) son purposes, Figure 3 also shows the
is always within ! r/N of the maxi- N N SNR 4 6 distribution of errors of Jacobsen’s es-
mum of the DFT. But it would also be timator [1], [2], which is based on three
possible to bypass the computation of where SNR = ; a 0 ; /v and where T~ CR consecutive DFT coefficients around
the full DFT if a rough estimate of ~ 0 is the Cramér-Rao lower bound of ~ DFT . The better performance of our
were available, e.g., when the frequen- the problem (see [3] for a calcula- formula is likely due to the higher SNR
cy of the signal is continuously chang- tion). Obviously, T~ is very close enjoyed by the two DFT coefficients
ing (tracking between successive signal to T~ CR, within less than 1%. When around the maximum of the DTFT
windows, as in radar applications) or ~ 0 is closer to the extremities of the in comparison to the three DFT coef-
when the frequency is a priori known interval 6~ 1, ~ 2@, T~ deviates from ficients used by Jacobsen’s formula,
up to some perturbation (physical res- the Cramér-Rao lower bound by up to one of which has a significantly lower
onance experiments, laser-based opti- 80%, still a very low error in absolute SNR because it is further away from
cal measurements, etc.). terms. In fact, a simple refinement the DTFT peak by more than 2r/N. A
of the trick, as depicted in Figure 2, somewhat milder difference is also that
Robustness to Inaccuracies shows how to attain this lower bound, (1) assumes an exact single-frequency
When the single-exponential model outlining the near optimality of this model, whereas Jacobsen’s formula is
is not exact, (1) is still a very ro- procedure; no other unbiased single- but a local quadratic approximation.
bust estimator of its frequency. This frequency estimation algorithm would Beyond just noise, when the single-
is particularly so when the uncer- be able to improve this performance frequency model is rendered inaccu-
tainty bandwidth ~ 2 - ~ 1 is reduced by more than 1%. rate due to, e.g., quantization, sample
to 2r/N an assumption that we will This is confirmed in Figure 3 by windowing, or the addition of other sinu-
make from now on. Indeed, consider simulations that consist of 1 million soidal/polynomial terms, the frequency
a noise model x n = a 0 e jn~ 0 + b n where tests where estimation error of (1) is controlled by f,
0.3
ment. Assume that the frequencies ~ k
0.25 Jacobsen [1]
Trick of the signal are distant from each other
0.2 Refined Trick (modulo (-r, r]) by at least d~ 2 r/N
Cramér-Rao and that the amplitude of the dominant
0.15 Bound
sinusoid is A; then, the estimation error
0.1 of any of the frequencies of the signal is
0.05 bounded according to
0 ~r k - ~ k #
2r
8
N
–6 –4 –2 0 2 4 6
Normalized Estimation Error ∆ω /∆ω CR DFT resolution
2 (K - 1) tan (r/(2N )) A
# . (4)
FIGURE 3. Histograms from 1 million random tests (number of samples N, SNR, frequency, and r sin ((d~ - r/ N ) /2) ; a k ;
noise). The error that results from using Jacobsen’s frequency estimator [1], the trick (1), or the 14444444244444443
`super-resolution_ coefficient
refined trick (Figure 2) is further normalized by the Cramér-Rao lower bound of the estimation prob-
lem (T~ CR = 2 3 N -3/2 SNR -1). The standard deviations of the three estimators are 1.5325, 1.3008,
Here, ~r k , ~ k , ; and a k ; are the estimated
and 1.0092, respectively. PDF: probability density function.
frequency, the ground-truth frequency,
and its amplitude, respectively. Despite
its coarseness (see Figure 4), this inequal-
ity already demonstrates superresolution
potential since the “superresolution” coef-
ficient is usually smaller than one and, in
DTFT
fact, tends to zero when N tends to infinity
DFT Samples
(for fixed d~) .
Uncertainty Band
Empirically, a minimum value of
Endpoint DTFT
4r/N for d~, or two DFT bins, seems
Estimated Frequency
to be sufficient to obtain good fre-
Ground-Truth Frequency
quency estimates. Of course, this cheap
approach to high-resolution multifre-
quency estimation is not optimal, yet
0 2π it could be used as the starting point
Frequency ω
of any iterative algorithm designed to
maximize the likelihood of the problem.
FIGURE 4. Multiple frequencies (three complex exponentials, 12 samples) can be accurately estimat-
ed by first locating the isolated peaks (e.g., using findpeaks in MATLAB) and then applying
(1) individually. The actual estimation errors of the three frequencies (from left to right) are roughly Conclusion
(0.01, 0.02, 0.03) # 2r/12, well below the resolution 2r/12 of the DFT; for comparison, the upper The frequency of a single complex ex-
bound (4) provides the much more conservative values (0.66, 0.45, 0.58) # 2r/12. ponential can be found exactly using
.
A direct proof of this inequality uses the fact that k=1 e j(~ - ~) - 1
k
- 2 arctan d tan ( 2N ) n
X1 - X2 jN (~ k - ~) jN (~ k - ~)
X (~) - a k 0 e j (~ k - ~) - 1 = a k e j (~ k - ~) - 1
/
r 0
X1 + X2 e 0
-1 k ! k0 e -1
# 2 tan a 2N k
r X1 + f1 - X2 - f2 X1 - X2 sin (N (~ k - ~) /2) (using triangle inequality
X1 + f1 + X2 + f2
-
X1 + X2 # / ;ak ;
144444444444444444424444444444444444443 k ! k0 sin ((~ k - ~) /2) and Euler’s formula)
2 f 1 ( X 2 + f 2) - 2 f 2 ( X 1 + f 1 ) A
= # / (denoting A = max a k )
( X 1 + X 2 ) ( X 1 + f 1 + X 2 + f 2) k ! k0 sin (( ~ k - ~) /2) k = 1fK
the magnitude of only two samples of Hong Kong Special Administrative Hong Kong, Sha Tin New Territories,
its DTFT, as this article shows. In the Region, China, in 2021. He is currently a Hong Kong Special Administrative
presence of noise or other inaccuracies, postdoctoral research associate at the Region, China, where he has been since
the trick that we provide is very robust Department of Electrical and Electronic 2008. He received two Best Paper
and can even be iterated once to reach Engineering, Imperial College London, Awards (2003 and 2006) and is the coau-
the theoretical optimum (Cramér-Rao SW7 2AZ London, U.K., working with thor of a paper that received a Young
lower bound)—up to less than 1%. The Prof. Ayush Bhandari on computational Author Best Paper Award (2009), all
robustness of this formula makes it pos- imaging and modulo sampling. He from the IEEE Signal Processing
sible to, e.g., refine the peaks of the DFT worked as a postdoctoral research fellow Society. His research interests include
of a signal, but we also anticipate that it with Prof. Thierry Blu at the Electronic wavelets, approximation and sampling
can be used as a tool for high-resolution Engineering (EE) Department at CUHK, theory, sparse representations, biomedi-
frequency estimation. For teaching pur- from 2021 to 2022. He received the Post cal imaging, optics, and wave propaga-
poses, we provide a step-by-step proof graduate Student Research Excellence tion. He is a Fellow of IEEE.
that requires only undergraduate signal Award from the EE Department of
processing knowledge. CUHK in 2022. His research interests
References
include sparse signal processing, sam- [1] E. Jacobsen and P. Kootsookos, “Fast, accurate
Acknowledgment pling theory, inverse problems, modulo frequency estimators [DSP Tips & Tricks],” IEEE
Signal Process. Mag., vol. 24, no. 3, pp. 123–125,
Thierry Blu is the corresponding author. sampling, and computational sensing May 2007, doi: 10.1109/MSP.2007.361611.
and imaging. [2] Ç. Candan, “A method for fine resolution frequen-
Authors Thierry Blu (thierry.blu@m4x.org) cy estimation from three DFT samples,” IEEE Signal
Process. Lett., vol. 18, no. 6, pp. 351–354, Jun. 2011,
Ruiming Guo (ruiming.guo@imperial. received his Ph.D. degree from Télécom doi: 10.1109/LSP.2011.2136378.
ac.uk) received his Ph.D. degree in elec- Paris (ENST) in 1996. He is a professor [3] P. Stoica and A. Nehorai, “MUSIC, maximum like-
lihood, and Cramér-Rao bound,” IEEE Trans. Acoust.,
tronic engineering at the Chinese in the Department of Electronic Speech, Signal Process., vol. 37, no. 5, pp. 720–741,
University of Hong Kong (CUHK), Engineering, the Chinese University of May 1989, doi: 10.1109/29.17564.
M
oving average filters output the av- N-1 Note, however, that the apparent
erage of N samples, and it is easy yn = 1 / xn - k singularity at z = 1 is removable, and
N k=0
to see (and to prove) that they are N-1 in fact, it is not properly speaking a
low-pass filters. A simple, causal moving = x n /N + 1 / xn - k singular point of the transfer function.
N k=1
average filter satisfies N-2
Dividing the numerator of H(z) by the
N-1 = x n /N + 1 / x ( n - 1) - k denominator, one finds that
yn = 1 / x n - k .(1) N k=0
H (z) = ^1 + z -1 + g z - (N - 1) h /N, z ! 0
N k=0 N-1
= x n /N + 1 / x (n - 1) - k - x n - N / N
Because of their simplicity and intui- N k=0
tive appeal, they are often preferred to and this is the version of the transfer
more complicated low-pass filters when which is equivalent to function one arrives at if one starts from
one would like to remove high-frequency (1) and proceeds naïvely. Here it is clear
noise and the demands on the filter are y n = x n /N + y n - 1 - x n - N /N (2) that H(0) is perfectly well defined and
not too great. is equal to 1. If one would like to write
and this points to a different possi- the transfer function in closed form, the
Background ble implementation. One can imple- correct way to do so is to write
In this column, we explain how imple- ment the filter by setting the current
menting a moving average filter us- output, y n , equal to the sum of the 1 1 - z -N , z ! 0, 1
ing a recursive formulation (where previous output value, y n - 1, and the H (z) = * N 1 - z -1 . (3)
the current output is a function of the current sample of the input divided 1 z=1
current and previous inputs and previ- by N, x n /N, less the oldest sample
ous outputs) is possible. We then show of the input that was “part of” the When written this way, it is clear that
why it can be problematic and how to previous value of the output divided 1 is contained in the transfer function’s
deal with that problem (and this is by N, x n - N /N. If one would like to region of convergence, and this is yet
the “tip”). Finally, we describe cir- minimize the number of operations another way to see that the filter is stable
cumstances under which the problem needed to calculate each value of [1]. As we find in the section “An Im-
does not exist even though one might the output, y n (if an efficient imple- plementation Issue with the Recursive
have thought that it should (and this is mentation of the moving average fil- Implementation,” when rounding errors
the “trick”). ter is important), the method based are added to the picture, the system’s
on (2) is to be preferred. (See, for stability becomes a somewhat more in-
A Tip and a Trick example, [2].) volved question.
S
everal techniques have been devel- the Fourier transform. Familiarity with phase displacement over a period of time.
oped to overcome the limitation of holography is also beneficial. Thus, the principle of superposition and
sensor bandwidth for 2D signals [1]. the intensity form the basis for recording
Though compressive sensing is an attrac- Background the interference of coherent light waves
tive technique that reduces the number of As shown in Figure 1, when two light and comprise the core of holography [4].
measurements required to record infor- waves, E1 and E2, approach each other The holography technique was de-
mation on a sparse signal basis [2], [3], re- and finally meet at the recording point, veloped to preserve the depth informa-
cording information beyond the Nyquist P, this situation can be mathematically tion in an object signal, which cannot
frequency remains difficult when work- described by using the principle of super be captured by a normal camera [5].
ing with nonsparse signals. Given this position of electromagnetic fields and the When an object image is captured by
constraint, this article focuses on the use wave intensity at that point. The princi- an image sensor, the recorded intensity
of the physical bandwidth of a coherent ple of superposition in optics states that is proportional to the square of the ob-
signal in the complex form instead of its when multiple waves overlap in a me- ject signal amplitude, causing the phase
intensity form. The resulting trick com- dium, the resultant amplitude is equal to information of the object signal to be
bines holographic multiplexing with sam- the algebraic sum of the individual wave lost. By contrast, holography maintains
pling scheme optimization to obtain the amplitudes. The intensity is defined as the phase information using a refer-
information in a 2D coherent signal from the radiant power density of the light sig- ence signal. For example, a microscopy
beyond the Nyquist frequency range. The nal detected by a device such as a cam- technique using holography, as illus-
prerequisites for understanding this ar- era or an eye, and it can be expressed as trated in Figure 2, can be divided into
ticle are a knowledge of basic algebra and the time average of the wave amplitude two setup components: magnification
squared. As this article deals only with and frequency modulation. Light from
coherent light, electromagnetic waves the coherent light source is scattered
Digital Object Identifier 10.1109/MSP.2023.3310710
Date of current version: 3 November 2023 can be assumed to maintain a fixed and passed through the object, and the
E1 = E1e –i (kd 1 – ω t + φ 1) 2π
, k=
P λ
E2 = E2e –i (kd 2 – ω t + φ 2)
EP = E1 + E2 by the Principle of Superposition
d1 Intensity
FIGURE 1. Interference of two light waves and the intensity. Here d is the distances traveled by the beams, ~ is the angular velocity, k is the angular wave-
number, and z represents the phase of the light at time 0. The intensity at point P has the interference term, I 12 other than the individual intensities.
Sensor Plane
LASER
Objective Image
Lens EO Sensor
Object
Grating
(b)
First Lens Second Lens
I = ER + EO2
∗
= (ER + EO)(ER + EO)
= (ER + EO)(ER∗ + EO∗ )
fmax = ERER∗ + EO EO∗ + EOE∗R + ER EO∗ Autocorrelation (Center Lobe)
U1
Physical Bandwidth of Object Signal Amplitude Digitized Bandwidth of Object Signal Amplitude
Physical Bandwidth of Object Signal Intensity (Shannon–Nyquist Theorem Applied)
Sensor Bandwidth Dead Zone Digitized Bandwidth of Object Signal Intensity
fmax Maximum Digitized Frequency (Shannon–Nyquist Theorem Applied)
FIGURE 2. The frequency scheme of (a) direct imaging and (b) a single hologram with an optical system.
Object
Masking
Frequency Domain
Separating and
Decreasing FOV
Overlapping Signals
Moving to Center
FOV
FIGURE 3. Comparison of direct imaging and the method using frequency multiplexing. (a) Direct imaging, (b) direct imaging of the magnified object, and
(c) imaging the combined four patches of the object using frequency multiplexing with image reconstruction. FOV: field of view.
1
2
U0
(a) U2 > U0
(c)
U1 > U0
(b)
1 2 3 4
FIGURE 4. 2D frequency schemes of (a) direct imaging, (b) single hologram without optimization, (c) two holograms without optimization, (d) single
hologram with optimization, and (e) two holograms with optimization. The sampling schemes are optimized for (f) three, (g) four, (h) 12, and (i) 16 holo-
grams. The physical bandwidth of the intensity and amplitude of the single hologram are shown in brown and gray, respectively, and its digitized intensity
and amplitude are shown in violet and black, respectively. The digitized intensity and amplitude of the multiplexed hologram are shown in light and dark
green, respectively. SH: single hologram; MH: multiple holograms.
N Sidelobes (Holograms)
U0,1d
kd,i,1d kd,i,1d
2kd,i,1d
= ^ E S 1 + E S 2 + E S 3 + E S 4h E )R 1
4
1.6 / E S E )R
i 1
(14)
i=1
1.4
0 5 10 15 20
the bandwidth of the first hologram
Number of Holograms
carried by the first reference signal
FIGURE 6. Effective coherent Nyquist factor ^ C h as a function of the number of captured holo- is colocated with the other three holo-
grams. The red squares (manual) show the optimal value of C for recording one, two, three, and grams because they are also carried by
four holograms. the first reference signal.
Signal Intensity
in Frequency Domain
: Analog Filter
LASER Zeroth Order
Mirror
Object
Objective
ER3 Grating
Lens
ES3
Recording
Magnified
Object ER4
Image
ES4 Sensor
Mask
ER2
ES2
ER1
ES1
FIGURE 7. An example implementation of the sub-Nyquist coherent system using four holograms. It consists of three parts: magnification, multiplexing
with optimization, and recording. The optimized sampling scheme for four holograms can be implemented by physically splitting and combining the
magnified object signal.
Acknowledgment USA. His current research interests [8] K. Ishizuka, “Optimized sampling schemes for off-
axis holography,” Ultramicroscopy, vol. 52, no. 1, pp.
This work was supported by the Ministry of include nanoimaging, holography, and 1–5, Sep. 1993, doi: 10.1016/0304-3991(93)90017-R.
Science and ICT, Korea, under the Informa- signal processing. [9] B. Tayebi, F. Sharif, A. Karimi, and J.-H. Han,
“Stable extended imaging area sensing without
tion Technology Research Center support Jae-Ho Han (hanjaeho@korea.ac.kr) mechanical movement based on spatial frequency
program (IITP-2023-RS-2022-00156225) received his Ph.D. degree in electrical multiplexing,” IEEE Trans. Ind. Electron., vol. 65,
no. 10, pp. 8195–8203, Oct. 2018, doi: 10.1109/
and under the ICT Creative Consilience and computer engineering from Johns TIE.2018.2803721.
program (IITP-2023-2020-0-01819) super- Hopkins University, Baltimore, MD, SP
vised by the Institute for Information and USA. He is a full professor with the
I
n the last decade, the signal process- Moreover, we think that now is the right of knowledge. Perhaps its most widely
ing (SP) community has witnessed a time to start defining our needs and in- known definition dates back to 1959
paradigm shift from model-based to spirations that will reflect the direction (paraphrased from Arthur Samuel [1]):
data-driven methods. Machine learn- the field of SP will take in years to come. “Learning algorithms to build a model
ing (ML)—more specifically, deep In this article, following a success- based on sample data, known as train-
learning—methodologies are nowadays ful panel at IEEE International Confer- ing data, in order to make predictions or
widely used in all SP fields, e.g., audio, ence on Acoustics, Speech, and Signal decisions without being explicitly pro-
speech, image, video, multimedia, and Processing (ICASSP 2022) held in Sin- grammed to do so.”
multimodal/multisensor processing, to gapore, we focus on these education ML is, thus, a method of data analy-
name a few. Many data-driven methods aspects and draft a manifesto for an sis that uses algorithms to enable com-
also incorporate domain knowledge to SP-oriented DS curriculum. We hope puter systems to identify patterns in
improve problem modeling, especially this article will encourage discussions data, learn from them, and make predic-
when computational burden, training among SP educators worldwide and pro- tions or decisions based on that learning.
data scarceness, and memory size are mote new teaching programs in the field. For the SP community, data come in
important constraints. the form of signals. While the definition
Data science (DS), as a research field, DS, ML, and SP: Interrelations of signals as the carriers of information
emerged from several scientific disci- DS is an interdisciplinary field that can remains unchanged, the variety of sig-
plines, namely, mathematics (mainly be taught from different perspectives. nal types is rapidly growing. Signals can
statistics and optimization), computer Indeed, DS-oriented material can be a be either 1D or multidimensional; can
science, electrical engineering (pri- segment of many existing teaching pro- be defined over a regular grid (time or
marily SP), industrial engineering, bio- grams in science, technology, engineer- pixels) or on an irregular graph; can be
medical engineering, and information ing, and mathematics. In this article, packed as vectors, matrices, or higher
technology. Each discipline offers an in- we aim at the more ambitious task of dimensional tensors; and can represent
dependent teaching program in its core defining a complete and comprehensive multimodal data.
domain with a segment dedicated to DS teaching program in DS that takes the As discussed, a significant compo-
studies. In recent years, numerous insti- unique SP perspective. nent of SP is dedicated to extracting,
tutes worldwide have started to provide To put things in context, SP is con- representing, and transforming (raw)
dedicated and comprehensive DS teach- cerned with extracting information data to information that accentuates cer-
ing programs with diverse applications. and knowledge from signals. Com- tain properties beneficial to downstream
mon SP tasks are the analysis, modi- tasks. While, traditionally, SP focuses on
Motivation and significance fication, enhancement, prediction, and processing raw data that have a physi-
We believe that there is a unique SP per- synthesis of signals [see also https:// cal grounding on planet Earth [e.g., au-
spective of DS that should be reflected signalprocessingsociety.org/volunteers/ dio, speech, radar, sonar, image, video,
in the education given to our students. constitution (Article II)]. In parallel to electrocardiogram, electroencephalo-
the evolution of the SP methodology, we gram, magnetoencephalography, and
are witnessing a fast-growing interest in econometric data], one may not need
Digital Object Identifier 10.1109/MSP.2023.3294709
Date of current version: 3 November 2023 the field of ML. ML is not a new field to be limited to this standard practice.
T
he Video and Image Processing scenario. With the evolution of technol- architectures, such as diffusion-based
(VIP) Cup is a student competition ogy, new architectures and different models. With the first dichotomy, we
that takes place each year at the ways of generating synthetic data are ask that the detectors be robust to the
IEEE International Conference on Image continuously proposed [4], [5], [6], [7], occurrence of images that are only par-
Processing (ICIP). The 2022 IEEE VIP [8]. Therefore, detectors trained on tially synthetic, thus with limited data
Cup asked undergraduate students to some specific sources will end up work- on which to base the decision. As for
develop a system capable of distinguish- ing on target data of a very different architectures, there is already a signifi-
ing pristine images from generated ones. nature, often with disappointing results. cant body of knowledge on the detec-
The interest in this topic stems from the In these conditions, the ability of gener- tion of GAN-generated images [9], but
incredible advances in the artificial intel- alizing to new data becomes crucial to new text-based diffusion models are
ligence (AI)-based generation of visual keep providing a reliable service. More- now gaining the spotlight, and general-
data, with tools that allow the synthesis over, detectors are often required to ization becomes the central issue. With
of highly realistic images and videos. work on data that have been seriously the 2022 IEEE VIP Cup, we challenged
While this opens up a large number of impaired in several ways. For example, teams to design solutions that are able
new opportunities, it also undermines the when images are uploaded on social to work in the wild as only a fraction of
trustworthiness of media content and networks, they are normally resized and the generators used in the test data are
fosters the spread of disinformation on compressed to meet internal constraints. known in advance.
the Internet. Recently, there has been These operations tend to destroy impor- In this article, we present an over-
strong concern about the generation of tant forensic traces, calling for detectors view of this challenge, including the
extremely realistic images by means of that are robust to such events and competition setup, the teams, and their
editing software that includes the recent degrade performance gracefully. To technical approaches. Note that all of
technology on diffusion models [1], [2]. summarize, to operate successfully in the teams were composed of a profes-
In this context, there is a need to develop the wild, a detector should be robust to sor, at most one graduate student
robust and automatic tools for synthetic image impairments and, at the same (tutor), and undergraduate students
image detection. time, able to generalize well on images (from a minimum of three to a maxi-
In the literature, there has been an coming from diverse and new models. mum of 10 students).
intense research effort to develop effec- In the scientific community, there is
tive forensic image detectors, and many still insufficient (although growing) Tasks, resources, and evaluation
of them, if properly trained, appear to awareness of the centrality of these criteria
provide excellent results [3]. Such aspects in the development of reliable
results, however, usually refer to ideal detectors. Therefore, we took the Tasks
conditions and rarely stand the chal- opportunity of this VIP Cup to push The challenge consisted of two phases:
lenge of real-world application. First of further along this direction. In design- an open competition (split into two
all, testing a detector on images gener- ing the challenge, we decided to con- parts), in which any eligible team could
ated by the very same models seen in sider an up-to-date, realistic setting participate, and an invitation-only final.
the training phase leads to overly opti- with test data including 1) both fully Phase 1 of the open competition was
mistic results. In fact, this is not a realistic synthetic and partially manipulated designed to provide teams with a simpli-
images and 2) images generated by fied version of the problem at hand to
both established generative adversarial familiarize themselves with the task,
Digital Object Identifier 10.1109/MSP.2023.3294720
Date of current version: 3 November 2023 network (GAN) models and newer while phase 2 was designed to tackle a
FIGURE 1. Examples of synthetic images from the datasets used in the open competition. The first row shows samples from GLIDE [5], Taming Transform-
ers [10], StyleGAN2 [11], StyleGAN3 [12], and inpainting with Gated Convolution [13]. The second row shows samples from BigGAN [14], DALL-e mini
[6], Ablated Diffusion Model [15], Latent Diffusion [7], and LaMa [16]. The images in the fifth column are only locally manipulated (the regions outlined in
red are synthetic).
100 100
90 90
Accuracy on Test Set 1 (%)
80 80
70 70
60 60
50 50
40 40
27,753
27,707
27,759
27,754
27,730
27,751
27,700
27,740
27,701
27,770
27,749
27,768
27,686
27,759
27,751
27,753
27,707
27,700
27,754
27,701
27,740
27,730
27,770
27,768
27,749
27,686
FIGURE 2. The anonymized results in terms of accuracy of the 13 teams on the two open competition datasets.
Score (%)
multiple inputs, not just the RGB image. 70 70
Indeed, it is well known that generators
60 60
fail to accurately reproduce the natural
correlation among color bands [20] and 50 50
also that the upsampling operation rou-
tinely performed in most generative 40 40
0 10 20 30 40 50 60 70 40 50 60 70 80 90 100
models gives rise to distinctive spectral Time (min) Accuracy (%) on Test Set 1
peaks in the Fourier domain [21]. There- (a) (b)
fore, some solutions considered as input
the image represented in different color FIGURE 3. The results of all of the submitted algorithms: (a) score versus time and (b) accuracy on
spaces, i.e., HSV and YCbCr, or com- test set 1 versus accuracy on test set 2.
puted the co-occurrence matrices on the
color channels. Moreover, to exploit fre- The majority of the teams trained resizing, and rotation, most teams
quency-based features, two-stream net- their networks on the data made avail- used augmentation based on Gaussian
works have been adopted, using features able for the challenge; however, some blurring and JPEG compression,
extracted from the Fourier analysis in of them increased this dataset by gen- found to be especially helpful in the
the second stream. A two-branch net- erating additional synthetic images literature [24], but also changes of sat-
work was also used to work both on using new generative models, such as uration, contrast, and brightness, as
local and global features, which were other architectures based on GANs well as CutMix and random cutout.
fused by means of an attention module and new ones based on diffusion
as done in [22]. In general, attention models. Of course, including more Finalists
mechanisms have been included in sev- generators during training helped to The final phase of the 2022 IEEE VIP
eral solutions. Likewise, the ensembling improve the performance, even if Cup took place at ICIP in Bordeaux, on
of multiple networks was largely used to some approaches were able to obtain 16 October 2022. Figure 6 shows the
increase diversity and boost perfor- good generalization ability even add- members of the winning team while
mance. Different aggregation strategies ing a few more models. In addition, receiving the award. In the following,
have been pursued with the aim to gen- augmentation was always carried out we describe the three finalist teams list-
eralize to unseen models and favor deci- to increase diversity and improve gen- ed according to their final ranking:
sions toward the real image class, as eralization. Beyond standard opera- FAU Erlangen-Nürnberg (first place),
proposed in [23]. tions, like image flipping, cropping, Megatron (second place), and Sherlock
90 90
Balanced Accuracy (%)
80 80 27,686
27,749
70 70 27,768
60 60
50 50
Taming Tran.
Latent Diff.
GLIDE
StyleGAN2
StyleGAN3
GC Inpainting
AVG
BigGAN
DALL-e mini
ADM
LaMa
AVG
FIGURE 4. The balanced accuracy of the three best performing methods on images from test set 1 and test set 2.
90 90
80 80 27,686
AUC (%)
AUC (%)
27,749
70 70 27,768
60 60
50 50
Taming Tran.
Latent Diff.
GLIDE
StyleGAN2
StyleGAN3
GC Inpainting
AVG
BigGAN
DALL-e mini
ADM
LaMa
AVG
FIGURE 5. The area under the receiver operating characteristic curve (AUC) of the three best performing methods on images from test set 1 and test set 2.
Megatron
■■ Affiliation: Bangladesh University of
Engineering and Technology,
Bangladesh
■■ Supervisor: Shaikh Anowarul Fattah
■■ Students: Md Awsafur Rahman,
Bishmoy Paul, Najibul Haque
Sarker, and Zaber Ibn Abdul Hakim
■■ Technical approach: a multiclass
classification scheme and an ensem-
ble of convolutional neural networks
and transformer-based architectures.
An extra class was introduced to
FIGURE 6. The winning team (FAU Erlangen-Nürnberg) during the award ceremony at ICIP 2022 in detect synthetic images coming
Bordeaux.
from unknown models. Knowledge
distillation and test time augmenta-
(third place). We will also present some Beetz, ChangGeng Drewes, and tion were also included in the pro-
details on their technical approach. Tobias Gessler posed solution. The training set
■■ Technical approach: an ensemble of included, beyond the five known
FAU Erlangen-Nürnberg vision transformers pretrained on techniques, additional images coming
■■ Affiliation: Friedrich-Alexander- Imagenet-21k and fine-tuned on a from the following generators:
Universität Erlangen-Nürnberg, large dataset of 400,000 images. To ProGAN [26], ProjectedGAN [27],
Germany extract generalizable features, a pro- CycleGAN [28], DDPM [29],
■■ Supervisor: Christian Riess cedure based on weighted random Diffusion-GAN [30], Stable Diffusion
■■ Tutor: Anatol Maier sampling was adopted during training [31], Denoising Diffusion GAN [32],
■■ Students: Vinzenz Dewor, Luca aimed at balancing the data distribu- and GauGAN [33].
2023
NOVEMBER
©SHUTTERSTOCK.COM/SAYAN URANAN
19th International Conference on Advanced
Video and Signal-Based Surveillance
(AVSS 2023)
6–9 November, Daegu, South Korea.
General Chairs: Jeng-Neng Hwang
and Michael S. Ryoo
URL: https://www.avss2023.org
The IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)
DECEMBER will be held in Seoul, Korea, 14–19 April 2024.
IEEE International Workshop on Information
Forensics and Security (WIFS 2023) IEEE International Conference on
4–7 December, Nuremberg, Germany. MAY Multimedia and Expo (ICME 2024)
General Chairs: Marta Gomez-Barrero IEEE Conference on Computational Imaging 15–19 July, Niagara Falls, Canada.
and Christian Riess Using Synthetic Apertures (CISA 2024) General Chairs: Junsong Yuan, Jiebo Luo,
URL: https://wifs2023.fau.de/ 3–6 May, Boulder, CO, USA. and Xiao-Ping Zhang
General Chairs: Alexandra Artusio-Glimpse, URL: https://2024.ieeeicme.org/
Ninth IEEE International Workshop on Paritosh Manurkar, Sam Berweger, Kumar
Computational Advances in Multi-Sensor Vijay Mishra, and Peter Vouras SEPTEMBER
Adaptive Processing (CAMSAP 2023) URL: https://2024.ieeecisa.org/
10–13 December, Costa Rica. IEEE 25th International Workshop
General Chairs: M. Haardt and André de Almeida IEEE International Symposium on on Signal Processing Advances in
URL: https://www.tuwien.at/etit/tc/en/ Biomedical Imaging (ISBI 2024) Wireless Communications (SPAWC 2024)
camsap-2023/ 27–30 May, Athens, Greece. 10–13 September, Lucca, Italy.
General Chairs: Konstantina S. Nikita, General Chair: Luca Sanguinetti
Workshop on Automatic Speech Recognition and Christos Davatzikos URL: https://spawc2024.org/
and Understanding (ASRU 2023) URL: https://biomedicalimaging.org/2024/
16–20 December, Taipei, Taiwan. OCTOBER
General Chairs: Chi-Chun Lee, Yu Tsao,
JUNE IEEE International Conference
and Hsin-Min Wang
URL: http://www.asru2023.org IEEE Conference on Artificial on Image Processing (ICIP 2024)
Intelligence (CAI 2024) 27–30 October, Abu Dhabi, UAE.
25–27 June, Singapore. General Chairs: Mohammed Al-Mualla
2024 General Chairs: Ivor Tsang,
Yew Soon Ong and Hussein Abbass
and Moncef Gabbouj
URL: https://2024.ieeeicip.org/
URL: https://ieeecai.org/2024/ SP
APRIL
JULY
IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP 2024) IEEE 13th Sensor Array and Multichannel
14–19 April, Seoul, Korea. Signal Processing Workshop (SAM 2024)
General Chairs: Hanseok Ko and Monson Hayes 8–11 July, Corvallis, OR, USA.
URL: https://2024.ieeeicassp.org/ General Chairs: Yuejie Chi and Raviv Raich
URL: https://attend.ieee.org/sam-2024/
mathworks.com/machinelearning