research-article

DiffSound: Differentiable Modal Sound Rendering and Inverse Rendering for Diverse Inference Tasks

Authors:

Sheng LiAuthors Info & Claims

SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

Article No.: 128, Pages 1 - 12

https://doi.org/10.1145/3641519.3657493

Published: 13 July 2024 Publication History

Abstract

Accurately estimating and simulating the physical properties of objects from real-world sound recordings is of great practical importance in the fields of vision, graphics, and robotics. However, the progress in these directions has been limited—prior differentiable rigid or soft body simulation techniques cannot be directly applied to modal sound synthesis due to the high sampling rate of audio, while previous audio synthesizers often do not fully model the accurate physical properties of the sounding objects. We propose DiffSound, a differentiable sound rendering framework for physics-based modal sound synthesis, which is based on an implicit shape representation, a new high-order finite element analysis module, and a differentiable audio synthesizer. Our framework can solve a wide range of inverse problems thanks to the differentiability of the entire pipeline, including physical parameter estimation, geometric shape reasoning, and impact position prediction. Experimental results demonstrate the effectiveness of our approach, highlighting its ability to accurately reproduce the target sound in a physics-based manner. DiffSound serves as a valuable tool for various sound synthesis and analysis applications.

References

[1]

Adam W. Bargteil and Elaine Cohen. 2014. Animation of Deformable Bodies with Quadratic BéZier Finite Elements. ACM Trans. Graph. 33, 3, Article 27 (jun 2014), 10 pages.

Digital Library

[2]

Gaurav Bharaj, David I. W. Levin, James Tompkin, Yun Fei, Hanspeter Pfister, Wojciech Matusik, and Changxi Zheng. 2015. Computational Design of Metallophone Contact Sounds. ACM Trans. Graph. 34, 6, Article 223 (nov 2015), 13 pages.

Digital Library

[3]

Jeffrey N. Chadwick, Changxi Zheng, and Doug L. James. 2012. Precomputed Acceleration Noise for Improved Rigid-Body Sound. ACM Trans. Graph. 31, 4, Article 103 (jul 2012), 9 pages.

Digital Library

[4]

Samuel Clarke, Ruohan Gao, Mason Wang, Mark Rau, Julia Xu, Mark Rau, Jui-Hsien Wang, Doug James, and Jiajun Wu. 2023. RealImpact: A Dataset of Impact Sound Fields for Real Objects. In Conference on Computer Vision and Pattern Recognition (CVPR).

[5]

Samuel Clarke, Negin Heravi, Mark Rau, Ruohan Gao, Jiajun Wu, Doug James, and Jeannette Bohg. 2021. DiffImpact: Differentiable Rendering and Identification of Impact Sounds. In 5th Annual Conference on Robot Learning.

[6]

Simon Le Cleac’h, Hong-Xing Yu, Michelle Guo, Taylor A. Howell, Ruohan Gao, Jiajun Wu, Zachary Manchester, and Mac Schwager. 2023. Differentiable Physics Simulation of Dynamics-Augmented Neural Objects. Robotics and Automation Letters (RA-L) (2023).

[7]

COMSOL AB, Stockholm, Sweden. 2005. Comsol multiphysics user’s guide.

[8]

Keenan Crane, Ulrich Pinkall, and Peter Schröder. 2013. Robust fairing via conformal curvature flow. ACM Transactions on Graphics (TOG) 32, 4 (2013), 1–10.

Digital Library

[9]

Filipe de Avila Belbute-Peres, Kevin Smith, Kelsey Allen, Josh Tenenbaum, and J. Zico Kolter. 2018. End-to-End Differentiable Physics for Learning and Control. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Vol. 31. Curran Associates, Inc.

[10]

Jonas Degrave, Michiel Hermans, Joni Dambre, 2019. A differentiable physics engine for deep learning in robotics. Frontiers in neurorobotics (2019), 6.

[11]

Akio Doi and Akio Koide. 1991. An efficient method of triangulating equi-valued surfaces by using tetrahedral cells. IEICE TRANSACTIONS on Information and Systems 74, 1 (1991), 214–224.

[12]

Tao Du, Kui Wu, Pingchuan Ma, Sebastien Wah, Andrew Spielberg, Daniela Rus, and Wojciech Matusik. 2021. DiffPD: Differentiable Projective Dynamics. ACM Trans. Graph. 41, 2, Article 13 (nov 2021), 21 pages.

Digital Library

[13]

Jesse Engel, Lamtharn (Hanoi) Hantrakul, Chenjie Gu, and Adam Roberts. 2020. DDSP: Differentiable Digital Signal Processing. In International Conference on Learning Representations.

[14]

Jean Feydy, Thibault Séjourné, François-Xavier Vialard, Shun-ichi Amari, Alain Trouve, and Gabriel Peyré. 2019. Interpolating between Optimal Transport and MMD using Sinkhorn Divergences. In The 22nd International Conference on Artificial Intelligence and Statistics. 2681–2690.

[15]

Ruohan Gao, Yen-Yu Chang, Shivani Mall, Li Fei-Fei, and Jiajun Wu. 2021. ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations. In 5th Annual Conference on Robot Learning.

[16]

Ruohan Gao, Yiming Dou, Hao Li, Tanmay Agarwal, Jeannette Bohg, Yunzhu Li, Li Fei-Fei, and Jiajun Wu. 2023. The ObjectFolder Benchmark: Multisensory Object-Centric Learning with Neural and Real Objects. In CVPR.

[17]

Ruohan Gao, Zilin Si, Yen-Yu Chang, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, and Jiajun Wu. 2022. ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer. In CVPR.

[18]

Moritz Geilinger, David Hahn, Jonas Zehnder, Moritz Bächer, Bernhard Thomaszewski, and Stelian Coros. 2020. ADD: Analytically Differentiable Dynamics for Multi-Body Systems with Frictional Contact. ACM Trans. Graph. 39, 6, Article 190 (nov 2020), 15 pages.

Digital Library

[19]

David Hahn, Pol Banzet, James M. Bern, and Stelian Coros. 2019. Real2Sim: Visco-Elastic Parameter Estimation from Dynamic Motion. ACM Trans. Graph. 38, 6, Article 236 (nov 2019), 13 pages.

Digital Library

[20]

Philipp Holl, Nils Thuerey, and Vladlen Koltun. 2020. Learning to Control PDEs with Differentiable Physics. In International Conference on Learning Representations.

[21]

Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Frédo Durand. 2020. DiffTaichi: Differentiable Programming for Physical Simulation. ICLR (2020).

[22]

Yuanming Hu, Jiancheng Liu, Andrew Spielberg, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu, Daniela Rus, and Wojciech Matusik. 2019. ChainQueen: A Real-Time Differentiable Physical Simulator for Soft Robotics. In 2019 International Conference on Robotics and Automation (ICRA) (Montreal, QC, Canada). IEEE Press, 6265–6271.

[23]

Thomas JR Hughes. 2012. The finite element method: linear static and dynamic finite element analysis. Courier Corporation.

[24]

Doug L. James. 2016. Physically Based Sound for Computer Animation and Virtual Environments. In ACM SIGGRAPH 2016 Courses (Anaheim, California) (SIGGRAPH ’16). Association for Computing Machinery, New York, NY, USA, Article 22, 8 pages.

[25]

Doug L James, Jernej Barbič, and Dinesh K Pai. 2006. Precomputed acoustic transfer: output-sensitive, accurate sound generation for geometrically complex vibration sources. ACM Transactions on Graphics (TOG) 25, 3 (2006), 987–995.

Digital Library

[26]

Xutong Jin, Sheng Li, Tianshu Qu, Dinesh Manocha, and Guoping Wang. 2020. Deep-Modal: Real-Time Impact Sound Synthesis for Arbitrary Shapes. In Proceedings of the 28th ACM International Conference on Multimedia (Seattle, WA, USA) (MM ’20). Association for Computing Machinery, New York, NY, USA, 1171–1179.

Digital Library

[27]

Xutong Jin, Sheng Li, Guoping Wang, and Dinesh Manocha. 2022. NeuralSound: Learning-Based Modal Sound Synthesis with Acoustic Transfer. ACM Trans. Graph. 41, 4, Article 121 (jul 2022), 15 pages.

Digital Library

[28]

Mark Kac. 1966. Can one hear the shape of a drum?The american mathematical monthly 73, 4P2 (1966), 1–23.

[29]

Hao Li, Yizhi Zhang, Junzhe Zhu, Shaoxiong Wang, Michelle A. Lee, Huazhe Xu, Edward Adelson, Li Fei-Fei, Ruohan Gao, and Jiajun Wu. 2022b. See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation. In CoRL.

[30]

Yifei Li, Tao Du, Kui Wu, Jie Xu, and Wojciech Matusik. 2022a. DiffCloth: Differentiable Cloth Simulation with Dry Frictional Contact. ACM Trans. Graph. 42, 1, Article 2 (oct 2022), 20 pages.

Digital Library

[31]

Junbang Liang, Ming Lin, and Vladlen Koltun. 2019. Differentiable Cloth Simulation for Inverse Problems. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Vol. 32. Curran Associates, Inc.

[32]

Andreas Longva, Fabian Löschner, Tassilo Kugelstadt, José Antonio Fernández-Fernández, and Jan Bender. 2020. Higher-Order Finite Elements for Embedded Simulation. ACM Trans. Graph. 39, 6, Article 181 (nov 2020), 14 pages.

Digital Library

[33]

Antoine McNamara, Adrien Treuille, Zoran Popović, and Jos Stam. 2004. Fluid Control Using the Adjoint Method. 23, 3 (aug 2004), 449–456.

[34]

Johannes Mezger, Bernhard Thomaszewski, Simon Pabst, and Wolfgang Straßer. 2008. Interactive Physically-Based Shape Editing. In Proceedings of the 2008 ACM Symposium on Solid and Physical Modeling (Stony Brook, New York) (SPM ’08). Association for Computing Machinery, New York, NY, USA, 79–89.

Digital Library

[35]

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.

[36]

Jacob Munkberg, Jon Hasselgren, Tianchang Shen, Jun Gao, Wenzheng Chen, Alex Evans, Thomas Müller, and Sanja Fidler. 2022. Extracting triangular 3d models, materials, and lighting from images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8280–8290.

[37]

J. Krishna Murthy, Miles Macklin, Florian Golemo, Vikram Voleti, Linda Petrini, Martin Weiss, Breandan Considine, Jérôme Parent-Lévesque, Kevin Xie, Kenny Erleben, Liam Paull, Florian Shkurti, Derek Nowrouzezahrai, and Sanja Fidler. 2021. gradSim: Differentiable simulation for system identification and visuomotor control. In International Conference on Learning Representations.

[38]

James F. O’Brien, Chen Shen, and Christine M. Gatchalian. 2002. Synthesizing Sounds from Rigid-Body Simulations. In Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (San Antonio, Texas) (SCA ’02). Association for Computing Machinery, New York, NY, USA, 175–181.

Digital Library

[39]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).

[40]

Jovan Popović, Steven M. Seitz, and Michael Erdmann. 2003. Motion Sketching for Control of Rigid-Body Simulations. ACM Trans. Graph. 22, 4 (oct 2003), 1034–1054.

Digital Library

[41]

Yi-Ling Qiao, Junbang Liang, Vladlen Koltun, and Ming C. Lin. 2020. Scalable Differentiable Physics for Learning and Control. In Proceedings of the 37th International Conference on Machine Learning(ICML’20). JMLR.org, Article 727, 10 pages.

Digital Library

[42]

Nikunj Raghuvanshi and Ming C. Lin. 2006. Interactive Sound Synthesis for Large Scale Environments. In Proceedings of the 2006 Symposium on Interactive 3D Graphics and Games (Redwood City, California) (I3D ’06). Association for Computing Machinery, New York, NY, USA, 101–108.

Digital Library

[43]

Zhimin Ren, Hengchin Yeh, and Ming C. Lin. 2013. Example-Guided Physically Based Modal Sound Synthesis. ACM Trans. Graph. 32, 1, Article 1 (feb 2013), 16 pages.

Digital Library

[44]

Connor Schenck and Dieter Fox. 2018. Spnets: Differentiable fluid dynamics for deep neural networks. In Conference on Robot Learning. PMLR, 317–335.

[45]

Teseo Schneider, Jérémie Dumas, Xifeng Gao, Mario Botsch, Daniele Panozzo, and Denis Zorin. 2019. Poly-spline finite-element method. ACM Transactions on Graphics (TOG) 38, 3 (2019), 1–16.

Digital Library

[46]

Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, and Sanja Fidler. 2021. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. Advances in Neural Information Processing Systems 34 (2021), 6087–6101.

[47]

Eftychios Sifakis and Jernej Barbic. 2012. FEM Simulation of 3D Deformable Solids: A Practitioner’s Guide to Theory, Discretization and Model Reduction. In ACM SIGGRAPH 2012 Courses (Los Angeles, California) (SIGGRAPH ’12). Association for Computing Machinery, New York, NY, USA, Article 20, 50 pages.

Digital Library

[48]

A. Sterling, N. Rewkowski, R. L. Klatzky, and M. C. Lin. 2019. Audio-Material Reconstruction for Virtualized Reality Using a Probabilistic Damping Model. IEEE Transactions on Visualization and Computer Graphics (2019), 1–1. https://doi.org/10.1109/TVCG.2019.2898822

[49]

Marc Toussaint, Kelsey R. Allen, Kevin A. Smith, and Joshua B. Tenenbaum. 2019. Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning - Extended Abtract. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial Intelligence Organization, 6231–6235.

[50]

Adrien Treuille, Antoine McNamara, Zoran Popović, and Jos Stam. 2003. Keyframe Control of Smoke Simulations. ACM Trans. Graph. 22, 3 (jul 2003), 716–723.

Digital Library

[51]

Kees van den Doel, Paul G. Kry, and Dinesh K. Pai. 2001. FoleyAutomatic: Physically-Based Sound Effects for Interactive Simulation and Animation. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques(SIGGRAPH ’01). Association for Computing Machinery, New York, NY, USA, 537–544.

Digital Library

[52]

Kees Van den Doel and Dinesh K Pai. 1998. The sounds of physical shapes. Presence 7, 4 (1998), 382–395.

Digital Library

[53]

Chris Wojtan, Peter J. Mucha, and Greg Turk. 2006. Keyframe Control of Complex Particle Systems Using the Adjoint Method. In Proceedings of the 2006 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (Vienna, Austria) (SCA ’06). Eurographics Association, Goslar, DEU, 15–23.

Digital Library

[54]

Jiankai Xing, Fujun Luan, Ling-Qi Yan, Xuejun Hu, Houde Qian, and Kun Xu. 2022. Differentiable Rendering Using RGBXY Derivatives and Optimal Transport. ACM Trans. Graph. 41, 6, Article 189 (nov 2022), 13 pages.

Digital Library

[55]

Jie Xu, Tao Chen, Lara Zlokapa, Michael Foshey, Wojciech Matusik, Shinjiro Sueda, and Pulkit Agrawal. 2021. An End-to-End Differentiable Framework for Contact-Aware Robot Design. In Robotics: Science and Systems XVII, Virtual Event, July 12-16, 2021, Dylan A. Shell, Marc Toussaint, and M. Ani Hsieh (Eds.). https://doi.org/10.15607/RSS.2021.XVII.008

[56]

Zhoutong Zhang, Qiujia Li, Zhengjia Huang, Jiajun Wu, Joshua B. Tenenbaum, and William T. Freeman. 2017. Shape and Material from Sound. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 1278–1288.

Digital Library

[57]

Changxi Zheng and Doug L. James. 2011. Toward High-Quality Modal Contact Sound. In ACM SIGGRAPH 2011 Papers (Vancouver, British Columbia, Canada) (SIGGRAPH ’11). Association for Computing Machinery, New York, NY, USA, Article 38, 12 pages.

[58]

Bofang Zhu. 2018. The finite element method: fundamentals and applications in civil, hydraulic, mechanical and aeronautical engineering. (2018).

Index Terms

DiffSound: Differentiable Modal Sound Rendering and Inverse Rendering for Diverse Inference Tasks
1. Applied computing
  1. Arts and humanities
    1. Sound and music computing

Recommendations

Real-time rendering of decorative sound textures for soundscapes

Audio recordings contain rich information about sound sources and their properties such as the location, loudness, and frequency of events. One prevalent component in sound recordings is the sound texture, which contains a massive number of events. In ...
Virtual Sound Rendering in a Stereophonic Loudspeaker Setup

This paper presents a mathematical analysis of the effects of interchannel amplitude and time differences in two channel (stereophonic) sound systems. The analysis is developed by computing the acoustic conditions at the listener's ears as a function of ...
Real-time rendering of aerodynamic sound using sound textures based on computational fluid dynamics

In computer graphics, most research focuses on creating images. However, there has been much recent work on the automatic generation of sound linked to objects in motion and the relative positions of receivers and sound sources. This paper proposes a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

July 2024

1106 pages

ISBN:9798400705250

DOI:10.1145/3641519

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Key R&D Program of China
NSFC of China

Conference

SIGGRAPH '24

Sponsor:

SIGGRAPH

SIGGRAPH '24: Special Interest Group on Computer Graphics and Interactive Techniques Conference

July 27 - August 1, 2024

CO, Denver, USA

Acceptance Rates

Overall Acceptance Rate 1,822 of 8,601 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
174
Total Downloads

Downloads (Last 12 months)174
Downloads (Last 6 weeks)53

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents