Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3641519.3657493acmconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article

DiffSound: Differentiable Modal Sound Rendering and Inverse Rendering for Diverse Inference Tasks

Published: 13 July 2024 Publication History

Abstract

Accurately estimating and simulating the physical properties of objects from real-world sound recordings is of great practical importance in the fields of vision, graphics, and robotics. However, the progress in these directions has been limited—prior differentiable rigid or soft body simulation techniques cannot be directly applied to modal sound synthesis due to the high sampling rate of audio, while previous audio synthesizers often do not fully model the accurate physical properties of the sounding objects. We propose DiffSound, a differentiable sound rendering framework for physics-based modal sound synthesis, which is based on an implicit shape representation, a new high-order finite element analysis module, and a differentiable audio synthesizer. Our framework can solve a wide range of inverse problems thanks to the differentiability of the entire pipeline, including physical parameter estimation, geometric shape reasoning, and impact position prediction. Experimental results demonstrate the effectiveness of our approach, highlighting its ability to accurately reproduce the target sound in a physics-based manner. DiffSound serves as a valuable tool for various sound synthesis and analysis applications.

References

[1]
Adam W. Bargteil and Elaine Cohen. 2014. Animation of Deformable Bodies with Quadratic BéZier Finite Elements. ACM Trans. Graph. 33, 3, Article 27 (jun 2014), 10 pages.
[2]
Gaurav Bharaj, David I. W. Levin, James Tompkin, Yun Fei, Hanspeter Pfister, Wojciech Matusik, and Changxi Zheng. 2015. Computational Design of Metallophone Contact Sounds. ACM Trans. Graph. 34, 6, Article 223 (nov 2015), 13 pages.
[3]
Jeffrey N. Chadwick, Changxi Zheng, and Doug L. James. 2012. Precomputed Acceleration Noise for Improved Rigid-Body Sound. ACM Trans. Graph. 31, 4, Article 103 (jul 2012), 9 pages.
[4]
Samuel Clarke, Ruohan Gao, Mason Wang, Mark Rau, Julia Xu, Mark Rau, Jui-Hsien Wang, Doug James, and Jiajun Wu. 2023. RealImpact: A Dataset of Impact Sound Fields for Real Objects. In Conference on Computer Vision and Pattern Recognition (CVPR).
[5]
Samuel Clarke, Negin Heravi, Mark Rau, Ruohan Gao, Jiajun Wu, Doug James, and Jeannette Bohg. 2021. DiffImpact: Differentiable Rendering and Identification of Impact Sounds. In 5th Annual Conference on Robot Learning.
[6]
Simon Le Cleac’h, Hong-Xing Yu, Michelle Guo, Taylor A. Howell, Ruohan Gao, Jiajun Wu, Zachary Manchester, and Mac Schwager. 2023. Differentiable Physics Simulation of Dynamics-Augmented Neural Objects. Robotics and Automation Letters (RA-L) (2023).
[7]
COMSOL AB, Stockholm, Sweden. 2005. Comsol multiphysics user’s guide.
[8]
Keenan Crane, Ulrich Pinkall, and Peter Schröder. 2013. Robust fairing via conformal curvature flow. ACM Transactions on Graphics (TOG) 32, 4 (2013), 1–10.
[9]
Filipe de Avila Belbute-Peres, Kevin Smith, Kelsey Allen, Josh Tenenbaum, and J. Zico Kolter. 2018. End-to-End Differentiable Physics for Learning and Control. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Vol. 31. Curran Associates, Inc.
[10]
Jonas Degrave, Michiel Hermans, Joni Dambre, 2019. A differentiable physics engine for deep learning in robotics. Frontiers in neurorobotics (2019), 6.
[11]
Akio Doi and Akio Koide. 1991. An efficient method of triangulating equi-valued surfaces by using tetrahedral cells. IEICE TRANSACTIONS on Information and Systems 74, 1 (1991), 214–224.
[12]
Tao Du, Kui Wu, Pingchuan Ma, Sebastien Wah, Andrew Spielberg, Daniela Rus, and Wojciech Matusik. 2021. DiffPD: Differentiable Projective Dynamics. ACM Trans. Graph. 41, 2, Article 13 (nov 2021), 21 pages.
[13]
Jesse Engel, Lamtharn (Hanoi) Hantrakul, Chenjie Gu, and Adam Roberts. 2020. DDSP: Differentiable Digital Signal Processing. In International Conference on Learning Representations.
[14]
Jean Feydy, Thibault Séjourné, François-Xavier Vialard, Shun-ichi Amari, Alain Trouve, and Gabriel Peyré. 2019. Interpolating between Optimal Transport and MMD using Sinkhorn Divergences. In The 22nd International Conference on Artificial Intelligence and Statistics. 2681–2690.
[15]
Ruohan Gao, Yen-Yu Chang, Shivani Mall, Li Fei-Fei, and Jiajun Wu. 2021. ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations. In 5th Annual Conference on Robot Learning.
[16]
Ruohan Gao, Yiming Dou, Hao Li, Tanmay Agarwal, Jeannette Bohg, Yunzhu Li, Li Fei-Fei, and Jiajun Wu. 2023. The ObjectFolder Benchmark: Multisensory Object-Centric Learning with Neural and Real Objects. In CVPR.
[17]
Ruohan Gao, Zilin Si, Yen-Yu Chang, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, and Jiajun Wu. 2022. ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer. In CVPR.
[18]
Moritz Geilinger, David Hahn, Jonas Zehnder, Moritz Bächer, Bernhard Thomaszewski, and Stelian Coros. 2020. ADD: Analytically Differentiable Dynamics for Multi-Body Systems with Frictional Contact. ACM Trans. Graph. 39, 6, Article 190 (nov 2020), 15 pages.
[19]
David Hahn, Pol Banzet, James M. Bern, and Stelian Coros. 2019. Real2Sim: Visco-Elastic Parameter Estimation from Dynamic Motion. ACM Trans. Graph. 38, 6, Article 236 (nov 2019), 13 pages.
[20]
Philipp Holl, Nils Thuerey, and Vladlen Koltun. 2020. Learning to Control PDEs with Differentiable Physics. In International Conference on Learning Representations.
[21]
Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Frédo Durand. 2020. DiffTaichi: Differentiable Programming for Physical Simulation. ICLR (2020).
[22]
Yuanming Hu, Jiancheng Liu, Andrew Spielberg, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu, Daniela Rus, and Wojciech Matusik. 2019. ChainQueen: A Real-Time Differentiable Physical Simulator for Soft Robotics. In 2019 International Conference on Robotics and Automation (ICRA) (Montreal, QC, Canada). IEEE Press, 6265–6271.
[23]
Thomas JR Hughes. 2012. The finite element method: linear static and dynamic finite element analysis. Courier Corporation.
[24]
Doug L. James. 2016. Physically Based Sound for Computer Animation and Virtual Environments. In ACM SIGGRAPH 2016 Courses (Anaheim, California) (SIGGRAPH ’16). Association for Computing Machinery, New York, NY, USA, Article 22, 8 pages.
[25]
Doug L James, Jernej Barbič, and Dinesh K Pai. 2006. Precomputed acoustic transfer: output-sensitive, accurate sound generation for geometrically complex vibration sources. ACM Transactions on Graphics (TOG) 25, 3 (2006), 987–995.
[26]
Xutong Jin, Sheng Li, Tianshu Qu, Dinesh Manocha, and Guoping Wang. 2020. Deep-Modal: Real-Time Impact Sound Synthesis for Arbitrary Shapes. In Proceedings of the 28th ACM International Conference on Multimedia (Seattle, WA, USA) (MM ’20). Association for Computing Machinery, New York, NY, USA, 1171–1179.
[27]
Xutong Jin, Sheng Li, Guoping Wang, and Dinesh Manocha. 2022. NeuralSound: Learning-Based Modal Sound Synthesis with Acoustic Transfer. ACM Trans. Graph. 41, 4, Article 121 (jul 2022), 15 pages.
[28]
Mark Kac. 1966. Can one hear the shape of a drum?The american mathematical monthly 73, 4P2 (1966), 1–23.
[29]
Hao Li, Yizhi Zhang, Junzhe Zhu, Shaoxiong Wang, Michelle A. Lee, Huazhe Xu, Edward Adelson, Li Fei-Fei, Ruohan Gao, and Jiajun Wu. 2022b. See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation. In CoRL.
[30]
Yifei Li, Tao Du, Kui Wu, Jie Xu, and Wojciech Matusik. 2022a. DiffCloth: Differentiable Cloth Simulation with Dry Frictional Contact. ACM Trans. Graph. 42, 1, Article 2 (oct 2022), 20 pages.
[31]
Junbang Liang, Ming Lin, and Vladlen Koltun. 2019. Differentiable Cloth Simulation for Inverse Problems. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Vol. 32. Curran Associates, Inc.
[32]
Andreas Longva, Fabian Löschner, Tassilo Kugelstadt, José Antonio Fernández-Fernández, and Jan Bender. 2020. Higher-Order Finite Elements for Embedded Simulation. ACM Trans. Graph. 39, 6, Article 181 (nov 2020), 14 pages.
[33]
Antoine McNamara, Adrien Treuille, Zoran Popović, and Jos Stam. 2004. Fluid Control Using the Adjoint Method. 23, 3 (aug 2004), 449–456.
[34]
Johannes Mezger, Bernhard Thomaszewski, Simon Pabst, and Wolfgang Straßer. 2008. Interactive Physically-Based Shape Editing. In Proceedings of the 2008 ACM Symposium on Solid and Physical Modeling (Stony Brook, New York) (SPM ’08). Association for Computing Machinery, New York, NY, USA, 79–89.
[35]
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.
[36]
Jacob Munkberg, Jon Hasselgren, Tianchang Shen, Jun Gao, Wenzheng Chen, Alex Evans, Thomas Müller, and Sanja Fidler. 2022. Extracting triangular 3d models, materials, and lighting from images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8280–8290.
[37]
J. Krishna Murthy, Miles Macklin, Florian Golemo, Vikram Voleti, Linda Petrini, Martin Weiss, Breandan Considine, Jérôme Parent-Lévesque, Kevin Xie, Kenny Erleben, Liam Paull, Florian Shkurti, Derek Nowrouzezahrai, and Sanja Fidler. 2021. gradSim: Differentiable simulation for system identification and visuomotor control. In International Conference on Learning Representations.
[38]
James F. O’Brien, Chen Shen, and Christine M. Gatchalian. 2002. Synthesizing Sounds from Rigid-Body Simulations. In Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (San Antonio, Texas) (SCA ’02). Association for Computing Machinery, New York, NY, USA, 175–181.
[39]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).
[40]
Jovan Popović, Steven M. Seitz, and Michael Erdmann. 2003. Motion Sketching for Control of Rigid-Body Simulations. ACM Trans. Graph. 22, 4 (oct 2003), 1034–1054.
[41]
Yi-Ling Qiao, Junbang Liang, Vladlen Koltun, and Ming C. Lin. 2020. Scalable Differentiable Physics for Learning and Control. In Proceedings of the 37th International Conference on Machine Learning(ICML’20). JMLR.org, Article 727, 10 pages.
[42]
Nikunj Raghuvanshi and Ming C. Lin. 2006. Interactive Sound Synthesis for Large Scale Environments. In Proceedings of the 2006 Symposium on Interactive 3D Graphics and Games (Redwood City, California) (I3D ’06). Association for Computing Machinery, New York, NY, USA, 101–108.
[43]
Zhimin Ren, Hengchin Yeh, and Ming C. Lin. 2013. Example-Guided Physically Based Modal Sound Synthesis. ACM Trans. Graph. 32, 1, Article 1 (feb 2013), 16 pages.
[44]
Connor Schenck and Dieter Fox. 2018. Spnets: Differentiable fluid dynamics for deep neural networks. In Conference on Robot Learning. PMLR, 317–335.
[45]
Teseo Schneider, Jérémie Dumas, Xifeng Gao, Mario Botsch, Daniele Panozzo, and Denis Zorin. 2019. Poly-spline finite-element method. ACM Transactions on Graphics (TOG) 38, 3 (2019), 1–16.
[46]
Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, and Sanja Fidler. 2021. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. Advances in Neural Information Processing Systems 34 (2021), 6087–6101.
[47]
Eftychios Sifakis and Jernej Barbic. 2012. FEM Simulation of 3D Deformable Solids: A Practitioner’s Guide to Theory, Discretization and Model Reduction. In ACM SIGGRAPH 2012 Courses (Los Angeles, California) (SIGGRAPH ’12). Association for Computing Machinery, New York, NY, USA, Article 20, 50 pages.
[48]
A. Sterling, N. Rewkowski, R. L. Klatzky, and M. C. Lin. 2019. Audio-Material Reconstruction for Virtualized Reality Using a Probabilistic Damping Model. IEEE Transactions on Visualization and Computer Graphics (2019), 1–1. https://doi.org/10.1109/TVCG.2019.2898822
[49]
Marc Toussaint, Kelsey R. Allen, Kevin A. Smith, and Joshua B. Tenenbaum. 2019. Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning - Extended Abtract. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial Intelligence Organization, 6231–6235.
[50]
Adrien Treuille, Antoine McNamara, Zoran Popović, and Jos Stam. 2003. Keyframe Control of Smoke Simulations. ACM Trans. Graph. 22, 3 (jul 2003), 716–723.
[51]
Kees van den Doel, Paul G. Kry, and Dinesh K. Pai. 2001. FoleyAutomatic: Physically-Based Sound Effects for Interactive Simulation and Animation. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques(SIGGRAPH ’01). Association for Computing Machinery, New York, NY, USA, 537–544.
[52]
Kees Van den Doel and Dinesh K Pai. 1998. The sounds of physical shapes. Presence 7, 4 (1998), 382–395.
[53]
Chris Wojtan, Peter J. Mucha, and Greg Turk. 2006. Keyframe Control of Complex Particle Systems Using the Adjoint Method. In Proceedings of the 2006 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (Vienna, Austria) (SCA ’06). Eurographics Association, Goslar, DEU, 15–23.
[54]
Jiankai Xing, Fujun Luan, Ling-Qi Yan, Xuejun Hu, Houde Qian, and Kun Xu. 2022. Differentiable Rendering Using RGBXY Derivatives and Optimal Transport. ACM Trans. Graph. 41, 6, Article 189 (nov 2022), 13 pages.
[55]
Jie Xu, Tao Chen, Lara Zlokapa, Michael Foshey, Wojciech Matusik, Shinjiro Sueda, and Pulkit Agrawal. 2021. An End-to-End Differentiable Framework for Contact-Aware Robot Design. In Robotics: Science and Systems XVII, Virtual Event, July 12-16, 2021, Dylan A. Shell, Marc Toussaint, and M. Ani Hsieh (Eds.). https://doi.org/10.15607/RSS.2021.XVII.008
[56]
Zhoutong Zhang, Qiujia Li, Zhengjia Huang, Jiajun Wu, Joshua B. Tenenbaum, and William T. Freeman. 2017. Shape and Material from Sound. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 1278–1288.
[57]
Changxi Zheng and Doug L. James. 2011. Toward High-Quality Modal Contact Sound. In ACM SIGGRAPH 2011 Papers (Vancouver, British Columbia, Canada) (SIGGRAPH ’11). Association for Computing Machinery, New York, NY, USA, Article 38, 12 pages.
[58]
Bofang Zhu. 2018. The finite element method: fundamentals and applications in civil, hydraulic, mechanical and aeronautical engineering. (2018).

Index Terms

  1. DiffSound: Differentiable Modal Sound Rendering and Inverse Rendering for Diverse Inference Tasks

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers
    July 2024
    1106 pages
    ISBN:9798400705250
    DOI:10.1145/3641519
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 July 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. audio
    2. differentiable simulation
    3. modal analysis
    4. sound synthesis
    5. vibration

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • National Key R&D Program of China
    • NSFC of China

    Conference

    SIGGRAPH '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,822 of 8,601 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 174
      Total Downloads
    • Downloads (Last 12 months)174
    • Downloads (Last 6 weeks)53
    Reflects downloads up to 24 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media