Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3290605.3300851acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Public Access

Audible Panorama: Automatic Spatial Audio Generation for Panorama Imagery

Published: 02 May 2019 Publication History

Abstract

As 360 deg cameras and virtual reality headsets become more popular, panorama images have become increasingly ubiquitous. While sounds are essential in delivering immersive and interactive user experiences, most panorama images, however, do not come with native audio. In this paper, we propose an automatic algorithm to augment static panorama images through realistic audio assignment. We accomplish this goal through object detection, scene classification, object depth estimation, and audio source placement. We built an audio file database composed of over $500$ audio files to facilitate this process. We designed and conducted a user study to verify the efficacy of various components in our pipeline. We run our method on a large variety of panorama images of indoor and outdoor scenes. By analyzing the statistics, we learned the relative importance of these components, which can be used in prioritizing for power-sensitive time-critical tasks like mobile augmented reality (AR) applications.

Supplementary Material

RAR File (paper621.rar)
1. Supplementary download (132MB) https://www.dropbox.com/s/7pol4rgjt6h3zzp/Ambient_Supplementary.zip?dl=0
MP4 File (paper621.mp4)
Supplemental video
MP4 File (paper621p.mp4)
Preview video

References

[1]
Fathima Assilmia, Yun Suen Pai, Keiko Okawa, and Kai Kunze. 2017. IN360: A 360-Degree-Video Platform to Change Students Preconceived Notions on Their Career. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, May 06--11, 2017, Extended Abstracts. 2359--2365.
[2]
Durand R Begault and Leonard J Trejo. 2000. 3-D sound for virtual reality and multimedia. (2000).
[3]
Doug A Bowman and Ryan P McMahan. 2007. Virtual reality: how much immersion is enough? Computer 40, 7 (2007).
[4]
Janki Dodiya and Vassil N. Alexandrov. 2007. Perspectives on Potential of Sound in Virtual Environments. HAVE 2007. IEEE International Workshop on Haptic, Audio and Visual Environments and Games (2007), 15--20.
[5]
Daniel J. Finnegan, Eamonn O'Neill, and Michael J. Proulx. 2016. Compensating for Distance Compression in Audiovisual Virtual Environments Using Incongruence. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, San Jose, CA, USA, May 7--12, 2016. 200--212.
[6]
Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound Technical Demo. In ACM International Conference on Multimedia (MM'13). ACM, ACM, Barcelona, Spain, 411--412.
[7]
François G. Germain, Gautham J. Mysore, and Takako Fujioka. 2016. Equalization matching of speech recordings in real-world environments. In IEEE ICASSP 2016.
[8]
Jan Gugenheimer, Dennis Wolf, Gabriel Haas, Sebastian Krebs, and Enrico Rukzio. 2016. SwiVRChair: A Motorized Swivel Chair to Nudge Users' Orientation for 360 Degree Storytelling in Virtual Reality. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, San Jose, CA, USA, May 7--12, 2016. 1996--2000.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10]
Claudia Hendrix and Woodrow Barfield. 1996. The sense of presence within auditory virtual environments. Presence: Teleoperators & Virtual Environments 5, 3 (1996), 290--301.
[11]
Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, and Kevin Murphy. 2017. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21--26, 2017. 3296--3297.
[12]
FZ. Kaghat, C. Le Prado, A. Damala, and P. Cubaud. 2009. Experimenting with Sound Immersion in an Arts and Crafts Museum. In: Natkin S., Dupire J. (eds) Entertainment Computing - ICEC 2009. 5709 (2009).
[13]
Arun Kulshreshth and Joseph J. LaViola Jr. 2016. Dynamic Stereoscopic 3D Parameter Adjustment for Enhanced Depth Discrimination. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, San Jose, CA, USA, May 7--12, 2016. 177--187.
[14]
Dingzeyu Li, Timothy R. Langlois, and Changxi Zheng. 2018. SceneAware Audio for 360°Videos. ACM Trans. Graph. (SIGGRAPH) 37, 4 (2018).
[15]
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr DollÃar. 2014. Microsoft COCO: Common Objects in Context. eprint arXiv 1405.0312 (2014).
[16]
Yen-Chen Lin, Yung-Ju Chang, Hou-Ning Hu, Hsien-Tzu Cheng, ChiWen Huang, and Min Sun. 2017. Tell Me Where to Look: Investigating Ways for Assisting Focus in 360° Video. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, May 06--11, 2017. 2535--2545.
[17]
Josh H McDermott, Michael Schemitsch, and Eero P Simoncelli. 2013. Summary statistics in auditory perception. Nature neuroscience 16, 4 (2013), 493--498.
[18]
Pedro Morgado, Nuno Vasconcelos, Timothy Langlois, and Oliver Wang. 2018. Self-Supervised Generation of Spatial Audio for 360°Video. In Neural Information Processing Systems (NeurIPS).
[19]
Cuong Nguyen, Stephen DiVerdi, Aaron Hertzmann, and Feng Liu. 2017. CollaVR: Collaborative In-Headset Review for VR Video. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, UIST 2017, Quebec City, QC, Canada, October 22 - 25, 2017. 267--277.
[20]
James F O'Brien, Chen Shen, and Christine M Gatchalian. 2002. Synthesizing sounds from rigid-body simulations. In Proceedings of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animation. ACM, 175--181.
[21]
Masashi Okada, Takao Onoye, and Wataru Kobayashi. 2012. A ray tracing simulation of sound diffraction based on the analytic secondary source model. IEEE Transactions on Audio, Speech, and Language Processing 20, 9 (2012), 2448--2460.
[22]
Amy Pavel, Björn Hartmann, and Maneesh Agrawala. 2017. Shot Orientation Controls for Interactive Cinematography with 360 Video. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, UIST 2017, Quebec City, QC, Canada, October 22 - 25, 2017. 289--297.
[23]
Vanessa C. Pope, Robert Dawes, Florian Schweiger, and Alia Sheikh. 2017. The Geometry of Storytelling: Theatrical Use of Space for 360degree Videos and Virtual Reality. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, May 06--11, 2017. 4468--4478.
[24]
Nikunj Raghuvanshi, John Snyder, Ravish Mehra, Ming Lin, and Naga Govindaraju. 2010. Precomputed wave simulation for real-time sound Audible Panorama: Automatic Spatial Audio Generation for Panorama Imagery CHI 2019, May 4--9, 2019, Glasgow, Scotland UK propagation of dynamic sources in complex scenes. ACM Transactions on Graphics (TOG) 29, 4 (2010), 68.
[25]
Katja Rogers, Giovanni Ribeiro, Rina R. Wehbe, Michael Weber, and Lennart E. Nacke. 2018. Vanishing Importance: Studying Immersive Effects of Game Audio Perception on Player Experiences in Virtual Reality. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018, Montreal, QC, Canada, April 21--26, 2018. 328.
[26]
Manolis Savva, Angel X. Chang, Gilbert Bernstein, Christopher D. Manning, and Pat Hanrahan. 2014. On Being the Right Scale: Sizing Large Collections of 3D Models. In SIGGRAPH Asia 2014 Workshop on Indoor Scene Understanding: Where Graphics meets Vision.
[27]
Eldon Schoop, James Smith, and Bjoern Hartmann. 2018. HindSight: Enhancing Spatial Awareness by Sonifying Detected Objects in RealTime 360-Degree Video. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018. 143.
[28]
K. Sharma, A. C. Kumar, and S. M. Bhandarkar. 2017. Action Recognition in Still Images Using Word Embeddings from Natural Language Descriptions. In 2017 IEEE Winter Applications of Computer Vision Workshops (WACVW). 58--66.
[29]
Jonathan Steuer. 1992. Defining virtual reality: Dimensions determining telepresence. Journal of communication 42, 4 (1992), 73--93.
[30]
Anthony Tang and Omid Fakourfar. 2017. Watching 360° Videos Together. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, May 06--11, 2017. 4501-- 4506.
[31]
Y. Tian, Y. Kong, Q. Ruan, G. An, and Y. Fu. 2017. Hierarchical and Spatio-Temporal Sparse Representation for Human Action Recognition. IEEE Transactions on Image Processing PP, 99 (2017), 1--1.
[32]
James Traer and Josh H McDermott. 2016. Statistics of natural reverberation enable perceptual separation of sound and space. Proceedings of the National Academy of Sciences 113, 48 (2016), E7856--E7865.
[33]
Paul Viola and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, Vol. 1. IEEE, I--I.
[34]
Peter M Visscher. 2008. Sizing up human height variation. Nature genetics 40, 5 (2008), 489.
[35]
Jiulin Zhang and Xiaoqing Fu. 2015. The Influence of Background Music of Video Games on Immersion. Journal of Psychology and Psychotherapy 5, 191 (2015).
[36]
Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Dahua Lin, and Xiaoou Tang. 2017. Temporal Action Detection with Structured Segment Networks. CoRR abs/1704.06228 (2017).

Cited By

View all
  • (2024)SonoHaptics: An Audio-Haptic Cursor for Gaze-Based Object Selection in XRProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676384(1-19)Online publication date: 13-Oct-2024
  • (2024)Modeling the Impact of Head-Body Rotations on Audio-Visual Spatial Perception for Virtual Reality ApplicationsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.337211230:5(2624-2632)Online publication date: May-2024
  • (2024)Audio Description of Videos Using Machine Learning2024 IEEE 9th International Conference for Convergence in Technology (I2CT)10.1109/I2CT61223.2024.10544216(1-6)Online publication date: 5-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems
May 2019
9077 pages
ISBN:9781450359702
DOI:10.1145/3290605
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 May 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. augmented reality
  2. immersive media
  3. panorama images
  4. spatial audio
  5. virtualreality

Qualifiers

  • Research-article

Funding Sources

Conference

CHI '19
Sponsor:

Acceptance Rates

CHI '19 Paper Acceptance Rate 703 of 2,958 submissions, 24%;
Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI '25
CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)359
  • Downloads (Last 6 weeks)92
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SonoHaptics: An Audio-Haptic Cursor for Gaze-Based Object Selection in XRProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676384(1-19)Online publication date: 13-Oct-2024
  • (2024)Modeling the Impact of Head-Body Rotations on Audio-Visual Spatial Perception for Virtual Reality ApplicationsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.337211230:5(2624-2632)Online publication date: May-2024
  • (2024)Audio Description of Videos Using Machine Learning2024 IEEE 9th International Conference for Convergence in Technology (I2CT)10.1109/I2CT61223.2024.10544216(1-6)Online publication date: 5-Apr-2024
  • (2024)Modeling and Driving Human Body Soundfields Through Acoustic PrimitivesComputer Vision – ECCV 202410.1007/978-3-031-72684-2_1(1-17)Online publication date: 3-Nov-2024
  • (2023)RealityReplayProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36108887:3(1-25)Online publication date: 27-Sep-2023
  • (2023)AI Generated Content in the Metaverse: Risks and Mitigation Strategies2023 International Symposium on Networks, Computers and Communications (ISNCC)10.1109/ISNCC58260.2023.10323860(1-4)Online publication date: 23-Oct-2023
  • (2023)Empowering the Metaverse with Generative AI: Survey and Future Directions2023 IEEE 43rd International Conference on Distributed Computing Systems Workshops (ICDCSW)10.1109/ICDCSW60045.2023.00022(85-90)Online publication date: 18-Jul-2023
  • (2023)The effect of audio on the experience in virtual reality: a scoping reviewBehaviour & Information Technology10.1080/0144929X.2022.215837143:1(165-199)Online publication date: 2-Jan-2023
  • (2022)Industrial Operation Training Technology Based on Panoramic Image and Augmented Reality2022 5th World Conference on Mechanical Engineering and Intelligent Manufacturing (WCMEIM)10.1109/WCMEIM56910.2022.10021482(1216-1220)Online publication date: 18-Nov-2022
  • (2022)Implementation of Attention-Based Spatial Audio for 360° Environments2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)10.1109/ISMAR-Adjunct57072.2022.00103(491-494)Online publication date: Oct-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media