research-article

Public Access

Audible Panorama: Automatic Spatial Audio Generation for Panorama Imagery

Authors:

Lap-Fai YuAuthors Info & Claims

CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

Paper No.: 621, Pages 1 - 11

https://doi.org/10.1145/3290605.3300851

Published: 02 May 2019 Publication History

All formats PDF

Abstract

As 360 deg cameras and virtual reality headsets become more popular, panorama images have become increasingly ubiquitous. While sounds are essential in delivering immersive and interactive user experiences, most panorama images, however, do not come with native audio. In this paper, we propose an automatic algorithm to augment static panorama images through realistic audio assignment. We accomplish this goal through object detection, scene classification, object depth estimation, and audio source placement. We built an audio file database composed of over $500$ audio files to facilitate this process. We designed and conducted a user study to verify the efficacy of various components in our pipeline. We run our method on a large variety of panorama images of indoor and outdoor scenes. By analyzing the statistics, we learned the relative importance of these components, which can be used in prioritizing for power-sensitive time-critical tasks like mobile augmented reality (AR) applications.

Supplementary Material

RAR File (paper621.rar)

1. Supplementary download (132MB) https://www.dropbox.com/s/7pol4rgjt6h3zzp/Ambient_Supplementary.zip?dl=0

Download
.19 KB

MP4 File (paper621.mp4)

Supplemental video

Download
93.50 MB

MP4 File (paper621p.mp4)

Preview video

Download
2.05 MB

References

[1]

Fathima Assilmia, Yun Suen Pai, Keiko Okawa, and Kai Kunze. 2017. IN360: A 360-Degree-Video Platform to Change Students Preconceived Notions on Their Career. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, May 06--11, 2017, Extended Abstracts. 2359--2365.

Digital Library

[2]

Durand R Begault and Leonard J Trejo. 2000. 3-D sound for virtual reality and multimedia. (2000).

Digital Library

[3]

Doug A Bowman and Ryan P McMahan. 2007. Virtual reality: how much immersion is enough? Computer 40, 7 (2007).

Digital Library

[4]

Janki Dodiya and Vassil N. Alexandrov. 2007. Perspectives on Potential of Sound in Virtual Environments. HAVE 2007. IEEE International Workshop on Haptic, Audio and Visual Environments and Games (2007), 15--20.

[5]

Daniel J. Finnegan, Eamonn O'Neill, and Michael J. Proulx. 2016. Compensating for Distance Compression in Audiovisual Virtual Environments Using Incongruence. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, San Jose, CA, USA, May 7--12, 2016. 200--212.

Digital Library

[6]

Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound Technical Demo. In ACM International Conference on Multimedia (MM'13). ACM, ACM, Barcelona, Spain, 411--412.

Digital Library

[7]

François G. Germain, Gautham J. Mysore, and Takako Fujioka. 2016. Equalization matching of speech recordings in real-world environments. In IEEE ICASSP 2016.

Digital Library

[8]

Jan Gugenheimer, Dennis Wolf, Gabriel Haas, Sebastian Krebs, and Enrico Rukzio. 2016. SwiVRChair: A Motorized Swivel Chair to Nudge Users' Orientation for 360 Degree Storytelling in Virtual Reality. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, San Jose, CA, USA, May 7--12, 2016. 1996--2000.

Digital Library

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]

Claudia Hendrix and Woodrow Barfield. 1996. The sense of presence within auditory virtual environments. Presence: Teleoperators & Virtual Environments 5, 3 (1996), 290--301.

Digital Library

[11]

Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, and Kevin Murphy. 2017. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21--26, 2017. 3296--3297.

[12]

FZ. Kaghat, C. Le Prado, A. Damala, and P. Cubaud. 2009. Experimenting with Sound Immersion in an Arts and Crafts Museum. In: Natkin S., Dupire J. (eds) Entertainment Computing - ICEC 2009. 5709 (2009).

Digital Library

[13]

Arun Kulshreshth and Joseph J. LaViola Jr. 2016. Dynamic Stereoscopic 3D Parameter Adjustment for Enhanced Depth Discrimination. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, San Jose, CA, USA, May 7--12, 2016. 177--187.

Digital Library

[14]

Dingzeyu Li, Timothy R. Langlois, and Changxi Zheng. 2018. SceneAware Audio for 360°Videos. ACM Trans. Graph. (SIGGRAPH) 37, 4 (2018).

Digital Library

[15]

Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr DollÃar. 2014. Microsoft COCO: Common Objects in Context. eprint arXiv 1405.0312 (2014).

[16]

Yen-Chen Lin, Yung-Ju Chang, Hou-Ning Hu, Hsien-Tzu Cheng, ChiWen Huang, and Min Sun. 2017. Tell Me Where to Look: Investigating Ways for Assisting Focus in 360° Video. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, May 06--11, 2017. 2535--2545.

Digital Library

[17]

Josh H McDermott, Michael Schemitsch, and Eero P Simoncelli. 2013. Summary statistics in auditory perception. Nature neuroscience 16, 4 (2013), 493--498.

[18]

Pedro Morgado, Nuno Vasconcelos, Timothy Langlois, and Oliver Wang. 2018. Self-Supervised Generation of Spatial Audio for 360°Video. In Neural Information Processing Systems (NeurIPS).

Digital Library

[19]

Cuong Nguyen, Stephen DiVerdi, Aaron Hertzmann, and Feng Liu. 2017. CollaVR: Collaborative In-Headset Review for VR Video. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, UIST 2017, Quebec City, QC, Canada, October 22 - 25, 2017. 267--277.

Digital Library

[20]

James F O'Brien, Chen Shen, and Christine M Gatchalian. 2002. Synthesizing sounds from rigid-body simulations. In Proceedings of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animation. ACM, 175--181.

Digital Library

[21]

Masashi Okada, Takao Onoye, and Wataru Kobayashi. 2012. A ray tracing simulation of sound diffraction based on the analytic secondary source model. IEEE Transactions on Audio, Speech, and Language Processing 20, 9 (2012), 2448--2460.

Digital Library

[22]

Amy Pavel, Björn Hartmann, and Maneesh Agrawala. 2017. Shot Orientation Controls for Interactive Cinematography with 360 Video. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, UIST 2017, Quebec City, QC, Canada, October 22 - 25, 2017. 289--297.

Digital Library

[23]

Vanessa C. Pope, Robert Dawes, Florian Schweiger, and Alia Sheikh. 2017. The Geometry of Storytelling: Theatrical Use of Space for 360degree Videos and Virtual Reality. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, May 06--11, 2017. 4468--4478.

Digital Library

[24]

Nikunj Raghuvanshi, John Snyder, Ravish Mehra, Ming Lin, and Naga Govindaraju. 2010. Precomputed wave simulation for real-time sound Audible Panorama: Automatic Spatial Audio Generation for Panorama Imagery CHI 2019, May 4--9, 2019, Glasgow, Scotland UK propagation of dynamic sources in complex scenes. ACM Transactions on Graphics (TOG) 29, 4 (2010), 68.

Digital Library

[25]

Katja Rogers, Giovanni Ribeiro, Rina R. Wehbe, Michael Weber, and Lennart E. Nacke. 2018. Vanishing Importance: Studying Immersive Effects of Game Audio Perception on Player Experiences in Virtual Reality. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018, Montreal, QC, Canada, April 21--26, 2018. 328.

Digital Library

[26]

Manolis Savva, Angel X. Chang, Gilbert Bernstein, Christopher D. Manning, and Pat Hanrahan. 2014. On Being the Right Scale: Sizing Large Collections of 3D Models. In SIGGRAPH Asia 2014 Workshop on Indoor Scene Understanding: Where Graphics meets Vision.

Digital Library

[27]

Eldon Schoop, James Smith, and Bjoern Hartmann. 2018. HindSight: Enhancing Spatial Awareness by Sonifying Detected Objects in RealTime 360-Degree Video. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018. 143.

Digital Library

[28]

K. Sharma, A. C. Kumar, and S. M. Bhandarkar. 2017. Action Recognition in Still Images Using Word Embeddings from Natural Language Descriptions. In 2017 IEEE Winter Applications of Computer Vision Workshops (WACVW). 58--66.

[29]

Jonathan Steuer. 1992. Defining virtual reality: Dimensions determining telepresence. Journal of communication 42, 4 (1992), 73--93.

[30]

Anthony Tang and Omid Fakourfar. 2017. Watching 360° Videos Together. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, May 06--11, 2017. 4501-- 4506.

Digital Library

[31]

Y. Tian, Y. Kong, Q. Ruan, G. An, and Y. Fu. 2017. Hierarchical and Spatio-Temporal Sparse Representation for Human Action Recognition. IEEE Transactions on Image Processing PP, 99 (2017), 1--1.

Digital Library

[32]

James Traer and Josh H McDermott. 2016. Statistics of natural reverberation enable perceptual separation of sound and space. Proceedings of the National Academy of Sciences 113, 48 (2016), E7856--E7865.

[33]

Paul Viola and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, Vol. 1. IEEE, I--I.

[34]

Peter M Visscher. 2008. Sizing up human height variation. Nature genetics 40, 5 (2008), 489.

[35]

Jiulin Zhang and Xiaoqing Fu. 2015. The Influence of Background Music of Video Games on Immersion. Journal of Psychology and Psychotherapy 5, 191 (2015).

[36]

Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Dahua Lin, and Xiaoou Tang. 2017. Temporal Action Detection with Structured Segment Networks. CoRR abs/1704.06228 (2017).

Cited By

Cho HSendhilnathan NNebeling MWang TPadmanabhan PBrowder JLindlbauer DJonker TTodi K(2024)SonoHaptics: An Audio-Haptic Cursor for Gaze-Based Object Selection in XRProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676384(1-19)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676384
Bernal-Berdun EVallejo MSun QSerrano AGutierrez D(2024)Modeling the Impact of Head-Body Rotations on Audio-Visual Spatial Perception for Virtual Reality ApplicationsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.337211230:5(2624-2632)Online publication date: May-2024
https://doi.org/10.1109/TVCG.2024.3372112
Roy RManepalli SRajan SMishra SSampatrao G(2024)Audio Description of Videos Using Machine Learning2024 IEEE 9th International Conference for Convergence in Technology (I2CT)10.1109/I2CT61223.2024.10544216(1-6)Online publication date: 5-Apr-2024
https://doi.org/10.1109/I2CT61223.2024.10544216
Show More Cited By

Index Terms

Audible Panorama: Automatic Spatial Audio Generation for Panorama Imagery
1. Applied computing
  1. Arts and humanities
    1. Sound and music computing
2. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Virtual reality

Recommendations

Investigating the Sense of Presence Between Handcrafted and Panorama Based Virtual Environments
MuC '21: Proceedings of Mensch und Computer 2021

Virtual Reality applications are becoming increasingly mature. The requirements and complexity of such systems is steadily increasing. Realistic and detailed environments are often omitted in order to concentrate on the interaction possibilities within ...
Captivating the Senses: Crafting a Multisensory Virtual Experience for Enhanced Realism
UbiComp/ISWC '23 Adjunct: Adjunct Proceedings of the 2023 ACM International Joint Conference on Pervasive and Ubiquitous Computing & the 2023 ACM International Symposium on Wearable Computing

This paper explores the challenges and opportunities in achieving a high level of realism and presence in Virtual Reality (VR) experiences. Realism and presence, which refer to the level of authenticity of the virtual environment and the user’s ...
A system for real-time panorama generation and display in tele-immersive applications

Wide field-of-view (FOV) is necessary for many industrial applications, such as air traffic control, large vehicle driving and navigation. Unfortunately, the supporting structure/frame in most systems usually blocks part of the view, results in "blind ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

May 2019

9077 pages

ISBN:9781450359702

DOI:10.1145/3290605

General Chairs:
Stephen Brewster
University of Glasgow, Scotland, UK
,
Geraldine Fitzpatrick
TU Wien, Austria
,
Program Chairs:
Anna Cox
University College London, UK
,
Vassilis Kostakos
University of Melbourne, Australia

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 May 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

CHI '19

Sponsor:

SIGCHI

CHI '19: CHI Conference on Human Factors in Computing Systems

May 4 - 9, 2019

Glasgow, Scotland Uk

Acceptance Rates

CHI '19 Paper Acceptance Rate 703 of 2,958 submissions, 24%;

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI '25

Sponsor:
sigchi

CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
1,513
Total Downloads

Downloads (Last 12 months)359
Downloads (Last 6 weeks)92

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cho HSendhilnathan NNebeling MWang TPadmanabhan PBrowder JLindlbauer DJonker TTodi K(2024)SonoHaptics: An Audio-Haptic Cursor for Gaze-Based Object Selection in XRProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676384(1-19)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676384
Bernal-Berdun EVallejo MSun QSerrano AGutierrez D(2024)Modeling the Impact of Head-Body Rotations on Audio-Visual Spatial Perception for Virtual Reality ApplicationsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.337211230:5(2624-2632)Online publication date: May-2024
https://doi.org/10.1109/TVCG.2024.3372112
Roy RManepalli SRajan SMishra SSampatrao G(2024)Audio Description of Videos Using Machine Learning2024 IEEE 9th International Conference for Convergence in Technology (I2CT)10.1109/I2CT61223.2024.10544216(1-6)Online publication date: 5-Apr-2024
https://doi.org/10.1109/I2CT61223.2024.10544216
Huang CMarković DXu CRichard A(2024)Modeling and Driving Human Body Soundfields Through Acoustic PrimitivesComputer Vision – ECCV 202410.1007/978-3-031-72684-2_1(1-17)Online publication date: 3-Nov-2024
https://doi.org/10.1007/978-3-031-72684-2_1
Cho HKomar MLindlbauer D(2023)RealityReplayProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36108887:3(1-25)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1145/3610888
Basyoni LQadir J(2023)AI Generated Content in the Metaverse: Risks and Mitigation Strategies2023 International Symposium on Networks, Computers and Communications (ISNCC)10.1109/ISNCC58260.2023.10323860(1-4)Online publication date: 23-Oct-2023
https://doi.org/10.1109/ISNCC58260.2023.10323860
Qin HHui P(2023)Empowering the Metaverse with Generative AI: Survey and Future Directions2023 IEEE 43rd International Conference on Distributed Computing Systems Workshops (ICDCSW)10.1109/ICDCSW60045.2023.00022(85-90)Online publication date: 18-Jul-2023
https://doi.org/10.1109/ICDCSW60045.2023.00022
Bosman IBuruk OJørgensen KHamari J(2023)The effect of audio on the experience in virtual reality: a scoping reviewBehaviour & Information Technology10.1080/0144929X.2022.215837143:1(165-199)Online publication date: 2-Jan-2023
https://doi.org/10.1080/0144929X.2022.2158371
Chen MWei HHu HLiu RGeng J(2022)Industrial Operation Training Technology Based on Panoramic Image and Augmented Reality2022 5th World Conference on Mechanical Engineering and Intelligent Manufacturing (WCMEIM)10.1109/WCMEIM56910.2022.10021482(1216-1220)Online publication date: 18-Nov-2022
https://doi.org/10.1109/WCMEIM56910.2022.10021482
Nassani ABarde ABai HNanayakkara SBillinghurst M(2022)Implementation of Attention-Based Spatial Audio for 360° Environments2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)10.1109/ISMAR-Adjunct57072.2022.00103(491-494)Online publication date: Oct-2022
https://doi.org/10.1109/ISMAR-Adjunct57072.2022.00103
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents