Nothing Special   »   [go: up one dir, main page]

In The Search of The Right Format For Immersive Sound

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/329191658

In the search of the right format for immersive sound

Conference Paper · October 2018

CITATIONS READS

0 177

1 author:

Jorge Medina Victoria


Darmstadt University of Applied Sciences
9 PUBLICATIONS   5 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Musicians' latency perception View project

All content following this page was uploaded by Jorge Medina Victoria on 15 January 2019.

The user has requested enhancement of the downloaded file.


The Global Composition 2018 Conference on Sound, Ecology, and Media Culture
Darmstadt-Dieburg/Germany, October 04 - 07, 2018

In the search of the right format for immersive sound

Jorge Medina Victoria


(Jorge.medina.victoria@h-da.de)
University of Applied Sciences Darmstadt, (Department of Media)
Germany

ABSTRACT

Immersive experiences are becoming the state of the art to deliver an artistic message with
audio content. There are different technological approaches to achieve three-dimensional
sound. The right choice depends on the recording, distribution and playback format in
addition to the technical characteristics, the usability of the format and its integration with
established technologies. In this paper, a broad summary of the three current dominant
technologies such as channel-based systems, ambisonics and object-based systems are
presented and compared with regard to human perception.

INTRODUCTION

Immersive audio is a concept widely used to describe techniques that deliver spatial audio
content in a setup around the listener [7] or through headphones. The majority of those
techniques and formats, which are also referred to as 3D–Audio, present audio information
not only in the horizontal plane but also with elevated sound sources and even in some cases
with sources under the horizontal listening plane.

To achieve complete immersion in a virtual world, it is necessary to fool or trick the senses;
the role of vision is predominant. However, sound information reinforces and supports
everything that is not seen [8] and is a fundamental part of the creation of a virtual world.

This paper is a short review of the most relevant methods used to capture and reproduce
immersive sound. An overview of the advantages and drawbacks relative to the different
techniques may facilitate the choice of an audio setup, regarding the available resources and
the expected acoustic environment for the listener.

FORMATS FOR IMMERSIVE AUDIO

Immersive audio content can be delivered mainly in three different formats: channel-based,
scene-based and object-based [7]. There are also mixed approaches, which depend strongly on
the context and combination of advantages of the different methods.
The normal approach is to use the same method from capture to playback of the audio
material. However, it is possible to use for the capture or recording of the sound a different
method as for its presentation or reproduction [2].

Channel-based
This approach is based mainly on amplitude panning. A straightforward relationship between
audio streams and loudspeakers is assumed [2], [7]. It is the traditional way of producing
sound for music recordings, film, etc. Level or amplitude differences between loudspeakers
provide a phantom source and therefore sound can be localized. It works well in the
horizontal plane [2]. The production of the amplitude differences through different methods
such as panning or using the faders of a mixer can be extended to setups with vertical
loudspeakers. However, localization in the vertical plane does not work as well as in the
horizontal plane and is very dependent on the sound source [5].

Figure 1: Channel-based capture and reproduction.

In some setups, amplitude panning plays no role at all. The setup developed by Morten
Lindberg and known under the name “2L-Cube” is an example of a 1:1 approach. The signals
recorded by two microphone arrays (Omni-pressure microphones) are routed directly to the
loudspeakers [4]. The first array in the horizontal plane consist of five microphones, the
second, a microphone array above the horizontal array, consist of four microphones. Both
arrays match the number of loudspeakers in the horizontal and height layers respectively.

The diversity of recording techniques and the robustness of formats such as 5.1 are some of
the advantages of the channel-based approach. On the other hand, a downside is the
reproduction setup, especially for immersive sound. The production and recording of sound
sources is very dependent on the reproduction setup, which is not flexible. The worst-case
scenario implies individual production approaches in the mix for delivering sound sources to
different loudspeaker arrays.

Object-based
For this sound representation, sound objects and the information about them are separated.
The ancillary information, which describes parameters such as spatial information, is known
as metadata [7]. The artistic intention is expressed through the metadata by indicating how the
audio elements should be translated in the reproduction system. Metadata can be generated by
a mixer or automatically [8]. In the final stage an audio rendering engine deliver a data stream
for the different formats from mono to 3D-Audio.

Figure 2: Object-based capture and reproduction for different formats.

The object-based method is an alternative that is growing rapidly in acceptance because of its
flexibility. Distribution is possible to a number of different end-devices such as smartphones
or tablets. Reproduction is achieved for loudspeakers in different configurations or binaural
over headphones [7]. This method works well, especially when the sound sources are
singular elements as in the case of monophonic audio sources.

Advantages
• Independence of the loudspeaker setup
• Personalization and interactivity

Drawbacks
• Demanding workflow
• Preferably monophonic sources

Ambisonics
The development of ambisonics started over four decades ago. The idea behind is the control
and representation of a sound field in a specific position for a loudspeaker arrangement and its
decomposition in spherical harmonics [1]. The sound scene is directionally encoded. The
original mathematical framework enabled First Order Ambisonics (FOA). A further
development is known as Higher Order Ambisonics (HOA).
Figure 3: Ambisonics spherical harmonics up to 2nd order.

Producing audio material for the ambisonics scene-based representation normally involves
two stages: encoding and decoding. The flexibility and manipulability of those stages makes
ambisonics a very interesting method for the recording and reproduction of 3D-Audio. In the
encoding process, audio signals are captured through an A-Format microphone. This four
channel signal can be easily decoded in the B-Format or even into Binaural, which makes
ambisonics a very practical tool for virtual reality applications.

Figure 4: A-Format microphone.

Some advantages of ambisonics are:

• Few signals are required to encode a complete three-dimensional sound field


• Using First-Order Ambisonics (FOA) only four loudspeakers are necessary
• Diffuse sounds, spread sources and reverberated sources are well represented

Some of the drawbacks of the ambisonics sound field representation are:

• Sweet spot is too narrow.


• Localization accuracy is weak for pinpointing sources
• Better spatial resolution requires an increasing number of channels. HOA.
HOW TO ACHIEVE IMMERSION

Producing immersive sound is possible with different approaches. However, it is relevant to


know which effect is expected on the listener. The technical resources may play also an
important role when deciding which tool should be used, not only for the process of capturing
the sound but also for its reproduction. Representation of audio material such as music,
acoustical environments (natural or artificial) as well as point sound sources should be
planned before beginning any project. Some studies based on listening tests [3], [6] have
shown that FOA represents diffuse audio information in a satisfactory way. On the other hand,
reproducing acoustically recorded music works better on channel-based systems with the
previously mentioned disadvantages.

For virtual reality, object-based and scene-based methods are the most flexible options. In the
case of ambisonics, its mathematical approach enables an easy manipulation (e.g. rotation) of
the signal in a three dimensional plane, which is very practical when using head mounted
displays and headphones [8]. Additional to the advantages of scene-base methods, object-
based methods can produce complex dynamic soundscapes. The metadata, which defines
position, level, width and other attributes of the sound source can be modified dynamically
according to the scene or the artistic intention [2], [8].

DISCUSSION

Some more relevant attributes of immersive sound are envelopment, localization, a large
sweet spot, decorrelation of signals and timbre. There is no single technical approach that
could deliver every single feature in an appropriate way in order to produce the sense of
immersion. The right format is always a compromise between availability of resources, time
and the desired results. This paper is an overview of the most representative methods to
produce immersive audio and presents information that may be helpful in the process of
choosing a technical format to develop an acoustic idea.

REFERENCES

[1] Ahrens, Jens, Analytic Methods of Sound Field Synthesis, Springer-Verlag, Berlin
Heidelberg, 2012

[2] Cengarle, Giulio, 3D audio technologies: applications to sound capture, post-production


and listener perception, Tesi Doctoral Universitat Pompeu Fabra, Barcelona 2012.

[ 3] Fischer Cédric, Zingler Dominik, Medina Victoria, Jorge Enrique, Extending the Double-
MS configuration for 3D audio requirements, VDT Tonmeistertagung, Cologne, Germany,
Nov. 2016.
[4] Kim, Sungyoung, Chapter 7 Height Channels, Immersive Sound, The art and science of
binaural and multi-channel audio / edited by Agnieszka Roginska and Paul Geluso, Focal
Press, New York, 2018.

[5] Lee, Hyunkook, The Relationship Between Interchannel Time and Level Differences in
Vertical Sound Localization and Masking, Audio Engineering Society Convention Paper 131st,
New York, NY, USA, 2011.

[6] Riitano, Lucca; Medina Victoria, Jorge Enrique, Comparison between Different
Microphone Arrays for 3D-Audio, Audio Engineering Society Convention Paper 144th,
Milan, Italy, 2018.

[7] Rumsey, Francis, Immersive Audio: Objects, Mixing, and Rendering, Audio Engineering
Society Journal, Volume 64, Number 7/8, 2016, pp. 584-588.

[8] Susal, Joel; Krauss, Kurt; Tsingos, Nicolas; Altman, Marcus, Immersive Audio for VR,
Audio Engineering Society, Audio for Virtual and Augmented Reality Conference Paper, Sep
30 – Oct 1, 2016, Los Angeles, CA, USA.

View publication stats

You might also like