In The Search of The Right Format For Immersive Sound

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/329191658
In the search of the right format for immersive sound
Conference Paper · October 2018
CITATIONS READS
0 177
1 author:
Jorge Medina Victoria

Darmstadt University of Applied Sciences
9 PUBLICATIONS 5 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Musicians' latency perception View project
All content following this page was uploaded by Jorge Medina Victoria on 15 January 2019.
The user has requested enhancement of the downloaded file.

The Global Composition 2018 Conference on Sound, Ecology, and Media Culture
Darmstadt-Dieburg/Germany, October 04 - 07, 2018
In the search of the right format for immersive sound
Jorge Medina Victoria

(Jorge.medina.victoria@h-da.de)
University of Applied Sciences Darmstadt, (Department of Media)
Germany
ABSTRACT
Immersive experiences are becoming the state of the art to deliver an artistic message with
audio content. There are different technological approaches to achieve three-dimensional
sound. The right choice depends on the recording, distribution and playback format in
addition to the technical characteristics, the usability of the format and its integration with
established technologies. In this paper, a broad summary of the three current dominant
technologies such as channel-based systems, ambisonics and object-based systems are
presented and compared with regard to human perception.
INTRODUCTION
Immersive audio is a concept widely used to describe techniques that deliver spatial audio
content in a setup around the listener [7] or through headphones. The majority of those
techniques and formats, which are also referred to as 3D–Audio, present audio information
not only in the horizontal plane but also with elevated sound sources and even in some cases
with sources under the horizontal listening plane.
To achieve complete immersion in a virtual world, it is necessary to fool or trick the senses;
the role of vision is predominant. However, sound information reinforces and supports
everything that is not seen [8] and is a fundamental part of the creation of a virtual world.
This paper is a short review of the most relevant methods used to capture and reproduce
immersive sound. An overview of the advantages and drawbacks relative to the different
techniques may facilitate the choice of an audio setup, regarding the available resources and
the expected acoustic environment for the listener.
FORMATS FOR IMMERSIVE AUDIO
Immersive audio content can be delivered mainly in three different formats: channel-based,
scene-based and object-based [7]. There are also mixed approaches, which depend strongly on
the context and combination of advantages of the different methods.
The normal approach is to use the same method from capture to playback of the audio
material. However, it is possible to use for the capture or recording of the sound a different
method as for its presentation or reproduction [2].
Channel-based
This approach is based mainly on amplitude panning. A straightforward relationship between
audio streams and loudspeakers is assumed [2], [7]. It is the traditional way of producing
sound for music recordings, film, etc. Level or amplitude differences between loudspeakers
provide a phantom source and therefore sound can be localized. It works well in the
horizontal plane [2]. The production of the amplitude differences through different methods
such as panning or using the faders of a mixer can be extended to setups with vertical
loudspeakers. However, localization in the vertical plane does not work as well as in the
horizontal plane and is very dependent on the sound source [5].
Figure 1: Channel-based capture and reproduction.
In some setups, amplitude panning plays no role at all. The setup developed by Morten
Lindberg and known under the name “2L-Cube” is an example of a 1:1 approach. The signals
recorded by two microphone arrays (Omni-pressure microphones) are routed directly to the
loudspeakers [4]. The first array in the horizontal plane consist of five microphones, the
second, a microphone array above the horizontal array, consist of four microphones. Both
arrays match the number of loudspeakers in the horizontal and height layers respectively.
The diversity of recording techniques and the robustness of formats such as 5.1 are some of
the advantages of the channel-based approach. On the other hand, a downside is the
reproduction setup, especially for immersive sound. The production and recording of sound
sources is very dependent on the reproduction setup, which is not flexible. The worst-case
scenario implies individual production approaches in the mix for delivering sound sources to
different loudspeaker arrays.
Object-based
For this sound representation, sound objects and the information about them are separated.
The ancillary information, which describes parameters such as spatial information, is known
as metadata [7]. The artistic intention is expressed through the metadata by indicating how the
audio elements should be translated in the reproduction system. Metadata can be generated by
a mixer or automatically [8]. In the final stage an audio rendering engine deliver a data stream
for the different formats from mono to 3D-Audio.
Figure 2: Object-based capture and reproduction for different formats.
The object-based method is an alternative that is growing rapidly in acceptance because of its
flexibility. Distribution is possible to a number of different end-devices such as smartphones
or tablets. Reproduction is achieved for loudspeakers in different configurations or binaural
over headphones [7]. This method works well, especially when the sound sources are
singular elements as in the case of monophonic audio sources.
Advantages
• Independence of the loudspeaker setup
• Personalization and interactivity
Drawbacks
• Demanding workflow
• Preferably monophonic sources
Ambisonics
The development of ambisonics started over four decades ago. The idea behind is the control
and representation of a sound field in a specific position for a loudspeaker arrangement and its
decomposition in spherical harmonics [1]. The sound scene is directionally encoded. The
original mathematical framework enabled First Order Ambisonics (FOA). A further
development is known as Higher Order Ambisonics (HOA).
Figure 3: Ambisonics spherical harmonics up to 2nd order.
Producing audio material for the ambisonics scene-based representation normally involves
two stages: encoding and decoding. The flexibility and manipulability of those stages makes
ambisonics a very interesting method for the recording and reproduction of 3D-Audio. In the
encoding process, audio signals are captured through an A-Format microphone. This four
channel signal can be easily decoded in the B-Format or even into Binaural, which makes
ambisonics a very practical tool for virtual reality applications.
Figure 4: A-Format microphone.
Some advantages of ambisonics are:
• Few signals are required to encode a complete three-dimensional sound field

• Using First-Order Ambisonics (FOA) only four loudspeakers are necessary
• Diffuse sounds, spread sources and reverberated sources are well represented
Some of the drawbacks of the ambisonics sound field representation are:
• Sweet spot is too narrow.

• Localization accuracy is weak for pinpointing sources
• Better spatial resolution requires an increasing number of channels. HOA.
HOW TO ACHIEVE IMMERSION
Producing immersive sound is possible with different approaches. However, it is relevant to

know which effect is expected on the listener. The technical resources may play also an
important role when deciding which tool should be used, not only for the process of capturing
the sound but also for its reproduction. Representation of audio material such as music,
acoustical environments (natural or artificial) as well as point sound sources should be
planned before beginning any project. Some studies based on listening tests [3], [6] have
shown that FOA represents diffuse audio information in a satisfactory way. On the other hand,
reproducing acoustically recorded music works better on channel-based systems with the
previously mentioned disadvantages.
For virtual reality, object-based and scene-based methods are the most flexible options. In the
case of ambisonics, its mathematical approach enables an easy manipulation (e.g. rotation) of
the signal in a three dimensional plane, which is very practical when using head mounted
displays and headphones [8]. Additional to the advantages of scene-base methods, object-
based methods can produce complex dynamic soundscapes. The metadata, which defines
position, level, width and other attributes of the sound source can be modified dynamically
according to the scene or the artistic intention [2], [8].
DISCUSSION
Some more relevant attributes of immersive sound are envelopment, localization, a large
sweet spot, decorrelation of signals and timbre. There is no single technical approach that
could deliver every single feature in an appropriate way in order to produce the sense of
immersion. The right format is always a compromise between availability of resources, time
and the desired results. This paper is an overview of the most representative methods to
produce immersive audio and presents information that may be helpful in the process of
choosing a technical format to develop an acoustic idea.
REFERENCES
[1] Ahrens, Jens, Analytic Methods of Sound Field Synthesis, Springer-Verlag, Berlin
Heidelberg, 2012
[2] Cengarle, Giulio, 3D audio technologies: applications to sound capture, post-production

and listener perception, Tesi Doctoral Universitat Pompeu Fabra, Barcelona 2012.
[ 3] Fischer Cédric, Zingler Dominik, Medina Victoria, Jorge Enrique, Extending the Double-
MS configuration for 3D audio requirements, VDT Tonmeistertagung, Cologne, Germany,
Nov. 2016.
[4] Kim, Sungyoung, Chapter 7 Height Channels, Immersive Sound, The art and science of
binaural and multi-channel audio / edited by Agnieszka Roginska and Paul Geluso, Focal
Press, New York, 2018.
[5] Lee, Hyunkook, The Relationship Between Interchannel Time and Level Differences in
Vertical Sound Localization and Masking, Audio Engineering Society Convention Paper 131st,
New York, NY, USA, 2011.
[6] Riitano, Lucca; Medina Victoria, Jorge Enrique, Comparison between Different
Microphone Arrays for 3D-Audio, Audio Engineering Society Convention Paper 144th,
Milan, Italy, 2018.
[7] Rumsey, Francis, Immersive Audio: Objects, Mixing, and Rendering, Audio Engineering
Society Journal, Volume 64, Number 7/8, 2016, pp. 584-588.
[8] Susal, Joel; Krauss, Kurt; Tsingos, Nicolas; Altman, Marcus, Immersive Audio for VR,
Audio Engineering Society, Audio for Virtual and Augmented Reality Conference Paper, Sep
30 – Oct 1, 2016, Los Angeles, CA, USA.
View publication stats

In The Search of The Right Format For Immersive Sound

Uploaded by

Copyright:

Available Formats

In The Search of The Right Format For Immersive Sound

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

In The Search of The Right Format For Immersive Sound

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

In the search of the right format for immersive sound

Conference Paper · October 2018

Jorge Medina Victoria

Musicians' latency perception View project

The user has requested enhancement of the downloaded file.

In the search of the right format for immersive sound

Jorge Medina Victoria

FORMATS FOR IMMERSIVE AUDIO

Figure 1: Channel-based capture and reproduction.

Figure 2: Object-based capture and reproduction for different formats.

Figure 4: A-Format microphone.

Some advantages of ambisonics are:

• Few signals are required to encode a complete three-dimensional sound field

Some of the drawbacks of the ambisonics sound field representation are:

• Sweet spot is too narrow.

Producing immersive sound is possible with different approaches. However, it is relevant to

[2] Cengarle, Giulio, 3D audio technologies: applications to sound capture, post-production

View publication stats

You might also like