2012 Spatosc Icmc Final
2012 Spatosc Icmc Final
2012 Spatosc Icmc Final
net/publication/291154227
CITATIONS READS
4 702
5 authors, including:
All content following this page was uploaded by Mike Wozniewski on 27 May 2017.
Mike Wozniewski, Zack Settel† , Alexandre Quessy, Tristan Matthews, Luc Courchesne
La Société des arts technologiques [SAT]
Montréal, Québec, Canada
† Universitéde Montréal, Faculté de Musique
Montréal, Québec, Canada
rather, each loudspeaker provides a localized rendering of necessarily connected to every listener, and there is an ex-
a certain portion of the wall. In such a case, a listener plicit CONNECTION class that is used to describe both the
would need to be defined for each loudspeaker. logical transfer of sound from one location to another, and
Sound sources on the other hand are generative, and also the physical modelling effects involved in the trans-
emit sound energy into the virtual scene at particular loca- fer. The C ONNECTION contains information about dis-
tions. Their signals may derive from pre-recorded sound tance, attenuation of gain resulting from sound field prop-
files, remote streams, realtime sound synthesis, or live in- agation, the time delay incurred for sound to travel that
put signals from audio hardware or software busses. distance, and other physically-modelled properties.
The following properties may be associated with sound A benefit of maintaining explicit connections includes
sources and listeners: the ability to temporarily disable sections of the scene, or
to provide unique sounds to some listeners without others
S PATIAL P OSE : Both sources and listeners have 6 degrees of
freedom: they are positionable in cartesian (x, y, z) space, and
hearing them (even if they are close by). However, the
they have an orientation which can be described either by a di- main benefit is the ability to customize spatial effects for
rection vector, Euler angles (pitch, roll, yaw), or quaternions. some connections differently than others.
Positions are always defined relative to a global coordinate
system. 3.3. Manipulation of connection effects
R ADIUS : A radius parameter allows a sound source to occupy a
In order to provide artists with flexibility for experimenta-
volumetric region about its origin instead of just an infinitesi-
tion and non-standard audio scene interaction, the C ON -
mally small point. If a listener node enters within this radius,
NECTION object provides the ability to tune (enable, dis-
all spatial effects are disabled and the sound plays naturally,
with unity gain and no filtering. A similar parameter is often able, scale) several spatial effects, and thus provides fine
found in sound panning systems under names like spread, dif- grained controls for spatialization that is not typically pos-
fusion, blur, etc.. By default, the radius for all nodes is set to sible with other libraries. The D ISTANCE FACTOR speci-
zero. fies the extent to which distance attenuation and absorp-
URI: For sound sources, the URI describes the media type and tion effects should be applied. The D IRECTIVITY FACTOR
location emitted at that location. This could be a simple sound provides a parameter to scale the effect of sound directiv-
file reference (file://loop.wav), a live input channel on ity. And the D OPPLER FACTOR allows for Doppler shift
the soundcard (adc://1), a stream (http://stream.mp3), to be minimized or emphasized, which is particularly use-
or a plugin (plugin://looper~.pd). ful in order to preserve timing and intonation in musical
E XTENSIONS : Each node can also be extended with arbitrary contexts.
key-value parameter pairs that describe additional featuresthat
may be required by specialized systems or SPEs (e.g higher- 3.4. Internal Conversions & Computations
level spatialization or synthesis parameters).
Given that SpatOSC needs to be able to supply a number
3.2. Connections of different parameters to different types of spatializers,
we often need to convert, scale, or apply transformations
One of the biggest differences between SpatOSC and other to the internal data. Consider for example, the gain at-
spatial audio representations is that sound sources are not tenuation of a signal as a result of distance. Amplitude
panning systems (VBAP, etc.) may only do panning and
do not support simulation of distance effects. For con-
venience, SpatOSC computes a gain coefficient for each
connection using the following formula (where A and B
are the positions of the source and listener):
1
g= β
(1)
(1 + |B − A|) 0.5
The result, in the range of [0,1], represents the amount
by which the amplitude of the signal should be scaled to
approximate the decay of sound due to distance. The D IS -
TANCE FACTOR , β , helps to control the steepness of the
exponential decay. With this parameter, one may gradu- Figure 2. The directivity of a sound source is defined by a
ally transition from zero distance attenuation (β = 0) to direction vector and sampled attenuation values at various angles
the inverse square law that sound intensity observes in (stored in a table).
nature (β = 1), and even beyond in order to create hyper-
localized sounds (β > 1).
case, the directivity has a cardioid shape, which is easily
There are also helper functions available to convert be- defined with a mathematical function:4
tween units. For instance, one might want to know the
gain value in decibels rather than amplitude gain, or the (1 + cos (α)) γ
position of a sound source in radial units (AED: azimuth, f (α) = (2)
2
elevation, distance) instead of cartesian units (x, y, z). One
might also require sound source positions to be relative to We can sample this function (with γ = 1) every few
a listener instead of the global coordinate system. In such degrees and store the values in the an attenuation table for
a case, the C ONNECTION object can be queried and the simple lookup during runtime. However, more abstract di-
relative radial or cartesian coordinates can be retrieved. rectivity patterns that are not easily represented by math-
Conversions between coordinate systems are also sup- ematical functions can also be stored.
ported. SpatOSC provides transformations for the entire During runtime, the D IRECTIVITY FACTOR parameter
scene that allows conversion between right- and left-handed γ can be used to apply a scaled reading from the attenu-
coordinate systems, flipping axes, and rotating the global ation table. When γ = 0, we read only the first element
coordinate system. from the table, reducing the effect of directivity. Then we
read exponentially faster through the table as γ increases,
such that when γ = 1 the reading becomes perfectly lin-
3.5. Directivity ear, and tends towards hyper-directivity with higher val-
ues. Of course, this only works when directivity functions
The representation of sound source directivity is poten-
are monotonically decreasing from the direction vector.
tially complex, and not surprisingly, one of the least stan-
dardized aspects of spatial audio scene description. In
fact, most spatializers only support omnidirectional sound 3.6. Translators
sources, while others represent directivity patterns with SpatOSC’s internal state can be converted for use in SPEs
parametric functions or simplified models such as cones such as Pure Data, and various third-party audio spatializ-
or ellipsoids. The goal of SpatOSC is to be able to store ers via the use of spatializer-specific “translators.” These
directivity in such a form that it is convertible into any of are implemented as plugins with “lazy loading,” so that
these other formats. they are only loaded at runtime when the user requests
For SpatOSC, we use axisymmetric attenuation tables a particular translator. Developers can thus create new
that specify sound intensity at different angles from the translators for new pieces of hardware and software, or
sound source’s direction vector. These tables are used for even for a particularly esoteric performance or installa-
both the horizontal and vertical directions, meaning there tion.
are two attenuation tables representing orthogonal angles On the implementation level, the developer of a trans-
from 0 to 180 degrees. We use variable sampling instead lator needs to define how a translator responds to notifi-
of fixed sampling, allowing complex patterns to be de- cations from the SpatOSC scheduler. The translator must
scribed with more detail while an omnidirectional source be coded so that it extracts what it needs from the internal
can be defined with a single table entry of 1.0 (instead of scene representation, using any of the conversion func-
repeating the value many times). tions provided, and sends OSC messages to its remote
Figure 2 shows an illustration of a source signal that 4 Note that γ = 1.0 produces a normal cardioid, γ = 0 flattens the shape
radiates with unity gain along its direction vector and ex- resulting in omni-directional radiation, and γ > 1.0 results in a hyper-
hibits varying levels of gain attenuation at different angles cardioid. This results in a single parameter in order to change from an
of incidence, α, away from the direction vector. In this omni-directional to highly directional sound source.
spatializer.5 In some cases, only relative positions or an- it is a difficult task to create a standard spatial audio for-
gles will be sent, while in other cases, delay times and mat. Instead, we have developed an open source software
filter coefficients might be included as well. It all depends library with an extensible architecture based on transla-
on the requirements and abilities of the remote technol- tor plugins, which can accommodate a range of technolo-
ogy. gies. Networking via OpenSoundControl allows for a sep-
In the event that the internal representation does not aration of tasks, letting spatializers do what they do best,
provide adequate information, the translator may wish to while providing flexibility for the integration of spatial au-
use SpatOSC’s node extensions (see Section 3.1) to set dio control into a number of different host applications.
these extra key-value parameters for each node. To pro- SpatOSC currently has no way to model the environ-
vide a concrete example, let us consider Zirkonium [7], mental effects of a 3-D scene; further work in this area
which is a VBAP-syle spatializer that only performs pan- is needed in order to achieve higher-quality spatial audio
ning with no distance effects or source directivity. Two effects. Further attention also needs to be paid to synchro-
parameters: “azimuth span” and “zenith span,” allow the nization, sequencing or timed playback of sonic material.
panning of a sound source to be stretched in either the Nonetheless, the utility of SpatOSC for real-time in-
horizontal or vertical direction. While this resembles our teractive work is clear. We have already created audio in-
directivity attenuation tables, it is not analogous. For in- stallations which easily play on multiple types of spatial-
stance, the effect of fully open azimuth span and no zenith ization hardware. As more translators are written, inter-
span creates a “ring” of sound on a horizontal plane, mean- change between systems will increase, hopefully leading
ing that sound is emitted uniformly from all speakers at a to more experimentation with 3-D sound throughout the
certain height. This situation has no meaningful 3-D rep- artistic community.
resentation in SpatOSC, so it is useful to use extension
properties to store the azimuth and zenith span values for 5. REFERENCES
every node. The Zirkonium translator thus knows how to
deal with those specifically-named properties, while other [1] D. R. Begault, 3-D sound for virtual reality and multime-
dia. Academic Press Professional Inc., 1994.
translators simply ignore these parameters.
It should also be noted that multiple translators can [2] M. Geier, J. Ahrens, and S. Spors, “Object-based audio
be assigned at once, making it possible to simultaneously reproduction and the audio scene description format,” Or-
render the scene on different spatializers. This is useful in ganised Sound, vol. 15, no. 03, pp. 219–227, 2010.
situations where multiple listeners are exploring the envi- [3] M. Gerzon, “Periphony: With-height sound reproduction,”
ronment. J. Audio Eng. Soc., vol. 21, no. 1, pp. 2–10, 1973.
[4] K. Hamasaki, K. Hiyama, and R. Okumura, “The 22.2
3.7. The SpatOSC Audio Unit plugin multichannel sound system and its application,” in AES
Convention, 2005.
Given that the SpatOSC library is written in C++, it is
[5] N. Peters, “Proposing SpatDIF - the spatial sound descrip-
generally quite easy to include the library within a plugin
tion interchange format,” in Panel session at ICMC, 2008.
for a DAW. As of writing, only an Audio Unit (for OSX)
has been developed, but the intention is to also develop [6] M. Pohja, “X3D and VRML sound components,” HUT,
VST (Windows) and LADSPA (Linux) plugins. Telecommunications Software & Multimedia Laboratory.
The Audio Unit can be added to a track in a compo- [7] C. Ramakrishnan, “Zirkonium: Non-invasive software for
sition, and provides parameters and a graphical user in- sound spatialisation,” Organized Sound, vol. 14, pp. 268–
terface (GUI) for moving sounds. In the most common 276, 2009.
setup, that track’s signal output is sent directly to an input [8] E. Scheirer, R. Väänänen, and J. Huopaniemi, “AudioB-
channel on the spatializer while OSC messages send cor- IFS: Describing audio scenes with the MPEG-4 multime-
responding control signals as to how that input should be dia standard,” in IEEE Trans. Multimedia, vol. 1, no. 3,
processed. SpatOSC has no methods for storing trajecto- 1999, pp. 237–250.
ries or sequencing spatial parameters, thus, a key benefit [9] Transcripts and notes, “Towards an interchange format for
of using SpatOSC within a DAW is the built-in automa- spatial audio scenes,” http://redmine.spatdif.org/projects/
tion system used to record and playback trajectories for 3/wiki/Belfast 2008, 2008.
moving sounds. [10] R. Vaananen and J. Huopaniemi, “Advanced AudioBIFS:
virtual acoustics modeling in mpeg-4 scene description,”
4. CONCLUSION & DISCUSSION IEEE Multimedia, vol. 6, no. 5, pp. 661 – 675, 2004.