Progressive perceptual audio rendering of complex scenes
Proceedings of the 2007 symposium on Interactive 3D graphics and games, 2007•dl.acm.org
Despite recent advances, including sound source clustering and perceptual auditory
masking, high quality rendering of complex virtual scenes with thousands of sound sources
remains a challenge. Two major bottlenecks appear as the scene complexity increases: the
cost of clustering itself, and the cost of pre-mixing source signals within each cluster. In this
paper, we first propose an improved hierarchical clustering algorithm that remains efficient
for large numbers of sources and clusters while providing progressive refinement …
masking, high quality rendering of complex virtual scenes with thousands of sound sources
remains a challenge. Two major bottlenecks appear as the scene complexity increases: the
cost of clustering itself, and the cost of pre-mixing source signals within each cluster. In this
paper, we first propose an improved hierarchical clustering algorithm that remains efficient
for large numbers of sources and clusters while providing progressive refinement …
Despite recent advances, including sound source clustering and perceptual auditory masking, high quality rendering of complex virtual scenes with thousands of sound sources remains a challenge. Two major bottlenecks appear as the scene complexity increases: the cost of clustering itself, and the cost of pre-mixing source signals within each cluster.
In this paper, we first propose an improved hierarchical clustering algorithm that remains efficient for large numbers of sources and clusters while providing progressive refinement capabilities. We then present a lossy pre-mixing method based on a progressive representation of the input audio signals and the perceptual importance of each sound source. Our quality evaluation user tests indicate that the recently introduced audio saliency map is inappropriate for this task. Consequently we propose a "pinnacle", loudness-based metric, which gives the best results for a variety of target computing budgets. We also performed a perceptual pilot study which indicates that in audio-visual environments, it is better to allocate more clusters to visible sound sources. We propose a new clustering metric using this result. As a result of these three solutions, our system can provide high quality rendering of thousands of 3D-sound sources on a "gamer-style" PC.