Nothing Special   »   [go: up one dir, main page]

GB2413745A - Classifying audio content by musical style/genre and generating an identification signal accordingly to adjust parameters of an audio system - Google Patents

Classifying audio content by musical style/genre and generating an identification signal accordingly to adjust parameters of an audio system Download PDF

Info

Publication number
GB2413745A
GB2413745A GB0409663A GB0409663A GB2413745A GB 2413745 A GB2413745 A GB 2413745A GB 0409663 A GB0409663 A GB 0409663A GB 0409663 A GB0409663 A GB 0409663A GB 2413745 A GB2413745 A GB 2413745A
Authority
GB
United Kingdom
Prior art keywords
audio
signal
output
genre
processing module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB0409663A
Other versions
GB0409663D0 (en
Inventor
Westley Dowdles
Stewart Chalmers
Christopher Kirkham
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Johnson Matthey Battery Systems Engineering Ltd
Original Assignee
Axeon Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Axeon Ltd filed Critical Axeon Ltd
Priority to GB0409663A priority Critical patent/GB2413745A/en
Publication of GB0409663D0 publication Critical patent/GB0409663D0/en
Priority to PCT/GB2005/001637 priority patent/WO2005106843A1/en
Publication of GB2413745A publication Critical patent/GB2413745A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/307Frequency adjustment, e.g. tone control

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

A method and system for classifying audio content according to musical style or genre is described. In one aspect of the invention, a method and system for adjusting parameters of an audio system according to the classification musical style or genre is provided. Embodiments of the invention use a neural network device in the classification stage, and automatically adjust one or more of an audio equaliser, a non-linear gain control, a surround sound processor and a spatial sound processor. A further embodiment of the invention adjusts parameters of the audio system according to an ambient noise level.

Description

. . 1 Improvements to audio systems 3 The present invention relates to
the field of audio 4 processing and playback. More particularly, the present invention in one of its aspects relates to a method and 6 apparatus for classifying audio content. In one of its 7 aspects, the invention relates to the control of audio 8 processing subsystems.
Optimum sound quality or audibility of audio output is 11 dependent to an extent on the audio content being played 12 back. For example, an audio system configured for high 13 quality playback of dance music may sound unsatisfactory 14 when delivering an acoustic vocal track. Similarly, an audio system configured appropriately for a speech 16 recording may not deliver high quality output of a jazz 17 or classical track.
19 It is therefore desirable in some applications to configure an audio system in a manner particular to 21 specified audio content. For example, audio systems are 22 available that offer pre-set configurations selectable by 23 a user, with each configuration designed to playback a 24 particular style or genre of music in a favourable :: :. :: . 1 manner. These systems are inadequate in the sense that a 2 user must select a different setting if the musical style 3 or genre being played back changes, or make do with a 4 setting that may be unsuitable. Such systems are particular unsuitable when the music source is a radio 6 tuner, television tuner, digital library music (e.g. 7 mp3), or a compilation recording, all of which may play 8 tracks of different styles in over relatively short 9 periods.
11 Other applications for classifying audio content include 12 the monitoring of listening habits or radio playlists.
14 US Patent No. 6,542,869 in the name of Foote relates to an automated system for differentiating between musical 16 audio content and speech. This allows the identification 17 of changes in the audio content, and has applications in 18 indexing, summarizing, beat tracking and retrieving. The 19 system of US 6,542,869 looks for local self-similarity within the audio signal.
22 US Patent No. 6,647,366 in the name of Wang relates to 23 method of controlling the digital coding rate of an audio 24 system. The technique determines which of a variety of coding rates is appropriate for the class of audio 26 content being delivered. The system recognises five 27 classes of audio, being voiced and unvoiced speech, 28 silence, transient or stationery music.
US Patent No. 6,658,383 in the name of Koishida concerns 31 another system for differentiating between music and 32 speech audio signals in order to identify the appropriate 33 linear predictive coding/decoding strategies.
e e ece e e e e e e e e 2 The above-referenced documents allow classification of 3 audio content to a limited extent only. For particular 4 applications, for example control of an audio processing subsystem or monitoring of musical tastes, it is 6 desirable to classify audio content with greater 7 precision. It is therefore one object of an aspect of 8 the invention to enable improved differentiation and 9 classification of audio content. It is a related aim of an aspect of the invention to provide classification of 11 audio content by musical style or genre.
13 Automatic control of the output of an audio system is 14 desirable in a number of applications. For example, where a listening space is subject to background noise, 16 automated control systems for audio devices can be used 17 to compensate for ambient noise variations.
19 US Patent Numbers 4,628,526, 5,550,922, and 5,666,426 in the names of Germer, Becker and Helmes respectively 21 propose systems in which ambient noise levels are 22 monitored using microphone sensors, and the overall 23 volume of the audio system is adjusted in response to the 24 signal received from microphone sensor.
26 US Patent No. 4,868,881 in the name of Zwicker discloses 27 a system for automatically adjusting an audio equaliser 28 of an automobile sound system in response to noise 29 signals derived from the extraneous noise in the passenger compartment of the automobile. This prior art 31 system attempts to mask the ambient noise by boosting 32 and/attenuating appropriate bands of the equaliser.
:: . :. ::e.
. . 1 US Patent No. 5,208,866 in the name of Kato discloses a 2 system in which an audio compressor is controlled in 3 response to a signal from in vehicle microphone sensor, 4 which monitors the background noise level within the vehicle.
7 The above referenced prior art documents disclose systems 8 capable of adjusting individual audio components in 9 response to background noise signals received from microphone sensors. However, each is limited in its 11 ability to control audio output.
13 It is one aim of an aspect of the invention to provide an 14 improved system for automatically adjusting the output of an audio system, capable of controlling a variety of 16 audio components.
18 It is a further aim of an aspect of the invention to 19 provide an automated system for automatically adjusting an audio system in a manner dependent on the 21 classification of audio content.
23 Further aims and objects of the invention will become 24 apparent from reading the following description.
26 According to the first aspect of the invention, there is 27 provided a method of classifying audio content, the 28 method comprising the steps of: 29 - Receiving an audio signal from an audio source into a processing module; 31 - Classifying, using a processing module, the audio 32 content by musical style or genre; I eee Aid; I 1 - Generating an identification signal indicative of the 2 musical style or genre of the audio content of the 3 audio signal.
The musical style or genre may be selected from the group 6 consisting of: Voice; Easy Listening/Jazz; Rock/Blues; 7 Orchestral/Classical; Beat/Dance; Urban/Hip-hop.
9 The method may include the step of: - Extracting a plurality of audio frames from the audio 11 signal, and; 12 - Computing one or more metrics characterizing at least 13 some of the extracted audio frames.
Preferably, the metrics are selected from the group 16 consisting of: centre point of the frequency spectrum 17 (centroid); end point of the spectrum; low energy 18 sections, including estimation of beats per minute; flux; 19 and zero-crossing.
21 The metrics may be other parameters characteristic of the 22 audio content, but it has been recognized by the 23 inventors that selecting a subset of the above-listed 24 parameters is particularly useful for classifying audio content by musical style or genre.
27 Preferably, the method includes the additional steps of: 28 - Producing feature vectors based on the metrics 29 computed; - Presenting the feature vectors to the inputs of a 31 neural network device.
33 The method may include the additional step of: he ce cee. .e a.. .e.e.. e 1 - applying a smoothing algorithm to responses of the 2 neural network device to provide a smoothed 3 identification signal for the audio signal.
According to a second aspect of the invention, there is 6 provided a method of controlling the output of an audio 7 system, the method comprising the steps of: 8 - Receiving an audio signal; 9 - Classifying, using a neural network, the audio signal by musical style or genre; 11 Generating an identification signal indicative of the 12 musical style or genre of the audio content of the 13 audio signal; 14 - Adjusting the output of an audio system in response to the identification signal.
17 The musical style or genre may be selected from the group 18 consisting of: Voice; Easy Listening/Jazz; Rock/Blues; 19 Orchestral/Classical; Beat/Dance; Urban/Hip-hop.
21 The method may include the step of: 22 - Extracting a plurality of audio frames from the audio 23 signal, and; 24 - Computing one or more metrics characterizing at least some of the extracted audio frames.
27 Preferably, the metrics are selected from the group 28 consisting of: centre point of the frequency spectrum 29 (centroid); end point of the spectrum; low energy sections, including estimation of beats per minute; flux; 31 and zero-crossing.
c e eeees 1 The metrics may be other parameters characteristic of the 2 audio content, but it has been recognized by the 3 inventors that selecting a subset of the above-listed 4 parameters is particularly useful for classifying audio content by musical style or genre.
7 Preferably, the method includes the additional steps of: 8 - Producing feature vectors based on the metrics 9 computed; - Presenting the feature vectors to the inputs of a 11 neural network device.
13 The method may include the additional step of: 14 - applying a smoothing algorithm to responses of the neural network device to provide a smoothed 16 identification signal for the audio signal.
18 Preferably, the step of adjusting the output of an audio 19 system includes the step of adjusting one or more of the following components of an audio control subsystem: an 21 audio equaliser, a non-linear gain control, a surround 22 sound processor and a spatial sound processor.
24 According to a third aspect of the invention, there is provided a method of classifying audio content, the 26 method comprising the steps of: 27 - Receiving an audio signal from an audio source into a 2 8 processing module; 29 - Extracting a plurality of audio frames from the audio signal, and; 31 - Computing one or more metrics characterizing at least 32 some of the extracted audio frames, wherein the metrics 33 are selected from the group consisting of: centre point e , it, ' 1 of the frequency spectrum (centroid); end point of the 2 spectrum; low energy sections, including estimation of 3 beats per minute; flux; and zero-crossing; 4 - Classifying, using a processing module, the audio content by musical style or genre, based on the metrics 6 computed; 7 Generating an identification signal indicative of the 8 musical style or genre of the audio content of the 9 audio signal.
11 The musical style or genre may be selected from the group 12 consisting of: Voice; Easy Listening/Jazz; Rock/Blues; 13 Orchestral/Classical; Beat/Dance; Urban/Hip-hop.
According to a fourth aspect of the invention, there is 16 provided a method for controlling the output of an audio 17 system, the method comprising the steps of: 18 - Receiving an audio input signal; 19 Receiving, from an automated audio content classification system, an identification signal 21 corresponding to classification of the audio input 22 signal by audio content; 23 - Adjusting the output of the audio system in response to 24 the identification signal.
26 Preferably, the step of adjusting the output of an audio 27 system includes the step of adjusting one or more of the 28 following components of an audio control subsystem: an 29 audio equaliser, a non-linear gain control, a surround sound processor and a spatial sound processor.
:e.. i.
1 According to a fifth aspect of the invention, there is 2 provided a method for controlling the output of an audio 3 system, the method comprising the steps of: 4 - Receiving an audio input signal; - Receiving a noise signal indicative of ambient noise 6 levels; 7 - Receiving, from an automated audio content 8 classification system, an identification signal 9 corresponding to classification of the audio input signal by audio content; 11 - Adjusting the output of the audio system in response to 12 the identification signal and the noise signal.
14 Preferably, the method includes the additional steps of: - Receiving, from an auxiliary audio input device, an 16 ambient noise level signal; 17 - Scaling the audio input signal; 18 - Computing a transformed domain subtraction between the 19 ambient noise level signal and the scaled audio input signal to provide a noise level estimation signal.
22 Preferably, the method includes the additional steps of: 23 Quantising the noise level estimation signal into a 24 plurality of levels by applying at least one threshold point, and; 26 - Adjusting the output of the audio system in response to 2 7 the quantised noise signal.
29 Preferably, the step of adjusting the output of an audio system includes the step of adjusting one or more of the 31 following components of an audio control subsystem: an 32 audio equalizer, a non-linear gain control, a surround 33 sound processor and a spatial sound processor.
: ..: ::.
. . . 2 According to a sixth aspect of the invention there is 3 provided a system for classifying audio content, the 4 system comprising a processing module adapted to receive an audio signal from an audio source, the processing 6 module including a neural network device adapted to 7 classify the audio content by musical style or genre and 8 generate an identification signal indicative of the 9 musical style or genre of the audio content of the audio signal.
12 According to a seventh aspect of the invention there is 13 provided a system for controlling the output of an audio 14 system, the system comprising - a processing module adapted to receive an audio signal 16 from an audio source, the processing module including a 17 neural network device adapted to classify the audio 18 content by musical style or genre and generate an 19 identification signal indicative of the musical style 2 0 or genre of the audio content of the audio signal, and; 21 - an audio control subsystem adapted to adjust the output 22 of the audio system in response to the identification 2 3 signal.
According to an eighth aspect of the invention there is 26 provided a system for classifying audio content, the 27 system comprising a processing module adapted to receive 28 an audio signal from an audio source, wherein the 29 processing module is further adapted to classify the 3 0 audio content by musical style or genre and generate an 31 identification signal indicative of the musical style or 32 genre of the audio content of the audio signal.
: I.: ::.
. . . 1 Preferably, the processing module comprises a neural 2 network device.
4 According to an eighth aspect of the invention there is provided a system for controlling the output of an audio 6 system, the system comprising: 7 a processing module adapted to receive an audio signal 8 from an audio source, wherein the processing module is 9 further adapted to classify the audio content by musical style or genre and generate an identification signal 11 indicative of the musical style or genre of the audio 12 content of the audio signal, and; 13 an audio control subsystem adapted to adjust the output 14 of the audio system in response to the identification signal.
17 Preferably, the processing module comprises a neural 18 network device.
The audio control subsystem may comprise at least one of 21 the following components: an audio equalizer, a non 22 linear gain control, a surround sound processor and a 23 spatial sound processor.
According to a ninth aspect of the invention there is 26 provided a system for controlling the output of an audio 27 system, the system comprising an audio control subsystem 28 adapted to receive an audio input signal and an 29 identification signal from an automated audio content classification system, the identification signal 31 corresponding to classification of the audio input signal 32 by audio content, wherein the audio control subsystem is # # # # # # # # , # # 1 adapted to adjust the output of the audio system in 2 response to the identification signal.
4 Preferably, the audio control subsystem comprises at least one of the following components: an audio 6 equalizer, a non-linear gain control, a surround sound 7 processor and a spatial sound processor.
9 In one embodiment, the audio control subsystem is further adapted to receive a noise signal indicative of ambient 11 noise levels from an auxiliary audio input device.
13 The system may be adapted to adjust the output of the 14 audio system in response to the identification signal and the noise signal.
17 According to a tenth aspect of the invention there is 18 provided a system for classifying audio content, the 19 system being adapted to implement the method of the first or third aspects of the invention.
22 According to an eleventh aspect of the invention there is 23 provided a system for controlling the output of an audio 24 system, the system being adapted to implement the method of any of the second, fourth or fifth aspects of the 26 invention.
28 There will now be described, by way of example only, an 29 embodiment of the invention with reference to the following drawings, of which: # 1 c: c: e: .e: c . 1 Figure 1 is a schematic overview of the system in 2 accordance with an example embodiment of the 3 invention; Figure 2 is a block diagram representing the 6 functionality of the system of Figure 1 as a series 7 of method steps; 9 Figure 3 is a schematic overview of a system in accordance with an alternative embodiment of the 11 invention, and; 13 Figure 4 is a block diagram representing the 14 functionality of the system of Figure 3 as a series of method steps.
17 Referring firstly to Figure 1 of the drawings, there is 18 shown, generally depicted at 10, a schematic 19 representation of a system according to an example embodiment of the invention. The system comprises an 21 audio source 11, which in this example is a Compact Disc 22 Digital Audio (CDDA) source. The source 11 provides an 23 audio input signal to a processing module 12, adapted to 24 classify the audio content of the audio input signal and output an identification signal. The processing module 26 12 may be implemented in hardware, software, or a 27 combination of hardware and software.
29 In this example, processing module 12 includes a trained neural network device 16, which may be for example a 31 device of the type described in International Patent 32 Publication Number WO 00/45333 Al in the name of AXEON 33 Limited. This type of neural processor, marketed under at#e e:. :..
1' :: ..
1 the VindAX brand, has been recognized as being 2 particularly useful for classification-type tasks. The 3 VindAX device has a continuous learning capability that 4 offers additional functionality in the context of this invention, including user configuration functions. The 6 processing module includes a pre-processing module 14, 7 and a post-processing module 17.
9 The source 11 also inputs the audio input signal to audio control subsystem 18, which controls the output signal 11 provide to audio output device 20. Typical components of 12 the audio control subsystem 19 are an audio equaliser, a 13 non-linear gain control, a surround sound processor and a 14 spatial sound processor, although this list should not be considered as exhaustive. The audio output device 20 is, 16 for example, a loudspeaker device or amplification 17 circuitry.
19 Figure 2 is a block diagram representing the functioning of the system of Figure 1 as a series of steps of a 21 method, generally depicted at 30.
23 At 33, the audio input signal is received in the 24 processing module.
26 The audio input signal undergoes pre-processing 34, to 27 provide suitable inputs for later analysis of the audio 2 8 signal. Pre-processing 34 is conducted on a series of 29 audio frames extracted from the audio signal, each audio frame consisting of a fixed number of samples of the 31 audio signal. For each audio frame analysed, metrics 32 characteristic of the audio signal are computed. The 33 metrics may be centre point of the frequency spectrum t. t' : . ; it: . . ; 1 (centroid), end point of the spectrum, low energy 2 sections, including estimation of beats per minute, flux, 3 or zero-crossing. Other parameters characteristic of the 4 audio content may also be used, but it has been recognized by the inventors that selecting a subset of 6 the above-listed parameters is particularly useful for 7 classifying audio content by musical style or genre.
8 Indeed, one aspect of the invention Feature vectors are produced 35 based on the metrics 11 computed for the audio frames, which are then 12 appropriately scaled for presentation to the inputs of 13 the neural network device 16.
The neural network device 16 classifies 36 the audio 16 signal by audio content processing the feature vectors 17 and outputting appropriate responses indicative of the 18 class to which the audio content has been assigned. By 19 selecting an appropriate neural network device and metrics, the audio frames are classified by musical style 21 or genre of the content within that audio frame.
23 The classes of musical style or genre are selected from,
24 for example:
2 6 Voice 2 7 Easy Listening/Jazz 2 8 Rock/Blues 29 Orchestral/Classical 3 0 Beat/Dance 31 Urban/Hip-hop I; 1 11 1 6 66 8 1 1 However, this list of musical styles or genres is not 2 exhaustive.
4 The identification signals are received 37 in the post processing module 17, which generates an output function 6 received 39 in the audio control subsystem 18. A 7 smoothing algorithm is applied 38 in the post-processing 8 module to provide a smoothed output function to the audio 9 control subsystem 18.
11 The audio control subsystem receives an input from the 12 audio source 11, and adjusts 40 the settings of the audio 13 control subsystem in accordance with parameters 14 determined by the output function. The audio control subsystem 18 therefore acts in response to the output 16 function, and the ultimate audio output is adjusted 41 in 17 a manner dependent on the musical style or genre 18 identified by the neural network 16.
In particular, the output function from the post 21 processing module 17 is capable of adjusting the settings 22 of the audio equaliser and/or the non-linear gain. In 23 alternative systems, the output function may adjust the 24 settings of linear volume, a surround sound processor or spatial sound processor.
27 The smoothing 38 of the output function by the post 28 processing module 17 allows the system to avoid stepwise 29 changes in the settings of the audio components, and thereby avoid harsh changes to the audio output itself.
31 Optionally, the linear volume may also be adjusted when 32 implementing new settings of the other audio components 33 in order to smooth the audio output effect.
:e:. :: e.
. . 2 The system and method described above provide a means for 3 automatically classifying audio content and adjusting 4 audio output in real time, according to the musical style or genre of the audio signal being played back.
7 Figure 3 is a schematic representation of a system in 8 accordance with an alternative embodiment of the 9 invention. This system, generally depicted at 50, includes components accounting for environmental and 11 ambient noise levels.
13 The system 50 comprises the components of Figure 1, with 14 like components signified with like reference numerals.
These like components function in the manner described 16 with reference to Figure 2, with various differences 17 described below.
19 The system includes an auxiliary audio input 21, in the form of an environment microphone sensor. The microphone 21 sensor 21 functions to monitor background or ambient 22 noise levels present within the listening space. The 23 signal from the microphone sensor 21 is received in a 24 noise level estimation module 23, along with the audio input signal from the main audio source 11. A noise 26 quantisation module 26 is also provided, receiving an 27 input from the noise level estimation module 23 and 2 8 outputting to the post-processing module 17. As before, 29 the post-processing module 17 provides an output function to the audio control subsystem 18. .
:e. :. ::. . . . 1 Figure 4 is a block diagram representing the functioning 2 of the system of Figure 3 as a series of steps of a 3 method, generally depicted at 70.
At 51 and 52 the noise level estimation module 23 6 receives the inputs from the auxiliary audio input and 7 the main audio source respectively. An appropriate 8 scaling factor is applied 53 to the main audio signal.
9 Subsequently, the noise level estimation module 23 computes 54 a transformed domain subtraction between the 11 two signals.
13 The resulting signal is received by the noise 14 quantisation module 26. The signal is quantized 56 into three levels by applying two threshold points and an 16 appropriate guard space. It will be appreciated by one 17 skilled in the art that the invention is not limited to a 18 particular number of quantisation levels or threshold 19 points. The quantised noise level signal is received 57a in the post- processing module 17.
22 This embodiment of the invention also implements the 23 method steps described with reference to Figure 2, and 24 outputs an audio content identification signal to the post-processing module 17. The post-processing module 17 26 receives 57a a noise level signal and the audio content 27 identification signal received 57b from the post 28 processing module 17. The audio output device 20 is 29 adjusted 61 in a manner dependent on the musical style or genre identified by the neural network 16 and the
31 estimated background noise levels. t
tee e e e e e e e 1 As in the embodiment of Figure 2, the output function 2 from the post-processing module 17 is capable of 3 adjusting the settings of the audio equaliser and/or the 4 non-linear gain. In alternative systems, the output function may adjust the settings of linear volume, a 6 surround sound processor, or a spatial sound processor.
8 Also as before, the smoothing 58 of the output function 9 by the post-processing module 17 allows the system to avoid stepwise changes in the settings of the audio 11 components, and thereby avoid harsh changes to the audio 12 output itself. Optionally, the linear volume may also be 13 adjusted when implementing new settings of the other 14 audio components in order to smooth the audio output effect.
17 It will be appreciated by one skilled in the art that 18 variations to the above-described embodiments can be made 19 within the scope of the invention herein intended.
21 For example, the description refers to CDDA sources, 22 although it will be evident that other audio sources can 23 be used including: digital library music such as mp3, 24 radio tuners, television tuners, DVD-Audio, SACD, HDCC etc. The signals from the sources may be digital or 26 analogue.
28 Furthermore, the signals may be single-channel and/or 29 multi-channel. For a multi-channel input signal, the individual channels can be summed into a single channel 31 and processed in the manner described above.
32 Alternatively, or in addition, each channel of the multi 33 channel signal can be processed individually.
! 2 The embodiments described include a neural processor of 3 the type described in International Patent Publication 4 Number WO 00/45333 Al. While it is considered to be beneficial to use such a system for the purposes of the 6 present invention, alternative systems may achieve 7 similar results. Alternative processing modules may be 8 implemented in software or hardware, and may include 9 alternative neural network devices, look-up tables, algorithms, or genetic algorithms.
12 The post-processing module may, in alternative 13 embodiments, include threshold and history functions to 14 add stability to the system. The threshold function uses the low- energy metrics to prevent the system from 16 classifying periods of "silence" within the audio signal.
17 The history function provides context for the audio 18 control during start-up periods of the audio signal, and 19 can prevent reclassification for slight variations in the audio signal.
22 The present invention in one of its aspects provides an 23 improved audio system by classifying, in real time, the 24 audio content by musical style or genre. This has application, for example, in indexing or cataloguing of 26 music, and monitoring listening habits. 28 In another aspect, the classification by musical style or 29 genre is
applied to control of an audio system. The settings of an audio control subsystem can be changed to 31 suit the musical style or genre being played back. In 32 one embodiment, ambient noise levels are also estimated ..e c.e 1 and contribute to the control of the audio control 2 subsystem.
4 These aspects of the invention have particular application to in-car audio systems, in which road noise 6 and engine noise can be compensated for and the audio 7 settings are configured optimally for the music being 8 played back. This enhances the driver's listening 9 experience and reduces the need for the driver to manually adjust audio controls. Applications to other 11 audio systems are also envisaged.

Claims (1)

  1. A: Aim:::.
    . . . . 1 Claims 3 1. A method of classifying audio content, the method 4 comprising the steps of: - Receiving an audio signal from an audio source 6 into a processing module; 7 - Classifying, using a processing module, the audio 8 content by musical style or genre; 9 Generating an identification signal indicative of the musical style or genre of the audio content of 11 the audio signal.
    13 2. The method as claimed in Claim 1 wherein the step of 14 classifying the audio content is carried out in a neural network device.
    17 3. The method as claimed in Claim 1 or Claim 2 wherein 18 the musical style or genre is selected from the 19 group consisting of: Voice; Easy Listening/Jazz; Rock/Blues; Orchestral/Classical; Beat/Dance; 21 Urban/Hip-hop.
    23 4. The method as claimed in any of Claims 1 to 3 24 including the additional steps of: - Extracting a plurality of audio frames from the 26 audio signal, and; 27 - Computing one or more metrics characterizing at 28 least some of the extracted audio frames.
    5. The method as claimed in Claim 4 wherein the metrics 31 are selected from the group consisting of: centre 32 point of the frequency spectrum (centroid); end 33 point of the spectrum; low energy sections, l ë 'se. .. t. ce.e 1 including estimation of beats per minute; flux; and 2 zero-crossing.
    4 6. The method as claimed in Claim 4 or Claim 5 including the additional steps of: 6 - Producing feature vectors based on the metrics 7 computed; 8 - Presenting the feature vectors to the inputs of a 9 neural network device comprised within the processing module.
    12 7. A method of controlling the output of an audio 13 system, the method comprising the steps of: 14 - Receiving an audio signal; - Classifying, using a processing module, the audio 16 signal by musical style or genre; 17 - Generating an identification signal indicative of 18 the musical style or genre of the audio content of 19 the audio signal; - Adjusting the output of an audio system in 21 response to the identification signal.
    23 8. The method as claimed in Claim 7 wherein the step of 24 classifying the audio content is carried out in a neural network device.
    27 9. The method as claimed in Claim 7 or Claim 8 wherein 28 the musical style or genre is selected from the 29 group consisting of: Voice; Easy Listening/Jazz; Rock/Blues; Orchestral/Classical; Beat/Dance; 31 Urban/Hip-hop.
    * :e * :e:: .
    . -: : A: . 1 10. The method as claimed in any of Claims 7 to 9 2 including the steps of: 3 - Extracting a plurality of audio frames from the 4 audio signal, and; - Computing one or more metrics characterizing at 6 least some of the extracted audio frames.
    8 11. The method as claimed in Claim 10 wherein the 9 metrics are selected from the group consisting of: centre point of the frequency spectrum (centroid); 11 end point of the spectrum; low energy sections, 12 including estimation of beats per minute; flux; and 13 zero-crossing.
    12. The method as claimed in Claim 11 including the 16 additional steps of: 17 - Producing feature vectors based on the metrics 18 computed; 19 Presenting the feature vectors to the inputs of a neural network device.
    22 13. The method as claimed in any of Claims 7 to 12, 23 comprising the additional steps of: 24 - Generating an output function from the identification signal; 26 - applying a smoothing algorithm to the output 27 function to provide a smoothed output function, 28 and; 29 - adjusting the settings of an audio control subsystem according to the smoothed output 31 function.
    .e lee' e.e cer. .e 1 14. The method as claimed in any of Claims 7 to 12, 2 wherein the step of adjusting the output of an audio 3 system includes the step of adjusting one or more of 4 the following components of an audio control subsystem: an audio equaliser, a non-linear gain 6 control, a surround sound processor and a spatial 7 sound processor.
    9 15. A method of classifying audio content, the method comprising the steps of: 11 - Receiving an audio signal from an audio source 12 into a processing module; 13 - Extracting a plurality of audio frames from the 14 audio signal, and; - Computing one or more metrics characterizing at 16 least some of the extracted audio frames, wherein 17 the metrics are selected from the group consisting 18 of: centre point of the frequency spectrum 19 (centroid); end point of the spectrum; low energy sections, including estimation of beats per 21 minute; flux; and zero-crossing; 22 Classifying, using a processing module, the audio 23 content by musical style or genre, based on the 24 metrics computed; - Generating an identification signal indicative of 26 the musical style or genre of the audio content of 27 the audio signal.
    29 16. The method as claimed in Claim 15 wherein the musical style or genre is selected from the group 31 consisting of: Voice; Easy Listening/Jazz; 32 Rock/Blues; Orchestral/Classical; Beat/Dance; 33 Urban/Hip-hop.
    d elele 2 17. A method for controlling the output of an audio 3 system, the method comprising the steps of: 4 - Receiving an audio input signal; Receiving, from an automated audio content 6 classification system, an identification signal 7 corresponding to classification of the audio input 8 signal by audio content; 9 - Adjusting the output of the audio system in response to the identification signal.
    12 18. The method as claimed in Claim 17, wherein the step 13 of adjusting the output of an audio system includes 14 the step of adjusting one or more of the following components of an audio control subsystem: an audio 16 equaliser, a non-linear gain control, a surround 17 sound processor and a spatial sound processor.
    19 19. The method as claimed in Claim 17 or 18, comprising the additional steps of: 21 - Generating an output function from the 22 identification signal; 23 - applying a smoothing algorithm to the output 24 function to provide a smoothed output function, and; 26 - adjusting the settings of an audio control 27 subsystem according to the smoothed output 28 function.
    20. The method as claimed in any of Claims 17 to 19, 31 comprising the additional steps of: 32 - Receiving a noise signal indicative of ambient 33 noise levels; 2.' 2.. ,,, ce; . it,.
    1 - Adjusting the output of the audio system in 2 response to the noise signal.
    4 21. The method as claimed in Claim 2 0 including the additional steps of: 6 - Receiving, from an auxiliary audio input device, 7 an ambient noise level signal; 8 - Scaling the audio input signal; 9 - Computing a transformed domain subtraction between the ambient noise level signal and the scaled 11 audio input signal to provide a noise level 12 estimation signal.
    14 22. The method as claimed in Claim 20 or Claim 21 including the additional steps of: 16 - Quantising the noise level estimation signal into 17 a plurality of levels by applying at least one 18 threshold point, and; 19 - adjusting the output of the audio system in 2 0 response to the quantized noise signal.
    2 2 2 3. The method as claimed in Claim any of Claims 2 0 to 23 22, comprising the additional steps of: 24 - Generating an output function from the identification signal and the noise signal; 2 6 - applying a smoothing algorithm to the output 27 function to provide a smoothed output function, 2 8 and; 29 - adjusting the settings of an audio control 3 0 subsystem according to the smoothed output 3 1 function.
    eaë eseee aIr 1 24. A system for classifying audio content, the system 2 comprising a processing module adapted to receive an 3 audio signal from an audio source, wherein the 4 processing module is further adapted to classify the audio content by musical style or genre and generate 6 an identification signal indicative of the musical 7 style or genre of the audio content of the audio 8 signal.
    25. The system as claimed in Claim 24 wherein the 11 processing module comprises a neural network device.
    13 26. A system for controlling the output of an audio 14 system, the system comprising - a processing module adapted to receive an audio 16 signal from an audio source, wherein the 17 processing module is further adapted to classify 18 the audio content by musical style or genre and 19 generate an identification signal indicative of the musical style or genre of the audio content of 21 the audio signal, and; 22 - an audio control subsystem adapted to adjust the 23 output of the audio system in response to the 24 identification signal.
    26 27. The system as claimed in Claim 26 wherein the 27 processing module comprises a neural network device.
    29 28. The system as claimed in Claim 26 or Claim 27 wherein the audio control subsystem comprises at 31 least one of the following components: an audio 32 equaliser, a non-linear gain control, a surround 33 sound processor and a spatial sound processor.
    : A.:::.
    . . . 1 29. A system for controlling the output of an audio 2 system, the system comprising an audio control 3 subsystem adapted to receive an audio input signal 4 and an identification signal from an automated audio content classification system, the identification 6 signal corresponding to classification of the audio 7 input signal by audio content, wherein the audio 8 control subsystem is adapted to adjust the output of 9 the audio system in response to the identification signal.
    12 30. The system as claimed in Claim 29 wherein the audio 13 control subsystem comprises at least one of the 14 following components: an audio equalizer, a non linear gain control, a surround sound processor and 16 a spatial sound processor.
    18 31. The system as claimed in Claims 29 or Claim 30 19 wherein the audio control subsystem is further adapted to receive a noise signal indicative of 21 ambient noise levels from an auxiliary audio input 22 device.
    24 32. The system as claimed in Claim 31 wherein the audio control subsystem is adapted to adjust the output of 26 the audio system in response to the identification 27 signal and the noise signal.
    29 33. A system for classifying audio content, the system being adapted to implement the method of any of 31 Claims 1 to 6.
    . e.t A:.
    1 34. A system for controlling the output of an audio 2 system, the system being adapted to implement the 3 method of any of Claims 7 to 23.
GB0409663A 2004-04-30 2004-04-30 Classifying audio content by musical style/genre and generating an identification signal accordingly to adjust parameters of an audio system Withdrawn GB2413745A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB0409663A GB2413745A (en) 2004-04-30 2004-04-30 Classifying audio content by musical style/genre and generating an identification signal accordingly to adjust parameters of an audio system
PCT/GB2005/001637 WO2005106843A1 (en) 2004-04-30 2005-04-28 Reproduction control of an audio signal based on musical genre classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0409663A GB2413745A (en) 2004-04-30 2004-04-30 Classifying audio content by musical style/genre and generating an identification signal accordingly to adjust parameters of an audio system

Publications (2)

Publication Number Publication Date
GB0409663D0 GB0409663D0 (en) 2004-06-02
GB2413745A true GB2413745A (en) 2005-11-02

Family

ID=32408308

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0409663A Withdrawn GB2413745A (en) 2004-04-30 2004-04-30 Classifying audio content by musical style/genre and generating an identification signal accordingly to adjust parameters of an audio system

Country Status (2)

Country Link
GB (1) GB2413745A (en)
WO (1) WO2005106843A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3847542A4 (en) * 2018-09-07 2022-06-01 Gracenote, Inc. Methods and apparatus for dynamic volume adjustment via audio classification
US11775250B2 (en) 2018-09-07 2023-10-03 Gracenote, Inc. Methods and apparatus for dynamic volume adjustment via audio classification

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007096792A1 (en) * 2006-02-22 2007-08-30 Koninklijke Philips Electronics N.V. Device for and a method of processing audio data
TR200701546A2 (en) * 2007-03-13 2008-10-21 Vestel Elektron�K Sanay� Ve T�Caret A.�. Automatic equalizer adjustment method
CN104079247B (en) 2013-03-26 2018-02-09 杜比实验室特许公司 Balanced device controller and control method and audio reproducing system
CN107093991B (en) 2013-03-26 2020-10-09 杜比实验室特许公司 Loudness normalization method and equipment based on target loudness
CN104078050A (en) 2013-03-26 2014-10-01 杜比实验室特许公司 Device and method for audio classification and audio processing
KR20170030384A (en) * 2015-09-09 2017-03-17 삼성전자주식회사 Apparatus and Method for controlling sound, Apparatus and Method for learning genre recognition model
WO2017097321A1 (en) * 2015-12-07 2017-06-15 Arcelik Anonim Sirketi Image display device with automatic audio and video mode configuration
US9928025B2 (en) 2016-06-01 2018-03-27 Ford Global Technologies, Llc Dynamically equalizing receiver
US10014841B2 (en) 2016-09-19 2018-07-03 Nokia Technologies Oy Method and apparatus for controlling audio playback based upon the instrument
EP3688756B1 (en) * 2017-09-28 2022-11-09 Sony Europe B.V. Method and electronic device
KR102685051B1 (en) * 2018-01-04 2024-07-16 하만인터내셔날인더스트리스인코포레이티드 Biometric personalized audio processing system
US10855241B2 (en) 2018-11-29 2020-12-01 Sony Corporation Adjusting an equalizer based on audio characteristics

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10161654A (en) * 1996-11-27 1998-06-19 Sanyo Electric Co Ltd Musical classification determining device
JP2000066669A (en) * 1998-08-25 2000-03-03 Victor Co Of Japan Ltd Music creating device
US20030040904A1 (en) * 2001-08-27 2003-02-27 Nec Research Institute, Inc. Extracting classifying data in music from an audio bitstream

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5023913A (en) * 1988-05-27 1991-06-11 Matsushita Electric Industrial Co., Ltd. Apparatus for changing a sound field
US5434922A (en) * 1993-04-08 1995-07-18 Miller; Thomas E. Method and apparatus for dynamic sound optimization
JPH0837700A (en) * 1994-07-21 1996-02-06 Kenwood Corp Sound field correction circuit
US6570991B1 (en) * 1996-12-18 2003-05-27 Interval Research Corporation Multi-feature speech/music discrimination system
BR0212418A (en) * 2001-09-11 2004-08-03 Thomson Licensing Sa Method and apparatus for activating automatic equalization mode
FR2842014B1 (en) * 2002-07-08 2006-05-05 Lyon Ecole Centrale METHOD AND APPARATUS FOR AFFECTING A SOUND CLASS TO A SOUND SIGNAL

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10161654A (en) * 1996-11-27 1998-06-19 Sanyo Electric Co Ltd Musical classification determining device
JP2000066669A (en) * 1998-08-25 2000-03-03 Victor Co Of Japan Ltd Music creating device
US20030040904A1 (en) * 2001-08-27 2003-02-27 Nec Research Institute, Inc. Extracting classifying data in music from an audio bitstream

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3847542A4 (en) * 2018-09-07 2022-06-01 Gracenote, Inc. Methods and apparatus for dynamic volume adjustment via audio classification
US11775250B2 (en) 2018-09-07 2023-10-03 Gracenote, Inc. Methods and apparatus for dynamic volume adjustment via audio classification
US12061840B2 (en) 2018-09-07 2024-08-13 Gracenote, Inc. Methods and apparatus for dynamic volume adjustment via audio classification

Also Published As

Publication number Publication date
WO2005106843A1 (en) 2005-11-10
GB0409663D0 (en) 2004-06-02

Similar Documents

Publication Publication Date Title
WO2005106843A1 (en) Reproduction control of an audio signal based on musical genre classification
US9372251B2 (en) System for spatial extraction of audio signals
US9282417B2 (en) Spatial sound reproduction
EP2144242A1 (en) Playback apparatus and display method.
KR20110103339A (en) Automatic correction of loudness in audio signals
WO2015035492A1 (en) System and method for performing automatic multi-track audio mixing
US8027487B2 (en) Method of setting equalizer for audio file and method of reproducing audio file
CN103580632B (en) automatic loudness control system and method
EP1826900A1 (en) Vehicle-mounted sound control system
CN103050126A (en) Audio signal processing apparatus, audio signal processing method and a program
CN1981433A (en) Method of and system for automatically adjusting the loudness of an audio signal
GB2415116A (en) Delivering more apparent bass through the psychoacoustic perception of bass frequencies
US20120155658A1 (en) Content reproduction device and method, and program
KR0129429B1 (en) Audio sgnal processing unit
WO2018066383A1 (en) Information processing device and method, and program
JP2010021627A (en) Device, method, and program for volume control
CN108768330B (en) Automatic loudness control
US8718298B2 (en) NVH dependent parallel compression processing for automotive audio systems
JP6902049B2 (en) Automatic correction of loudness level of audio signals including utterance signals
US9219455B2 (en) Peak detection when adapting a signal gain based on signal loudness
CN112511966B (en) Self-adaptive active frequency division method for vehicle-mounted stereo playback
JP2002369281A (en) Sound quality and sound volume controller
KR20070070728A (en) Automatic equalizing system of audio and method thereof
US9240208B2 (en) Recording apparatus with mastering function
CN106533379B (en) Method and apparatus for processing audio signal

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)