WO2004090870A1 - 広帯域音声を符号化または復号化するための方法及び装置 - Google Patents
広帯域音声を符号化または復号化するための方法及び装置 Download PDFInfo
- Publication number
- WO2004090870A1 WO2004090870A1 PCT/JP2004/004913 JP2004004913W WO2004090870A1 WO 2004090870 A1 WO2004090870 A1 WO 2004090870A1 JP 2004004913 W JP2004004913 W JP 2004004913W WO 2004090870 A1 WO2004090870 A1 WO 2004090870A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- audio signal
- identification information
- band
- wideband
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 138
- 230000005236 sound signal Effects 0.000 claims abstract description 269
- 238000010183 spectrum analysis Methods 0.000 claims abstract description 3
- 238000005070 sampling Methods 0.000 claims description 120
- 230000003044 adaptive effect Effects 0.000 claims description 61
- 238000012545 processing Methods 0.000 claims description 45
- 230000008569 process Effects 0.000 claims description 40
- 238000006243 chemical reaction Methods 0.000 claims description 38
- 230000003595 spectral effect Effects 0.000 claims description 29
- 230000005284 excitation Effects 0.000 claims description 27
- 230000015572 biosynthetic process Effects 0.000 claims description 24
- 238000003786 synthesis reaction Methods 0.000 claims description 24
- 238000012937 correction Methods 0.000 claims description 15
- 238000001514 detection method Methods 0.000 claims description 14
- 238000007781 pre-processing Methods 0.000 claims description 2
- 230000011664 signaling Effects 0.000 claims 2
- 238000001228 spectrum Methods 0.000 description 42
- 238000010586 diagram Methods 0.000 description 31
- 238000012805 post-processing Methods 0.000 description 21
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 18
- 230000000694 effects Effects 0.000 description 16
- 230000004044 response Effects 0.000 description 12
- 238000004364 calculation method Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 7
- 238000013139 quantization Methods 0.000 description 6
- 230000002708 enhancing effect Effects 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 5
- 238000001914 filtration Methods 0.000 description 5
- 108010076504 Protein Sorting Signals Proteins 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- CDBYLPFSWZWCQE-UHFFFAOYSA-L Sodium Carbonate Chemical compound [Na+].[Na+].[O-]C([O-])=O CDBYLPFSWZWCQE-UHFFFAOYSA-L 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000005238 low-frequency sound signal Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
Definitions
- the present invention relates to a method and apparatus for encoding or decoding not only a wideband audio signal but also a narrowband audio signal with high quality.
- the audio signals are sampled at a sampling frequency (or sampling rate) of 8 kHz.
- the sampled data is encoded by an encoding method suitable for the sample rate and transmitted.
- a signal sampled at a sampling rate of 8 kHz contains a frequency of 4 kHz or more, which corresponds to half the sampling frequency. Absent.
- speech signals that do not include frequencies above 4 kHz are called narrowband speech (or telephone band speech).
- a method suitable for narrowband speech For encoding / decoding of narrowband speech, a method suitable for narrowband speech is used.
- G.729 which is an international standard at ITU-T
- AM R-NB Adaptive Multi Rate-Narrow Band
- the sampling rate of the input audio signal is specified as 8 kHz.
- the sampling frequency is much higher than 8 kHz (usually about 16 kHz, and in some cases, 12.8 kHz or 16 kHz).
- Speech signals expressed using Hz or higher sampling frequencies are called wideband speech.
- a wideband speech encoding scheme suitable for wideband speech which is different from a normal narrowband speech encoding scheme, is used.
- G.722.2 which is an international standard in the ITU-T, is an encoding / decoding method for wideband speech, and includes a sampling frequency of a speech signal input to an encoder, and The sampling frequency of the audio signal output from the decoder is both specified as 16 kHz.
- the wideband speech coding method described in G.722.2 is called the AMR—WB (Adaptive Multi Rate-Wide Band) method, and is used to convert a wideband speech signal with a sampling frequency of 16 kHz. It aims to encode / decode to quality.
- AM R—WB has nine bit rates available. In general, the audio quality generated by encoding and decoding at a high bit rate is relatively good, but the audio generated by encoding and decoding at a low bit rate has coding distortion. , The sound quality tends to deteriorate.
- a narrowband voice signal is used in a wideband voice communication system.
- the encoded data generated by encoding the narrowband audio signal by the wideband audio encoding is decoded by the wideband audio decoding corresponding to the wideband audio encoding.
- the audio signal to be decoded is decoded by exactly the same processing as a normal wideband audio signal.
- the sampling frequency is for a wideband signal
- the transmitting side uses wideband audio coding and the receiving side uses normal wideband audio decoding. To decode the audio signal.
- the wideband speech decoding unit uses a Low-Band unit (generates a low-frequency speech signal of about 6 kHz or less) and a Highband-B and unit. (Generates the audio signal on the high frequency side of the band of about 6 kHz to 7 kHz).
- the Lower — B and section is a CELP-based speech coding system.
- the lower-band speech signal generated by decoding in the Lower — B and section is generated by the Higher_B and section.
- the output signal of the wideband audio decoding unit is generated by always giving the generated high-frequency audio signal.
- the decoding unit of the AMR-WB system is specialized for wideband speech. Therefore, even when coded data that generates narrow-band speech is input, the unnecessary high-frequency signal generated by the High-B and section is added to the speech output from the speech decoding section. Problem.
- Japanese Patent Application Laid-Open No. H11-259099 (No. 25, 6-page, FIG. 1) describes the configuration of an encoding and decoding apparatus by discriminating voice Z and non-voice of an input signal. It describes how to switch.
- the method provides a configuration optimized for processing speech signals and a configuration optimized for processing non-speech signals for some of the functional blocks of the encoder and decoder. Then, these configurations are switched based on voice / non-voice identification information.
- none of the above methods takes into account the problem of mismatch between the audio coding method and the bandwidth of the input signal. For this reason, it is not possible to improve the sound quality degradation that occurs when the coded data of the narrowband speech that has been subjected to the wideband encoding at the low bit rate as described above is decoded by the wideband speech decoding.
- An object of the present invention is to provide a method and an apparatus for encoding or decoding a wideband speech capable of obtaining good sound quality not only for a wideband speech signal but also for a narrowband speech signal. .
- an embodiment of a wideband speech encoding method and apparatus identifies whether an input speech signal is a narrowband signal or a wideband speech signal.
- the input audio signal is a narrowband audio signal
- the input audio signal is subjected to spectrum analysis to generate a target signal, and the generated target signal is encoded as a narrowband signal. It is encoded based on
- One embodiment of the wideband speech decoding method and apparatus is configured to generate a sound source signal and a synthesis filter from encoded data, and perform a decoding process of decoding a speech signal from the sound source signal and the synthesis filter.
- the identification information for identifying that the audio signal to be decoded has a narrow band is obtained, and the decoding process is controlled based on the obtained identification information.
- FIG. 1 is a block diagram showing a configuration of a wideband audio encoding device according to a first embodiment of the present invention.
- FIG. 2 is a block diagram showing a configuration of a wideband speech encoding unit of the wideband speech encoding device shown in FIG.
- FIG. 3B is a diagram showing a pulse position candidate setting unit and a first example of pulse position candidates of the speech encoding unit shown in FIG.
- FIG. 4 is a diagram showing pulse position candidates for the integer sample positions shown in FIG.
- FIG. 5 is a diagram showing pulse position candidates at even-numbered sample positions shown in FIG.
- FIG. 6B is a diagram showing a pulse position candidate setting unit and a second example of pulse position candidates of the speech coding unit shown in FIG.
- FIG. 7 is a diagram showing pulse position candidates at the odd-numbered sample positions shown in FIG.
- Fig. 8 is a flow chart showing the control procedure and contents by the control unit of the wideband speech coding apparatus shown in Fig. 1.
- FIG. 9 is a block diagram showing a configuration of a speech coding unit according to a second embodiment of the present invention.
- FIG. 10 is a block diagram showing another configuration example of the wideband speech coding apparatus according to the present invention.
- FIG. 11 is a block diagram showing a configuration of a wideband speech decoding device according to a third embodiment of the present invention.
- FIG. 12 is a block diagram showing an example of a wideband speech encoding apparatus for generating encoded data according to the third embodiment of the present invention.
- Fig. 13 is a block diagram showing a configuration of a speech decoding unit and a control unit of the wideband speech decoding device shown in Fig. 11.
- FIG. 14 is a block diagram showing a first example of an audio decoding unit and a control unit according to a fourth embodiment of the present invention.
- FIG. 15 is a block diagram showing a first example of a speech decoding unit and a control unit according to a fifth embodiment of the present invention.
- F i. 16 is a flowchart showing the procedure and contents of the audio decoding process according to the third embodiment of the present invention.
- FIG. 17 is a flowchart showing a processing procedure and contents when the audio decoding process according to the third embodiment of the present invention and the audio decoding process according to the seventh embodiment are used together. .
- FIG. 18 is a flowchart showing the procedure and contents of the audio decoding process according to the seventh embodiment of the present invention.
- FIG. 19 is a block diagram showing a configuration of a wideband speech decoding apparatus according to another embodiment of the present invention.
- FIG. 20 is a block diagram showing a configuration of a wideband speech coding apparatus according to another embodiment of the present invention.
- FIG. 21 is a block diagram showing a second example of the speech decoding unit and the control unit according to the fourth embodiment of the present invention.
- FIG. 22 is a block diagram showing a third example of the speech decoding unit and the control unit according to the fourth embodiment of the present invention.
- FIG. 23 is a block diagram illustrating a configuration example of a post-processing filter unit according to the fifth embodiment of the present invention.
- FIG. 24 is a block diagram showing a first example of a speech decoding unit and a control unit according to a sixth embodiment of the present invention.
- Fig. 25 is a block diagram showing a configuration of a sampling rate conversion unit and a control unit according to a seventh embodiment of the present invention.
- FIG. 26 is a block diagram showing a second example of the speech decoding unit and the control unit according to the sixth embodiment of the present invention.
- FIG. 27 is a block diagram showing a third example of the speech decoding unit and the control unit according to the sixth embodiment of the present invention.
- FIG. 28 is a block diagram showing a fourth example of the speech decoding unit and the control unit according to the sixth embodiment of the present invention.
- FIG. 1 is a block diagram showing a configuration of a wideband audio encoding device according to the first embodiment of the present invention.
- This device is composed of a band detection unit 11, a sampling conversion unit 12, a voice encoding unit 14, and a control unit 15 that controls the entire device. Then,-, the input audio signal 10 is encoded, and the encoded output code 19 is output.
- the band detecting unit 11 detects the sampling rate of the input audio signal 10 and notifies the control unit 15 of the detected sampling rate.
- Sampling rate detection methods include:
- the band detection unit 11 a obtains sampling rate information and information for identifying a wideband signal / narrowband signal from the input audio signal 10.
- This method uses sampling information, information for identifying wideband / narrowband, or attribute information of the input audio signal, or input information, in bits of a predetermined portion of the input audio signal sequence. This can be used when the identification information of the codec that generated the generated audio signal 10 is embedded.
- an embedding method for example, a method of embedding in the least significant bit of the PCM of the input audio signal sequence is considered. By doing so, the sample rate information or the information identifying the wideband Z narrowband without affecting the upper bits of the PCM, that is, without affecting the sound quality of the input audio signal, or It becomes possible to embed the attribute information of the input audio signal or the identification information of the codec that generated the input audio signal 10.
- band detection unit various embodiments can be considered as the band detection unit.
- any configuration may be used as long as it can identify sampling information, wideband / narrowband, or codec.
- sampler As for the integration information, the broadband narrowband identification information, and the codec identification information, any representative information may be used.
- the sample rate conversion unit 12 converts the input audio signal 10 into a predetermined sample rate, and transmits the converted signal of the predetermined sample rate to the audio encoding unit 14. For example, when an 8 kHz sampling signal is input, an up-sampled 16 kHz sampling signal is generated and output using an interpolation filter. Also, when a 16 kHz sampling signal is input, it is output without converting the sample rate.
- the configuration of the sample rate conversion unit 12 is not limited to this.
- the method of converting a sample rate is not limited to an interpolation filter; for example, it can be realized by using a frequency conversion method such as FFT, DFT, or MDCT.
- an input signal is converted to a frequency conversion region by FFT, DFT, MDCT, or the like. Then, the data is extended by adding zero data to the high frequency side of the frequency domain data obtained by this conversion. It is also possible to assume that they are virtually added. Next, an upsampled input signal is obtained by inversely transforming the expanded data.
- the audio encoding unit 14 receives a signal sampled at 16 kHz from the sampling conversion unit 12. Then, the received signal is encoded, and an encoded signal 19 is output.
- the speech encoding method used by the speech encoding unit 14 will be described taking a CELP (Code Excited Linear Prediction) method as an example, but the speech encoding method is not limited to this.
- CELP Code Excited Linear Prediction
- FIG. 2 is a block diagram showing a configuration of the speech encoding unit 14.
- the speech coding unit 14 includes a spectrum parameter / night coding unit 21, a target signal generation unit 22, an impulse response calculation unit 23, an adaptive codebook search unit 24, and a noise codebook search. Section 25, gain codebook search section 26, pulse position candidate setting section 27, broadband pulse position candidate 27a, narrowband pulse position candidate 27b, and sound source signal generation section 2 8
- the audio encoding unit 14 is a device that encodes the input audio signal 20 and outputs an encoded signal 19, and operates as follows.
- the spectral parameter encoding unit 21 analyzes the input speech signal 20 to extract the spectral parameters.
- the spectral parameter codebook stored in advance in the spectral parameter encoding unit 21 is searched.
- an index of a codebook that can better represent the spectrum envelope of the input audio signal is selected, and the selected index is output as a spectrum parameter overnight code (A). .
- the spectral parameter overnight code (A) is a part of the output code 19, and the spectral parameter overnight encoding unit 21 is not quantized corresponding to the extracted spectral parameter overnight. Outputs the LPC coefficient and the quantized LPC coefficient.
- the LPC coefficients that are not quantized and the LPC coefficients that have been quantized are simply referred to as spectral parameters.
- the LSP (Line Spectrum Pair) parameter is used as the spectrum parameter used when encoding the spectrum envelope.
- the parameter is not limited to this, and if it is a parameter that can express the spectrum envelope, it is used in the LPC (Linear Predictive Coding) coefficient, the K parameter, and G.722.2. It is also possible to use other parameters such as the ISF parameter.
- the audio signal 20, the spectral parameter output from the spectral parameter encoding unit 21, and the sound source signal from the sound source signal generating unit 28 are input to the target signal generating unit 22. Is done.
- the target signal generation unit 22 uses the signals input above to generate the target signal. Calculate the target signal X (n).
- the target signal a signal obtained by synthesizing an ideal sound source signal excluding the influence of past coding with a perceptually weighted synthesizing filter is used, but the present invention is not limited to this. It is known that this can be achieved by using the Spectral Parade.
- the impulse response calculator 23 calculates and outputs an impulse response h (n) from the spectrum parameters output from the spectrum parameter encoder 21.
- This impulse response can be typically calculated using a perceptually weighted synthetic filter H (z) with the following characteristics, which combines a synthetic filter with LPC coefficients and a perceptual weight filter.
- the means for calculating the impulse response is not limited to the one using the above-mentioned auditory weighted synthesis filter H (z).
- W (Z) is the perceptual weight filter
- LPC coefficient is Consisting of
- Adaptive codebook search section 24 receives spectral parameter output from spectral parameter encoding section 21 and target signal X (n) output from target signal generation section 22. You. Adaptive codebook search section 24 extracts the pitch period included in the speech signal from each of the input signals and the adaptive codebook stored in adaptive codebook search section 24. Then, through an encoding process, an index corresponding to the extracted pitch period is obtained, and an adaptive code (L) is output. The adaptive code (L) forms part of the output code 19.
- the adaptive codebook search section 24 receives the excitation signal generated by the excitation signal generation section 28 before searching for the adaptive codebook.
- the structure is such that the adaptive codebook is updated with signals.
- the past codec is stored in the adaptive codebook.
- adaptive codebook searching section 24 searches an adaptive codebook corresponding to the pitch period from the adaptive codebook, and outputs the result to excitation signal generating section 28. Furthermore, using this adaptive code vector and the synthetic file with auditory weight, An adaptive code vector is generated, and the generated adaptive code vector is output to gain codebook search section 26. Further, a second target signal X 2 (n) (hereinafter referred to as a target vector X 2) is generated by subtracting a signal component corresponding to the contribution of the adaptive codebook from the target signal X (n). Then, the generated target vector X 2 is output to noise codebook search section 25.
- the pulse position candidate setting unit 27 specifies the position of the pulse searched by the random codebook search unit 25 based on the notification from the control unit 15.
- the pulse position candidate setting section 27 determines whether the sample rate of the input audio signal is 16 kHz or 8 kHz (or whether the input signal is a wideband signal or a narrowband signal). Is received from the control unit 15. Then, in response to the received notification, one of the broadband pulse position candidate 27a and the narrowband pulse position candidate 27b is selected, and the selected pulse position candidate is output.
- the pulse position candidate setting unit 27 selects the wideband pulse position candidate 27 a.
- a narrow-band pulse position candidate 27 b is selected.
- the exceptional pulse position candidate 27 b for the narrow band which is different from the normal wideband speech coding processing, is miscellaneous.
- the operation of the voice coding unit 14 is controlled so that the phonetic codebook searching unit 25 searches.
- the conventional wideband speech coding method assumes only a 16 kHz sampling rate as an input speech signal. For this reason, if the input audio signal before encoding is a signal that has only the narrowband information of the 8 kHz sampling rate, then if this signal is to be encoded, the sampling rate of the 8 kHz sample signal is low. The only way to do this is to first upsample the input signal to 16 kHz and encode it as a normal wideband speech signal.
- pulse position candidates for representing a sound source signal are prepared at a high sampling rate corresponding to a wideband.
- the coding bit rate is less than, for example, 10 kbit / sec, it becomes impossible to allocate many bits to the pulse representing the excitation signal.
- the inefficient use of bits at the pulse position makes it difficult to establish a pulse to sufficiently represent the sound source signal. As a result, the sound quality of the encoded and reproduced audio signal is likely to be deteriorated.
- the sampling rate of the input speech signal is converted from a sampling rate of 8 kHz to a sampling rate of 16 kHz, and the speech coding is performed. Even if the input audio signal is input to the encoding unit 14, it has a function to identify whether the input audio signal is a wideband signal or a narrowband signal before encoding.
- Adapting section 14 can be adapted to either a wide band or a narrow band.
- the pulse position candidates for representing the sound source signal have the sampling rate reduced to, for example, 8 kHz. For this reason, it is possible to prevent a problem of using a bit even as a pulse position candidate having an unnecessarily fine resolution.
- the surplus bits can be used for other information. For example, it is possible to increase the number of pulses, which makes it possible to express a sound source signal more efficiently. Therefore, even at a low bit rate of about 10 to 6 kbit / sec, it is possible to encode an audio signal of higher quality with respect to an input signal of an 8 kHz sample rate. is there.
- pulse position candidates 2 7 c uses pulse position candidates 2 7 c at integer sample positions as pulse position candidates 27 b for wideband, and pulse position candidates 27 d at even sample positions as pulse position candidates 27 b for narrow band. It shows a configuration when used.
- each pulse has an amplitude of "+1" or "-1".
- the section for encoding the excitation signal is called a subframe, and here the subframe length is 64 samples. In the pull, each pulse is selected from 0 to 63 sample positions in the subframe.
- the integer sample positions 0 to 63 in the subframe are divided into four tracks.
- Each track contains only one pulse.
- pulse i0 is among pulse position candidates ⁇ 0, 4, 8, 12, 16, 20, 24, 28, 32 36, 40, 44, 48, 52, 56, 60 ⁇ included in track 1. Selected from one of the positions.
- encoding 16 pulses per track requires 4 bits for 16 possible pulse positions and 1 bit for pulse amplitude.
- the configuration of the algebraic codebook shown in FIG. 4 is an example, and the present invention is not limited to this.
- the four pulses are selected from the candidate integer sample positions in the subframe.
- each pulse is selected from candidate pulse positions located only at even-numbered sample positions among the sample positions 0 to 63 in the subframe. However, even if there are some odd sample positions in addition to the even sample positions as pulse position candidates, the essence is not impaired.
- the sound source signal is represented by five pulses, and each pulse has an amplitude of +1 or -1.
- pulse position candidates at which each pulse can be set are 0 in the subframe. Of the sample positions of ⁇ 63, they are located only at even-numbered sample positions.
- the even-numbered sample positions are divided into five tracks.
- Each track contains only one pulse.
- the noise i0 is selected from any one of the pulse position candidates ⁇ 0, 8, 16, 24, 32, 40, 48, 56 ⁇ included in the track 1.
- the configuration of the pulse position candidate 27 d of the even-numbered sample position is only an example, and various configurations of the track can be considered.
- the narrow-band pulse is selected from position candidates consisting of even-numbered sample positions in the subframe.
- pulse position candidates 2 7 c of integer sample positions as pulse position candidates 2 7 a for wideband and odd sample positions composed of odd sample positions as pulse position candidates 2 7 b for narrow band. This shows the configuration when the pulse position candidate 27 e is used.
- FIG. 7 shows a pulse position candidate 27 e at an odd-numbered sample position.
- Pulse position candidate 2 for odd sample position 7e is a configuration in which a pulse is selected from pulse position candidates arranged only at odd-numbered sample positions, and a similar effect can be obtained with this configuration.
- pulse position candidate 27 e at the odd sample position the sound source signal is represented by five pulses, and each pulse has an amplitude of "+1" or "-1".
- pulse position candidates where each pulse can be set are arranged only at odd-numbered sample positions among 0 to 63 sample positions in the subframe. In the subframe, the odd-numbered sample positions are divided into five tracks, and each track contains only one pulse.
- pulse i0 is selected from any one of the candidate pulse positions ⁇ 1, 9, 17, 25, 33, 41, 49, 57) included in track 1.
- the pulse for the narrow band is selected from the position candidates of the odd-numbered sample positions.
- the narrow-band pulse position candidate 27 b may have another configuration.
- the even sample position and the odd sample position are switched every subframe, or the even sample position and the odd sample position are switched. It is also possible to configure to switch between several sample positions for each of a plurality of subframes.
- the configuration is such that the narrow-band pulse position candidates are located at sample positions that are thinned out more than the wide-band pulse position candidates, and to a degree according to the ratio of the narrow-band bandwidth to the wide-band bandwidth. If the pulse position candidate is given at the thinning rate of, the pulse position candidate will work satisfactorily as a pulse position candidate used for a narrow-band sound source.
- the bandwidth of the narrowband audio signal is about 4 kHz (or a signal obtained by upsampling an input signal of 8 kHz originally to 16 kHz).
- the bandwidth of the wideband audio signal is about 8 kHz (in the case of a normal 16 kHz sampled signal).
- the sampling method for the narrow band sample position is determined by reducing the sampling rate to 1 Z 2 (of course, the sampling rate may be reduced to 1 Z 2 or more, such as 2 Z 3 or the like). It is sufficient if the configuration is such that the candidate is located. Therefore, the position of the narrow-band pulse position candidate is 27 b, and the position is thinned to 1/2 compared to the wide-band pulse position candidate 27 a.
- the narrowband speech signal can also have a wideband pulse position candidate 27 as shown in FIG. 4, for example.
- a pulse position candidate with the same high temporal resolution as that of a normal broadband signal such as a is used.
- a position candidate having a high temporal resolution only a few pulses can be generated with a limited number of pulses, and a few pulses are excessively generated in adjacent integer samples due to unnecessarily fine resolution. May concentrate on In this case, the pulse is not allocated to other positions, which is insufficient for the sound source signal. For this reason, the quality of the reproduced sound is deteriorated.
- the input audio signal is a wideband signal or a narrowband signal. If the input audio signal is a narrow-band signal, a low-resolution pulse position candidate suitable for the narrow-band signal is used. For this reason, it is possible to prevent bits for representing the pulse position from being wasted on the high-frequency signal. Furthermore, the pulse is limited so that it only rises at the position with low time resolution. Therefore, a plurality of pulses representing the sound source signal are not unnecessarily concentrated, and more pulses can be generated. Therefore, the device on the decoding side can reproduce higher quality audio.
- the random codebook search section 25 uses an algebraic codebook composed of pulse position candidates output from the pulse position candidate setting section 27 to generate a code vector with the minimum distortion. Search for the code of, that is, the random code (K).
- the algebraic codebook limits the possible values of the predetermined Np pulses to "+1" and "1-1", and outputs pulse position information and amplitude information (ie, polarity information).
- the pulse that has been generated according to is output as a code vector.
- the feature of the algebraic codebook is that instead of storing the code vector itself, it is sufficient to store only the information on the pulse position candidates and the pulse polarity. Therefore, the amount of memory representing the codebook can be reduced. Also, the noise component included in the sound source information can be represented with relatively high quality, despite the small amount of calculation for selecting the code vector.
- Such a method using an algebraic codebook for encoding a sound source signal is called an ACELP (Algebraic Code Excite d hinear Prediction method), and it is known that a synthesized speech with relatively little distortion can be obtained.
- ACELP Algebraic Code Excite d hinear Prediction method
- the random codebook search unit 25 includes the pulse position candidates output from the pulse position candidate setting unit 27 and the second target output from the adaptive codebook search unit 24.
- the signal X 2 and the impulse response h (n) output from the impulse response calculator 23 are input.
- the noise codebook search unit 25 evaluates the distortion of the synthesized code vector with the auditory weight and the second target vector X2. Then, an index that reduces the distortion, that is, a random code (K) is searched.
- the perceptually weighted synthesized code vector is generated using the code vector output from the algebraic codebook according to the pulse position candidates.
- the noise codebook search section 25 outputs the searched noise code (K), a code vector corresponding to the noise code (K), and a code vector synthesized with auditory weights.
- the noise code (K) forms part of the output code 19.
- X Hck ⁇ . (M ;. )
- mi is the position of the i-th pulse
- 0 j is the amplitude of the i-th pulse
- f (n) is an element of the correlation vector X 2 t H.
- the selection is completed.
- the pulse position mj to be searched is limited to the pulse position candidates set in the pulse position candidate setting section 27. It is. By doing so, even when the algebraic codebook is composed of the pulse position candidates output from the pulse position candidate setting section 27, it is possible to search for the algebraic codebook.
- the necessary values of f (n) and ⁇ (i, j) used for the code search are calculated in advance. In this way, the amount of calculation required for the code search becomes very small.
- the pulse position information selected in this way is output as a noise code (K) together with the pulse amplitude information. Further, the random codebook search unit 25 outputs a code vector corresponding to the noise code and a code vector combined with auditory weights.
- the gain codebook search unit 26 includes the perceptual weighted synthesized code vector output from the adaptive codebook search unit 24 and the perceptual weighted synthesized code output from the noise codebook search unit 25.
- the input sign vector is input.
- the gain codebook search unit 26 encodes two types of gains, a gain used for an adaptive code vector and a gain used for a code vector, to represent the gain component of the sound source. I do. For the sake of simplicity, the above two types of gain are simply called gain.
- the gain codebook search unit 26 converts the gain code (G), which is an index such that distortion between the perceptually weighted synthesized speech signal and the target signal (X (n) in this embodiment) is small. Explore. Then, the searched gain code (G) and the corresponding gain are output. The gain code (G) forms a part of the output code 19.
- the perceptually weighted synthesized speech signal is reproduced using a gain candidate selected from a gain codebook.
- Excitation signal generation section 28 includes adaptive code vector output from adaptive codebook search section 24, code vector output from noise codebook search section 25, and gain codebook search section 2. A sound source signal is generated using the gain output from 6.
- the adaptive code vector is multiplied by the gain for the adaptive code vector, and the code vector is multiplied by the gain for the code vector. Then, it is obtained by adding the adaptive code vector multiplied by the gain and the code vector multiplied by the gain.
- the method of generating the sound source signal is not limited to this.
- the obtained excitation signal is stored in the adaptive codebook in adaptive codebook search section 24 for use in adaptive codebook search section 24 in the next coding section. Furthermore, the generated excitation signal is also used by the target signal generation unit 22 to calculate a target signal for encoding in the next encoding section.
- FIG. 8 is a flowchart showing the speech encoding processing procedure and contents.
- the detector 110 determines whether the input voice signal is a wideband signal (step S10). If the result of the discrimination is that the signal is a wideband signal, coded data is generated by performing predetermined wideband coding (step S50), and the process ends. On the other hand, if the signal is identified as a narrowband signal, the exceptional processing is the sampling rate assumed by the wideband speech encoder (usually 16 bits). The conversion of the sampling rate of the input signal is performed so as to conform to KHz) (step S20). Next, the coded data is processed by performing a wideband speech encoding process in which the processing content is modified for the narrowband using a narrowband parameter sequence for performing an exceptional wideband speech encoding. It is generated (step S40), and the process ends.
- the part where the processing content is modified for the narrowband is at least a part of the wideband speech coding processing.
- One example is to correct pulse position candidates used in the random code search unit.
- FIG. 9 is a block diagram showing a configuration of the speech coder 14 according to the second embodiment of the present invention.
- the same parts as those in FIG. 2 are denoted by the same reference numerals, and detailed description thereof will be omitted.
- the speech coding unit 14 includes a parameter overnight order setting unit 31.
- the parameter overnight order setting unit 31 outputs the parameter overnight order.
- the spectrum parameter overnight encoding unit 21a performs the same operation as the spectrum parameter overnight encoding unit 21 according to the first embodiment, but the parameter overnight order is variable, and the parameter Overnight The parameter order output from the order setting unit 31 is input and used.
- the parameter order setting unit 31 sets the order of the LSP parameter overnight used by the spectrum parameter overnight encoding unit 21a based on the notification from the control unit 15. That is, upon receiving the notification that the sample rate of the input audio signal is 16 kHz, the parameter order setting unit 31 selects and outputs the LSP order for wideband. When receiving the notification that the frequency is 8 kHz, it selects and outputs the LSP order for the narrow band.
- the SP order can be limited to an appropriate level for a narrowband signal, and accordingly, the number of bits required for encoding a spectrum parameter can be reduced.
- the control operation of the control unit 15 in the second embodiment is substantially the same as the control operation of the control unit 15 according to the first embodiment (illustrated in the flowchart of FIG. 8).
- the wideband encoding process in step S50 is performed by setting the LSP order for the wideband in the parameter order setting unit 31 and causing the speech encoding unit 14 to perform the wideband speech encoding process. Is realized.
- the LSP order for the narrowband is set in the parameter overnight order setting unit 31 so that the speech encoding unit 14 performs the encoding process of the narrowband speech. This is achieved by:
- the wideband speech encoding method and apparatus are not limited to the first and second embodiments.
- the sampling rate of an input audio signal depending on the sampling rate conversion of the input audio signal, whether the input audio signal is a wideband signal or a narrowband signal.
- this identification information it is possible to adaptively control the number of parameters, the number of coding candidates, and the like used in the preprocessing section, adaptive codebook search section, pitch analysis section, or gain codebook search section.
- the present invention can be applied to bit rate control of variable rate wideband speech coding. That is, it is possible to efficiently control the bit rate of the wideband speech encoding means by identifying whether the input speech signal is a wideband signal or a narrowband signal. For example, if the input speech signal is a wideband signal, it is an input signal suitable for a wideband speech encoding unit, so that the encoding bit rate can be reduced to some extent. On the other hand, when the input voice signal is a narrowband signal, the coding efficiency tends to be poor because the signal is not normally assumed in the wideband voice coding unit as described above. In such a case, the bit rate of the encoding is controlled so that the bit rate becomes high. However, it is not necessary to control to increase the bit rate in the section where the input audio signal is silent.
- the control to increase the encoding bit rate is performed. Work on the judgment section. Then, the bit rate can be kept low in the section where the voice activity is low, so that the average bit rate can be reduced.
- the wideband speech coding apparatus even if the input speech signal is a wideband signal or a narrowband signal, it is possible to stably provide a certain level of quality or more. There is an effect that can be done.
- FIG. 11 is a block diagram showing an example of a wideband speech decoding apparatus according to the third embodiment of the present invention.
- Fig. 1 2 is the above wideband sound.
- FIG. 2 is a block diagram illustrating an example of a wideband audio encoding device that generates encoded audio data input to a decoding device.
- the wideband speech decoder is used in the receiving system, and the wideband speech encoder is used in the transmitting system. Also, the wideband audio decoding device is used for reproducing encoded data recorded as content.
- the wideband speech encoding apparatus 120 includes a speech input section 122, a band detection section 123, a control section 125, a sampling rate conversion section 124, It comprises an audio encoding unit 126 and an encoded data output unit 127.
- the audio input unit 122 receives the audio signal 122 and acquires identification information relating to the band of the input audio signal.
- the identification information can be obtained from the input audio signal, the acquisition route, the acquisition history, etc., but here, the explanation will be given using the example of acquiring from the sampling information of the input audio signal. I do.
- the audio input unit 122 sends the acquired sampling information to the band detection unit 123 and supplies the input audio signal to the sampling conversion unit 124.
- the audio input section 122 is not limited to the one for real-time communication that inputs audio from a microphone and performs A / D conversion. Evening data may be read and input.
- the identification information relating to the band can be obtained by, for example, reading attribute information attached to the audio information file from a header portion or the like.
- the band detecting section 123 receives the sampling information of the input audio signal output from the audio input section 122, and controls the band information detected based on the received sampling information. Output to 5
- the bandwidth information may be the sampling rate information itself, or may be sample rate mode information set in advance corresponding to the sampling rate information. For example, if the sample information of the audio signal assumed in the audio input unit 122 is two types of "16 kHz" or "8 kHz", the mode is set to "16 kHz”. Make “0" correspond. When the sampling rate information indicates "8 kHz", mode "1" is associated.
- the control unit 125 controls the sample rate conversion unit 124 and the voice encoding unit 126 based on the band information from the band detection unit 123. Specifically, if the input audio signal matches the sampling rate of the input audio signal assumed by the audio encoding unit 126, If not, the sampler of the input audio signal is converted so as to match this, and the converted input audio signal is input to the audio encoder 126. On the other hand, if the input audio signal matches the sampling rate of the input audio signal assumed by the audio encoding unit 126, the sampling conversion of the input audio signal is not performed. Then, the input audio signal is directly input to the audio encoding unit 126.
- the sampling rate of the input audio signal assumed by the audio encoding unit 126 is 16 kHz, and the sampling rate of the input audio signal output from the audio input unit 122 is 8 kHz.
- the input audio signal whose sampling rate is 8 kHz does not match the sampling rate of the input audio signal assumed by the audio encoding unit 126, so that the input audio signal whose sampling rate is 8 kHz is 16 kHz. Up-sampled to the sample rate of, and then input to speech coding unit 126.
- the sampling rate of the input audio signal assumed by the audio encoding unit 126 is 16 kHz, and the sampling rate of the input audio signal output from the audio input unit 122 is the same 16 kHz. In the case of z, it matches the sampling rate of the input audio signal assumed by the audio encoding unit 126. For this reason, the input audio signal is directly input to the audio encoder 126 without converting the sampling rate of the input audio signal.
- the audio encoding unit 126 encodes the input audio signal by a predetermined wideband audio encoding, and outputs corresponding encoded data to the encoded data output unit 127.
- Examples of the encoding algorithm used in the audio encoding unit 126 include ITU- A wideband speech coding of CELP system such as AMR-WB shown in T recommendation G.722.2.
- control unit 125 selects and reads out the coding parameter for wideband or narrowband from the built-in coding parameter memory based on the band identification information. Then, using the selected encoding parameter, the speech encoding unit 126 performs encoding.
- the band identification information is incorporated into a part of the encoded data by the encoded data output unit 127 and output. How to incorporate them is a matter of design.
- the band identification information can be output as side information as encoded data and data of another system. This is also a matter to be appropriately designed. In some cases, they may not be included.
- the wideband speech decoding apparatus 110 includes an encoded data input unit 117, a band detection unit 113, a control unit 115, and a speech decoding unit 116. It comprises a sample rate conversion section 114 and an audio output section 112.
- the coded data input unit 117 separates the input coded data into voice parameter data and band identification information, and outputs the voice parameter data to the voice decoding unit 116. Is sent, and the band identification information is sent to the band detector 113.
- Band detecting section 113 outputs band information detected based on band identification information to control section 115.
- the sample rate information itself may be used, or the mode information of the sample sample preset corresponding to this may be used. For example, if the sample information of the audio signal assumed by the audio input unit 122 is of two types, “16 kHz" or “8 kHz”, it is set to “16 kHz”. Corresponds to mode "0". If the sampling rate information indicates "8 kHz", mode "1" is associated.
- Prepare another mode for example, mode "unknown”
- the band identification information included in a part of the coded data or transmitted as a data accompanying the coded data is extracted by the coded data input unit 117. Is sent to the band detection unit 113.
- the format of the encoded data may be, for example, a data format in which the band identification information is received as a part of the encoded data, or a data format received along with the encoded data. In other embodiments, the band identification information may not be incorporated into a part of the encoded data.
- band identification information can be input from outside the wideband speech coding apparatus 123 by input means (not shown).
- a signal for example, an audio signal or a sound source signal
- a spectrum representing an outline of a spectrum of the audio signal is also possible to identify the band of the audio signal reproduced by decoding.
- FIG. 19 is an example of the configuration.
- the audio decoding unit 1 16 analyzes the frequency range represented by the spectral parameters, which represent the outline of the spectrum of the audio signal, for example, to thereby determine the bandwidth of the audio signal reproduced by the decoding unit. Can be identified.
- the identification information of the band thus extracted is sent to the band detector 113. In this way, control using the band identification information can be performed without transmitting the band identification information itself. As a result, information for incorporating the band identification information into a part of the encoded data can be eliminated.
- band identification information is extracted from data transmitted separately from the encoded data as side information from the encoding device side. May be.
- the decoding device side receives the band identification information SA and the audio signal or a spectrum representing an outline of the spectrum of the audio signal. (4) Compare with the band identification information SB obtained by analyzing the vector parameters. In this way, when the identification information SA and the identification information SB are different, it is possible to detect an error in the received data.
- the control unit 115 controls the audio decoding unit 116, the sampling conversion unit 114, and the audio output unit 112 based on the band information from the band detecting unit 113. I do. The specific control method will be described in the following description of the speech decoding unit 116, the sample-letter conversion unit 114, and the speech output unit 112.
- the audio decoding unit 116 receives the information of the audio parameter overnight code from the encoded data input unit 117, and reproduces the audio signal using these. At this time, the audio decoding unit 116 is controlled based on the band information from the control unit 115.
- the audio decoding unit 116 is controlled based on the band information from the control unit 115.
- speech decoding section 13 6 includes adaptive codebook 13 1, excitation signal generating section 13 2, synthesis filter section 13 3, and pulse position setting section 13 4. And a post-processing filter section 13.
- the control unit 135 has a built-in memory for the decoding unit parameter.
- the speech decoding unit 1336 will be described using an example in which speech decoding corresponding to a CELP-based wideband speech encoding scheme such as AMR-WB is used.
- the information of the input speech parameter code is composed of a spectrum parameter code A, an adaptive code L, a gain code G, and a noise code K.
- Adaptive codebook 13 1 stores the excitation signal output from excitation signal generation section 13 2, described later, as a past excitation signal in the codebook. Then, based on the adaptive code L, The source signal in the past with a pitch period of less than one is output as an adaptive code vector.
- the pulse position setting unit 134 generates a noise code vector corresponding to the noise code K.
- a random code vector can be generated using a predetermined algebraic codebook (also called an algebraic codebook).
- the noise code vector is composed of a small number of pulses.
- the pulse amplitude, polarity, and pulse position of each pulse constituting the random code vector are generated based on the random code K.
- the setting of the structure of the algebraic codebook is uniquely determined for each pit rate.
- the configuration of the structure of the algebraic codebook changes according to the band information even for the same bit rate.
- the control unit 135 has two types of pulse position candidates in the built-in memory for the decoding unit parameters. Then, a pulse position candidate corresponding to the band information is provided to the pulse position setting unit 134. Thereby, the setting of the pulse position of the algebraic codebook of the pulse position setting unit 134 is controlled. Using the pulse position candidates set in this way, a pulse is raised at a pulse position corresponding to the noise code K, and a noise code vector is generated and output by the pulse position setting unit 34.
- a configuration is shown in which two types of pulse position candidates are switched between “candidate pulse positions at even sample positions” and “candidate pulse positions at integer sample positions”. If the band information indicates a wide band, pulse position candidates for integer sample positions are set as in the past.
- the band information indicates a narrow band
- the band of the reproduced audio signal is a narrow band signal having no high frequency.
- the sampling rate for representing the noise code vector from which the sound source signal is generated can be sufficiently expressed by a sampling rate lower than that corresponding to the wideband signal. Therefore, when the band information indicates a narrow band, a pulse position candidate at a thinned sample position (a pulse position candidate at an even sample position in the example of FIG. 13) is set.
- the pulse position candidates of the sampled sample positions may be, for example, pulse position candidates of an odd-numbered sample position, and it is needless to say that the pulse position candidates are not limited thereto.
- the band information indicates a narrow band
- the number of bits required to represent pulse position information can be reduced, and the number of bits transmitted from the encoding side can be reduced.
- transmission of other information improves the sound quality and improves the bit error resistance of the bits reduced by the pulse position information.
- the reduced bits for the pulse position information can be used to generate more pulses or to increase the resolution of the pulse amplitude quantization. It is possible. This makes it possible to improve the sound quality even when decoding and reproducing a narrow-band signal by wide-band decoding at a low bit rate.
- the sound source signal generation unit 13 2 uses the gain code G to generate the gain used for the adaptive code vector from the adaptive code book 13 1 and the noise code from the pulse position setting unit 13 4. Find the gain used for the vector. Then, an excitation code signal is generated by adding the adaptive code vector to which the gain has been added and the noise code vector.
- the sound source signal is input to the synthetic filter section 13 3 and the adaptive codebook 13 1.
- the synthesis filter 133 decodes the spectral parameter representing the outline of the spectrum of the audio signal from the spectral parameter code A, and uses the decoded parameter to calculate the filter coefficient of the synthetic filter. ⁇ Input the sound source signal from the sound source signal generation unit 132 into the synthesis filter composed using the filter coefficients thus obtained. ⁇ In this way, the synthesis filter 13 An audio signal is generated as output.
- the post-processing filter section 1338 shapes the spectrum of the audio signal generated by the synthesis filter 133. As a result, a speech signal whose subjective sound quality has been improved can be output from the speech decoding unit.
- the typical post-processing filter unit 13 8 uses the spectral parameter or the filter coefficient of the synthetic filter to calculate the spike of the voice signal.
- the outline of the tor is shaped. Irregularities in the shape of the spectrum based on the outline of the spectrum of the audio signal Among them, the coding noise existing at the valley frequency is suppressed, and the coding noise existing at the peak frequency is allowed to some extent. By doing so, shaping is performed such that the coding noise is masked by the audio signal and is hardly heard by human ears.
- the reproduced audio signal is output from the audio decoding unit 1336.
- the sample rate converter 114 receives the audio signal output from the audio decoder. If the bandwidth information indicates a wide band based on the bandwidth information from the control unit 115, the audio signal is output without sampling conversion of the audio signal from the audio decoding unit 116. Output to section 1 1 2.
- the audio signal from the audio decoding unit input to the sample rate conversion unit 114 is a narrowband signal having no high frequency. It can be seen that it is.
- the sample rate conversion section 114 converts the audio signal input from the audio decoding section at a sample rate (typically 16 kHz sampling) corresponding to a wideband signal into a narrowband signal. The signal is converted to a low sampling rate (typically 8 kHz sampling) for output.
- the sampling rate of the audio signal from the audio decoding unit is converted (down-sampling in the above example) according to the band information detected in this way. As a result, the sampling rate corresponding to the actual frequency band included in the audio signal is sufficient.
- the audio signal can be obtained overnight.
- the wideband audio decoding results in an excessively high sampling rate for the wideband audio, which increases the audio signal data rate. This can be avoided by using the present invention.
- the audio output unit 112 receives the audio signal from the sample conversion unit 114 and inputs the audio signal from the sample conversion unit 114 at each sample at the timing corresponding to the sample rate corresponding to the band information from the control unit 115. Outputs audio 1 1 1
- the audio output unit 112 includes, for example, a D / A conversion unit and a driver.
- the audio output unit 112 converts the audio signal from the sampling conversion unit 114 into an analog electric signal based on the wide and narrow band identification information from the control unit 115. The signal is converted to a signal, and a speaker (not shown) is driven as shown in FIG. 11 to output sound.
- Fig. 16 is a flowchart showing the operation of the broadband speech decoding apparatus according to the third embodiment of the present invention.
- the band detecting unit 113 acquires band information incorporated in the encoded data and transmitted (step S61). Then, based on the acquired band information, it is determined whether to perform the processing for the wide band or the narrow band (step S62).
- control unit 115 corrects a predetermined parameter used for decoding in the audio decoding unit 116 for the narrow band. Then, a voice signal is generated by the voice decoding unit 116 from the input encoded data (step S63), and the process is terminated.
- the control unit 115 sets the predetermined parameter used for decoding in the speech decoding unit 116 to the wide band. Then, an audio signal is generated by the audio decoding unit 116 from the input encoded data (step S64), and the process ends.
- an appropriate decoding parameter is selected based on band information.
- a wideband or narrowband audio signal is generated by the wideband audio decoding process, a high-quality audio signal corresponding to the band information can be decoded.
- the fourth embodiment of the present invention is characterized in that a sound source signal generated in decoding is corrected in accordance with the detected band information of a wide band or a narrow band.
- FIG. 14 is a block diagram showing the configuration of a speech decoding unit 146 and a control unit 145 used to correct a sound source signal generated in decoding.
- the configuration of the speech decoding unit 144 in FIG. 14 is characterized in that a sound source correction unit 144 is provided between the sound source signal generation unit 142 and the synthesis filter unit 144. There is.
- the pulse position setting section 144 sets the pulse position candidates according to the conventional method.
- the sound source correction unit 147 uses the pitch periodicity or formant of the sound source signal generated by the sound source signal generation unit 1442 in order to reduce the perceived noise caused by quantization. It adjusts the strength or absence of emphasis.
- the decoding parameter memory 144a built into the control unit 144 includes "sound source correction parameters (for wideband)" used for decoding wideband audio signals and decoding of narrowband audio signals.
- the “parameter for sound source correction (for narrow band)” used for the conversion is stored so that it can be selectively read out.
- the control unit 145 Based on the identification information of the bandwidth, the control unit 145 sends “Parameters for sound source correction (for broadband)” or “Parameters for sound source correction” from the built-in memory for decoding parameters 144a. Evening (for narrow band) ”and send it to the sound source correction section 144.
- the sound source modification section 1.47 can set the periodicity of the corresponding pitch and the strength of the formant emphasis or the presence or absence of the corresponding formant. As a result, the effects of quantization noise can be appropriately reduced.
- the wideband audio decoding is performed in comparison with the case where the wideband audio signal is decoded based on the band identification information. Since it is presumed that the deterioration of the sound source signal generated in step (1) is large, it is preferable to relatively strongly correct the sound source signal.
- the method of modifying the excitation signal generated in the decoding is not limited to the configuration of FIG. ig ⁇ 11 or Fig. 12 may be used.
- Fig. 11 1 shows that the sound source correcting section 47a corrects the adaptive code vector from the adaptive codebook 41, and by using the corrected adaptive code vector, Represents the generated configuration.
- the adaptive code vector that forms the excitation signal is modified according to whether the band information indicates a wide band or a narrow band.
- the sound source signal is modified according to whether the band information is broadband or narrowband.
- the sound source correction unit 47 b corrects the noise code vector (in this example, the code vector generated from the algebraic codebook) from the pulse position setting unit 44, This shows a configuration in which a corrected sound source signal is generated by using a corrected noise code vector.
- the noise code vector that forms the source signal is modified according to whether the band information indicates a wide band or a narrow band. Therefore, as a result, the sound source signal is corrected according to whether the band information is broadband or narrowband.
- the present invention is included as long as the sound source signal is modified according to whether the band information is broadband or narrowband.
- the sound source signal can be adaptively modified according to the width of the band of the reproduced audio signal. For this reason, the effect of quantization noise can be appropriately reduced.
- the periodicity of the pitch by the post-processing filter of the synthesized speech signal or the strength of the formant is selected according to the distinction between the wide band and the narrow band obtained from the band identification information.
- the speech decoding unit is configured so as to be able to perform the decoding.
- FIG. 15 is a block diagram illustrating a configuration of a control unit 1555 including the speech decoding unit 1556 and a decoding parameter overnight memory 1555a related thereto.
- speech decoding section 15 6 includes adaptive codebook 15 1, excitation signal generation section 15 2, synthesis filter section 15 3, pulse position setting section 15 4, And a post-processing filter section 158.
- the pulse position setting section 144 is the same as the pulse position setting section 144 of FIG.
- adaptive codebook 151, excitation signal generation section 152 and synthesis filter section 153 are adapted codebook 131, excitation signal generation section 132 of FIG. It is the same as Sei Phil Yube 1 3 3.
- the decoding parameter memory 1505a built into the control unit 155 includes “post-processing parameter data (for wideband)” used for decoding wideband audio signals.
- the “post-processing parameters (for narrow band)” used for decoding the band audio signal is stored so as to be selectively read.
- control unit 155 sends “post-processing parameter overnight (for wide band)” or “post-processing” from the built-in decoding parameter overnight memory 150 a. ”(For narrowband)” and send it to the post-processing filter section 158.
- the post-processing filter section 158 controls the periodicity of the corresponding pitch or the presence or absence of the emphasis of the formant when processing the wideband audio signal or the narrowband audio signal from the synthesis filter section 153. Can be set. As a result, even if the decoded speech signal is a wideband speech signal or a narrowband speech signal, the effect of quantization noise can be appropriately reduced ⁇ As a specific example, when it is known that the narrowband audio signal is decoded by the band identification information, the wideband audio signal is decoded compared to when the wideband audio signal is decoded by the band identification information. It is estimated that the degradation of the audio signal output from the synthesis filter during decoding is large. For this reason, it is preferable to control the parameters used in the post-processing filter so that the correction of the audio signal is performed relatively strongly.
- the adaptive boost filter is composed of a formant post filter 190, a tilt compensation filter 191, and a gain adjustment section 192, for example. It is not limited to this.
- the configuration of the adaptive boost fill may further include a pitch emphasis fill.
- the processing of the adaptive boss fill is performed as follows as an example. First, the audio signal from the synthesized filter is passed through a formant post filter 190, and the output signal is passed through a slope compensation filter 191. Then, the output signal from the slope compensation filter is input to the gain adjustment section 1992 to perform gain adjustment. As a result, an audio signal which is an output of the adaptive post filter is obtained.
- the processing order inside the adaptive post filter is not limited to this, and the audio signal from the synthesis filter is first passed through the gradient compensation filter, and the gain compensation processing is performed by the adaptive boss.
- Various configurations can be adopted, such as a configuration that is performed in the first stage or middle stage of the processing of a towel.
- the control unit 15 5 controls the parameters used in the formant post filter 190 according to the band identification information, so that the audio spectrum is controlled. This shows a configuration in which the degree of emphasizing the outline is controlled.
- the boost fill is updated for each subframe obtained by dividing the frame. For example, as a typical example when a frame for speech decoding is 20 ms, a subframe length of 5 ms or 10 ms is often used.
- the formant post filter 160 (H f (z)) is given by, for example, the following equation.
- I / ⁇ ⁇ ( ⁇ ) represents the approximate shape of the spectrum of the reproduced audio signal (also called the spectrum envelope), and the formant post-filter H f
- the characteristic of (z) is determined. Normally, parameters n and d are 0 ⁇ rn ⁇ 1, and 0 ⁇ rd ⁇ l.
- the formant post filter H f (z) is a characteristic that emphasizes the outline of the: spectrum of the audio signal. Also, the degree of emphasizing the outline of the spectrum of the audio signal can be changed according to the values of rn and rd.
- the parameter set is a formant postfill that has a greater degree of emphasizing (correcting) the outline of the spectrum of the audio signal than the first parameter set. By switching the parameters (set) in this way, the characteristics of the adaptive postfill are corrected.
- the parameter (set) is switched so that the degree of enhancement (correction) by the adaptive postfill is increased.
- Use rd 0.7).
- Use rd 0.55).
- the sound quality can be improved by enhancing the outline of the spectrum with an appropriate strength.
- the quality of a broadband speech signal tends to be small, so it is not necessary to emphasize the outline of the spectrum much. Try to use a small number of parameters (sets), so that the spectral shape can be appropriately enhanced depending on whether narrowband or broadband speech is generated. Even at low bit rates, high-quality audio can be provided stably.
- the numerical values of the first and second parameter sets described above are not limited to these.
- k1 ' is obtained from the truncation of the impulse response at the length Lh (for example, about 20).
- Lh for example, about 20
- the present invention is not limited to this.
- the gain adjustment section 192 receives the output signal from the slope compensation filter and performs gain adjustment.
- the gain adjustment section 1992 is used to compensate for the difference between the audio signal from the synthesized filter, which is the input signal of the boost filter, and the gain of the output signal processed by the post filter. Calculate the value. Then, based on this calculation result, the gain of the bottle fill itself is adjusted. By doing so, it is possible to adjust the audio signal input to the boost filter and the audio signal output from the boost filter to be approximately the same.
- a formant boost filter was used as the correction of the audio signal using the post-processing filter, but the present invention is not limited to this.
- parameters related to either a pitch emphasis filter for enhancing the pitch periodicity of an audio signal, a slope compensation filter, or a gain adjustment process are determined according to whether the band information is broadband or narrowband. By modifying, adaptation is also possible depending on the configuration in which the audio signal is modified.
- the gist of the present invention is characterized in that the audio signal is adaptively modified according to whether the band information is a wide band or a narrow band. If an adaptive post-processing configuration according to the gist is adopted, Needless to say, it is included in the invention.
- the outline of the spectrum of the audio signal is determined by the post-processing filter according to whether the band information of the detected audio signal is broadband or narrowband. Is adaptively shaped, so that the effect of quantization noise included in the audio signal can be appropriately reduced.
- the feature of the present invention in the sixth embodiment is that the audio decoding unit 166 generates a low-band (Lower-B and) generating unit 166a (generates a low-frequency side audio signal. , Which generates a low-frequency audio signal of about 6 kHz or less) and an eight-band (H1116 1 "-8 311 € 1) generator 1 66 ⁇ (generates a high-frequency signal. Typically, it generates an audio signal on the high-frequency side of the band of about 6 kHz to 7 kHz.) And distinguishes the detected band information between broadband and narrowband. Accordingly, the high-frequency signal in the speech decoding unit is corrected by controlling the Higher-B and generation unit 16 6 b, or the high-frequency signal generation process is corrected. It is in.
- the high band signal from the generator 166 b and L ower L The main point is to make a correction so as not to be added to the signal from the generator 1666a.
- Lower — B and generation section 16 6 a is composed of adaptive codebook 16 1, pulse position setting section 16 4, excitation signal generation section 16 2, synthesis filter section 16 3, and post-processing It comprises a filter section 168 and an upsampling section 169.
- the Lower-B and generator 1666a converts the audio signal using the adaptive codebook 161, the pulse position setting unit 164, the excitation signal generator 162, and the synthesis filter unit 16'3. Generate.
- the generated audio signal is processed by the post-processing filter section 168, and is thereby included in the audio signal.
- a low-frequency audio signal in which coding noise is shaped is generated.
- about 12.8 kHz is typically used as the sampling rate of the audio signal.
- the generated audio signal is input to the up-sampling section 169, and is up-sampled to the same sample rate as the Higher-B and signal (typically, 16 kHz). You.
- the low-frequency side audio signal upsampled to 16 kHz in this way is output from the Low 1- — B and generation section 16 6 a, and sent to the Higher — B and generation section 66 b. Entered.
- the Higher-B and signal generator 1666b is composed of a Higher-B and signal generator 1666bl and a Higher-B and signal adder 1666b2.
- the Higher_B and signal generator 1666 bl uses the information of the synthesized filter that represents the outline of the spectrum shape of the low-frequency side audio signal used in the synthesized filter 16 3. Then, a high-frequency synthesis filter that represents the spectrum shape of the high-frequency signal is generated. And this generated The high-frequency sound source signal whose gain has been adjusted is input to the synthesized filter, and the synthesized signal is passed through a predetermined band-pass filter to generate a high-frequency signal. The gain of the high-frequency sound source signal is adjusted based on the energy of the low-frequency sound source signal and the slope of the spectrum of the low-frequency sound signal.
- the H igher — B and signal adding section 1 6 6 b 2 adds the H igher — B and signal generating section 1 6 6 b to the low-frequency side audio signal input from the Lower-B and generating section 1 6 6 a. Generate a signal to which the high-frequency signal generated in step 1 is added. Then, the generated signal is input to the sample conversion unit 1104 as an output from the speech decoding unit 1666.
- the sample conversion section 111 has the same function as the sample conversion section 114 of F i.11.
- the sample rate converter 111 receives the audio signal output from the audio decoder 166. If the bandwidth information indicates a wide band based on the bandwidth information output from the control unit 165, the audio signal from the audio decoding unit is output as it is without performing sampling conversion. Output to the section.
- the audio signal from the audio decoding unit input to the sample rate conversion unit 111 does not have a high frequency. It can be seen that this is a band signal.
- the sample rate converter 1104 converts the audio signal (typically 16 kHz sampling) input from the audio decoder into a low-level signal for a narrowband signal. It is converted to a ring (typically 8 kHz sampling) and output.
- the control unit 16 5 controls the H igher — B and generation unit 16 6 b to control the H igher one B and Prevents the high-frequency signal from the generator from being added to the signal from the Lower-B and generator.
- the processing for generating the H igher — B and signal is not performed in the H igher — B and signal generation section 16 6 b 1, or the H igher — B and the generated H Correct and output the B and signal so that it becomes zero or a small value.
- the signal from the Lower-B and generator is added to the signal from the Lower-B and generator without adding the Higher-B and signal.
- a method of outputting as it is may be used.
- the third, fourth, and fifth units are added to the low-frequency side speech decoding unit (Lower-B and generation unit 1666a in FIG. 24). It goes without saying that each of the inventions described in the embodiments can be used.
- the control signal from the control unit 16 5 (indicated by a dotted arrow in FIG. 24) is Low-Ba
- the input is to be input to the nd section 16a.
- Examples of control signals (shown by dotted arrows) input to the Lower-B and section 16 6a are shown in FIG. 26 (controlling the pulse position setting section) and FIG. 27 (sound source Signal (control signal) and Fig. 28 (control the post-processing filter). These correspond to FIG. 13 in the third embodiment, FIG. 14 in the fourth embodiment, and FIG. 15 in the fifth embodiment, respectively. Detailed description is omitted.
- the wideband speech decoding unit is composed of a Lower-B and generator (generating a low-frequency side audio signal) and a Higher-B and generator (generating a high-frequency signal)
- a method may be employed in which the control of the Higher-B and generator is not performed by using any of the inventions described in the third, fourth and fifth embodiments for the Lower-B and generator. Also in this case, the same effects as those of the inventions described in the third, fourth, and fifth embodiments can be obtained.
- FIG. 27 and FIG. 28 show a configuration example of the invention in which the control signal indicated by the dotted arrow output from the control unit 165 is used. (Control for the Lower-B and generator) and no control signal indicated by the solid arrow (Hig li er —control for the generator).
- the seventh embodiment of the present invention in that the processing in the sampling Ngure preparative conversion unit based on the bandwidth information is controlled is the same as the sampling down gray preparative converter 2 4 described above.
- the seventh embodiment of the present invention is characterized in downsampling processing in the sampling conversion unit.
- the band information used is from the band detection unit.
- downsampling can be performed by controlling the sample rate conversion unit based on the band information. For this reason, when the band information indicates a narrow band, the fact that the audio signal input to the sampling converter is guaranteed to be a narrow band signal is used. It is possible to downsample a signal by thinning out the signal without limiting the bandwidth. As a result, there is no need for a band-limiting filter, so that there is an effect that the output signal is not delayed by downsampling processing. Also, since the band limiting filter is not used, the effect of reducing the amount of calculation can be obtained. is there.
- the audio signal input to the sampling converter is band-limited to a narrow band, and then the signal is decimated and down-sampled. This has the effect of minimizing the effect of frequency aliasing due to noise.
- FIG. 25 shows the configuration of the control unit 1665 and the sampling rate conversion unit 1104.
- Band information from the band detector is input to the controller 165. This band information indicates whether the audio signal (typically an audio signal of 16 kHz sampling) generated by the decoding unit is a narrowband signal or a wideband signal.
- the band information used is obtained from the band identification information in the band detection unit.
- the band identification information as shown in FIG. 20, for example, apart from the coded data, information transmitted from the transmitting side as side information is used, but is not limited to this. Absent.
- the information in which the band identification information is transmitted as a data accompanying the encoded data Alternatively, a signal (for example, an audio signal or a sound source signal) reproduced inside the audio decoding unit as shown in FIG. 19, or a spectrum representing an outline of a spectrum of the audio signal. Overnight at Toparparame As described above, it is one method to obtain band identification information based on the above.
- the control unit 165 controls the switching unit 1107 to set the switch in the switching unit to the downsampling unit 1. Connect to 106 side. As a result, the audio signal input to the sampling converter 1104 is input to the downsampler 1106.
- the down-sampling section 1106 reduces the input audio signal (typically a 16 kHz sampling audio signal) and downsamples the audio signal (typically 8 kHz). Sampling audio signal) is generated and output to the audio output unit. At this time, the signal decimation process in the downsampling unit 116 is simply thinning out the signal without using the band-limited filtering process. I do.
- the input audio signal of 16 kHz sampling is converted to 2 samples.
- An 8 kHz sampling audio signal can be generated by thinning out the signal regularly at a ratio of 1. In other words, only the odd-numbered samples or the even-numbered samples of the 16 kHz sampling audio signal are used as they are and output as the 8 kHz sampling audio signal.
- control unit 165 sets the sampling rate conversion unit 11 1 W
- the switch of the switching unit 1107 is controlled so that the audio signal (typically the audio signal of 16 kHz sampling) input to 04 is output to the audio output unit as it is.
- the audio signal typically the audio signal of 16 kHz sampling
- FIG. 18 is a flow chart illustrating a processing example of the invention according to the seventh embodiment.
- step S81 bandwidth information is obtained.
- wideband speech decoding is performed in step S82. Before or after this, it is determined whether or not the band information indicates a narrow band in step S83. At this time, if it is determined that the band is narrow, the audio signal generated by the wideband audio decoding process is downsampled by thinning out the signal without using the band limiting filter in step S84. And output the generated signal. On the other hand, if it is determined in step S83 that the signal is not in the narrow band, the audio signal generated by the wideband audio decoding process is output as it is.
- the seventh embodiment can be used together with the respective methods shown in the third, fourth, fifth, and sixth embodiments described above. That is, the methods shown in the respective embodiments can be used alone, or a plurality of methods can be used in combination.
- FIG. 17 is a flowchart illustrating a processing example when the method according to the seventh embodiment and the method according to the third embodiment are used together.
- step S71 bandwidth information is obtained.
- step S72 it is determined whether the band information indicates a narrow band. At this time, if it is determined that the band is not narrow band, in step S73, the first wideband speech decoding processing (normal wideband speech decoding processing using parameters for wideband) is performed.
- step S74 the second wideband speech decoding process (the wideband speech decoding process in which the parameters are corrected for a narrow band) is performed. )' I do. Then, in step S75, a downsampled audio signal is generated and output from the audio signal generated by this process by the thinning process without using the band-limited filter.
- the method according to the seventh embodiment is more effective when used in combination with the method according to the sixth embodiment. That is, using the method in the sixth embodiment. If it is found that the audio signal generated by the decoding unit is a narrowband signal based on the detected band information, the decoding unit 1666 outputs The control unit controls the high-frequency signal from the high-band generator (166b) (even if a narrow-band audio signal is generated, it is not completely zero) in the audio signal to be output. I do. For this reason, it is possible to generate a narrow-band audio signal having even less high-frequency signal components as an output of the decoding unit.
- frequency aliasing occurs when downsampling is performed by thinning out without band-limiting filtering. Is smaller than when the method according to the seventh embodiment is used alone, which has the effect of improving sound quality.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/240,495 US7788105B2 (en) | 2003-04-04 | 2005-10-03 | Method and apparatus for coding or decoding wideband speech |
US12/751,292 US8160871B2 (en) | 2003-04-04 | 2010-03-31 | Speech coding method and apparatus which codes spectrum parameters and an excitation signal |
US12/751,421 US8260621B2 (en) | 2003-04-04 | 2010-03-31 | Speech coding method and apparatus for coding an input speech signal based on whether the input speech signal is wideband or narrowband |
US12/751,191 US8249866B2 (en) | 2003-04-04 | 2010-03-31 | Speech decoding method and apparatus which generates an excitation signal and a synthesis filter |
US13/417,906 US8315861B2 (en) | 2003-04-04 | 2012-03-12 | Wideband speech decoding apparatus for producing excitation signal, synthesis filter, lower-band speech signal, and higher-band speech signal, and for decoding coded narrowband speech |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2003101422A JP4580622B2 (ja) | 2003-04-04 | 2003-04-04 | 広帯域音声符号化方法及び広帯域音声符号化装置 |
JP2003-101422 | 2003-04-04 | ||
JP2004-071740 | 2004-03-12 | ||
JP2004071740A JP4047296B2 (ja) | 2004-03-12 | 2004-03-12 | 音声復号化方法及び音声復号化装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/240,495 Continuation US7788105B2 (en) | 2003-04-04 | 2005-10-03 | Method and apparatus for coding or decoding wideband speech |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2004090870A1 true WO2004090870A1 (ja) | 2004-10-21 |
Family
ID=33161508
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2004/004913 WO2004090870A1 (ja) | 2003-04-04 | 2004-04-05 | 広帯域音声を符号化または復号化するための方法及び装置 |
Country Status (2)
Country | Link |
---|---|
US (5) | US7788105B2 (ja) |
WO (1) | WO2004090870A1 (ja) |
Families Citing this family (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7987095B2 (en) * | 2002-09-27 | 2011-07-26 | Broadcom Corporation | Method and system for dual mode subband acoustic echo canceller with integrated noise suppression |
WO2004090870A1 (ja) * | 2003-04-04 | 2004-10-21 | Kabushiki Kaisha Toshiba | 広帯域音声を符号化または復号化するための方法及び装置 |
JP4887282B2 (ja) * | 2005-02-10 | 2012-02-29 | パナソニック株式会社 | 音声符号化におけるパルス割当方法 |
US8326614B2 (en) * | 2005-09-02 | 2012-12-04 | Qnx Software Systems Limited | Speech enhancement system |
JP5159318B2 (ja) * | 2005-12-09 | 2013-03-06 | パナソニック株式会社 | 固定符号帳探索装置および固定符号帳探索方法 |
US7590523B2 (en) * | 2006-03-20 | 2009-09-15 | Mindspeed Technologies, Inc. | Speech post-processing using MDCT coefficients |
JP4976381B2 (ja) * | 2006-03-31 | 2012-07-18 | パナソニック株式会社 | 音声符号化装置、音声復号化装置、およびこれらの方法 |
US20090240494A1 (en) * | 2006-06-29 | 2009-09-24 | Panasonic Corporation | Voice encoding device and voice encoding method |
US8135047B2 (en) * | 2006-07-31 | 2012-03-13 | Qualcomm Incorporated | Systems and methods for including an identifier with a packet associated with a speech signal |
KR100922897B1 (ko) * | 2007-12-11 | 2009-10-20 | 한국전자통신연구원 | Mdct 영역에서 음질 향상을 위한 후처리 필터장치 및필터방법 |
JP5519230B2 (ja) | 2009-09-30 | 2014-06-11 | パナソニック株式会社 | オーディオエンコーダ及び音信号処理システム |
US10288617B2 (en) * | 2009-10-26 | 2019-05-14 | Externautics Spa | Ovary tumor markers and methods of use thereof |
ES2501840T3 (es) * | 2010-05-11 | 2014-10-02 | Telefonaktiebolaget Lm Ericsson (Publ) | Procedimiento y disposición para el procesamiento de señales de audio |
JP5325340B2 (ja) * | 2010-07-05 | 2013-10-23 | 日本電信電話株式会社 | 符号化方法、復号方法、符号化装置、復号装置、プログラム、及び記録媒体 |
JP5589631B2 (ja) * | 2010-07-15 | 2014-09-17 | 富士通株式会社 | 音声処理装置、音声処理方法および電話装置 |
EP2626856B1 (en) | 2010-10-06 | 2020-07-29 | Panasonic Corporation | Encoding device, decoding device, encoding method, and decoding method |
US9767822B2 (en) * | 2011-02-07 | 2017-09-19 | Qualcomm Incorporated | Devices for encoding and decoding a watermarked signal |
RU2580924C2 (ru) | 2011-02-14 | 2016-04-10 | Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. | Представление информационного сигнала с использованием преобразования с перекрытием |
PT2676267T (pt) * | 2011-02-14 | 2017-09-26 | Fraunhofer Ges Forschung | Codificação e descodificação de posições de pulso de faixas de um sinal de áudio |
AU2012217269B2 (en) | 2011-02-14 | 2015-10-22 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for processing a decoded audio signal in a spectral domain |
EP2676264B1 (en) | 2011-02-14 | 2015-01-28 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio encoder estimating background noise during active phases |
CA2920964C (en) | 2011-02-14 | 2017-08-29 | Christian Helmrich | Apparatus and method for coding a portion of an audio signal using a transient detection and a quality result |
CN103620672B (zh) | 2011-02-14 | 2016-04-27 | 弗劳恩霍夫应用研究促进协会 | 用于低延迟联合语音及音频编码(usac)中的错误隐藏的装置和方法 |
RU2575993C2 (ru) | 2011-02-14 | 2016-02-27 | Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. | Основанная на линейном предсказании схема кодирования, использующая формирование шума в спектральной области |
TR201910075T4 (tr) | 2011-03-04 | 2019-08-21 | Ericsson Telefon Ab L M | Nicemleme sonrası kazanım düzeltmeli ses dekoderi. |
KR102105044B1 (ko) * | 2011-11-03 | 2020-04-27 | 보이세지 코포레이션 | 낮은 레이트의 씨이엘피 디코더의 비 음성 콘텐츠의 개선 |
US9437213B2 (en) * | 2012-03-05 | 2016-09-06 | Malaspina Labs (Barbados) Inc. | Voice signal enhancement |
US9026065B2 (en) * | 2012-03-21 | 2015-05-05 | Raytheon Company | Methods and apparatus for resource sharing for voice and data interlacing |
US9349383B2 (en) | 2013-01-29 | 2016-05-24 | 2236008 Ontario Inc. | Audio bandwidth dependent noise suppression |
EP2760022B1 (en) * | 2013-01-29 | 2017-11-01 | 2236008 Ontario Inc. | Audio bandwidth dependent noise suppression |
CN108364657B (zh) | 2013-07-16 | 2020-10-30 | 超清编解码有限公司 | 处理丢失帧的方法和解码器 |
US9801115B2 (en) * | 2013-09-04 | 2017-10-24 | Qualcomm Incorporated | Robust inter-radio access technology operations in unlicensed spectrum |
HRP20240674T1 (hr) | 2014-04-17 | 2024-08-16 | Voiceage Evs Llc | Postupci, koder i dekoder za linearno prediktivno kodiranje i dekodiranje zvučnih signala pri prijelazu između okvira koji imaju različitu brzinu uzorkovanja |
KR102244612B1 (ko) | 2014-04-21 | 2021-04-26 | 삼성전자주식회사 | 무선 통신 시스템에서 음성 데이터를 송신 및 수신하기 위한 장치 및 방법 |
CN105225666B (zh) * | 2014-06-25 | 2016-12-28 | 华为技术有限公司 | 处理丢失帧的方法和装置 |
US9979831B2 (en) * | 2015-03-30 | 2018-05-22 | Mediatek Inc. | Method for cellular text telephone modem operation with sampling rate conversion and machine readable medium |
US10049684B2 (en) * | 2015-04-05 | 2018-08-14 | Qualcomm Incorporated | Audio bandwidth selection |
JP6734394B2 (ja) * | 2016-04-12 | 2020-08-05 | フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン | 高位周波数帯域における検出されたピークスペクトル領域を考慮してオーディオ信号を符号化するオーディオ符号器、オーディオ信号を符号化する方法、及びコンピュータプログラム |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6143796A (ja) * | 1984-08-08 | 1986-03-03 | カシオ計算機株式会社 | 音声記録装置 |
JPH0537674A (ja) * | 1991-07-29 | 1993-02-12 | Fujitsu Ltd | 電話音声・低速モデム・フアクシミリ信号用符号・復号器 |
JPH07212320A (ja) * | 1994-01-14 | 1995-08-11 | Oki Electric Ind Co Ltd | 音声帯域信号パケット化装置 |
JPH09127994A (ja) * | 1995-10-26 | 1997-05-16 | Sony Corp | 信号符号化方法及び装置 |
JPH09127985A (ja) * | 1995-10-26 | 1997-05-16 | Sony Corp | 信号符号化方法及び装置 |
JP2000181494A (ja) * | 1998-12-11 | 2000-06-30 | Sony Corp | 受信装置及び方法、通信装置及び方法 |
JP2000206995A (ja) * | 1999-01-11 | 2000-07-28 | Sony Corp | 受信装置及び方法、通信装置及び方法 |
JP2001215999A (ja) * | 1999-12-21 | 2001-08-10 | Texas Instr Inc <Ti> | サブバンド音声コーディングシステム |
JP2001337700A (ja) * | 2000-05-22 | 2001-12-07 | Texas Instr Inc <Ti> | 広帯域音声符号化システムおよびその方法 |
Family Cites Families (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4330689A (en) * | 1980-01-28 | 1982-05-18 | The United States Of America As Represented By The Secretary Of The Navy | Multirate digital voice communication processor |
NL8500843A (nl) * | 1985-03-22 | 1986-10-16 | Koninkl Philips Electronics Nv | Multipuls-excitatie lineair-predictieve spraakcoder. |
NL9000338A (nl) * | 1989-06-02 | 1991-01-02 | Koninkl Philips Electronics Nv | Digitaal transmissiesysteem, zender en ontvanger te gebruiken in het transmissiesysteem en registratiedrager verkregen met de zender in de vorm van een optekeninrichting. |
CA2010830C (en) * | 1990-02-23 | 1996-06-25 | Jean-Pierre Adoul | Dynamic codebook for efficient speech coding based on algebraic codes |
US5754976A (en) * | 1990-02-23 | 1998-05-19 | Universite De Sherbrooke | Algebraic codebook with signal-selected pulse amplitude/position combinations for fast coding of speech |
US5701392A (en) | 1990-02-23 | 1997-12-23 | Universite De Sherbrooke | Depth-first algebraic-codebook search for fast coding of speech |
US5455888A (en) * | 1992-12-04 | 1995-10-03 | Northern Telecom Limited | Speech bandwidth extension method and apparatus |
JP3328080B2 (ja) * | 1994-11-22 | 2002-09-24 | 沖電気工業株式会社 | コード励振線形予測復号器 |
BR9611050A (pt) * | 1995-10-20 | 1999-07-06 | America Online Inc | Sistema de compressão de som repetitivo |
US6067517A (en) * | 1996-02-02 | 2000-05-23 | International Business Machines Corporation | Transcription of speech data with segments from acoustically dissimilar environments |
FI964975A (fi) * | 1996-12-12 | 1998-06-13 | Nokia Mobile Phones Ltd | Menetelmä ja laite puheen koodaamiseksi |
US6385576B2 (en) * | 1997-12-24 | 2002-05-07 | Kabushiki Kaisha Toshiba | Speech encoding/decoding method using reduced subframe pulse positions having density related to pitch |
JPH11202900A (ja) | 1998-01-13 | 1999-07-30 | Nec Corp | 音声データ圧縮方法及びそれを適用した音声データ圧縮システム |
JP3475772B2 (ja) | 1998-03-16 | 2003-12-08 | 三菱電機株式会社 | 音声符号化装置および音声復号装置 |
US6480822B2 (en) * | 1998-08-24 | 2002-11-12 | Conexant Systems, Inc. | Low complexity random codebook structure |
US6600741B1 (en) * | 1999-03-25 | 2003-07-29 | Lucent Technologies Inc. | Large combined broadband and narrowband switch |
US6260009B1 (en) * | 1999-02-12 | 2001-07-10 | Qualcomm Incorporated | CELP-based to CELP-based vocoder packet translation |
JP2000305599A (ja) | 1999-04-22 | 2000-11-02 | Sony Corp | 音声合成装置及び方法、電話装置並びにプログラム提供媒体 |
US7315815B1 (en) * | 1999-09-22 | 2008-01-01 | Microsoft Corporation | LPC-harmonic vocoder with superframe structure |
US6581032B1 (en) | 1999-09-22 | 2003-06-17 | Conexant Systems, Inc. | Bitstream protocol for transmission of encoded voice signals |
FI115329B (fi) * | 2000-05-08 | 2005-04-15 | Nokia Corp | Menetelmä ja järjestely lähdesignaalin kaistanleveyden vaihtamiseksi tietoliikenneyhteydessä, jossa on valmiudet useisiin kaistanleveyksiin |
JP2001318698A (ja) | 2000-05-10 | 2001-11-16 | Nec Corp | 音声符号化装置及び音声復号化装置 |
US7136810B2 (en) * | 2000-05-22 | 2006-11-14 | Texas Instruments Incorporated | Wideband speech coding system and method |
FI109393B (fi) * | 2000-07-14 | 2002-07-15 | Nokia Corp | Menetelmä mediavirran enkoodaamiseksi skaalautuvasti, skaalautuva enkooderi ja päätelaite |
US6847929B2 (en) * | 2000-10-12 | 2005-01-25 | Texas Instruments Incorporated | Algebraic codebook system and method |
JP3467469B2 (ja) | 2000-10-31 | 2003-11-17 | Necエレクトロニクス株式会社 | 音声復号装置および音声復号プログラムを記録した記録媒体 |
JP2004513399A (ja) * | 2000-11-09 | 2004-04-30 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | 知覚品質を高める電話スピーチの広帯域拡張 |
CA2327041A1 (en) | 2000-11-22 | 2002-05-22 | Voiceage Corporation | A method for indexing pulse positions and signs in algebraic codebooks for efficient coding of wideband signals |
US7289461B2 (en) * | 2001-03-15 | 2007-10-30 | Qualcomm Incorporated | Communications using wideband terminals |
EP1400139B1 (en) * | 2001-06-26 | 2006-06-07 | Nokia Corporation | Method for transcoding audio signals, network element, wireless communications network and communications system |
JP3957589B2 (ja) | 2001-08-23 | 2007-08-15 | 松下電器産業株式会社 | 音声処理装置 |
US20040243400A1 (en) * | 2001-09-28 | 2004-12-02 | Klinke Stefano Ambrosius | Speech extender and method for estimating a wideband speech signal using a narrowband speech signal |
US6988066B2 (en) * | 2001-10-04 | 2006-01-17 | At&T Corp. | Method of bandwidth extension for narrow-band speech |
EP1374230B1 (en) * | 2001-11-14 | 2006-06-21 | Matsushita Electric Industrial Co., Ltd. | Audio coding and decoding |
US6662154B2 (en) * | 2001-12-12 | 2003-12-09 | Motorola, Inc. | Method and system for information signal coding using combinatorial and huffman codes |
US7409056B2 (en) * | 2002-12-16 | 2008-08-05 | Broadcom Corporation | Switchboard for dual-rate single-band communication system |
US7657427B2 (en) * | 2002-10-11 | 2010-02-02 | Nokia Corporation | Methods and devices for source controlled variable bit-rate wideband speech coding |
KR100711280B1 (ko) * | 2002-10-11 | 2007-04-25 | 노키아 코포레이션 | 소스 제어되는 가변 비트율 광대역 음성 부호화 방법 및장치 |
US20040083090A1 (en) * | 2002-10-17 | 2004-04-29 | Daniel Kiecza | Manager for integrating language technology components |
US7698132B2 (en) * | 2002-12-17 | 2010-04-13 | Qualcomm Incorporated | Sub-sampled excitation waveform codebooks |
US7301902B2 (en) * | 2003-03-03 | 2007-11-27 | Broadcom Corporation | Generic on-chip homing and resident, real-time bit exact tests |
WO2004090870A1 (ja) * | 2003-04-04 | 2004-10-21 | Kabushiki Kaisha Toshiba | 広帯域音声を符号化または復号化するための方法及び装置 |
-
2004
- 2004-04-05 WO PCT/JP2004/004913 patent/WO2004090870A1/ja active Application Filing
-
2005
- 2005-10-03 US US11/240,495 patent/US7788105B2/en not_active Expired - Lifetime
-
2010
- 2010-03-31 US US12/751,421 patent/US8260621B2/en not_active Expired - Lifetime
- 2010-03-31 US US12/751,292 patent/US8160871B2/en not_active Expired - Lifetime
- 2010-03-31 US US12/751,191 patent/US8249866B2/en not_active Expired - Fee Related
-
2012
- 2012-03-12 US US13/417,906 patent/US8315861B2/en not_active Expired - Lifetime
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6143796A (ja) * | 1984-08-08 | 1986-03-03 | カシオ計算機株式会社 | 音声記録装置 |
JPH0537674A (ja) * | 1991-07-29 | 1993-02-12 | Fujitsu Ltd | 電話音声・低速モデム・フアクシミリ信号用符号・復号器 |
JPH07212320A (ja) * | 1994-01-14 | 1995-08-11 | Oki Electric Ind Co Ltd | 音声帯域信号パケット化装置 |
JPH09127994A (ja) * | 1995-10-26 | 1997-05-16 | Sony Corp | 信号符号化方法及び装置 |
JPH09127985A (ja) * | 1995-10-26 | 1997-05-16 | Sony Corp | 信号符号化方法及び装置 |
JP2000181494A (ja) * | 1998-12-11 | 2000-06-30 | Sony Corp | 受信装置及び方法、通信装置及び方法 |
JP2000206995A (ja) * | 1999-01-11 | 2000-07-28 | Sony Corp | 受信装置及び方法、通信装置及び方法 |
JP2001215999A (ja) * | 1999-12-21 | 2001-08-10 | Texas Instr Inc <Ti> | サブバンド音声コーディングシステム |
JP2001337700A (ja) * | 2000-05-22 | 2001-12-07 | Texas Instr Inc <Ti> | 広帯域音声符号化システムおよびその方法 |
Also Published As
Publication number | Publication date |
---|---|
US8315861B2 (en) | 2012-11-20 |
US8160871B2 (en) | 2012-04-17 |
US20120173230A1 (en) | 2012-07-05 |
US20100250263A1 (en) | 2010-09-30 |
US20100250245A1 (en) | 2010-09-30 |
US20100250262A1 (en) | 2010-09-30 |
US7788105B2 (en) | 2010-08-31 |
US8260621B2 (en) | 2012-09-04 |
US8249866B2 (en) | 2012-08-21 |
US20060020450A1 (en) | 2006-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2004090870A1 (ja) | 広帯域音声を符号化または復号化するための方法及び装置 | |
JP5374418B2 (ja) | 音声符号化用適応符号帳ゲインの制御 | |
JP4658596B2 (ja) | 線形予測に基づく音声コーデックにおける効率的なフレーム消失の隠蔽のための方法、及び装置 | |
JP4390803B2 (ja) | 可変ビットレート広帯域通話符号化におけるゲイン量子化方法および装置 | |
JP2006525533A5 (ja) | ||
JP2005513539A (ja) | 音声信号の効率的コーディングのための信号修正方法 | |
US20040111257A1 (en) | Transcoding apparatus and method between CELP-based codecs using bandwidth extension | |
JP4047296B2 (ja) | 音声復号化方法及び音声復号化装置 | |
Jelinek et al. | On the architecture of the cdma2000/spl reg/variable-rate multimode wideband (VMR-WB) speech coding standard | |
JP4438280B2 (ja) | トランスコーダ及び符号変換方法 | |
JP5002642B2 (ja) | 広帯域音声符号化方法及び広帯域音声符号化装置 | |
JP2004309686A (ja) | 広帯域音声符号化方法及び広帯域音声符号化装置 | |
JP5084360B2 (ja) | 音声符号化装置及び音声復号装置 | |
JP2004020676A (ja) | 音声符号化/復号化方法及び音声符号化/復号化装置 | |
JP2005062410A (ja) | 音声信号の符号化方法 | |
JP2004020675A (ja) | 音声符号化/復号化方法及び音声符号化/復号化装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 11240495 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 11240495 Country of ref document: US |
|
122 | Ep: pct application non-entry in european phase |