WO2004097796A1 - 音声符号化装置、音声復号化装置及びこれらの方法 - Google Patents
音声符号化装置、音声復号化装置及びこれらの方法 Download PDFInfo
- Publication number
- WO2004097796A1 WO2004097796A1 PCT/JP2004/006294 JP2004006294W WO2004097796A1 WO 2004097796 A1 WO2004097796 A1 WO 2004097796A1 JP 2004006294 W JP2004006294 W JP 2004006294W WO 2004097796 A1 WO2004097796 A1 WO 2004097796A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- long
- term prediction
- signal
- information
- decoding
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 38
- 230000007774 longterm Effects 0.000 claims abstract description 245
- 239000013598 vector Substances 0.000 claims description 82
- 230000003044 adaptive effect Effects 0.000 claims description 33
- 230000005236 sound signal Effects 0.000 claims description 18
- 230000005540 biological transmission Effects 0.000 claims description 8
- 108010076504 Protein Sorting Signals Proteins 0.000 claims description 4
- 230000002123 temporal effect Effects 0.000 claims 1
- 238000004364 calculation method Methods 0.000 abstract description 11
- 230000005284 excitation Effects 0.000 description 46
- 238000003860 storage Methods 0.000 description 19
- 238000010586 diagram Methods 0.000 description 16
- 238000013139 quantization Methods 0.000 description 16
- 238000004891 communication Methods 0.000 description 14
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 13
- 238000006243 chemical reaction Methods 0.000 description 11
- 230000015572 biosynthetic process Effects 0.000 description 9
- 238000003786 synthesis reaction Methods 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- RDYMFSUJUZBWLH-UHFFFAOYSA-N endosulfan Chemical compound C12COS(=O)OCC2C2(Cl)C(Cl)=C(Cl)C1(Cl)C2(Cl)Cl RDYMFSUJUZBWLH-UHFFFAOYSA-N 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012805 post-processing Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000005311 autocorrelation function Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000003892 spreading Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/24—Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
Definitions
- Speech coding apparatus speech decoding apparatus, and methods thereof
- the present invention relates to a speech encoding device, a speech decoding device, and a method thereof used in a communication system that encodes and transmits a speech / tone signal.
- the coding of voice signals is used in order to make effective use of transmission line capacity such as radio waves and storage media.
- Decoding technology is indispensable, and many speech coding and decoding methods have been developed so far. Among them, the CELP speech coding Z decoding scheme has been put into practical use as the mainstream scheme.
- the CELP speech encoding device encodes input speech based on a speech model stored in advance. Specifically, the digitized audio signal is divided into frames of about 2 Oms, the linear prediction analysis of the audio signal is performed for each frame, the linear prediction coefficient and the linear prediction residual vector are obtained, and the linear prediction coefficient and The linear prediction residual vectors are individually coded.
- the conventional CELP type speech coding / decoding scheme mainly stores models of speech sounds.
- a bucket loss occurs depending on a network condition. Therefore, even if a part of the encoded information is lost, a voice is transmitted from the remaining part of the encoded information. It is desirable to be able to decode musical tones.
- a variable rate communication system in which the bit rate is changed according to the communication capacity, when the communication capacity decreases, a part of the encoded information is It is desirable that it is easy to reduce the load on the communication capacity by transmitting only data.
- scalable coding technology has recently attracted attention as a technology capable of decoding speech and musical sounds using all of the coded information or only a part of the coded information. Conventionally, several scalable encoding schemes have been disclosed.
- the scalable encoding method generally includes a base layer and an enhancement layer, and each layer has a hierarchical structure with the base layer being the lowest layer. Then, in each layer, encoding is performed on a residual signal that is a difference between an input signal and an output signal of a lower layer. With this configuration, it is possible to decode the voice / music signal using only the encoded information of all the layers or the encoded information of the lower layer.
- a CELP-type voice coding / decoding method is used as the coding method of the base layer and the enhancement layer, so that a corresponding amount is required for both the calculation amount and the coding information. Disclosure of the invention
- An object of the present invention is to provide a speech encoding device, a speech decoding device, and a method thereof that can realize scalable encoding with a small amount of calculation and a small amount of encoded information.
- the purpose of this is to improve the quality of the decoded signal by providing an extended layer for long-term prediction and making a long-term prediction of the residual signal in the extended layer using the long-term correlation properties of speech and musical tones. This is achieved by reducing the amount of computation by obtaining the long-term prediction lag using the long-term prediction information of the base layer.
- FIG. 1 is a block diagram showing a configuration of a speech coding apparatus Z speech decoding apparatus according to Embodiment 1 of the present invention
- FIG. 2 is a block diagram showing an internal configuration of the base layer coding section according to the above-described embodiment
- FIG. 3 is a diagram for explaining a process of determining a signal generated from an adaptive excitation codebook by the parameter determining unit ⁇ of the basic layer encoding unit according to the above-described embodiment.
- Block diagram showing the internal configuration of the basic layer decoding unit according to
- FIG. 5 is a block diagram showing an internal configuration of the enhancement layer coding section according to the above embodiment
- FIG. 6 is a block diagram showing the internal configuration of the enhancement layer decoding section according to the above embodiment
- FIG. 7 is a block diagram showing an internal configuration of an extended layer encoding unit according to Embodiment 2 of the present invention.
- FIG. 8 is a block diagram showing the internal configuration of the enhancement layer decoding section according to the above embodiment.
- FIG. 9 is a block diagram showing a configuration of an audio signal transmitting device and an audio signal receiving device according to Embodiment 3 of the present invention.
- Hierarchical speech coding is a speech coding method that encodes the residual signal (difference between the input signal of the lower layer and the decoded signal of the lower layer) by long-term prediction and outputs coded information. This is a method in which multiple layers exist in the upper layer to form a hierarchical structure.
- hierarchical audio decoding The method is a method in which there are a plurality of speech decoding methods for decoding the residual signal in the upper layer, forming a P-layer structure.
- the base layer is a voice 'music coding / decoding method existing in the lowest layer.
- the speech / musical sound coding Z decoding method existing in a layer higher than the base layer is defined as an enhancement layer.
- FIG. 1 is a block diagram showing a configuration of a speech coding apparatus / speech decoding apparatus according to Embodiment 1 of the present invention.
- a speech encoding apparatus 100 includes a base layer encoding section 101, a basic layer decoding section 102, an adding section 103, and an enhancement layer encoding section 104. And a multiplexing unit 105. Also, speech decoding apparatus 150 mainly includes a demultiplexing section 151, a base layer decoding section 152, an enhancement layer decoding section 153, and an adding section 154. Be composed.
- the basic layer coding unit 101 receives a voice signal, encodes the input signal using a CELP type voice coding method, and decodes the basic layer coded information obtained by coding into the basic layer. Output to the multiplexing unit 105 as well as to the multiplexing unit 102.
- Base layer decoding section 102 decodes base layer encoded information using a CELP type speech decoding method, and outputs a base layer decoded signal obtained by decoding to addition section 103. I do. Further, base layer decoding section 102 outputs the pitch lag to enhancement layer coding section 104 as long-term prediction information of the base layer.
- the “long-term prediction information” is information representing a long-term correlation of a voice / sound signal.
- the “pitch lag” is position information specified by the base layer, and will be described later in detail.
- the adder 103 adds the basic layer decoded signal output from the base layer decoder 102 to the input signal after inverting the polarity, and adds the result, and expands the residual signal as the addition result. Output to the extension layer encoding unit 104.
- the extended layer encoding unit 104 calculates the long-term prediction coefficient using the long-term prediction information output from the basic layer decoding unit 102 and the residual signal output from the adding unit 103, and calculates the long-term prediction coefficient.
- the prediction coefficient is encoded, and the extended layer encoded information obtained by the encoding is output to the multiplexing unit 105.
- the multiplexing unit 105 multiplexes the base layer coding information output from the base layer coding unit 101 and the extended layer coding information output from the enhancement layer coding unit 104. And outputs the multiplexed information to the multiplex separator 151 via the transmission line.
- the multiplexing / demultiplexing section 151 separates the multiplexed information transmitted from the audio coder 100 into basic layer encoded information and extended layer encoded information, and separates the basic layer encoded information. Information is output to base layer decoding section 15 2, and the separated enhancement layer encoded information is output to enhancement layer decoding section 15 3.
- the base layer decoding unit 152 decodes the base layer encoded information using a CELP type speech decoding method, and outputs a base layer decoded signal obtained by decoding to the adding unit 154. I do. Further, base layer decoding section 152 outputs pitch lag to enhancement layer decoding section 1553 as long-term prediction information of the base layer. Enhancement layer decoding section 153 uses the long-term prediction information to decode the enhancement layer coded information, and outputs an enhancement layer decoded signal obtained by decoding to addition section 154.
- the addition section 154 adds the base layer decoded signal output from the base layer decoding section 152 to the enhancement layer decoded signal output from the enhancement layer decoding section 153, and performs addition.
- the resulting voice / tone signal is output to a post-processing device.
- the input signal of base layer coding section 101 is input to preprocessing section 200.
- the pre-processing unit 200 performs high-pass filter processing for removing DC components and subsequent encoding processing. It performs waveform shaping and pre-emphasis to improve processing performance, and outputs the processed signal (Xin) to the LPC analyzer 201 and adder 204.
- the LPC analysis unit 201 performs a linear prediction analysis using Xin, and outputs an analysis result (linear prediction coefficient) to the LPC quantization unit 202.
- the LPC quantizer 202 performs a quantization process on the linear prediction coefficient (LPC) output from the LPC analyzer 201, outputs the quantized LPC to the synthesis filter 203, and represents the quantized LPC.
- the code (L) is output to the multiplexing unit 2 13.
- the synthesis filter 203 generates a synthesized signal by performing filter synthesis on a driving sound source output from an adder 210 described later using a filter coefficient based on the quantized LPC, and generates the synthesized signal. Output to adder 204.
- the adder 204 calculates the error signal by inverting the polarity of the synthesized signal and adding the inverted signal to Xin, and outputs the error signal to the auditory weighting unit 211.
- the adaptive excitation codebook 205 stores in the buffer the excitation signal output by the adder 210 in the past, and the past excitation specified by the signal output from the parameter determination unit 212.
- One frame sample is cut out from the sound source signal sample as an adaptive sound source vector and output to the multiplier 208.
- Quantization gain generation section 206 outputs adaptive excitation gain and fixed excitation gain specified by the signal output from parameter determination section 212 to multipliers 208 and 209, respectively.
- the fixed excitation codebook 2 07 multiplies the fixed excitation vector obtained by multiplying the pulse excitation vector having a shape specified by the signal output from the parameter determination unit 2 12 by the spreading vector. To the container 209.
- the multiplier 208 multiplies the quantized adaptive excitation gain output from the quantization gain generator 206 by the adaptive excitation vector output from the adaptive excitation codebook 205 to adder 2 1 Output to 0.
- the multiplier 209 converts the quantized fixed excitation gain output from the quantization gain generation section 206 into the fixed excitation vector output from the fixed excitation codebook 207. And output to adder 210.
- the adder 210 inputs the adaptive sound source vector after the gain multiplication and the fixed sound source vector from the multiplier 208 and the multiplier 209, respectively, and performs vector calculation and calorie calculation on these.
- the resulting excitation is output to synthesis filter 203 and adaptive excitation codebook 205. Note that the driving excitation input to adaptive excitation codebook 205 is stored in a buffer.
- the auditory weighting unit 211 performs auditory weighting on the error signal output from the calo calculator 204, calculates the distortion between 3 ⁇ 4n and the composite signal in the auditory weighting area, and determines the parameter. Output to 2 1 2
- the parameter determination unit 2 12 2 calculates the adaptive excitation vector, the fixed excitation vector, and the quantization gain that minimize the coding distortion output from the auditory weighting unit 2 1 1, respectively, into the adaptive excitation codebook 205, An adaptive excitation vector code (A), an excitation gain code (G), and a fixed excitation vector code (F) indicating the selection result are selected from the fixed excitation codebook 2007 and the quantization gain generation unit 206. Output to multiplexing section 2 1 3.
- the adaptive source vector code (A) is a code corresponding to the pitch lag.
- the multiplexing section 2 13 receives the code (L) representing the quantized LPC from the LPC quantization section 202 and the code (A) representing the adaptive excitation vector and the fixed excitation A code (F) representing a vector and a code (G) representing a quantization gain are input, and these information are multiplexed and output as base layer coding information.
- a buffer 301 is a buffer provided in the adaptive excitation codebook 205
- a position 302 is a cutout position of an adaptive sound source vector
- a vector 303 is a cutout adaptive excitation vector It is.
- the numerical values “4 1” and “2 96” correspond to the lower and upper limits of the range in which the cutout position 302 is moved.
- the range to move the cutout position 302 is set to the code (A) representing the adaptive sound source vector.
- the length can be set to a range of “25 6” (for example, 41 to 2966).
- the range in which the cutout position 302 is driven can be set arbitrarily.
- the parameter determination unit 2 12 moves the cutout position 302 within the set range, and cuts out the adaptive sound source vector 303 by the length of each frame. Then, the parameter determination unit 2 12 obtains the cutout position 3 02 at which the encoding distortion output from the auditory weighting unit 2 11 is minimized.
- the buffer cutout position 302 obtained by the parameter determination unit 212 is the "pitch lag".
- the basic layer coding information input to the base layer decoding unit 102 (152) is converted into individual codes (L, A, G, F) by the demultiplexing unit 401. Separated.
- the separated LPC code (L) is output to LPC decoding section 402, the separated adaptive excitation vector code (A) is output to adaptive excitation codebook 405, and the separated excitation gain Code (G) is output to quantization gain generating section 406, and the separated fixed excitation vector code (F) is output to fixed excitation codebook 407.
- the decoding unit 402 decodes the LPC from the code (L) output from the demultiplexing unit 401 and outputs it to the synthesis filter 403.
- the adaptive excitation codebook 405 extracts one frame sample from the past driving excitation signal sample specified by the code (A) output from the demultiplexing unit 401 as an adaptive sound source vector and multiplies it. Output to container 4 08. Also, adaptive excitation codebook 405 outputs the pitch lag to extended layer encoding section 104 (enhanced layer decoding section 153) as long-term prediction information.
- the quantization gain generation section 406 decodes the adaptive excitation vector gain and the fixed excitation vector gain specified by the excitation gain code (G) output from the demultiplexing section 401 and multipliers 406. 8 and the multiplier 409.
- Fixed excitation codebook 407 generates a fixed excitation vector specified by the code (F) output from multiplexing / demultiplexing section 401, and outputs the generated fixed excitation vector to multiplier 409.
- the multiplier 408 multiplies the adaptive sound source vector by the adaptive sound source vector gain and outputs the result to the adder 410.
- the multiplier 409 multiplies the fixed sound source vector by the fixed sound source vector gain and outputs the result to the adder 410.
- the adder 410 adds the adaptive sound source vector after the gain multiplication output from the multipliers 408 and 409 and the fixed sound source vector to generate a driving sound source vector, and generates the driving sound source vector. To the synthesis filter 403 and the adaptive excitation codebook 405.
- the synthesis filter 4003 performs filter synthesis using the driving sound source vector output from the adder 410 as a driving signal and the filter coefficient decoded by the LPC decoding section 402, and synthesizes.
- the resulting signal is output to the post-processing unit 404.
- the post-processing unit 404 processes the signal output from the synthesis filter 403 to improve the subjective quality of speech, such as formant emphasis and pitch emphasis, and the subjective quality of stationary noise. It performs processing to improve it and outputs it as a base layer decoded signal.
- enhancement layer coding section 104 of FIG. 1 will be described using the block diagram of FIG.
- the enhancement layer encoding unit 104 divides the residual signal by N samples (N is a natural number), and performs encoding for each frame with N samples as one frame.
- the residual signal is represented as e (0) to e (X-l)
- the frame to be encoded is represented as e (n) to e (n + N-l).
- X is the length of the residual signal
- N is equivalent to the length of the frame.
- n is a sample located at the head of each frame, and n corresponds to an integer multiple of N.
- the method of predicting and generating a signal of a certain frame from signals generated in the past is called long-term prediction. Filters that perform long-term prediction are called pitch filters, comb filters, and so on. In FIG.
- a long-term prediction lag instruction section 501 receives long-term prediction information t obtained by the basic layer decoding section 102, obtains a long-term prediction lag T of an enhancement layer based on the information, and stores this in a long-term prediction signal storage. Output to section 502.
- the long-term prediction lag T can be obtained by the following equation (1).
- D is the sampling frequency of the enhancement layer
- d is the sampling frequency of the base layer, and so on. .
- the long-term prediction signal storage unit 502 includes a buffer that stores a long-term prediction signal generated in the past. If the length of the buffer is M, the buffer consists of the long-term predicted signal sequences s (n-M-1) to s (n-1) generated in the past.
- the long-term prediction signal storage unit 502 receives the long-term prediction lag T from the long-term prediction lag instruction unit 501, the long-term prediction signal s is traced back by the long-term prediction lag T from the series of past long-term prediction signals stored in the buffer. (n ⁇ T) to s (n ⁇ T + N ⁇ 1) are cut out and output to the long-term prediction coefficient calculation unit 503 and the long-term prediction signal generation unit 506.
- the long-term predicted signal storage unit 502 receives the long-term predicted signals s (n) to s (n + N-1) from the long-term predicted signal generation unit 506, and updates the buffer according to the following equation (2).
- the long-term prediction signal can be extracted by multiplying the long-term prediction lag ⁇ by an integer until it is longer than the frame length ⁇ .
- the signal can be extracted by repeating the long-term prediction signal s (n-T) to s (n-T + N-1) which is traced back by the long-term prediction lag ⁇ and applying the frame length N.
- the long-term prediction coefficient calculation unit 503 calculates the residual signals e (n) to e (n + N—1) and the length S (nT) to s (nT + N-1), and using these, a long-term prediction coefficient] 3 is calculated by the following equation (3), Output to 504. Equation (3)
- the long-term prediction coefficient encoding unit 504 encodes the long-term prediction coefficient ⁇ , outputs enhancement layer coded information obtained by encoding to the long-term prediction coefficient decoding unit 505, and performs enhancement layer decoding via a transmission path. Output to the conversion unit 153.
- a coding method of the long-term prediction coefficient j8 a method of performing scalar quantization or the like is known.
- Long-term prediction coefficient decoding section 505 decodes the enhancement layer encoded information, and outputs the decoded long-term prediction coefficient q obtained by this to long-term prediction signal generation section 506.
- the long-term prediction signal generation unit 506 receives the decoded long-term prediction coefficient] 3q and the long-term prediction signals s (nT) to s (nT + N-1), and uses them to calculate the long-term prediction
- the prediction signals s (n) to s (n + N-1) are calculated and output to the long-term prediction signal storage unit 502.
- enhancement layer decoding section 153 in FIG. 1 will be described using the block diagram in FIG.
- long-term prediction lag instructing section 601 obtains long-term prediction lag T of the enhancement layer using long-term prediction information output from base layer decoding section 152, and outputs this to long-term prediction signal storage section 602. .
- the long-term prediction signal storage unit 602 includes a buffer that stores a long-term prediction signal generated in the past. If the length of the buffer is M, the buffer was created in the past. S (nM-1) to s (n-1).
- the long-term prediction signal storage unit 602 receives the long-term prediction lag T from the long-term prediction lag instruction unit 601
- the long-term prediction signal s is traced back from the series of past long-term prediction signals stored in the buffer by the long-term prediction lag T.
- (nT) to s (nT + N ⁇ 1) are cut out and output to the long-term prediction signal generation unit 604.
- the long-term predicted signal storage section 602 receives the long-term predicted signals s (n) to s (n + N-1) from the long-term predicted signal generation section 604 and updates the buffer according to the above equation (2).
- Long-term prediction coefficient decoding section 603 decodes the enhancement layer coded information, and outputs a decoded long-term prediction coefficient ⁇ q obtained by the decoding to long-term prediction signal generation section 604.
- the long-term prediction signal generation unit 604 receives the decoded long-term prediction coefficient] 3q and the long-term prediction signals s (nT) to s (nT + N-1), and uses them to calculate the long-term prediction
- the prediction signals s (n) to s (n + N-1) are calculated and output to the long-term prediction signal storage unit 602 and the addition unit 153 as enhancement layer decoded signals.
- the above is the description of the internal configuration of enhancement layer decoding section 153 in FIG.
- the extended layer for long-term prediction is provided, and the residual signal is subjected to long-term prediction in the extended layer by using the long-term correlation properties of speech and musical tones. It is possible to effectively encode / decode voice / musical sound signals and reduce the amount of calculation.
- the coding information can be reduced by obtaining the long-term prediction lag using the long-term prediction information of the base layer instead of encoding and decoding the long-term prediction lag.
- the base layer coding information by decoding the base layer coding information, only the decoded signal of the base layer can be obtained.
- the CELP type speech coding / decoding method a part of the coding information is obtained. It is possible to realize a function (scalable encoding) that can decode voices' musical tones.
- the long-term correlation of voice A frame with the highest correlation with the frame is cut out from the buffer, and the signal of the current frame is expressed using the cut-out frame signal.
- a method of cutting out the frame with the highest correlation with the current frame from the buffer In the case where there is no information indicating the long-term correlation such as pitch lag, etc., of speech and musical sounds, the autocorrelation function between the extracted frame and the current frame is changed while changing the extraction position when extracting the frame from the buffer. Therefore, it is necessary to search for the frame having the highest correlation, and the amount of calculation required for the search becomes very large, but the extraction position is determined using the pitch lag obtained by the base layer coding unit 101. By uniquely defining it, it is possible to significantly reduce the amount of calculation required when performing normal long-term prediction.
- the long-term prediction information output from the base layer decoding unit is a pitch lag.
- the present invention is not limited to this. Any information that has a long-term correlation can be used as long-term prediction information.
- the long-term prediction signal storage unit 502 cuts out the long-term prediction signal from the buffer is described as the long-term prediction lag ⁇ . Is a minute number and can be set arbitrarily).
- the present invention can be applied to the case where even a small error occurs in the long-term prediction lag ⁇ , the same operation and effect as in the present embodiment can be obtained. be able to.
- the long-term prediction signal storage unit 502 receives the long-term prediction lag ⁇ ⁇ from the long-term prediction lag instructing unit 501, and obtains the long-term prediction signal s by going back by ⁇ + ⁇ from the past long-term prediction signal sequence stored in the buffer.
- the long-term prediction signal storage unit 602 decodes the code ⁇ information of ⁇ to obtain ⁇ , and also uses the long-term prediction lag ⁇ to calculate the long-term prediction signal s ( ⁇ - ⁇ - ⁇ ) Cut out s ( ⁇ - ⁇ - ⁇ + ⁇ -1). ... Equation (5)
- the long-term prediction coefficient calculation unit 503 stores the long-term prediction signals s (n—T) to s (nT + N- 1) is newly provided with a function of converting from the time domain into the frequency domain and a function of converting the residual signal into a frequency parameter
- the long-term predicted signal generation unit 506 has the long-term predicted signal s (n) to s (n + N -A function to reversely convert (1) from the frequency domain to the time domain is newly provided.
- the long-term predicted signal generation unit 604 is newly provided with a function of inversely transforming the long-term predicted signals s (n) to s (n + N-1) from the frequency domain to the time domain.
- the bits of the redundant bits to be allocated to the coded information (A) output from the base layer coding unit 101 and the coded information (B) output from the enhancement layer coding unit 104 are The weight distribution can be assigned to the encoded information (A).
- Embodiment 2 describes a case where encoding / decoding of a difference between a residual signal and a long-term prediction signal (long-term prediction residual signal) is performed.
- Speech coding apparatus has the same configuration as that in FIG. 1, and differs only in the internal configuration of enhancement layer coding section 104 and enhancement layer decoding section 153.
- FIG. 7 is a block diagram showing an internal configuration of enhancement layer coding section 104 according to the present embodiment. Note that, in FIG. 7, the same components as those in FIG. 5 are denoted by the same reference numerals as in FIG. 5, and description thereof will be omitted.
- the extended layer coding unit 104 in FIG. 7 is different from the adding unit 701, the long-term prediction residual signal coding unit 702, the coded information multiplexing unit 703, and the long-term prediction residual signal decoding unit.
- a configuration in which an adder 704 and an adder 705 are added is adopted.
- the long-term prediction signal generation section 506 outputs the calculated long-term prediction signals s (n) to s (n + N-1) to the addition sections 701 and 705.
- the adder 701 inverts the polarity of the long-term prediction signals s (n) to s (n + N-1) to obtain the residual signals e (n) to e (n + N ⁇ 1), and outputs the long-term prediction residual signal p (n) to p (n + N ⁇ 1) as a result of the addition to the long-term prediction residual signal encoding unit 702.
- the long-term prediction residual signal encoding unit 702 encodes the long-term prediction residual signals ⁇ ( ⁇ ) to ⁇ ( ⁇ + N-l), and encodes information obtained by encoding (hereinafter, “long-term prediction Is output to the coded information multiplexing unit 703 and the long-term prediction residual signal decoding unit 704.
- the quantization of the long-term prediction residual signal is vector quantization.
- the encoding method of (n) to p (n + N-1) will be described using an example in which vector quantization is performed with 8 bits.
- a code book in which 256 kinds of code vectors created in advance are stored is prepared inside the long-term prediction residual signal encoding unit 702.
- This code vector CODE (k) (0) —CODE (k) (N- l) is an N-length vector.
- K is the index of the code vector, and takes a value from 0 to 255.
- the long-term prediction residual signal encoding unit 702 calculates the long-term prediction residual signal p (n) ⁇ ! Calculate the square error er between (n + N-1) and the code vector CODE (k) (0) -CODE (k) (N-1). '"Expression (7)
- the long-term prediction residual signal encoding unit 702 determines the value of k that minimizes the square error er as long-term prediction residual encoding information.
- the coded information multiplexing section 703 multiplexes the extended layer coded information input from the long-term prediction coefficient coding section 504 and the long-term predicted residual coded information input from the long-term prediction residual signal coding section 702. Then, the multiplexed information is output to the extended layer decoding section 153 via the transmission path.
- Long-term prediction residual signal decoding section 704 decodes long-term prediction residual coded information, and decodes long-term prediction residual signal P q (n) :: pq (n + N-1) to the adder 705.
- the adder 705 includes the long-term prediction signals s (n) to s (n + N-1) input from the long-term prediction signal generation unit 506 and the decoded long-term prediction residual input from the long-term prediction residual signal decoding unit 704.
- Signal P q (n) ⁇ ! Adds q (n + N-1) and outputs the addition result to the long-term prediction signal storage unit 502.
- the long-term prediction signal storage unit 502 updates the buffer according to the following equation (8).
- enhancement layer decoding section 153 Next, the internal configuration of enhancement layer decoding section 153 according to the present embodiment will be described using the block diagram of FIG. In FIG. 8, the same components as those in FIG. 6 are denoted by the same reference numerals as those in FIG. 6, and description thereof will be omitted.
- the enhancement layer decoding section 153 in FIG. 8 has a configuration in which an encoded information separation section 8001, a long-term prediction residual signal decoding section 802, and an addition section 803 are added, as compared with FIG.
- the coded information separating section 801 separates the multiplexed coded information received from the transmission line into enhanced layer coded information and long-term prediction residual coded information, and Output to prediction coefficient decoding section 603, and outputs long-term prediction residual coding information to long-term prediction residual signal decoding section 802.
- the long-term prediction residual signal decoding unit 802 decodes the long-term prediction residual coded information to obtain a decoded long-term prediction residual signal Pq (n) to pq (n + N—1). Output to adder 803.
- the adding section 803 includes the long-term prediction signals s (n) to s (n + N ⁇ 1) input from the long-term prediction signal generation section 604 and the decoded long-term prediction residual input from the long-term prediction residual signal decoding section 802. Signal pq (n) ⁇ ! ) Q (n + N— 1), and outputs the addition result to the long-term prediction signal storage section 602, and outputs the addition result as an enhancement layer decoding signal.
- enhancement layer decoding section 153 The above is the description of the internal configuration of enhancement layer decoding section 153 according to the present embodiment.
- the present invention is not limited to an encoding method, and includes, for example, shape-gain VQ, division VQ, transform Encoding may be performed by VQ or multi-stage VQ.
- shape-gain VQ 13 bits and a shape of 8 bits and a gain of 5 bits.
- two types of codebooks a shape codebook and a gain codebook, are prepared.
- the shape code book consists of 256 shape code vectors, and the shape code vectors SCODE (k 1) (0) to SCODE (k 1) (N-1) are vectors of length N.
- k 1 is the index of the shape code vector and takes a value from 0 to 255.
- the gain code puck is composed of 32 types of gain codes, and the gain code GCODE (k 2) takes a scalar value.
- k 2 is an index of the gain code, and takes a value from 0 to 31.
- the long-term prediction residual signal encoding unit 702 calculates the gain gain and the shape solid of the long-term prediction residual signal p (n) to p (n + M-1) by the following equation (9). (N-1), and the gain error gainer between the gain gain and the gain code GCODE (k 2) and the shape vector
- shapeer ⁇ (shape (i) one SCODE (i)) 2
- the long-term prediction residual signal encoding unit 702 obtains a value of k2 that minimizes the gain error gainer and a value of k1 that minimizes the square error shapper, and calculates these values. It is long-term prediction residual coding information.
- the first divided codepook consists of 16 types of first divided code vectors SPCODE (k3) (0) to SPCODE (k3) (N / 2-1), and the second divided codepook SPCODE (k4) ( 0) to SPCODE (k 4) (N / 2-1) consist of 16 kinds of second divided code vector force, and each code vector is a vector of length N / 2.
- k 3 is the index of the first divided code vector, and takes a value from 0 to 15.
- K4 is the index of the second divided code vector and takes a value from 0 to 15.
- the long-term prediction residual signal encoding unit 702 calculates the long-term prediction residual signal by the following equation (11)!) (N) to p (n + N-1) is divided into the first split vector spl (0) to spl (N / 2-1) and the second split vector sp 2 (0) to sp 2 (N / 2-1).
- the long-term prediction residual signal encoding unit 702 calculates the value of k 3 that minimizes the square error spliterl and k 4 that minimizes the square error spliter2. And these values are used as long-term prediction residual coding information.
- a conversion codebook consisting of 256 conversion code vectors is prepared, and the conversion code vectors TCODE (k5) (0) to TCODE (k5) (N / 2-1) are vectors of length N. It is.
- k 5 is an index of the transform code vector, and takes a value from 0 to 255.
- the long-term prediction residual signal coding unit 702 performs a discrete Fourier transform on the long-term prediction residual signal p (n) ⁇ : p (n + N—1) according to the following equation (13), and performs a transform vector tp (0) To tp (N— 1) and the conversion vector tp (0) to tp (N—1) and the conversion code vector TCODE (k 5) (0) -TCODE (k 5) (N / 2— Find the square error transer with 1). (H --- N-1)... Equation (13)
- the long-term prediction residual signal encoding unit 702 obtains the value of k5 that minimizes the square error transer, and calculates this value. Is long-term prediction residual coding information.
- the first-stage codepook consists of 32 types of first-stage codevectors PHC0DE1 (k 6) (0) to PHCODE 1 (k6) (N — 1), and the second-stage codepook has 256 types of second-stage code vectors PHCODE 2 ( k7) (0) ⁇ ; PHCODE2 (k7) (N-1), where each code vector is a vector of length N.
- k 6 is the index of the first-stage code vector, and takes a value from 0 to 31.
- K7 is the index of the second-stage code vector, and takes a value from 0 to 255.
- the long-term prediction residual signal encoding unit 702 calculates the long-term prediction residual signal p (n) ⁇ ! ) Find the square error phaser 1 between (n + N- 1) and the first-stage code vector PHCODE 1 (k 6) (0) to PHC0DE 1 (k 6) (N— 1). Find the value of k6, and call this value kmax.
- the long-term prediction residual signal encoding unit 702 calculates the error vector ep (0) ⁇ ep ( N-1), and the error vectors ep (0) to ep (N-1) and the second-stage code vector PHCODE 2 (k 7) (0) to PHC0DE2 (k 7) Find the square error phaser 2 with (N-1), find the value of k7 that minimizes the square error phaseer 2, and use this value and k max as long-term prediction residual encoding information.
- FIG. 9 is a block diagram showing a configuration of an audio signal transmitting device and an audio signal receiving device including the audio encoding device and the audio decoding device described in Embodiment 12 above.
- an audio signal 901 is converted into an electric signal by an input device 902 and output to an AZD conversion device 903.
- the AZD conversion device 903 converts the (analog) signal output from the input device 902 into a digital signal, and outputs the signal to the speech coding device 904.
- the speech encoding device 904 implements the speech encoding device 100 shown in FIG. 1, encodes the digital speech signal output from the AZD conversion device 903, and encodes the encoded information into the RF modulation device 900. 5 Output.
- the RF modulator 905 converts the speech coded information output from the speech coder 904 into a signal to be transmitted on a propagation medium such as radio waves and outputs the signal to the transmission antenna 906.
- the transmitting antenna 906 transmits the output signal output from the RF modulator 905 as a radio wave (RF signal).
- the RF signal 907 in the figure represents a radio wave (RF signal) transmitted from the transmitting antenna 906.
- the above is the configuration and operation of the audio signal transmitting device.
- the RF signal 908 is received by the receiving antenna 909 and output to the RF demodulator 910. Note that the RF signal 908 in the figure represents a radio wave received by the receiving antenna 909, and becomes exactly the same as the RF signal 907 unless signal attenuation or noise superposition occurs in the propagation path.
- the RF demodulator 910 demodulates the voice coded information from the RF signal output from the receiving antenna 909, and outputs the coded information to the voice coder 911.
- the audio decoding device 911 implements the audio decoding device 150 shown in FIG. 1, decodes the audio signal from the audio encoded information output from the RF demodulation device 910, and performs DZA conversion. 1 2 Output.
- the D / A conversion device 9 1 2 is the digital output from the audio decoding device 9 1 1
- the audio signal is converted into an analog electric signal and output to the output device 9 13.
- the output device 913 converts the electrical signal into air vibration and outputs it as sound waves so that it can be heard by the human ear.
- reference numeral 914 represents the output sound wave. The above is the configuration and operation of the audio signal receiving device.
- the present invention it is possible to effectively encode z-decode a speech / tone signal having a wide frequency band with a small amount of encoded information, and to reduce the amount of computation. it can.
- the encoded information can be reduced.
- decoding the basic layer encoded information it is possible to obtain only the decoded signal of the basic layer.
- even a part of the encoded information can be used to produce a speech or a musical tone.
- a function that can be decoded (scalable coding) can be realized.
- the present invention is suitable for use in a speech encoding device and a speech decoding device used in a communication system that encodes and transmits speech / musical tone signals.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/554,619 US7299174B2 (en) | 2003-04-30 | 2004-04-30 | Speech coding apparatus including enhancement layer performing long term prediction |
EP04730659A EP1619664B1 (en) | 2003-04-30 | 2004-04-30 | Speech coding apparatus, speech decoding apparatus and methods thereof |
CA2524243A CA2524243C (en) | 2003-04-30 | 2004-04-30 | Speech coding apparatus including enhancement layer performing long term prediction |
US11/872,359 US7729905B2 (en) | 2003-04-30 | 2007-10-15 | Speech coding apparatus and speech decoding apparatus each having a scalable configuration |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2003125665 | 2003-04-30 | ||
JP2003-125665 | 2003-04-30 |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/554,619 A-371-Of-International US7299174B2 (en) | 2003-04-30 | 2004-04-30 | Speech coding apparatus including enhancement layer performing long term prediction |
US11/872,359 Continuation US7729905B2 (en) | 2003-04-30 | 2007-10-15 | Speech coding apparatus and speech decoding apparatus each having a scalable configuration |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2004097796A1 true WO2004097796A1 (ja) | 2004-11-11 |
Family
ID=33410232
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2004/006294 WO2004097796A1 (ja) | 2003-04-30 | 2004-04-30 | 音声符号化装置、音声復号化装置及びこれらの方法 |
Country Status (6)
Country | Link |
---|---|
US (2) | US7299174B2 (ja) |
EP (1) | EP1619664B1 (ja) |
KR (1) | KR101000345B1 (ja) |
CN (2) | CN100583241C (ja) |
CA (1) | CA2524243C (ja) |
WO (1) | WO2004097796A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007043811A1 (en) * | 2005-10-12 | 2007-04-19 | Samsung Electronics Co., Ltd. | Method and apparatus for encoding/decoding audio data and extension data |
Families Citing this family (55)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1496500B1 (en) * | 2003-07-09 | 2007-02-28 | Samsung Electronics Co., Ltd. | Bitrate scalable speech coding and decoding apparatus and method |
EP1688917A1 (en) * | 2003-12-26 | 2006-08-09 | Matsushita Electric Industries Co. Ltd. | Voice/musical sound encoding device and voice/musical sound encoding method |
JP4733939B2 (ja) * | 2004-01-08 | 2011-07-27 | パナソニック株式会社 | 信号復号化装置及び信号復号化方法 |
US7701886B2 (en) * | 2004-05-28 | 2010-04-20 | Alcatel-Lucent Usa Inc. | Packet loss concealment based on statistical n-gram predictive models for use in voice-over-IP speech transmission |
JP4771674B2 (ja) * | 2004-09-02 | 2011-09-14 | パナソニック株式会社 | 音声符号化装置、音声復号化装置及びこれらの方法 |
US7783480B2 (en) * | 2004-09-17 | 2010-08-24 | Panasonic Corporation | Audio encoding apparatus, audio decoding apparatus, communication apparatus and audio encoding method |
BRPI0516201A (pt) * | 2004-09-28 | 2008-08-26 | Matsushita Electric Ind Co Ltd | aparelho de codificação escalonável e método de codificação escalonável |
EP1881488B1 (en) * | 2005-05-11 | 2010-11-10 | Panasonic Corporation | Encoder, decoder, and their methods |
KR100754389B1 (ko) * | 2005-09-29 | 2007-08-31 | 삼성전자주식회사 | 음성 및 오디오 신호 부호화 장치 및 방법 |
US8069035B2 (en) * | 2005-10-14 | 2011-11-29 | Panasonic Corporation | Scalable encoding apparatus, scalable decoding apparatus, and methods of them |
EP1991986B1 (en) * | 2006-03-07 | 2019-07-31 | Telefonaktiebolaget LM Ericsson (publ) | Methods and arrangements for audio coding |
JP5058152B2 (ja) * | 2006-03-10 | 2012-10-24 | パナソニック株式会社 | 符号化装置および符号化方法 |
US20090276210A1 (en) * | 2006-03-31 | 2009-11-05 | Panasonic Corporation | Stereo audio encoding apparatus, stereo audio decoding apparatus, and method thereof |
US20090164211A1 (en) * | 2006-05-10 | 2009-06-25 | Panasonic Corporation | Speech encoding apparatus and speech encoding method |
EP2040251B1 (en) | 2006-07-12 | 2019-10-09 | III Holdings 12, LLC | Audio decoding device and audio encoding device |
US7461106B2 (en) | 2006-09-12 | 2008-12-02 | Motorola, Inc. | Apparatus and method for low complexity combinatorial coding of signals |
US20100010810A1 (en) * | 2006-12-13 | 2010-01-14 | Panasonic Corporation | Post filter and filtering method |
CN101206860A (zh) * | 2006-12-20 | 2008-06-25 | 华为技术有限公司 | 一种可分层音频编解码方法及装置 |
CN101246688B (zh) | 2007-02-14 | 2011-01-12 | 华为技术有限公司 | 一种对背景噪声信号进行编解码的方法、系统和装置 |
JP4871894B2 (ja) * | 2007-03-02 | 2012-02-08 | パナソニック株式会社 | 符号化装置、復号装置、符号化方法および復号方法 |
WO2008120438A1 (ja) * | 2007-03-02 | 2008-10-09 | Panasonic Corporation | ポストフィルタ、復号装置およびポストフィルタ処理方法 |
US8160872B2 (en) * | 2007-04-05 | 2012-04-17 | Texas Instruments Incorporated | Method and apparatus for layered code-excited linear prediction speech utilizing linear prediction excitation corresponding to optimal gains |
WO2008151755A1 (en) * | 2007-06-11 | 2008-12-18 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio encoder for encoding an audio signal having an impulse- like portion and stationary portion, encoding methods, decoder, decoding method; and encoded audio signal |
CN101075436B (zh) * | 2007-06-26 | 2011-07-13 | 北京中星微电子有限公司 | 带补偿的音频编、解码方法及装置 |
US8576096B2 (en) | 2007-10-11 | 2013-11-05 | Motorola Mobility Llc | Apparatus and method for low complexity combinatorial coding of signals |
US8527265B2 (en) * | 2007-10-22 | 2013-09-03 | Qualcomm Incorporated | Low-complexity encoding/decoding of quantized MDCT spectrum in scalable speech and audio codecs |
US8209190B2 (en) | 2007-10-25 | 2012-06-26 | Motorola Mobility, Inc. | Method and apparatus for generating an enhancement layer within an audio coding system |
EP2224432B1 (en) * | 2007-12-21 | 2017-03-15 | Panasonic Intellectual Property Corporation of America | Encoder, decoder, and encoding method |
US7889103B2 (en) | 2008-03-13 | 2011-02-15 | Motorola Mobility, Inc. | Method and apparatus for low complexity combinatorial coding of signals |
US8639519B2 (en) | 2008-04-09 | 2014-01-28 | Motorola Mobility Llc | Method and apparatus for selective signal coding based on core encoder performance |
US8249142B2 (en) * | 2008-04-24 | 2012-08-21 | Motorola Mobility Llc | Method and apparatus for encoding and decoding video using redundant encoding and decoding techniques |
KR20090122143A (ko) * | 2008-05-23 | 2009-11-26 | 엘지전자 주식회사 | 오디오 신호 처리 방법 및 장치 |
FR2938688A1 (fr) * | 2008-11-18 | 2010-05-21 | France Telecom | Codage avec mise en forme du bruit dans un codeur hierarchique |
US8219408B2 (en) | 2008-12-29 | 2012-07-10 | Motorola Mobility, Inc. | Audio signal decoder and method for producing a scaled reconstructed audio signal |
US8140342B2 (en) | 2008-12-29 | 2012-03-20 | Motorola Mobility, Inc. | Selective scaling mask computation based on peak detection |
US8200496B2 (en) | 2008-12-29 | 2012-06-12 | Motorola Mobility, Inc. | Audio signal decoder and method for producing a scaled reconstructed audio signal |
US8175888B2 (en) | 2008-12-29 | 2012-05-08 | Motorola Mobility, Inc. | Enhanced layered gain factor balancing within a multiple-channel audio coding system |
CN101771417B (zh) * | 2008-12-30 | 2012-04-18 | 华为技术有限公司 | 信号编码、解码方法及装置、系统 |
JPWO2010103854A1 (ja) * | 2009-03-13 | 2012-09-13 | パナソニック株式会社 | 音声符号化装置、音声復号装置、音声符号化方法及び音声復号方法 |
EP2348504B1 (en) | 2009-03-27 | 2014-01-08 | Huawei Technologies Co., Ltd. | Encoding and decoding method and device |
JP5269195B2 (ja) * | 2009-05-29 | 2013-08-21 | 日本電信電話株式会社 | 符号化装置、復号装置、符号化方法、復号方法及びそのプログラム |
CN102081927B (zh) * | 2009-11-27 | 2012-07-18 | 中兴通讯股份有限公司 | 一种可分层音频编码、解码方法及系统 |
US8442837B2 (en) | 2009-12-31 | 2013-05-14 | Motorola Mobility Llc | Embedded speech and audio coding using a switchable model core |
US8423355B2 (en) | 2010-03-05 | 2013-04-16 | Motorola Mobility Llc | Encoder for audio signal including generic audio and speech frames |
US8428936B2 (en) | 2010-03-05 | 2013-04-23 | Motorola Mobility Llc | Decoder for audio signal including generic audio and speech frames |
US9767822B2 (en) | 2011-02-07 | 2017-09-19 | Qualcomm Incorporated | Devices for encoding and decoding a watermarked signal |
US9767823B2 (en) | 2011-02-07 | 2017-09-19 | Qualcomm Incorporated | Devices for encoding and detecting a watermarked signal |
NO2669468T3 (ja) * | 2011-05-11 | 2018-06-02 | ||
CN103124346B (zh) * | 2011-11-18 | 2016-01-20 | 北京大学 | 一种残差预测的确定方法及系统 |
US9947331B2 (en) * | 2012-05-23 | 2018-04-17 | Nippon Telegraph And Telephone Corporation | Encoding method, decoding method, encoder, decoder, program and recording medium |
US9129600B2 (en) | 2012-09-26 | 2015-09-08 | Google Technology Holdings LLC | Method and apparatus for encoding an audio signal |
KR101632599B1 (ko) | 2013-04-05 | 2016-06-22 | 돌비 레버러토리즈 라이쎈싱 코오포레이션 | 향상된 스펙트럼 확장을 사용하여 양자화 잡음을 감소시키기 위한 압신 장치 및 방법 |
CN109712633B (zh) * | 2013-04-05 | 2023-07-07 | 杜比国际公司 | 音频编码器和解码器 |
MY187944A (en) | 2013-10-18 | 2021-10-30 | Fraunhofer Ges Forschung | Concept for encoding an audio signal and decoding an audio signal using deterministic and noise like information |
CN105745705B (zh) | 2013-10-18 | 2020-03-20 | 弗朗霍夫应用科学研究促进协会 | 编码和解码音频信号的编码器、解码器及相关方法 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0573099A (ja) * | 1991-09-17 | 1993-03-26 | Oki Electric Ind Co Ltd | コード励振線形予測符号化方式 |
JPH05249999A (ja) * | 1991-10-21 | 1993-09-28 | Toshiba Corp | 学習型音声符号化装置 |
JPH06102900A (ja) * | 1992-09-18 | 1994-04-15 | Fujitsu Ltd | 音声符号化方式および音声復号化方式 |
JPH0854900A (ja) * | 1994-08-09 | 1996-02-27 | Yamaha Corp | ベクトル量子化による符号化復号化方式 |
JPH08147000A (ja) * | 1994-11-18 | 1996-06-07 | Yamaha Corp | ベクトル量子化による符号化復号方式 |
JPH08211895A (ja) * | 1994-11-21 | 1996-08-20 | Rockwell Internatl Corp | ピッチラグを評価するためのシステムおよび方法、ならびに音声符号化装置および方法 |
JPH08328595A (ja) * | 1995-05-30 | 1996-12-13 | Sanyo Electric Co Ltd | 音声符号化装置 |
JPH10177399A (ja) * | 1996-10-18 | 1998-06-30 | Mitsubishi Electric Corp | 音声符号化方法、音声復号化方法及び音声符号化復号化方法 |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US197833A (en) * | 1877-12-04 | Improvement in sound-deadening cases for type-writers | ||
US171771A (en) * | 1876-01-04 | Improvement in corn-planters | ||
JPS62234435A (ja) * | 1986-04-04 | 1987-10-14 | Kokusai Denshin Denwa Co Ltd <Kdd> | 符号化音声の復号化方式 |
EP0331858B1 (en) * | 1988-03-08 | 1993-08-25 | International Business Machines Corporation | Multi-rate voice encoding method and device |
US5671327A (en) * | 1991-10-21 | 1997-09-23 | Kabushiki Kaisha Toshiba | Speech encoding apparatus utilizing stored code data |
US5797118A (en) * | 1994-08-09 | 1998-08-18 | Yamaha Corporation | Learning vector quantization and a temporary memory such that the codebook contents are renewed when a first speaker returns |
US5781880A (en) * | 1994-11-21 | 1998-07-14 | Rockwell International Corporation | Pitch lag estimation using frequency-domain lowpass filtering of the linear predictive coding (LPC) residual |
US5864797A (en) | 1995-05-30 | 1999-01-26 | Sanyo Electric Co., Ltd. | Pitch-synchronous speech coding by applying multiple analysis to select and align a plurality of types of code vectors |
US5751901A (en) * | 1996-07-31 | 1998-05-12 | Qualcomm Incorporated | Method for searching an excitation codebook in a code excited linear prediction (CELP) coder |
JP3134817B2 (ja) * | 1997-07-11 | 2001-02-13 | 日本電気株式会社 | 音声符号化復号装置 |
KR100335611B1 (ko) * | 1997-11-20 | 2002-10-09 | 삼성전자 주식회사 | 비트율 조절이 가능한 스테레오 오디오 부호화/복호화 방법 및 장치 |
EP1132892B1 (en) | 1999-08-23 | 2011-07-27 | Panasonic Corporation | Speech encoding and decoding system |
US6604070B1 (en) * | 1999-09-22 | 2003-08-05 | Conexant Systems, Inc. | System of encoding and decoding speech signals |
US7020605B2 (en) * | 2000-09-15 | 2006-03-28 | Mindspeed Technologies, Inc. | Speech coding system with time-domain noise attenuation |
US6856961B2 (en) * | 2001-02-13 | 2005-02-15 | Mindspeed Technologies, Inc. | Speech coding system with input signal transformation |
CN1272911C (zh) * | 2001-07-13 | 2006-08-30 | 松下电器产业株式会社 | 音频信号解码装置及音频信号编码装置 |
FR2840070B1 (fr) * | 2002-05-23 | 2005-02-11 | Cie Ind De Filtration Et D Equ | Procede et dispositif permettant d'effectuer une detection securisee de la pollution de l'eau |
-
2004
- 2004-04-30 US US10/554,619 patent/US7299174B2/en not_active Expired - Lifetime
- 2004-04-30 CN CN200480014149A patent/CN100583241C/zh not_active Expired - Fee Related
- 2004-04-30 EP EP04730659A patent/EP1619664B1/en not_active Expired - Lifetime
- 2004-04-30 CN CN2009101575912A patent/CN101615396B/zh not_active Expired - Fee Related
- 2004-04-30 CA CA2524243A patent/CA2524243C/en not_active Expired - Fee Related
- 2004-04-30 KR KR1020057020680A patent/KR101000345B1/ko active IP Right Grant
- 2004-04-30 WO PCT/JP2004/006294 patent/WO2004097796A1/ja active Application Filing
-
2007
- 2007-10-15 US US11/872,359 patent/US7729905B2/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0573099A (ja) * | 1991-09-17 | 1993-03-26 | Oki Electric Ind Co Ltd | コード励振線形予測符号化方式 |
JPH05249999A (ja) * | 1991-10-21 | 1993-09-28 | Toshiba Corp | 学習型音声符号化装置 |
JPH06102900A (ja) * | 1992-09-18 | 1994-04-15 | Fujitsu Ltd | 音声符号化方式および音声復号化方式 |
JPH0854900A (ja) * | 1994-08-09 | 1996-02-27 | Yamaha Corp | ベクトル量子化による符号化復号化方式 |
JPH08147000A (ja) * | 1994-11-18 | 1996-06-07 | Yamaha Corp | ベクトル量子化による符号化復号方式 |
JPH08211895A (ja) * | 1994-11-21 | 1996-08-20 | Rockwell Internatl Corp | ピッチラグを評価するためのシステムおよび方法、ならびに音声符号化装置および方法 |
JPH08328595A (ja) * | 1995-05-30 | 1996-12-13 | Sanyo Electric Co Ltd | 音声符号化装置 |
JPH10177399A (ja) * | 1996-10-18 | 1998-06-30 | Mitsubishi Electric Corp | 音声符号化方法、音声復号化方法及び音声符号化復号化方法 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007043811A1 (en) * | 2005-10-12 | 2007-04-19 | Samsung Electronics Co., Ltd. | Method and apparatus for encoding/decoding audio data and extension data |
KR100851972B1 (ko) * | 2005-10-12 | 2008-08-12 | 삼성전자주식회사 | 오디오 데이터 및 확장 데이터 부호화/복호화 방법 및 장치 |
US8055500B2 (en) | 2005-10-12 | 2011-11-08 | Samsung Electronics Co., Ltd. | Method, medium, and apparatus encoding/decoding audio data with extension data |
Also Published As
Publication number | Publication date |
---|---|
US7299174B2 (en) | 2007-11-20 |
CA2524243C (en) | 2013-02-19 |
EP1619664A4 (en) | 2010-07-07 |
CN1795495A (zh) | 2006-06-28 |
EP1619664A1 (en) | 2006-01-25 |
CN101615396A (zh) | 2009-12-30 |
KR20060022236A (ko) | 2006-03-09 |
KR101000345B1 (ko) | 2010-12-13 |
US20080033717A1 (en) | 2008-02-07 |
EP1619664B1 (en) | 2012-01-25 |
CA2524243A1 (en) | 2004-11-11 |
US7729905B2 (en) | 2010-06-01 |
CN100583241C (zh) | 2010-01-20 |
CN101615396B (zh) | 2012-05-09 |
US20060173677A1 (en) | 2006-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2004097796A1 (ja) | 音声符号化装置、音声復号化装置及びこれらの方法 | |
JP3747492B2 (ja) | 音声信号の再生方法及び再生装置 | |
JP4958780B2 (ja) | 符号化装置、復号化装置及びこれらの方法 | |
WO2003091989A1 (en) | Coding device, decoding device, coding method, and decoding method | |
JP4445328B2 (ja) | 音声・楽音復号化装置および音声・楽音復号化方法 | |
JP2003223189A (ja) | 音声符号変換方法及び装置 | |
JP4789430B2 (ja) | 音声符号化装置、音声復号化装置、およびこれらの方法 | |
US20040111257A1 (en) | Transcoding apparatus and method between CELP-based codecs using bandwidth extension | |
JP3144009B2 (ja) | 音声符号復号化装置 | |
JP4578145B2 (ja) | 音声符号化装置、音声復号化装置及びこれらの方法 | |
JP3888097B2 (ja) | ピッチ周期探索範囲設定装置、ピッチ周期探索装置、復号化適応音源ベクトル生成装置、音声符号化装置、音声復号化装置、音声信号送信装置、音声信号受信装置、移動局装置、及び基地局装置 | |
JP2004302259A (ja) | 音響信号の階層符号化方法および階層復号化方法 | |
JP4373693B2 (ja) | 音響信号の階層符号化方法および階層復号化方法 | |
JP3576485B2 (ja) | 固定音源ベクトル生成装置及び音声符号化/復号化装置 | |
JP2005215502A (ja) | 符号化装置、復号化装置、およびこれらの方法 | |
JPH11259098A (ja) | 音声符号化/復号化方法 | |
JP2002073097A (ja) | Celp型音声符号化装置とcelp型音声復号化装置及び音声符号化方法と音声復号化方法 | |
JP3063087B2 (ja) | 音声符号化復号化装置及び音声符号化装置ならびに音声復号化装置 | |
JP4287840B2 (ja) | 符号化装置 | |
JP3232728B2 (ja) | 音声符号化方法 | |
JP3017747B2 (ja) | 音声符号化装置 | |
JP2003015699A (ja) | 固定音源符号帳並びにそれを用いた音声符号化装置及び音声復号化装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
ENP | Entry into the national phase |
Ref document number: 2524243 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2004730659 Country of ref document: EP Ref document number: 1219/MUMNP/2005 Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020057020680 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2004814149X Country of ref document: CN |
|
WWP | Wipo information: published in national office |
Ref document number: 2004730659 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1020057020680 Country of ref document: KR |
|
ENP | Entry into the national phase |
Ref document number: 2006173677 Country of ref document: US Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10554619 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 10554619 Country of ref document: US |