WO2012158159A1 - Packet loss concealment for audio codec - Google Patents
Packet loss concealment for audio codec Download PDFInfo
- Publication number
- WO2012158159A1 WO2012158159A1 PCT/US2011/036662 US2011036662W WO2012158159A1 WO 2012158159 A1 WO2012158159 A1 WO 2012158159A1 US 2011036662 W US2011036662 W US 2011036662W WO 2012158159 A1 WO2012158159 A1 WO 2012158159A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frame
- pitch
- signal
- residual signals
- stored
- Prior art date
Links
- 238000012986 modification Methods 0.000 claims abstract description 7
- 230000004048 modification Effects 0.000 claims abstract description 7
- 238000000034 method Methods 0.000 claims description 29
- 230000005236 sound signal Effects 0.000 claims description 22
- 230000000737 periodic effect Effects 0.000 claims description 21
- 230000015572 biosynthetic process Effects 0.000 claims description 17
- 238000003786 synthesis reaction Methods 0.000 claims description 17
- 238000001914 filtration Methods 0.000 claims description 7
- 238000012952 Resampling Methods 0.000 claims description 5
- 238000004891 communication Methods 0.000 description 37
- 238000012545 processing Methods 0.000 description 32
- 230000008569 process Effects 0.000 description 16
- 239000013598 vector Substances 0.000 description 9
- 238000013459 approach Methods 0.000 description 8
- 230000007774 longterm Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000011524 similarity measure Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011045 prefiltration Methods 0.000 description 2
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 1
- MHAJPDPJQMAIIY-UHFFFAOYSA-N Hydrogen peroxide Chemical compound OO MHAJPDPJQMAIIY-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/005—Correction of errors induced by the transmission channel, if related to the coding algorithm
Definitions
- the technical field relates to packet loss concealment in communication systems (such as Voice over IP, also referred to as VoIP), having an audio codec (coder/decoder).
- Voice over IP also referred to as VoIP
- audio codec coder/decoder
- coder/decoder One such codec may be iSAC.
- Real-time communication refers to communication where the delay between one user speaking and another user hearing the speech is so short that it is imperceptible or nearly imperceptible.
- packet-switched networks such as the Internet
- VoIP is one audio communication approach enabling real-time communication over packet-switched networks.
- an audio signal is broken up into short time segments by an audio coder, and the time segments are transmitted individually as audio frames in packets.
- the packets are received by the receiver, the audio frames are extracted, and the short time segments are reassembled by an audio decoder into the original audio signal, enabling the receiver to hear the transmitted audio signal,
- Real time audio communication over packet-switched networks has brought with it unique challenges.
- the available bandwidth of the network may be limited, and may change over time, Packets may also get lost or corrupted. A packet is considered lost, when it fails to arrive at the intended receiver within a limited time interval, even if the packet does eventually arrive at the receiver.
- BEC Backward Error Correction
- Another approach for dealing with lost packets is to use information from received packets to recreate lost packet or packets.
- the received packets may contain information specifically for this purpose, such as redundant information about audio data from preceding time segments.
- Such an approach will result in reduced effective bandwidth available for communication, because the available bandwidth is used for transmitting redundant data, which may not be needed at all if packets are not lost.
- the present invention recognizes the problem posed by lost packets in real-time audio communication over packet switched networks, and provides a solution that avoids the disadvantages of the above examples.
- the loss of packets is concealed by simulating the audio information that would have likely been contained in the lost packets based on previously received packets.
- the invention utilizes packets that were previously received to reconstruct dropped packets in a particular way, without the use of a jitter buffer. Specifically, information from a previously received packet is used to reconstruct a lost packet, but the information is not merely copied. If it were simply copied, the resulting audio would sound unnatural and "robotic.” Instead, the information from the previously received packet is modified in a special way to make the reconstructed packet result in natural sounding audio.
- a method of decoding an audio signal having been encoded as a sequence of consecutive frames may include receiving a first frame of the consecutive frames, the first frame containing decoding parameters and a residual signals for reconstructing audio data represented by the first frame, storing the residual signals contained in the first frame, decoding the first frame based on the stored residual signals to reconstruct the audio signal encoded by the first frame, determining that a second frame subsequent to the first frame in time has been lost, modifying the stored residual signals, and reconstructing an estimate of the audio signal encoded by the second frame based on the modified residual signals.
- the modifying the stored residual signals may include generating a periodic signal, generating a colored pseudo-random signal based on the stored residual signals, multiplying the periodic signal and the colored pseudo-random signal with weight factors selected based on energy of an input and an output signal of a pitch synthesis filter created from the stored residual signals and based on pitch gain of the stored residual signals, and summing the weighted periodic signal and the weighted colored pseudo-random signal.
- the generating the periodic signal may include retrieving at least two most recently stored pitch cycles, altering periodicity of each pitch cycle, weighting each pitch cycle, and summing the two weighted pitch cycles.
- the altering the periodicity may include resampling pitch pulses of the pitch cycles.
- the generating the colored pseudo-random signal may include generating a pseudo-random sequence, and filtering the pseudo-random sequence with Nth- order all-zero filter with coefficients given by N latest samples of a previously decoded lower-band residual signals of a previously received frame.
- the stored residual signals may include input of a pitch synthesis filter, and input of an LPC synthesis filter.
- the decoding parameters may include pitch gains, pitch lags, and LPC parameters.
- the frames may contain encoded information for a first frequency band and distinct second frequency band higher than the first frequency band, and only a residual signal of the first frequency band is pitch post filtered, but not a residual signal of the second frequency band.
- a decoding apparatus for decoding an audio signal having been encoded as a sequence of consecutive frames includes a receiver configured to receive a first frame of the consecutive frames, the first frame containing decoding parameters and a residual signals for reconstructing audio data represented by the first frame, a storage unit storing the residual signals contained in the first frame, a decoding unit configured to decode the first frame based on the stored residual signals to reconstruct the audio signal encoded by the first frame, a loss detector configured to determine that a second frame subsequent to the first frame in time has been lost, a modification unit configured to modify the stored residual signals, and a reconstruction unit configured to reconstruct an estimate of the audio signal encoded by the second frame based on the stored residual signals modified by the modification unit.
- the modification unit may include a first signal generator configured to generate a periodic signal, a second signal generator configured to generate a colored pseudo-random signal based on the stored residual signals, a multiplier multiplying the periodic signal generated in the first signal generator and the colored pseudo-random signal generated in the second signal generator with weight factors selected based on energy of an input and an output signal of a pitch synthesis filter created from the stored residual signals and based on pitch gain of the stored residual signals, and an adder summing the weighted periodic signal and the weighted colored pseudo-random signal output from the multiplier.
- the first signal generator may be configured to retrieve at least two most recently stored pitch cycles, alter periodicity of each pitch cycle, weight each pitch cycle, and sum the two weighted pitch cycles.
- the first signal generator may be configured to alter the periodicity by resampling pitch pulses of the pitch cycles.
- the second signal generator may be configured to generate a pseudo-random sequence, and filter the pseudo-random sequence with Nth-order all-zero filter with coefficients given by N latest samples of a previously decoded lower-band residual signals of a previously received frame.
- the stored residual signals may include input of a pitch synthesis filter, and input of an LPC synthesis filter.
- the decoding parameters may include pitch gains, pitch lags, and LPC parameters.
- a computer readable tangible recording medium is encoded with instructions, wherein the instructions, when executed on a processor, cause the processor to perform a method including receiving a first frame of the consecutive frames, the first frame containing decoding parameters and a residual signals for reconstructing audio data represented by the first frame, storing the residual signals contained in the first frame, decoding the first frame based on the stored residual signals to reconstruct the audio signal encoded by the first frame, determining that a second frame subsequent to the first frame in time has been lost, modifying the stored residual signals, and reconstructing an estimate of the audio signal encoded by the second frame based on the modified residual signals,
- FIG. 1 is a block diagram illustrating an example of a communication system according to an embodiment of the present invention.
- FIG. 2 illustrates an example of a stream of packets with a lost packet according to an embodiment of the present invention
- FIG. 3 illustrates an example of a process flow of receiving packets according to an embodiment of the present invention.
- FIG. 4 illustrates an example of a process flow of decoding received packets according to an embodiment of the present invention.
- FIGS. 5A and 5B illustrate an example of a process flow of an algorithm for concealing packet loss according to an embodiment of the present invention.
- FIGS. 6A and 6B illustrate an example of a process flow of an algorithm for generating a quasi-periodic pulse train according to an embodiment of the present invention.
- FIG. 7 illustrates an example of a processing system for implementing the packet loss algorithm according to an embodiment of the present invention.
- Fig. 1 illustrates a communication system. Audio input is passed into one end of the system, and is ultimately output at the other end of the system. The communication can be concurrently bi-directional, as in a telephone conversation between two callers. The audio input can be generated by a user speaking, by a recording, or any other audio source. The audio input is supplied to encoder 102.
- Encoder 102 encodes the audio input into multiple packets, which are transmitted over packet network 104 to decoder 106.
- Packet network 104 can be any packet-switched network, whether using physical link connection and/or wireless link connections. Packet network 104 may also be a wireless communication network, and/or an optical link network. Packet network 104 conveys packets from encoder 102 to decoder 106. Some of the packets sent from the encoder 102 may get lost, as illustrated in Fig. 2.
- the encoder 102 may be the iSAC coder, and produces as output packets (also referred to as frames).
- An embodiment of the invention relies on pitch information, and assumes that pitch parameters are available at the decoder. But even if pitch parameters are not embedded in the payload, they could be estimated at the decoder based on the previously decoded audio.
- Each frame corresponds to a short segment of time, for example 30 or 60 milliseconds for iSAC. Other segment lengths may also be used with other encoders.
- Oneway delay is at least as large as one frame size, so frame sizes longer than 60 ms may create unacceptably long delays.
- longer frames are harder to conceal in the event of a lost packet. Shorter frames on the other hand may introduce too much packet overhead, reducing the effective bandwidth. If delay was not a concern (for instance in streaming), high quality could be achieved by allowing long frame sizes for stationary segments.
- the encoder 102 may separate the incoming audio signal into two frequency bands, referred to as the lower band (LB) and the upper band (UB).
- the LB may be 0-4kHz
- the UB may be 4-8kHz.
- a single frequency band (e.g., 0-8kHz) may also be used, without separating the incoming audio signal into separate bands.
- each frame contains at least pitch gain, pitch lag, LPC parameters, and DFT coefficients of a residual signal during the corresponding time segment.
- each of the bands will have respective information in the frame, the information for each band can be individually selected from the frame, and there are no pitch parameters associated with the UB band.
- the encoder used is iSAC
- the pitch lag can be thought of as the "optimal" delay of a long-term predictor
- pitch gain can be though of as the prediction gain
- LPC coefficients are optimal short-term prediction coefficients.
- Decoder 106 receives packets conveyed by network 104 and decodes the packets into audio data, which is output from decoder 106. Details of the processing performed by the decoder 106 are illustrated in Figs. 3-6. Decoder 106 may be implemented on a processor, such as illustrated in Fig. 7, or on other hardware platforms, such as mobile telecommunication devices. The processing performed by decoder 106 is advantageous for mobile devices that lack sufficient processing power to perform alternate types of packet loss concealment, as the approach according to the present invention is of a relatively low computational complexity.
- Fig. 3 illustrates a high level processing flow of the PLC approach according to an embodiment of the present invention.
- step S 306 a determination is made whether frame N has been received, i.e., not lost. If frame N has been received, the processing continues to step S 320, where frame N is decoded.
- Fig. 4 illustrates additional details of the processing in step S 320.
- step S 320 After frame N is decoded in step S 320, the processing increments index N in step S 340, and continues with step S 306 to determine if frame N+l has been received. So long as frames are not lost, the processing continues along the loop of step S 306, S 320, and S 340.
- step S 306 If it is determined in step S 306 that a frame has been lost, the processing continues to step S 350, where the loss of the frame is concealed.
- Figs. 5A-B illustrate additional details of the processing in step S 350.
- Fig. 4 illustrates an example of the process of decoding frames that are received by decoder 106.
- frame size and bandwidth information are decoded from the frame in step S 410.
- the frame size represents the size of the time segment represented by the frame, and can be represented in milliseconds, or count of samples at a particular sampling rate.
- the sampling rate may also be encoded in the frame. Sampling rate may be negotiated before a call takes place and is not supposed to change during a call.
- the bandwidth information reflects the bandwidth of the audio data encoded in the frame, and may be LB, UB, or both.
- step S 415 the pitch lags and the pitch gains are decoded from the frame.
- Pitch lags and gains may be updated every 7.5 ms, thus resulting is 4 pitch lags and gains per one 30 ms frame.
- the pitch lag represents the lag of a long-term predictor for the current signal.
- the pitch gain represents the long-term linear prediction coefficient.
- step S 420 The decoded pitch lags and pitch gains are stored in step S 420, as they may be needed for packet loss concealment, if subsequent frames are lost.
- step S 425 the LPC parameters (LPC shape and gain) are decoded.
- the LPC parameters represent short-term linear prediction coefficients, describing the spectral envelope of the signal.
- the LPC shape and gain are stored in step S 430, as they may be needed for packet loss concealment, if subsequent frames are lost.
- step S 435 the DFT coefficients of the residual signal encoded in the frame are decoded.
- the residual signal is the result of filtering out the short term and the long term linear dependencies.
- the DFT coefficients are the result of transforming the residual signal into the frequency domain by an operation such as the FFT.
- the DFT coefficients may include separate information for the LB signal and separate information for the UB signal.
- step S 440 the DFT coefficients which were decoded in step S 435 are transformed from the frequency domain into the time domain, by an operation such as an inverse FFT, resulting in the residual signal.
- an operation such as an inverse FFT
- a separate residual signal is created for LB (referred to as LBJ es) and a separate residual signal is created for UB (referred to as UB_Res).
- step S 445 the residual signals (LB Res and UB_R.es) are stored, as they may be needed for packet loss concealment.
- step S 450 the lower band residual signal (LB Res) is filtered by a pitch post- filter.
- the pitch post-filter is pole-zero filter where coefficients are given by the pitch gain and lag. It is the inverse of pitch pre-filter, therefore, it introduces long-term structure which was removed by the pitch pre-filter. Even when both LBJtes and UB_Res may be available, only the LB ies will be pitch post-filtered.
- the output of the pitch post-filter (the filtered residual signal) is stored, as it may be needed for packet loss concealment.
- step S 455 the LPC parameters decoded in step S 425 are used to synthesize the lower band and the upper band signals.
- LPC synthesis is an all-pole filter with coefficients derived from LPC parameters. This filter is the inverse of LPC analysis (at the encoder), therefore, it introduces short-term structure of the signal,
- the output of LPC synthesis is the time domain representation of the original encoded signal. In the case where LB and UB are used at the same time, the output is a separate LB signal and UB signal.
- step S 460 When LB and UB are used together, in step S 460 the LB signal and the UB signal are combined, thus creating a representation of the original audio input, thereby, the output can be the audio input for a receiver, illustrated in Fig. 1. In an implementation where LB and UB are not treated separately, and only a single frequency band is used, step S 460 may be skipped.
- the re-creation of the audio depends of the availability of the residual signal, pitch gain and lag, and LPC parameters from the received frame. In case of packet loss, however, that information is not available. As each frame represents a time segment on the order of 30 milliseconds, it is possible to simply copy the information from a preceding frame to represent the lost frame. With that approach, however, the audio would sound artificial and robotic. Thus, the inventors have derived an approach to reconstruct the data from the lost frame based on previously received frames which creates natural sounding audio.
- step S 306 When it is determined in step S 306 that a frame has been lost, decoder 106 performs packet loss concealment in step S 350. As shown in Fig. 5 A, stored pitch lag and pitch gain are retrieved in step S 510. The pitch lag and pitch gain were stored in step S 420 for the previous received frame.
- step S 515 the residual signal is retrieved for the previous received frame.
- the residual signal was stored in step S 445.
- step S 516 the decoder determines whether the current lost frame is one of consecutive lost frames. If the lost frame is not one of multiple consecutive lost frames, the processing proceeds to step S 520.
- step S 520 the latest two pitch pulses are computed.
- the pitch pulses used are closest in time to the lost frame. The computation is based on the pitch lag and the residual signal retrieved in steps S 510 and S 515.
- the two latest pitch pulses are only computed for the LB signal, even when both LB and UB signals are used.
- the two pitch pulses may be computed for both the LB and UB signals.
- the choice of using two pitch pulses is a design parameter determined by the inventors for optimal performance, but other number of pitch pulses could also be used.
- step S 525 the pitch pulses obtained in step S 520 are stored.
- the pitch pulses will be referred to as LB PI and LB P2.
- step S 530 the pitch post-filter output stored in step S 450 is retrieved, and in step S 535 the pitch post-filter output is used to compute a long-term similarity measure. More specifically the long-term similarity measure is a ratio computed based on the energy of pitch pulses before and after the post-filtering of the previous frame. It is a measure of how periodic the previous frame was.
- a voice indicator is computed based on the long-term similarity measure and the frequency of the computed pitch pulses.
- the voice indicator may be calculated as log2( sigma2_out / sigma2_in ) + 2 * pitch_gain + pitch_gain / 256, where log2(x) is logarithm of x in base 2, sigma2_out is the variance of the latest pitch pulse at the output of pitch post-filter and sigma2_in is the variance of the corresponding pulse at the input.
- the voice indicator is an indication of how periodic the last decoded frame was.
- step S 545 weigh factors are computed for voiced and un-voiced segments.
- the weight factor for voiced segments is w_v, while the weight factor for un-voiced segments is w_u.
- the weights are stored in step S 550.
- the description of steps S 520 through S 550 is based on non-consecutive lost frames.
- the processing differs for multiple consecutive lost frames as compared to a single lost frame. In the case of multiple consecutive lost frames, there is no immediately preceding frame that has been received. However, the first lost frame of a sequence of multiple lost frames will have been processed through steps S 520 to S 550. Any subsequent lost frames follow the processing through S 517 and S 547.
- step S 517 A decay rate is increased.
- the decay rate is the rate that the synthesized residual signal is decayed to zero, and is applied in step S 590.
- step S 547 the weight factors w_v and w_u calculated during the previous PLC call (stored in step S 550) are retrieved.
- step S 556 the weight factors w_v and w_u are analyzed to determine what kind of utterance is contained in the most recent received frame. Voiced utterances have strong periodic nature, while unvoiced utterances do not. If the most recently received frame contains voiced utterances, w_v will be greater than zero. If the frame also contains unvoiced utterances, w_u will also be grater than zero. The weights reflect the relative mix of voiced to unvoiced utterances in the frame. A frame with only voiced utterances will have w_ u equal to zero, while a frame with only unvoiced utterances will have w_v equal to zero. If both w_v and w_u are non-zero, the utterance is considered a mixed utterance.
- step S 560 If it is determined that the utterance is unvoiced (i.e., w_v is zero), the processing proceeds to step S 560, where a pseudo random vector is generated.
- a pseudo random vector may be generated for LB and a separate one for UB, when both LB and UB are used.
- step S 562 the pseudo random vector is filtered by an Nth-order all-zero filter with coefficients given by N latest samples of recently decoded residual signal.
- N may be a fixed number equal to 30. This filtering will color the generated pseudo random vectors to have a spectrum envelope similar to that of the previous received packet.
- step S 556 If it is determined in step S 556 that the utterance is voiced (i.e., w_ is zero), the processing proceeds to step S 580.
- step S 580 a quasi periodic pulse train is constructed.
- the quasi periodic pulse train is a weighted sum of the two latest pitch cycles.
- the output is the residual signal. In case both LB and UB are used, the output is the LB residual and the UB residual. Details of the process of generating the quasi periodic pulse train are illustrated in Figs. 6A-B.
- step S 556 If it is determined in step S 556 that the utterance is mixed, the processing proceeds to step S 570.
- Step S 570 is functionally the same as step S 580. The details of the processing in step S 570 are illustrated in Figs. 6A-B.
- the output of step S 570 is a lower band pulse train (referred to as LB_F) and an upper band pulse train (referred to as UB P).
- step S 572 two pseudo random vectors are generated, one for LB and one for UB.
- the process of generating the pseudo random vectors is the same as in step S 560.
- the LB pseudo random vector will be referred to as LB N and the UB pseudo random vector will be referred to as UB_N,
- step S 574 weight factors w_v and w u are applied to the quasi-periodic pulse train and to the pseudo random vectors as follows.
- the LB residual is LB_P*w_v + LB_N*w_u.
- the UB residual is UB_P*w_v + UB_N*w_u.
- step S 590 the residual signal is decayed.
- the decay is linear and applied sample-by- sample. If K is the size of the reconstructed residual signal, the following pseudo code illustrates an exemplary algorithm for decaying the signal, where d is a number less than 1 , the role of the decay_rate is apparent:
- step S 592 the LB residual is pitch post-filtered, similar to step S 450.
- the pitch post-filtering uses filter coefficients derived from pitch lag and pitch gain stored in step S 420.
- the UB residual can skip pitch post-filtering.
- step S 594 LPC parameters stored in step S 430 are retrieved, and LPC synthesis of the LB and UB signal is performed based on the retrieved parameters. [0074] In step S 596 the LB and UB signals are combined to create a synthesized representation of the audio of the lost frame.
- Figs. 6A-B illustrate a detailed description of the process of constructing a quasi- periodic pulse train according to an embodiment of the present invention.
- a quasi-periodic pulse train is constructed in steps S 570 and S 580.
- step S 610 the pitch lag of a previous frame, LB PI, LB P2, and UB_Res are retrieved. These values were previously stored when the previous frame was received.
- step S 615 loop counters j and p_cntr are initialized to zero.
- step S 616 the decoder determines whether the current frame is one of consecutive lost frames. If the lost frame is not one of multiple consecutive lost frames, the processing proceeds to step S 617, where the value of variable L is set equal to the retrieved pitch lag from step S610. It can be appreciated that the first lost frame will cause L to be initialized to the value of pitch lag, but subsequent lost frames will bypass step S 617, and the processing will continue to step S 620.
- step S 620 LB_P1 is resampled to L samples and assigned to Rl. Thus, the length of Rl is L samples.
- step S 625 the last L samples of UB_Res are selected, and referred to as Ql.
- step S 630 loop counter i is initialized to zero.
- step S 636 the decoder determines whether j is less than the frame size (extracted in step S 410). As long as j remains less than the frame size, the loop continues. When j reaches the frame ize, LB P and UB P are returned as the quasi-periodic pulse trains.
- step S 638 the decoder determines whether i is less than L. If i is less than L, the process returns to step S 635 and continues the loop. Once i reaches L, the process continues to step S 640, shown in Fig. 6B.
- step S 640 p cntr is incremented by one.
- step S 642 the decoder determines whether L is greater than pitch Jag. If L is not greater, L is set to pitch Jag+1 in step S 644. If L is greater than pitch ag, L is set to pitch Jag in step S 646.
- This processing is an example of resampling of pitch pulses to avoid too much of periodicity in the reconstructed signal.
- step S 650 LB PI is resampled to L samples and assigned to RJ.
- the length of Rl is I samples.
- step S 655 LB P2 is resampled to L samples and assigned to R2.
- the length of R2 is L samples.
- step S 656 the decoder determines whether the value of p_cntr is equal to 1, 2, or 3.
- Rl is set to (3 *R1 +R2)/4 in step S 661.
- Rl is set to (Rl+R2)/2 in step S 662.
- step S 673 If the value of p_cntr is 3, Rl is set to (Rl+3*R2)/4 in step S 663, and p_cntr is set to 0 in step S 673. [0092] At the conclusion any of steps S 661, S 662, and S 673 the processing returns to step S 630 in Fig. 6A.
- FIG. 7 is a block diagram illustrating an example of a computing device 700 that is arranged for packet loss concealment in accordance with the present disclosure.
- computing device 700 typically includes one or more processors 710 and system memory 720.
- a memory bus 730 can be used for communicating between the processor 710 and the system memory 720,
- processor 710 can be of any type including but not limited to a microprocessor ( ⁇ ), a microcontroller ( ⁇ ), a digital signal processor (DSP), or any combination thereof.
- Processor 710 can include one more levels of caching, such as a level one cache 711 and a level two cache 712, a processor core 713, and registers 714.
- the processor core 713 can include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP Core), or any combination thereof.
- a memory controller 715 can also be used with the processor 710, or in some implementations the memory controller 715 can be an internal part of the processor 710.
- system memory 720 can be of any type including but not limited to volatile memory (such as RAM), non- volatile memory (such as ROM, flash memory, etc.) or any combination thereof.
- System memory 720 typically includes an operating system 721, one or more applications 722, and program data 724.
- Application 722 includes a decoding processing algorithm with packet loss concealment 723 that is arranged to decode incoming packets, and to conceal lost packets according to the present disclosure.
- Program Data 724 includes service data 725 that is useful for performing decoding of received packets and concealing lost packets, as will be further described below.
- application 722 can be arranged to operate with program data 724 on an operating system 721 such that Android, Chrome, Windows, etc. This described basic configuration is illustrated in FIG. 7 by those components within dashed line 701.
- Computing device 700 can have additional features or functionality, and additional interfaces to facilitate communications between the basic configuration 701 and any required devices and interfaces.
- a bus/interface controller 740 can be used to facilitate communications between the basic configuration 701 and one or more data storage devices 750 via a storage interface bus 741.
- the data storage devices 750 can be removable storage devices 751, non-removable storage devices 752, or a combination thereof.
- removable storage and non-removable storage devices include magnetic disk devices such as flexible disk drives and hard-disk drives (HDD), optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSD), and tape drives to name a few.
- Example computer storage media can include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data.
- System memory 720, removable storage 751 and non-removable storage 752 are all examples of computer readable storage media, and store information as described in various steps of the processing algorithms described in this disclosure.
- Computer readable storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 700. Any such computer storage media can be part of device 700, and can store instructions that are executed by processor 710, and cause the computing device 700 to perform a method of decoding packets and concealing lost packets as described in this disclosure.
- Computing device 700 can also include an interface bus 742 for facilitating communication from various interface devices (e.g., output interfaces, peripheral interfaces, and communication interfaces) to the basic configuration 701 via the bus/interface controller 740.
- Example output devices 760 include a graphics processing unit 761 and an audio processing unit 762, which can be configured to communicate to various external devices such as a display or speakers via one or more A/V ports 763.
- Example peripheral interfaces 770 include a serial interface controller 771 or a parallel interface controller 772, which can be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 773.
- An example communication device 780 includes a network controller 781, which can be arranged to facilitate communications with one or more other computing devices 790 over a network communication via one or more communication ports 782.
- the communication connection is one example of a communication media.
- Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media.
- a "modulated data signal" can be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
- communication media can include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared (IR) and other wireless media.
- RF radio frequency
- IR infrared
- the term computer readable media as used herein can include both storage media and communication media.
- Computing device 700 can be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions.
- a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions.
- PDA personal data assistant
- Computing device 700 can also be implemented as a personal computer including both laptop computer and non-laptop computer configurations.
- the implementer may opt for a mainly hardware and/or firmware vehicle; if flexibility is paramount, the implementer may opt for a mainly software implementation; or, yet again alternatively, the implementer may opt for some combination of hardware, software, and/or firmware.
- Examples of a signal bearing medium include, but are not limited to, the following: a recordable type medium such as a floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape, a computer memory, etc.; and a transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.) .
- a typical data processing system generally includes one or more of a system unit housing, a video display device, a memory such as volatile and nonvolatile memory, processors such as microprocessors and digital signal processors, computational entities such as operating systems, drivers, graphical user interfaces, and applications programs, one or more interaction devices, such as a touch pad or screen, and/or control systems including feedback loops and control motors (e.g., feedback for sensing position and/or velocity; control motors for moving and/or adjusting components and/or quantities).
- a typical data processing system may be implemented utilizing any suitable commercially available components, such as those typically found in data computing/communication and/or network computing communication systems.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Mobile Radio Communication Systems (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
A speech signal is encoded as a sequence of consecutive frames. When a frame is lost, the loss is concealed at a receiver by reconstructing audio that would be contained in the lost frame based on other previously received frames. The frames contain a residual signal and linear predictive coding parameters representing a segment of audio data. For a lost frame the content of a previous frame is not copied, but is modified to make the reconstructed audio sound natural. The modification includes creating a weighted sum of a quasi-periodic signal derived from the latest two pitch cycles and a pseudo random sequence. The weights are selected based on a determination of whether the previous frame contains voiced or unvoiced utterances.
Description
PACKET LOSS CONCEALMENT FOR AUDIO CODEC
TECHNICAL FIELD
[0001] The technical field relates to packet loss concealment in communication systems (such as Voice over IP, also referred to as VoIP), having an audio codec (coder/decoder). One such codec may be iSAC.
BACKGROUND
[0002] Telephone communication originally relied on dedicated connections between callers. Thus, every ongoing telephone conversation required a physical, real-time, connection to enable real-time communication. Real-time communication refers to communication where the delay between one user speaking and another user hearing the speech is so short that it is imperceptible or nearly imperceptible. In recent years, advances in communication technology have allowed packet-switched networks, such as the Internet, to support real-time communication.
[0003] VoIP is one audio communication approach enabling real-time communication over packet-switched networks. Instead of a dedicated connection between callers, an audio signal is broken up into short time segments by an audio coder, and the time segments are transmitted individually as audio frames in packets. The packets are received by the receiver, the audio frames are extracted, and the short time segments are reassembled by an audio decoder into the original audio signal, enabling the receiver to hear the transmitted audio signal,
[0004] Real time audio communication over packet-switched networks has brought with it unique challenges. The available bandwidth of the network may be limited, and may change over time, Packets may also get lost or corrupted. A packet is considered lost, when it fails to arrive at the intended receiver within a limited time interval, even if the packet does eventually arrive at the receiver.
[0005] One approach for dealing with lost packets is Backward Error Correction (BEC), where the receiver notifies the transmitter that an expected packet was not received, causing the transmitter to re-transmit the expected packet. While viable for tasks such as file transmission, BEC is not desirable for a real-time communication system. In real-time audio communication re-transmission is not a viable option because it typically results in a large delay before the missing packet is received by the receiver. Waiting for re-transmission of a packet would result in the loss of the real-time nature of the communication.
[0006] Another approach for dealing with lost packets is to use information from received packets to recreate lost packet or packets. The received packets may contain information specifically for this purpose, such as redundant information about audio data from preceding time segments. Such an approach, however, will result in reduced effective bandwidth available for communication, because the available bandwidth is used for transmitting redundant data, which may not be needed at all if packets are not lost.
[0007] The present invention recognizes the problem posed by lost packets in real-time audio communication over packet switched networks, and provides a solution that avoids the disadvantages of the above examples.
[0008] According to an embodiment of the present invention, the loss of packets is concealed by simulating the audio information that would have likely been contained in the lost packets based on previously received packets. The invention utilizes packets that were previously received to reconstruct dropped packets in a particular way, without the use of a
jitter buffer. Specifically, information from a previously received packet is used to reconstruct a lost packet, but the information is not merely copied. If it were simply copied, the resulting audio would sound unnatural and "robotic." Instead, the information from the previously received packet is modified in a special way to make the reconstructed packet result in natural sounding audio.
SUMMARY
[0009] In an embodiment a method of decoding an audio signal having been encoded as a sequence of consecutive frames may include receiving a first frame of the consecutive frames, the first frame containing decoding parameters and a residual signals for reconstructing audio data represented by the first frame, storing the residual signals contained in the first frame, decoding the first frame based on the stored residual signals to reconstruct the audio signal encoded by the first frame, determining that a second frame subsequent to the first frame in time has been lost, modifying the stored residual signals, and reconstructing an estimate of the audio signal encoded by the second frame based on the modified residual signals.
[0010] In an embodiment, the modifying the stored residual signals may include generating a periodic signal, generating a colored pseudo-random signal based on the stored residual signals, multiplying the periodic signal and the colored pseudo-random signal with weight factors selected based on energy of an input and an output signal of a pitch synthesis filter created from the stored residual signals and based on pitch gain of the stored residual signals, and summing the weighted periodic signal and the weighted colored pseudo-random signal.
[0011] In an embodiment, the generating the periodic signal may include retrieving at least two most recently stored pitch cycles, altering periodicity of each pitch cycle, weighting each pitch cycle, and summing the two weighted pitch cycles.
[0012] In an embodiment, the altering the periodicity may include resampling pitch pulses of the pitch cycles.
[0013] In an embodiment, the generating the colored pseudo-random signal may include generating a pseudo-random sequence, and filtering the pseudo-random sequence with Nth- order all-zero filter with coefficients given by N latest samples of a previously decoded lower-band residual signals of a previously received frame.
[0014] In an embodiment, the stored residual signals may include input of a pitch synthesis filter, and input of an LPC synthesis filter. The decoding parameters may include pitch gains, pitch lags, and LPC parameters.
[0015] In an embodiment, the frames may contain encoded information for a first frequency band and distinct second frequency band higher than the first frequency band, and only a residual signal of the first frequency band is pitch post filtered, but not a residual signal of the second frequency band.
[0016] In another embodiment a decoding apparatus for decoding an audio signal having been encoded as a sequence of consecutive frames includes a receiver configured to receive a first frame of the consecutive frames, the first frame containing decoding parameters and a residual signals for reconstructing audio data represented by the first frame, a storage unit storing the residual signals contained in the first frame, a decoding unit configured to decode the first frame based on the stored residual signals to reconstruct the audio signal encoded by the first frame, a loss detector configured to determine that a second frame subsequent to the first frame in time has been lost, a modification unit configured to modify the stored residual signals, and a reconstruction unit configured to reconstruct an estimate of the audio signal
encoded by the second frame based on the stored residual signals modified by the modification unit.
[0017] In an embodiment, the modification unit may include a first signal generator configured to generate a periodic signal, a second signal generator configured to generate a colored pseudo-random signal based on the stored residual signals, a multiplier multiplying the periodic signal generated in the first signal generator and the colored pseudo-random signal generated in the second signal generator with weight factors selected based on energy of an input and an output signal of a pitch synthesis filter created from the stored residual signals and based on pitch gain of the stored residual signals, and an adder summing the weighted periodic signal and the weighted colored pseudo-random signal output from the multiplier.
[0018] In an embodiment, the first signal generator may be configured to retrieve at least two most recently stored pitch cycles, alter periodicity of each pitch cycle, weight each pitch cycle, and sum the two weighted pitch cycles.
[0019] In an embodiment, the first signal generator may be configured to alter the periodicity by resampling pitch pulses of the pitch cycles.
[0020] In an embodiment, the second signal generator may be configured to generate a pseudo-random sequence, and filter the pseudo-random sequence with Nth-order all-zero filter with coefficients given by N latest samples of a previously decoded lower-band residual signals of a previously received frame.
[0021] In an embodiment, the stored residual signals may include input of a pitch synthesis filter, and input of an LPC synthesis filter. The decoding parameters may include pitch gains, pitch lags, and LPC parameters.
[0022] In yet another embodiment a computer readable tangible recording medium is encoded with instructions, wherein the instructions, when executed on a processor, cause the
processor to perform a method including receiving a first frame of the consecutive frames, the first frame containing decoding parameters and a residual signals for reconstructing audio data represented by the first frame, storing the residual signals contained in the first frame, decoding the first frame based on the stored residual signals to reconstruct the audio signal encoded by the first frame, determining that a second frame subsequent to the first frame in time has been lost, modifying the stored residual signals, and reconstructing an estimate of the audio signal encoded by the second frame based on the modified residual signals,
BRIEF DESCRIPTION OF DRAWINGS
[0023] The present invention will become more fully understood from the detailed description given herein below and the accompanying drawings which are given by way of illustration only, and thus do not limit the present invention.
[0024] FIG. 1 is a block diagram illustrating an example of a communication system according to an embodiment of the present invention.
[0025] FIG. 2 illustrates an example of a stream of packets with a lost packet according to an embodiment of the present invention,
[0026] FIG. 3 illustrates an example of a process flow of receiving packets according to an embodiment of the present invention.
[0027] FIG. 4 illustrates an example of a process flow of decoding received packets according to an embodiment of the present invention.
[0028] FIGS. 5A and 5B illustrate an example of a process flow of an algorithm for concealing packet loss according to an embodiment of the present invention.
[0029] FIGS. 6A and 6B illustrate an example of a process flow of an algorithm for generating a quasi-periodic pulse train according to an embodiment of the present invention.
[0030] FIG. 7 illustrates an example of a processing system for implementing the packet loss algorithm according to an embodiment of the present invention.
DETAILED DESCRIPTION
[0031] Fig. 1 illustrates a communication system. Audio input is passed into one end of the system, and is ultimately output at the other end of the system. The communication can be concurrently bi-directional, as in a telephone conversation between two callers. The audio input can be generated by a user speaking, by a recording, or any other audio source. The audio input is supplied to encoder 102.
[0032] Encoder 102 encodes the audio input into multiple packets, which are transmitted over packet network 104 to decoder 106. Packet network 104 can be any packet-switched network, whether using physical link connection and/or wireless link connections. Packet network 104 may also be a wireless communication network, and/or an optical link network. Packet network 104 conveys packets from encoder 102 to decoder 106. Some of the packets sent from the encoder 102 may get lost, as illustrated in Fig. 2.
[0033] The encoder 102 may be the iSAC coder, and produces as output packets (also referred to as frames). An embodiment of the invention relies on pitch information, and assumes that pitch parameters are available at the decoder. But even if pitch parameters are not embedded in the payload, they could be estimated at the decoder based on the previously decoded audio. Each frame corresponds to a short segment of time, for example 30 or 60 milliseconds for iSAC. Other segment lengths may also be used with other encoders. Oneway delay is at least as large as one frame size, so frame sizes longer than 60 ms may create unacceptably long delays. Furthermore, longer frames are harder to conceal in the event of a lost packet. Shorter frames on the other hand may introduce too much packet overhead,
reducing the effective bandwidth. If delay was not a concern (for instance in streaming), high quality could be achieved by allowing long frame sizes for stationary segments.
[0034] When the encoder 102 is the iSAC coder, it may separate the incoming audio signal into two frequency bands, referred to as the lower band (LB) and the upper band (UB). For example, the LB may be 0-4kHz, and the UB may be 4-8kHz. Other selections of the bands are possible and may be used, (e.g., LB=0-8kHz, UB=8- 16kHz). A single frequency band (e.g., 0-8kHz) may also be used, without separating the incoming audio signal into separate bands.
[0035] As illustrated in Fig. 2, each frame contains at least pitch gain, pitch lag, LPC parameters, and DFT coefficients of a residual signal during the corresponding time segment. In the case where the incoming audio signal was separated into the LB and UB bands, each of the bands will have respective information in the frame, the information for each band can be individually selected from the frame, and there are no pitch parameters associated with the UB band. When the encoder used is iSAC, there are 4 sets of pitch parameters in a frame and 6 sets of LPC parameters in a frame, to capture the evolution of the signal within the frame. The pitch lag can be thought of as the "optimal" delay of a long-term predictor, and pitch gain can be though of as the prediction gain, while LPC coefficients are optimal short-term prediction coefficients.
[0036] Decoder 106 receives packets conveyed by network 104 and decodes the packets into audio data, which is output from decoder 106. Details of the processing performed by the decoder 106 are illustrated in Figs. 3-6. Decoder 106 may be implemented on a processor, such as illustrated in Fig. 7, or on other hardware platforms, such as mobile telecommunication devices. The processing performed by decoder 106 is advantageous for mobile devices that lack sufficient processing power to perform alternate types of packet loss
concealment, as the approach according to the present invention is of a relatively low computational complexity.
[0037] Fig. 3 illustrates a high level processing flow of the PLC approach according to an embodiment of the present invention. In step S 306, a determination is made whether frame N has been received, i.e., not lost. If frame N has been received, the processing continues to step S 320, where frame N is decoded. Fig. 4 illustrates additional details of the processing in step S 320.
[0038] After frame N is decoded in step S 320, the processing increments index N in step S 340, and continues with step S 306 to determine if frame N+l has been received. So long as frames are not lost, the processing continues along the loop of step S 306, S 320, and S 340.
[0039] If it is determined in step S 306 that a frame has been lost, the processing continues to step S 350, where the loss of the frame is concealed. Figs. 5A-B illustrate additional details of the processing in step S 350.
[0040] Fig. 4 illustrates an example of the process of decoding frames that are received by decoder 106. When a frame is received, frame size and bandwidth information are decoded from the frame in step S 410. The frame size represents the size of the time segment represented by the frame, and can be represented in milliseconds, or count of samples at a particular sampling rate. The sampling rate may also be encoded in the frame. Sampling rate may be negotiated before a call takes place and is not supposed to change during a call. The bandwidth information reflects the bandwidth of the audio data encoded in the frame, and may be LB, UB, or both.
[0041] In step S 415 the pitch lags and the pitch gains are decoded from the frame. Pitch lags and gains may be updated every 7.5 ms, thus resulting is 4 pitch lags and gains per one
30 ms frame. The pitch lag represents the lag of a long-term predictor for the current signal. The pitch gain represents the long-term linear prediction coefficient.
[0042] The decoded pitch lags and pitch gains are stored in step S 420, as they may be needed for packet loss concealment, if subsequent frames are lost.
[0043] In step S 425 the LPC parameters (LPC shape and gain) are decoded. The LPC parameters represent short-term linear prediction coefficients, describing the spectral envelope of the signal.
[0044] The LPC shape and gain are stored in step S 430, as they may be needed for packet loss concealment, if subsequent frames are lost.
[0045] In step S 435 the DFT coefficients of the residual signal encoded in the frame are decoded. The residual signal is the result of filtering out the short term and the long term linear dependencies. The DFT coefficients are the result of transforming the residual signal into the frequency domain by an operation such as the FFT. The DFT coefficients may include separate information for the LB signal and separate information for the UB signal.
[0046] In step S 440 the DFT coefficients which were decoded in step S 435 are transformed from the frequency domain into the time domain, by an operation such as an inverse FFT, resulting in the residual signal. In case of using both LB and UB signal, a separate residual signal is created for LB (referred to as LBJ es) and a separate residual signal is created for UB (referred to as UB_Res).
[0047] In step S 445 the residual signals (LB Res and UB_R.es) are stored, as they may be needed for packet loss concealment.
[0048] In step S 450 the lower band residual signal (LB Res) is filtered by a pitch post- filter. The pitch post-filter is pole-zero filter where coefficients are given by the pitch gain and lag. It is the inverse of pitch pre-filter, therefore, it introduces long-term structure which was removed by the pitch pre-filter. Even when both LBJtes and UB_Res may be available,
only the LB ies will be pitch post-filtered. The output of the pitch post-filter (the filtered residual signal) is stored, as it may be needed for packet loss concealment.
[0049] In step S 455 the LPC parameters decoded in step S 425 are used to synthesize the lower band and the upper band signals. LPC synthesis is an all-pole filter with coefficients derived from LPC parameters. This filter is the inverse of LPC analysis (at the encoder), therefore, it introduces short-term structure of the signal,
[0050] The output of LPC synthesis is the time domain representation of the original encoded signal. In the case where LB and UB are used at the same time, the output is a separate LB signal and UB signal.
[0051] When LB and UB are used together, in step S 460 the LB signal and the UB signal are combined, thus creating a representation of the original audio input, thereby, the output can be the audio input for a receiver, illustrated in Fig. 1. In an implementation where LB and UB are not treated separately, and only a single frequency band is used, step S 460 may be skipped.
[0052] As illustrated in Fig. 4, the re-creation of the audio depends of the availability of the residual signal, pitch gain and lag, and LPC parameters from the received frame. In case of packet loss, however, that information is not available. As each frame represents a time segment on the order of 30 milliseconds, it is possible to simply copy the information from a preceding frame to represent the lost frame. With that approach, however, the audio would sound artificial and robotic. Thus, the inventors have derived an approach to reconstruct the data from the lost frame based on previously received frames which creates natural sounding audio. The idea is to reconstruct residual signals— input to pitch synthesis (low-band residual) and the input to the upper-band LPC synthesis (upper-band residual)-similar to ones of the previous packet, but not exactly the same. The details are illustrated in Figs. 5A-B.
[0053] When it is determined in step S 306 that a frame has been lost, decoder 106 performs packet loss concealment in step S 350. As shown in Fig. 5 A, stored pitch lag and pitch gain are retrieved in step S 510. The pitch lag and pitch gain were stored in step S 420 for the previous received frame.
[0054] In step S 515 the residual signal is retrieved for the previous received frame. The residual signal was stored in step S 445.
[0055] In step S 516, the decoder determines whether the current lost frame is one of consecutive lost frames. If the lost frame is not one of multiple consecutive lost frames, the processing proceeds to step S 520.
[0056] In step S 520 the latest two pitch pulses are computed. The pitch pulses used are closest in time to the lost frame. The computation is based on the pitch lag and the residual signal retrieved in steps S 510 and S 515. In an embodiment the two latest pitch pulses are only computed for the LB signal, even when both LB and UB signals are used. In other embodiments the two pitch pulses may be computed for both the LB and UB signals. The choice of using two pitch pulses is a design parameter determined by the inventors for optimal performance, but other number of pitch pulses could also be used.
[0057] In step S 525 the pitch pulses obtained in step S 520 are stored. For the LB signal the pitch pulses will be referred to as LB PI and LB P2.
[0058] In step S 530 the pitch post-filter output stored in step S 450 is retrieved, and in step S 535 the pitch post-filter output is used to compute a long-term similarity measure. More specifically the long-term similarity measure is a ratio computed based on the energy of pitch pulses before and after the post-filtering of the previous frame. It is a measure of how periodic the previous frame was.
[0059] In step S 540 a voice indicator is computed based on the long-term similarity measure and the frequency of the computed pitch pulses. For example, the voice indicator
may be calculated as log2( sigma2_out / sigma2_in ) + 2 * pitch_gain + pitch_gain / 256, where log2(x) is logarithm of x in base 2, sigma2_out is the variance of the latest pitch pulse at the output of pitch post-filter and sigma2_in is the variance of the corresponding pulse at the input. The voice indicator is an indication of how periodic the last decoded frame was. [0060] In step S 545 weigh factors are computed for voiced and un-voiced segments. The weight factor for voiced segments is w_v, while the weight factor for un-voiced segments is w_u. The following pseudo code is an example of an algorithm for calculating the weight factors: limLow = 0.0214;
limUp = 0.1526;
M = ( limLow + limUp ) / 2 ;
if( voicelndicator < limLow )
{
w_u = 1;
w_v = 0;
} else {
if( voicelndicator > limUp ){
w_u = 0;
w_v = 1;
} else{
if( voicelndicator < M ){
s - ( voiceIndicator - limLow ) / (doubleX 1.41421356237310 * M ); b = s*s;
a = 1 - b;
} else{
s = ( limUp - voicelndicator ) / (double)( 1.41421356237310 * M ); a = s * s;
b = 1 - a;
}
w_u = a;
w_v = b;
}
}
[0061] The weights are stored in step S 550. The description of steps S 520 through S 550 is based on non-consecutive lost frames. The processing differs for multiple consecutive lost frames as compared to a single lost frame. In the case of multiple consecutive lost frames, there is no immediately preceding frame that has been received. However, the first
lost frame of a sequence of multiple lost frames will have been processed through steps S 520 to S 550. Any subsequent lost frames follow the processing through S 517 and S 547.
[0062] A voiced segment reproduced by simply repeating a pitch pulse would sound very artificial and unpleasant to human ears (known as robotic sounds). Thus, the weighting changes over the number of reconstructed pitch cycles to avoid artifacts. In step S 517 a decay rate is increased. The decay rate is the rate that the synthesized residual signal is decayed to zero, and is applied in step S 590.
[0063] In step S 547 the weight factors w_v and w_u calculated during the previous PLC call (stored in step S 550) are retrieved.
[0064] The processing flow continues in Fig. 5B, where in step S 556 the weight factors w_v and w_u are analyzed to determine what kind of utterance is contained in the most recent received frame. Voiced utterances have strong periodic nature, while unvoiced utterances do not. If the most recently received frame contains voiced utterances, w_v will be greater than zero. If the frame also contains unvoiced utterances, w_u will also be grater than zero. The weights reflect the relative mix of voiced to unvoiced utterances in the frame. A frame with only voiced utterances will have w_ u equal to zero, while a frame with only unvoiced utterances will have w_v equal to zero. If both w_v and w_u are non-zero, the utterance is considered a mixed utterance.
[0065] If it is determined that the utterance is unvoiced (i.e., w_v is zero), the processing proceeds to step S 560, where a pseudo random vector is generated. A pseudo random vector may be generated for LB and a separate one for UB, when both LB and UB are used.
[0066] In step S 562 the pseudo random vector is filtered by an Nth-order all-zero filter with coefficients given by N latest samples of recently decoded residual signal. In an exemplary embodiment, N may be a fixed number equal to 30. This filtering will color the
generated pseudo random vectors to have a spectrum envelope similar to that of the previous received packet.
[0067] If it is determined in step S 556 that the utterance is voiced (i.e., w_ is zero), the processing proceeds to step S 580. In step S 580 a quasi periodic pulse train is constructed. The quasi periodic pulse train is a weighted sum of the two latest pitch cycles. The output is the residual signal. In case both LB and UB are used, the output is the LB residual and the UB residual. Details of the process of generating the quasi periodic pulse train are illustrated in Figs. 6A-B.
[0068] If it is determined in step S 556 that the utterance is mixed, the processing proceeds to step S 570. Step S 570 is functionally the same as step S 580. The details of the processing in step S 570 are illustrated in Figs. 6A-B. The output of step S 570 is a lower band pulse train (referred to as LB_F) and an upper band pulse train (referred to as UB P).
[0069] In step S 572 two pseudo random vectors are generated, one for LB and one for UB. The process of generating the pseudo random vectors is the same as in step S 560. The LB pseudo random vector will be referred to as LB N and the UB pseudo random vector will be referred to as UB_N,
[0070] In step S 574 weight factors w_v and w u are applied to the quasi-periodic pulse train and to the pseudo random vectors as follows. The LB residual is LB_P*w_v + LB_N*w_u. The UB residual is UB_P*w_v + UB_N*w_u.
[0071] At this stage the residual signals have been calculated and weighted appropriately. In step S 590 the residual signal is decayed. The decay is linear and applied sample-by- sample. If K is the size of the reconstructed residual signal, the following pseudo code illustrates an exemplary algorithm for decaying the signal, where d is a number less than 1 , the role of the decay_rate is apparent:
for n = 1 to do
{
s(n) - s(n) * d;
d = d - decay_rate;
if ( d < O ) then d = 0
}
[0072] In step S 592 the LB residual is pitch post-filtered, similar to step S 450. The pitch post-filtering uses filter coefficients derived from pitch lag and pitch gain stored in step S 420. The UB residual can skip pitch post-filtering.
[0073] In step S 594 LPC parameters stored in step S 430 are retrieved, and LPC synthesis of the LB and UB signal is performed based on the retrieved parameters. [0074] In step S 596 the LB and UB signals are combined to create a synthesized representation of the audio of the lost frame.
[0075] Figs. 6A-B illustrate a detailed description of the process of constructing a quasi- periodic pulse train according to an embodiment of the present invention. A quasi-periodic pulse train is constructed in steps S 570 and S 580.
[0076] In step S 610 the pitch lag of a previous frame, LB PI, LB P2, and UB_Res are retrieved. These values were previously stored when the previous frame was received. [0077] In step S 615 loop counters j and p_cntr are initialized to zero. In step S 616 the decoder determines whether the current frame is one of consecutive lost frames. If the lost frame is not one of multiple consecutive lost frames, the processing proceeds to step S 617, where the value of variable L is set equal to the retrieved pitch lag from step S610. It can be appreciated that the first lost frame will cause L to be initialized to the value of pitch lag, but subsequent lost frames will bypass step S 617, and the processing will continue to step S 620. [0078] In step S 620 LB_P1 is resampled to L samples and assigned to Rl. Thus, the length of Rl is L samples.
[0079] In step S 625 the last L samples of UB_Res are selected, and referred to as Ql. [0080] In step S 630 loop counter i is initialized to zero.
[0081] In step S 635 the quasi-periodic pulse trains LB P (for the lower band) and UB_P (for the upper band) are constructed. At each iteration of a loop over /' and j, LB_P(j)=Rl(i) and UB_P(f)=Ql(i), and / and j are incremented by one.
[0082] In step S 636 the decoder determines whether j is less than the frame size (extracted in step S 410). As long as j remains less than the frame size, the loop continues. When j reaches the frame ize, LB P and UB P are returned as the quasi-periodic pulse trains.
[0083] In step S 638 the decoder determines whether i is less than L. If i is less than L, the process returns to step S 635 and continues the loop. Once i reaches L, the process continues to step S 640, shown in Fig. 6B.
[0084] In step S 640 p cntr is incremented by one.
[0085] In step S 642 the decoder determines whether L is greater than pitch Jag. If L is not greater, L is set to pitch Jag+1 in step S 644. If L is greater than pitch ag, L is set to pitch Jag in step S 646. This processing is an example of resampling of pitch pulses to avoid too much of periodicity in the reconstructed signal.
[0086] In step S 650 LB PI is resampled to L samples and assigned to RJ. Thus, the length of Rl is I samples.
[0087] In step S 655 LB P2 is resampled to L samples and assigned to R2. Thus, the length of R2 is L samples.
[0088] In step S 656 the decoder determines whether the value of p_cntr is equal to 1, 2, or 3.
[0089] If the value of p_cntr is 1 , Rl is set to (3 *R1 +R2)/4 in step S 661.
[0090] If the value ofp_cnlr is 2, Rl is set to (Rl+R2)/2 in step S 662.
[0091] If the value of p_cntr is 3, Rl is set to (Rl+3*R2)/4 in step S 663, and p_cntr is set to 0 in step S 673.
[0092] At the conclusion any of steps S 661, S 662, and S 673 the processing returns to step S 630 in Fig. 6A.
[0093] FIG. 7 is a block diagram illustrating an example of a computing device 700 that is arranged for packet loss concealment in accordance with the present disclosure. In a very basic configuration 701, computing device 700 typically includes one or more processors 710 and system memory 720. A memory bus 730 can be used for communicating between the processor 710 and the system memory 720,
[0094] Depending on the desired configuration, processor 710 can be of any type including but not limited to a microprocessor (μΡ), a microcontroller (μθ), a digital signal processor (DSP), or any combination thereof. Processor 710 can include one more levels of caching, such as a level one cache 711 and a level two cache 712, a processor core 713, and registers 714. The processor core 713 can include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP Core), or any combination thereof. A memory controller 715 can also be used with the processor 710, or in some implementations the memory controller 715 can be an internal part of the processor 710.
[0095] Depending on the desired configuration, the system memory 720 can be of any type including but not limited to volatile memory (such as RAM), non- volatile memory (such as ROM, flash memory, etc.) or any combination thereof. System memory 720 typically includes an operating system 721, one or more applications 722, and program data 724. Application 722 includes a decoding processing algorithm with packet loss concealment 723 that is arranged to decode incoming packets, and to conceal lost packets according to the present disclosure. Program Data 724 includes service data 725 that is useful for performing decoding of received packets and concealing lost packets, as will be further described below. In some embodiments, application 722 can be arranged to operate with program data 724 on
an operating system 721 such that Android, Chrome, Windows, etc. This described basic configuration is illustrated in FIG. 7 by those components within dashed line 701.
[0096] Computing device 700 can have additional features or functionality, and additional interfaces to facilitate communications between the basic configuration 701 and any required devices and interfaces. For example, a bus/interface controller 740 can be used to facilitate communications between the basic configuration 701 and one or more data storage devices 750 via a storage interface bus 741. The data storage devices 750 can be removable storage devices 751, non-removable storage devices 752, or a combination thereof. Examples of removable storage and non-removable storage devices include magnetic disk devices such as flexible disk drives and hard-disk drives (HDD), optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSD), and tape drives to name a few. Example computer storage media can include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data.
[0097] System memory 720, removable storage 751 and non-removable storage 752 are all examples of computer readable storage media, and store information as described in various steps of the processing algorithms described in this disclosure. Computer readable storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 700. Any such computer storage media can be part of device 700, and can store instructions that are executed by processor 710, and cause the computing
device 700 to perform a method of decoding packets and concealing lost packets as described in this disclosure.
[0098] Computing device 700 can also include an interface bus 742 for facilitating communication from various interface devices (e.g., output interfaces, peripheral interfaces, and communication interfaces) to the basic configuration 701 via the bus/interface controller 740. Example output devices 760 include a graphics processing unit 761 and an audio processing unit 762, which can be configured to communicate to various external devices such as a display or speakers via one or more A/V ports 763. Example peripheral interfaces 770 include a serial interface controller 771 or a parallel interface controller 772, which can be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 773. An example communication device 780 includes a network controller 781, which can be arranged to facilitate communications with one or more other computing devices 790 over a network communication via one or more communication ports 782. The communication connection is one example of a communication media. Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. A "modulated data signal" can be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media can include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared (IR) and other wireless media. The term computer readable media as used herein can include both storage media and communication media.
[0099] Computing device 700 can be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions. Computing device 700 can also be implemented as a personal computer including both laptop computer and non-laptop computer configurations.
[00100] There is little distinction left between hardware and software implementations of aspects of systems; the use of hardware or software is generally (but not always, in that in certain contexts the choice between hardware and software can become significant) a design choice representing cost vs. efficiency tradeoffs. There are various vehicles by which processes and or systems and/or other technologies described herein can be effected (e.g., hardware, software, and/or firmware), and that the preferred vehicle will vary with the context in which the processes and/or systems and/or other technologies are deployed. For example, if an implementer determines that speed and accuracy are paramount, the implementer may opt for a mainly hardware and/or firmware vehicle; if flexibility is paramount, the implementer may opt for a mainly software implementation; or, yet again alternatively, the implementer may opt for some combination of hardware, software, and/or firmware.
[00101] The foregoing detailed description has set forth various embodiments of the devices and/or processes via the use of block diagrams, flowcharts, and/or examples. Insofar as such block diagrams, flowcharts, and/or examples contain one or more functions and/or operations, it will be understood by those within the art that each function and/or operation within such block diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or virtually any combination thereof. In one embodiment, 'several portions of the subject matter described herein may be
implemented via Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), digital signal processors (DSPs), or other integrated formats. However, those skilled in the art will recognize that some aspects of the embodiments disclosed herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computer systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and or firmware would be well within the skill of one of skill in the art in light of this disclosure. In addition, those skilled in the art will appreciate that the mechanisms of the subject matter described herein are capable of being distributed as a program product in a variety of forms, and that an illustrative embodiment of the subject matter described herein applies regardless of the particular type of signal bearing medium used to actually carry out the distribution. Examples of a signal bearing medium include, but are not limited to, the following: a recordable type medium such as a floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape, a computer memory, etc.; and a transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.) .
[00102] Those skilled in the art will recognize that it is common within the art to describe devices and/or processes in the fashion set forth herein, and thereafter use engineering practices to integrate such described devices and/or processes into data processing systems. That is, at least a portion of the devices and/or processes described herein can be integrated into a data processing system via a reasonable amount of experimentation. Those having skill in the art will recognize that a typical data processing system generally includes one or
more of a system unit housing, a video display device, a memory such as volatile and nonvolatile memory, processors such as microprocessors and digital signal processors, computational entities such as operating systems, drivers, graphical user interfaces, and applications programs, one or more interaction devices, such as a touch pad or screen, and/or control systems including feedback loops and control motors (e.g., feedback for sensing position and/or velocity; control motors for moving and/or adjusting components and/or quantities). A typical data processing system may be implemented utilizing any suitable commercially available components, such as those typically found in data computing/communication and/or network computing communication systems.
[00103] With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.
[00104] While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.
Claims
1. A method of decoding an audio signal having been encoded as a sequence of consecutive frames, the method comprising:
receiving a first frame of the consecutive frames, the first frame containing decoding parameters and a residual signals for reconstructing audio data represented by the first frame; storing the residual signals contained in the first frame;
decoding the first frame based on the stored residual signals to reconstruct the audio signal encoded by the first frame;
determining that a second frame subsequent to the first frame in time has been lost; modifying the stored residual signals; and
reconstructing an estimate of the audio signal encoded by the second frame based on the modified residual signals.
2. The method according to claim 1, wherein the modifying the stored residual signals includes:
generating a periodic signal;
generating a colored pseudo-random signal based on the stored residual signals; multiplying the periodic signal and the colored pseudo-random signal with weight factors selected based on energy of an input and an output signal of a pitch synthesis filter created from the stored residual signals and based on pitch gain of the stored residual signals; and
summing the weighted periodic signal and the weighted colored pseudo-random signal.
3. The method according to claim 2, wherein the generating the periodic signal includes:
retrieving at least two most recently stored pitch cycles;
altering periodicity of each pitch cycle;
weighting each pitch cycle; and
summing the two weighted pitch cycles.
4. The method according to claim 3, wherein the altering the periodicity includes: resampling pitch pulses of the pitch cycles.
5. The method according to claim 2, wherein the generating the colored pseudorandom signal includes:
generating a pseudo-random sequence; and
filtering the pseudo-random sequence with Nth-order all-zero filter with coefficients given by N latest samples of a previously decoded lower-band residual signals of a previously received frame.
6. The method according to claim 1, wherein
the stored residual signals include
input of a pitch synthesis filter, and
input of an LPC synthesis filter; and
the decoding parameters include
pitch gains,
pitch lags, and
LPC parameters.
7. The method according to claim 1, wherein
the frames contain encoded information for a first frequency band and distinct second frequency band higher than the first frequency band, and
only a residual signal of the first frequency band is pitch post filtered, but not a residual signal of the second frequency band.
8. A decoding apparatus for decoding an audio signal having been encoded as a sequence of consecutive frames, comprising:
a receiver configured to receive a first frame of the consecutive frames, the first frame containing decoding parameters and a residual signals for reconstructing audio data represented by the first frame;
a storage unit storing the residual signals contained in the first frame;
a decoding unit configured to decode the first frame based on the stored residual signals to reconstruct the audio signal encoded by the first frame;
a loss detector configured to determine that a second frame subsequent to the first frame in time has been lost;
a modification unit configured to modify the stored residual signals; and
a reconstruction unit configured to reconstruct an estimate of the audio signal encoded by the second frame based on the stored residual signals modified by the modification unit.
9. The decoding apparatus according to claim 8, wherein the modification unit comprises:
a first signal generator configured to generate a periodic signal; a second signal generator configured to generate a colored pseudo-random signal based on the stored residual signals;
a multiplier multiplying the periodic signal generated in the first signal generator and the colored pseudo-random signal generated in the second signal generator with weight factors selected based on energy of an input and an output signal of a pitch synthesis filter created from the stored residual signals and based on pitch gain of the stored residual signals; and
an adder summing the weighted periodic signal and the weighted colored pseudorandom signal output from the multiplier.
10. The decoding apparatus according to claim 9. wherein the first signal generator is configured to:
retrieve at least two most recently stored pitch cycles;
alter periodicity of each pitch cycle;
weight each pitch cycle; and
sum the two weighted pitch cycles.
11. The decoding apparatus according to claim 10, wherein
the first signal generator is configured to alter the periodicity by resampling pitch pulses of the pitch cycles.
12. The decoding apparatus according to claim 9, wherein the second signal generator is configured to:
generate a pseudo-random sequence; and filter the pseudo-random sequence with Nth-order all-zero filter with coefficients given by N latest samples of a previously decoded lower-band residual signals of a previously received frame.
13. The decoding apparatus according to claim 8, wherein
the stored residual signals include
input of a pitch synthesis filter, and
input of an LPC synthesis filter; and
the decoding parameters include
pitch gains,
pitch lags, and
LPC parameters.
14. A computer readable tangible recording medium encoded with instructions, wherein the instructions, when executed on a processor, cause the processor to perform a method comprising:
receiving a first frame of the consecutive frames, the first frame containing decoding parameters and a residual signals for reconstructing audio data represented by the first frame; storing the residual signals contained in the first frame;
decoding the first frame based on the stored residual signals to reconstruct the audio signal encoded by the first frame;
determining that a second frame subsequent to the first frame in time has been lost; modifying the stored residual signals; and
reconstructing an estimate of the audio signal encoded by the second frame based on the modified residual signals.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201180072349.0A CN103688306B (en) | 2011-05-16 | 2011-05-16 | Method and device for decoding audio signals encoded in continuous frame sequence |
PCT/US2011/036662 WO2012158159A1 (en) | 2011-05-16 | 2011-05-16 | Packet loss concealment for audio codec |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2011/036662 WO2012158159A1 (en) | 2011-05-16 | 2011-05-16 | Packet loss concealment for audio codec |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012158159A1 true WO2012158159A1 (en) | 2012-11-22 |
Family
ID=44626536
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2011/036662 WO2012158159A1 (en) | 2011-05-16 | 2011-05-16 | Packet loss concealment for audio codec |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN103688306B (en) |
WO (1) | WO2012158159A1 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104347076A (en) * | 2013-08-09 | 2015-02-11 | 中国电信股份有限公司 | Network audio packet loss concealment method and device |
WO2015100999A1 (en) * | 2013-12-31 | 2015-07-09 | 华为技术有限公司 | Method and device for decoding speech and audio streams |
CN105453173A (en) * | 2013-06-21 | 2016-03-30 | 弗朗霍夫应用科学研究促进协会 | Apparatus and method for improved concealment of the adaptive codebook in acelp-like concealment employing improved pulse resynchronization |
CN106133827A (en) * | 2014-03-19 | 2016-11-16 | 弗朗霍夫应用科学研究促进协会 | Use and represent the device producing error concealing signal, method and corresponding computer program for the LPC of replacement out of the ordinary of codebook information out of the ordinary |
CN106788876A (en) * | 2015-11-19 | 2017-05-31 | 电信科学技术研究院 | A kind of method and system of voice Discarded Packets compensation |
US10269357B2 (en) | 2014-03-21 | 2019-04-23 | Huawei Technologies Co., Ltd. | Speech/audio bitstream decoding method and apparatus |
US10381011B2 (en) | 2013-06-21 | 2019-08-13 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for improved concealment of the adaptive codebook in a CELP-like concealment employing improved pitch lag estimation |
US10621993B2 (en) | 2014-03-19 | 2020-04-14 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for generating an error concealment signal using an adaptive noise estimation |
US10733997B2 (en) | 2014-03-19 | 2020-08-04 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for generating an error concealment signal using power compensation |
US11437047B2 (en) | 2013-02-05 | 2022-09-06 | Telefonaktiebolaget L M Ericsson (Publ) | Method and apparatus for controlling audio frame loss concealment |
US11857615B2 (en) | 2014-11-13 | 2024-01-02 | Evaxion Biotech A/S | Peptides derived from Acinetobacter baumannii and their use in vaccination |
US12009002B2 (en) | 2019-02-13 | 2024-06-11 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio transmitter processor, audio receiver processor and related methods and computer programs |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
NO2780522T3 (en) | 2014-05-15 | 2018-06-09 | ||
CN104021792B (en) * | 2014-06-10 | 2016-10-26 | 中国电子科技集团公司第三十研究所 | A kind of voice bag-losing hide method and system thereof |
CN111402905B (en) * | 2018-12-28 | 2023-05-26 | 南京中感微电子有限公司 | Audio data recovery method and device and Bluetooth device |
CN111383643B (en) * | 2018-12-28 | 2023-07-04 | 南京中感微电子有限公司 | Audio packet loss hiding method and device and Bluetooth receiver |
CN112908346B (en) * | 2019-11-19 | 2023-04-25 | 中国移动通信集团山东有限公司 | Packet loss recovery method and device, electronic equipment and computer readable storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1724756A2 (en) * | 2005-05-20 | 2006-11-22 | Broadcom Corporation | Packet loss concealment for block-independent speech codecs |
WO2008022176A2 (en) * | 2006-08-15 | 2008-02-21 | Broadcom Corporation | Packet loss concealment for sub-band predictive coding based on extrapolation of full-band audio waveform |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ATE451685T1 (en) * | 2005-09-01 | 2009-12-15 | Ericsson Telefon Ab L M | PROCESSING REAL-TIME ENCODED DATA |
JP2008058667A (en) * | 2006-08-31 | 2008-03-13 | Sony Corp | Signal processing apparatus and method, recording medium, and program |
CN101261833B (en) * | 2008-01-24 | 2011-04-27 | 清华大学 | A method for hiding audio error based on sine model |
-
2011
- 2011-05-16 WO PCT/US2011/036662 patent/WO2012158159A1/en active Application Filing
- 2011-05-16 CN CN201180072349.0A patent/CN103688306B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1724756A2 (en) * | 2005-05-20 | 2006-11-22 | Broadcom Corporation | Packet loss concealment for block-independent speech codecs |
WO2008022176A2 (en) * | 2006-08-15 | 2008-02-21 | Broadcom Corporation | Packet loss concealment for sub-band predictive coding based on extrapolation of full-band audio waveform |
Non-Patent Citations (1)
Title |
---|
BALAZS KOVESI ET AL: "A low complexity packet loss concealment algorithm for ITU-T G.722", ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 2008. ICASSP 2008. IEEE INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 31 March 2008 (2008-03-31), pages 4769 - 4772, XP031251665, ISBN: 978-1-4244-1483-3 * |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11437047B2 (en) | 2013-02-05 | 2022-09-06 | Telefonaktiebolaget L M Ericsson (Publ) | Method and apparatus for controlling audio frame loss concealment |
CN105453173A (en) * | 2013-06-21 | 2016-03-30 | 弗朗霍夫应用科学研究促进协会 | Apparatus and method for improved concealment of the adaptive codebook in acelp-like concealment employing improved pulse resynchronization |
US11410663B2 (en) | 2013-06-21 | 2022-08-09 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for improved concealment of the adaptive codebook in ACELP-like concealment employing improved pitch lag estimation |
US10643624B2 (en) | 2013-06-21 | 2020-05-05 | Fraunhofer-Gesellschaft zur Föerderung der Angewandten Forschung E.V. | Apparatus and method for improved concealment of the adaptive codebook in ACELP-like concealment employing improved pulse resynchronization |
CN105453173B (en) * | 2013-06-21 | 2019-08-06 | 弗朗霍夫应用科学研究促进协会 | Using improved pulse resynchronization like ACELP hide in adaptive codebook the hiding device and method of improvement |
US10381011B2 (en) | 2013-06-21 | 2019-08-13 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for improved concealment of the adaptive codebook in a CELP-like concealment employing improved pitch lag estimation |
CN104347076A (en) * | 2013-08-09 | 2015-02-11 | 中国电信股份有限公司 | Network audio packet loss concealment method and device |
CN104347076B (en) * | 2013-08-09 | 2017-07-14 | 中国电信股份有限公司 | Network audio packet loss covering method and device |
WO2015100999A1 (en) * | 2013-12-31 | 2015-07-09 | 华为技术有限公司 | Method and device for decoding speech and audio streams |
US9734836B2 (en) | 2013-12-31 | 2017-08-15 | Huawei Technologies Co., Ltd. | Method and apparatus for decoding speech/audio bitstream |
US10121484B2 (en) | 2013-12-31 | 2018-11-06 | Huawei Technologies Co., Ltd. | Method and apparatus for decoding speech/audio bitstream |
US10621993B2 (en) | 2014-03-19 | 2020-04-14 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for generating an error concealment signal using an adaptive noise estimation |
CN106133827A (en) * | 2014-03-19 | 2016-11-16 | 弗朗霍夫应用科学研究促进协会 | Use and represent the device producing error concealing signal, method and corresponding computer program for the LPC of replacement out of the ordinary of codebook information out of the ordinary |
US10614818B2 (en) | 2014-03-19 | 2020-04-07 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for generating an error concealment signal using individual replacement LPC representations for individual codebook information |
CN106133827B (en) * | 2014-03-19 | 2020-01-03 | 弗朗霍夫应用科学研究促进协会 | Apparatus, method and computer storage medium for generating error concealment signal |
US10733997B2 (en) | 2014-03-19 | 2020-08-04 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for generating an error concealment signal using power compensation |
US11367453B2 (en) | 2014-03-19 | 2022-06-21 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for generating an error concealment signal using power compensation |
US11393479B2 (en) | 2014-03-19 | 2022-07-19 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for generating an error concealment signal using individual replacement LPC representations for individual codebook information |
US11423913B2 (en) | 2014-03-19 | 2022-08-23 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for generating an error concealment signal using an adaptive noise estimation |
US10269357B2 (en) | 2014-03-21 | 2019-04-23 | Huawei Technologies Co., Ltd. | Speech/audio bitstream decoding method and apparatus |
US11031020B2 (en) | 2014-03-21 | 2021-06-08 | Huawei Technologies Co., Ltd. | Speech/audio bitstream decoding method and apparatus |
US11857615B2 (en) | 2014-11-13 | 2024-01-02 | Evaxion Biotech A/S | Peptides derived from Acinetobacter baumannii and their use in vaccination |
CN106788876A (en) * | 2015-11-19 | 2017-05-31 | 电信科学技术研究院 | A kind of method and system of voice Discarded Packets compensation |
CN106788876B (en) * | 2015-11-19 | 2020-01-21 | 电信科学技术研究院 | Method and system for compensating voice packet loss |
US12009002B2 (en) | 2019-02-13 | 2024-06-11 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio transmitter processor, audio receiver processor and related methods and computer programs |
US12039986B2 (en) | 2019-02-13 | 2024-07-16 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Decoder and decoding method for LC3 concealment including full frame loss concealment and partial frame loss concealment |
US12057133B2 (en) | 2019-02-13 | 2024-08-06 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Multi-mode channel coding |
US12080304B2 (en) | 2019-02-13 | 2024-09-03 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio transmitter processor, audio receiver processor and related methods and computer programs for processing an error protected frame |
Also Published As
Publication number | Publication date |
---|---|
CN103688306A (en) | 2014-03-26 |
CN103688306B (en) | 2017-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2012158159A1 (en) | Packet loss concealment for audio codec | |
JP5186054B2 (en) | Subband speech codec with multi-stage codebook and redundant coding technology field | |
JP4658596B2 (en) | Method and apparatus for efficient frame loss concealment in speech codec based on linear prediction | |
JP4824167B2 (en) | Periodic speech coding | |
US11721349B2 (en) | Methods, encoder and decoder for linear predictive encoding and decoding of sound signals upon transition between frames having different sampling rates | |
KR101290425B1 (en) | Systems and methods for reconstructing an erased speech frame | |
KR101513184B1 (en) | Concealment of transmission error in a digital audio signal in a hierarchical decoding structure | |
WO2006130226A2 (en) | Audio codec post-filter | |
JP2009175693A (en) | Method and apparatus for obtaining attenuation factor | |
WO2015088919A1 (en) | Bandwidth extension mode selection | |
JP5289319B2 (en) | Method, program, and apparatus for generating concealment frame (packet) | |
AU2015241092B2 (en) | Apparatus and methods of switching coding technologies at a device | |
JP2013076871A (en) | Speech encoding device and program, speech decoding device and program, and speech encoding system | |
JP5604572B2 (en) | Transmission error spoofing of digital signals by complexity distribution | |
Chibani | Increasing the robustness of CELP speech codecs against packet losses. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11721213 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11721213 Country of ref document: EP Kind code of ref document: A1 |