EP0167364A1 - Sprachpausenbestimmung mit Teilbandkodierung - Google Patents
Sprachpausenbestimmung mit Teilbandkodierung Download PDFInfo
- Publication number
- EP0167364A1 EP0167364A1 EP85304627A EP85304627A EP0167364A1 EP 0167364 A1 EP0167364 A1 EP 0167364A1 EP 85304627 A EP85304627 A EP 85304627A EP 85304627 A EP85304627 A EP 85304627A EP 0167364 A1 EP0167364 A1 EP 0167364A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech
- value
- statistic
- detection
- step size
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 37
- 230000007774 longterm Effects 0.000 claims abstract description 5
- 238000000034 method Methods 0.000 claims description 6
- 238000009499 grossing Methods 0.000 claims description 3
- 239000008186 active pharmaceutical agent Substances 0.000 description 6
- 238000009432 framing Methods 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000007423 decrease Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000009533 lab test Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Definitions
- the invention relates to signal processing generally, and more particularly to means for detecting intervals of silence in encoded speech.
- Speech silence When the speech is transmitted electronically, such as in a communications network, the speech-silence occupies a significant portion of the total transmission time. This leads to inefficient use of the communications network, since the only information which is transmitted during the course of the entire speech-silence interval, no matter how long, is the existence of the interval and its duration.
- TASI time assignment and speech interpolation
- Speech silence may be detected even in voice signals which have already been digitally encoded into a pulse code modulated (PCM) format.
- PCM pulse code modulated
- speech-silence boundaries are detected in the digitally encoded data of at least two subbands of the speech signal. Energy estimates are made for each of the frequency subbands for generating a detection statistic to estimate short-term speech energy. A threshold which is adapted to the long-term speech level is computed. This threshold is compared to the detection statistic to make a decision as to the presence of a silence interval. The resulting detection has significantly improved accuracy over detection using only one frequency band.
- the two-band subband encoder 10 with speech detection shown in FIG. 1 includes a lower frequency subband, or low band encoding circuit 12 made up of a low pass quadrature mirror filter 14, a by-two decimator 16, and an ADPCM (adaptive digital pulse code modulation) encoder 18.
- a higher frequency subband, or high band encoding circuit 20 made up of a high pass quadrature mirror filter 22, a by-two decimator 24, and an ADPCM encoder 26.
- Both of the encoding circuits 12, 20 operate with a sampling rate of 12 kHz (kilohertz) and receive the same 5.5 kHz analog speech input signal. They send their outputs to a multiplexer 28 for transmission.
- subband encoding circuits such as the circuits 12, 20 and the multiplexer 28 are known to those in the art and are described, for example, in the U.S. Pat. 4,048,443 in "Sub-band Coding," by R. E. Crochiere in the Bell System Technical Journal, vol. 60, No. 7, Part 2, pp. 1633-1653, Sept. 1981, and in "Digital Voice Storage In a Microprocessor," by J. L. Flanagan, J. D. Johnston, and J. W. Upton, IEEE Transactions On Communications, Feb. 1982, vol. COM 30, no.2, pp.336-345.
- a speech detector 30, which includes a speech threshold computing subunit 32, a speech statistic computing subunit 34, and a determining subunit 36 is adapted to provide an output to the multiplexer 28 which will result in the insertion of a speech presence indicator, or speech flag, in the transmitted output.
- the input to the speech threshold computing subunit 32 is the step size information from the low band encoder 12.
- the input to the speech statistic computing subunit 34 is the sample step size information from both the low band encoder 12 and the high band encoder 20. Both the threshold subunit 32 and the statistic subunit 34 give their output to the speech determining subunit 36.
- the statistic computing subunit 34 is shown in greater detail in FIG. 2.
- Speech detection is accomplished by deriving information from the encoders 12, 20 and using it to determine whether speech is present or absent.
- Each of the encoders 12, 20 in the course of its normal encoding function makes a separate determination of the quantizer step size, based on the signal amplitude in its respective subband.
- the log of the step size is determined and used as a pointer to a step-size table.
- the log step-size parameters are used as estimates of the speech in each band at a given time.
- the speech sampling period is represented by ⁇ 0 .
- the log of the step size in the low band is represented by d L (i ⁇ 0 )
- T(i TO ) be the speech detection statistic used to determine the speech level.
- ⁇ L and ⁇ H be fixed weights associated with d L (i TO ) and d H (i ⁇ 0 )
- ⁇ DS be a fixed weight such that 0 ⁇ DS ⁇ 1.
- a detection statistic T(i ⁇ 0 ) can be computed as follows:
- the detection statistic T(i ⁇ 0 ) is smoothed to become a low-pass filtered sum of speech information taken from each subband.
- the weight ⁇ DS is chosen to give T(i ⁇ 0 ) a specific time constant which controls the necessary smoothing of the information.
- a time constant of 16 milliseconds has been found to be suitable.
- the constants ⁇ L and ⁇ H determine the relative weight given to each subband. It has been found to be particularly advantageous to set ⁇ H at a value of about 1.5 to 2 times the value of ⁇ L . This accentuates discrimination in the high subband, which contains more information for the detection of fricatives and other consonants.
- the values of these constants for a particular application may be readily determined by means of laboratory tests by one skilled in the art.
- FIG. 3 shows the method of computing a speech presence energy threshold ⁇ ON and a speech silence energy threshold ⁇ OFF .
- This method is very similar to that used in ADPCM speech detection, using the log step size d L (i ⁇ 0 ) from the lower subband only.
- M(i ⁇ 0 ) is the maximum of the values ⁇ M d L (i ⁇ 0 ); ⁇ M is a constant weight. Therefore, when ⁇ M d L (i ⁇ 0 ) increases, M(i ⁇ 0 ) increases when ⁇ Md L (i ⁇ 0 ) decreases, M(i ⁇ 0 ) decreases only very slowly according to the leak factor B M .
- M(i To ) is restrained from decreasing to less than its lower limit (M O ), so M(i ⁇ 0 ) measures the maximum speech energy in the lower subband.
- variable d' L can be defined to be the bias of 32 is used to insure that d' L and M are always positive.
- the value of M at time i ⁇ 0 is
- the thresholds are fixed distances below M, so, the threshold ⁇ ON , used to determine when speech changes from OFF to ON, is computed as follows: the threshold ⁇ OFF' used to determine when speech changes from ON to OFF, is the values of C ON and C OFF are constants, with C OFF > C ON .
- FIG. 4 shows how the comparison is done.
- the speech samples are divided into blocks of some convenient length. (In this case 24 samples per block are used.) Once per block, a decision is made concerning whether speech is ON or OFF. If, in the previous block, speech was on, then the ON threshold is used; if speech was off, the OFF threshold is used. The switch in FIG. 4 chooses the correct threshold, which is then compared to the detection statistic. The speech flag is set ON or OFF depending on whether the detection statistic is above or below the threshold.
- S denote the speech state with two possible values:
- the system 10 can be effectively implemented by a person of ordinary skill in the art of subband encoding by appropriately adapting two or more digital signal processor microcomputers.
- microcomputers are presently in use and may include a memory unit, an arithmetic unit, a control unit, an input-output unit, and a machine language storage unit in a single VLSI circuit. Their function may alternately be provided by a combination of a number of different VLSI circuits interconnected.
- One such microcomputer which is suitable for implementing the system 10 is a DSP (Digital Signal Processor) manufactured by AT&T Technologies, Inc., a corporation of New York, U.S.A. and described, for example, in the above-mentioned Bell System Technical Journal volume.
- DSP Digital Signal Processor
- one DSP is used for the encoding and transmission of speech, while the other DSP is used for the reception and decoding of speech.
- External logic is used to interface the PCM (pulse code modulation) bit streams of each DSP to both analog-to-digital and digital-to-analog converters for speech input and output.
- the DSP microcomputers also perform speech-silence detection on the speech signal, so that the silence intervals can be used to transmit user- supplied data.
- the DSP microcomputers determine the speech state every two milliseconds.
- the transmitting DSP provides the speech-state status for external circuitry and generates a 112-bit frame for transmission.
- the frame consists of a 3- bit framing pattern, a 1-bit speech flag, and 24 samples of subband encoded speech. This speech is sampled at a 12 Khz rate and encoded with 5-bit accuracy in the low band and 4- bit accuracy in the high band.
- external line interface circuitry will send the DSP-generated frame intact.
- the speech flag is off, the 24 samples of speech is replaced by 108 bits of user supplied data. After construction, the frame is sent over a 56 Kbps (kilobits per second) digital channel to another terminal for decoding.
- a simple framing algorithm is implemented with a combination of DSP firmware and external line interface circuitry.
- the framing algorithm searches the incoming 56 Kbps signal to find the orientation of the 3-bit framing pattern.
- the receiving DSP synchronizes itself with the framing pattern, it reads the speech state flag. If the speech state flag is present, the DSP begins decoding the incoming speech signal for listening, but if the flag is absent, the DSP signals external circuitry to remove the data and send it to a user interface. This pattern is repeated every two milliseconds, as long as a valid framing pattern is detected.
- the equations above describe the general concepts involved in determining the quantities needed by the speech detector. Due to finite bit length and timing considerations in the DSP, some of these equations are preferably slightly modified. For example, the system 10 is based on a 24-sample frame, so every 24 samples a decision is made as to whether speech is present.
- the speech detection statistic is computed in this framework by the DSP as follows:
- T (i ⁇ DS ) is updated each sample period by adding ⁇ L d' L + ⁇ d' to it, and it is leaked once per block of 24 samples.
- the value of the maximum level M must also be computed slightly differently to obtain accurate results with the DSP.
- the equation for M that may be implemented in the
- the thresholds only need to be computed once per 24 samples, so that they can be used to detect the presence or absence of speech.
- the speech state is determined in the same way as described in Section 11.2 by equations (6-8).
- This invention is not limited to two-band subband coding.
- equations (12)-(13) can be slightly altered to conform to a specific hardware implementation, such as an implementation using a DSP microprocessor. It is also necessary to choose specific values of the parameters in equations (12)-(13).
- the maximum level depends on the energy in the low-frequency bands, giving a smooth long-term average.
- equations (12) and (13) can be extended to any number of bands.
- the time delay associated with computing the detection statistic and maximum level also increases. Therefore there is a practical limit to the number of bands that can be used in this system.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US62858384A | 1984-07-06 | 1984-07-06 | |
US628583 | 2003-07-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
EP0167364A1 true EP0167364A1 (de) | 1986-01-08 |
Family
ID=24519496
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP85304627A Withdrawn EP0167364A1 (de) | 1984-07-06 | 1985-06-28 | Sprachpausenbestimmung mit Teilbandkodierung |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP0167364A1 (de) |
JP (1) | JPS6132900A (de) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0341128A1 (de) * | 1988-05-04 | 1989-11-08 | Thomson-Csf | Verfahren und Anordnung zur Feststellung der Anwesenheit von Sprachsignalen |
EP0565947A1 (de) * | 1992-04-13 | 1993-10-20 | NOKIA TECHNOLOGY GmbH | Verfahren zum Einfügen digitaler Daten in ein Audiosignal vor der Kanalkodierung |
WO1996002911A1 (en) * | 1992-10-05 | 1996-02-01 | Matsushita Electric Industrial Co., Ltd. | Speech detection device |
WO1996005592A1 (en) * | 1994-08-10 | 1996-02-22 | Qualcomm Incorporated | Method and apparatus for selecting an encoding rate in a variable rate vocoder |
US5826230A (en) * | 1994-07-18 | 1998-10-20 | Matsushita Electric Industrial Co., Ltd. | Speech detection device |
WO2000042600A2 (en) * | 1999-01-18 | 2000-07-20 | Nokia Mobile Phones Ltd | Method in speech recognition and a speech recognition device |
US6182035B1 (en) * | 1998-03-26 | 2001-01-30 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and apparatus for detecting voice activity |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2451680A1 (fr) * | 1979-03-12 | 1980-10-10 | Soumagne Joel | Discriminateur parole/silence pour interpolation de la parole |
DE3235279A1 (de) * | 1981-09-25 | 1983-04-21 | Nissan Motor Co., Ltd., Yokohama, Kanagawa | Spracherkennungseinrichtung |
EP0110467A1 (de) * | 1982-11-23 | 1984-06-13 | Philips Kommunikations Industrie AG | Anordnung zur Erkennung von Sprachpausen |
-
1985
- 1985-06-28 EP EP85304627A patent/EP0167364A1/de not_active Withdrawn
- 1985-07-05 JP JP14687785A patent/JPS6132900A/ja active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2451680A1 (fr) * | 1979-03-12 | 1980-10-10 | Soumagne Joel | Discriminateur parole/silence pour interpolation de la parole |
DE3235279A1 (de) * | 1981-09-25 | 1983-04-21 | Nissan Motor Co., Ltd., Yokohama, Kanagawa | Spracherkennungseinrichtung |
EP0110467A1 (de) * | 1982-11-23 | 1984-06-13 | Philips Kommunikations Industrie AG | Anordnung zur Erkennung von Sprachpausen |
Non-Patent Citations (3)
Title |
---|
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, vol. ASSP-28, no. 5, October 1980, pages 550-561, IEEE, New York, US; B.V. COX et al.: "Nonparametric rank-order statistics applied to robust-voiced-unvoiced-silence classification" * |
IEEE TRANSACTIONS ON COMMUNICATIONS, vol. COM-24, no. 5, May 1976, pages 563-567, New York, US; R.W. SCHAFER et al.: "Detecting the presence of speech using ADPCM coding" * |
TELECOMMUNICATIONS AND RADIO ENGINEERING, vol. 4, April 1965, pages 70-72, Washington, US; V.N. TETEREV: "A combinatorial method of detecting speech signals in a background of smooth noise" * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2631147A1 (fr) * | 1988-05-04 | 1989-11-10 | Thomson Csf | Procede et dispositif de detection de signaux vocaux |
US4982341A (en) * | 1988-05-04 | 1991-01-01 | Thomson Csf | Method and device for the detection of vocal signals |
EP0341128A1 (de) * | 1988-05-04 | 1989-11-08 | Thomson-Csf | Verfahren und Anordnung zur Feststellung der Anwesenheit von Sprachsignalen |
EP0565947A1 (de) * | 1992-04-13 | 1993-10-20 | NOKIA TECHNOLOGY GmbH | Verfahren zum Einfügen digitaler Daten in ein Audiosignal vor der Kanalkodierung |
WO1996002911A1 (en) * | 1992-10-05 | 1996-02-01 | Matsushita Electric Industrial Co., Ltd. | Speech detection device |
US5826230A (en) * | 1994-07-18 | 1998-10-20 | Matsushita Electric Industrial Co., Ltd. | Speech detection device |
AU711401B2 (en) * | 1994-08-10 | 1999-10-14 | Qualcomm Incorporated | Method and apparatus for selecting an encoding rate in a variable rate vocoder |
US5742734A (en) * | 1994-08-10 | 1998-04-21 | Qualcomm Incorporated | Encoding rate selection in a variable rate vocoder |
WO1996005592A1 (en) * | 1994-08-10 | 1996-02-22 | Qualcomm Incorporated | Method and apparatus for selecting an encoding rate in a variable rate vocoder |
EP1233408A1 (de) * | 1994-08-10 | 2002-08-21 | QUALCOMM Incorporated | Verfahren und Vorrichtung zur Auswahl der Kodierrate in einem Vocoder mit Variabler Rate |
KR100455826B1 (ko) * | 1994-08-10 | 2005-04-06 | 콸콤 인코포레이티드 | 가변율보코더의인코딩속도를선택하기위한방법및장치 |
EP1530201A2 (de) * | 1994-08-10 | 2005-05-11 | QUALCOMM Incorporated | Verfahren und Vorrichtung zur Auswahl der Kodierrate in einem Vocoder mit Variabler Rate |
EP1530201A3 (de) * | 1994-08-10 | 2005-08-10 | QUALCOMM Incorporated | Verfahren und Vorrichtung zur Auswahl der Kodierrate in einem Vocoder mit Variabler Rate |
CN1320521C (zh) * | 1994-08-10 | 2007-06-06 | 高通股份有限公司 | 在速率可变的声码器中选择编码速率的方法和装置 |
US6182035B1 (en) * | 1998-03-26 | 2001-01-30 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and apparatus for detecting voice activity |
WO2000042600A2 (en) * | 1999-01-18 | 2000-07-20 | Nokia Mobile Phones Ltd | Method in speech recognition and a speech recognition device |
WO2000042600A3 (en) * | 1999-01-18 | 2000-09-28 | Nokia Mobile Phones Ltd | Method in speech recognition and a speech recognition device |
Also Published As
Publication number | Publication date |
---|---|
JPS6132900A (ja) | 1986-02-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0145332B1 (de) | Digitale Übertragung von Audiosignalen | |
CA1181857A (en) | Silence editing speech processor | |
CA1139884A (en) | Half duplex integral vocoder modem system | |
US4912763A (en) | Process for multirate encoding signals and device for implementing said process | |
EP0099397B1 (de) | Anpassungsfähige differential-pcm-kodierung | |
EP0279451B1 (de) | Codierungseinrichtung zur Sprachübertragung | |
US4860313A (en) | Adaptive differential pulse code modulation (ADPCM) systems | |
US7933216B2 (en) | Method and apparatus for coding modem signals for transmission over voice networks | |
EP0228696A2 (de) | ADPCM-Kodierer-Dekodierer mit Energiebandübergangsdetektion | |
NO146521B (no) | Fremgangsmaate og innretning for detektering av naervaer eller fravaer av et talesignal paa en talekanal | |
JPH0234497B2 (de) | ||
US4386237A (en) | NIC Processor using variable precision block quantization | |
US4464782A (en) | Transmission process and device for implementing the so-improved process | |
EP1042861B1 (de) | Vorrichtung und verfahren zur detektion von zuvor entstandenen digitalen pcm-fehlern in einem kommunikationsnetz | |
US6424940B1 (en) | Method and system for determining gain scaling compensation for quantization | |
US4319082A (en) | Adaptive prediction differential-PCM transmission method and circuit using filtering by sub-bands and spectral analysis | |
CA1288867C (en) | Adaptive differential pulse code modulation system | |
EP0049271B1 (de) | Prädiktionssignalcodierung mit teilquantisierung | |
EP0167364A1 (de) | Sprachpausenbestimmung mit Teilbandkodierung | |
CA1321025C (en) | Speech signal coding/decoding system | |
US4912765A (en) | Voice band data rate detector | |
Raulin et al. | A 60 Channel PCM-ADPCM Converter | |
US6553074B1 (en) | Method and device for combating PCM line impairments | |
JPH0242258B2 (de) | ||
Cointot | A 32-kbit/sec ADPCM coder robust to channel errors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Designated state(s): DE FR |
|
17P | Request for examination filed |
Effective date: 19860611 |
|
17Q | First examination report despatched |
Effective date: 19871020 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 19880104 |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: SCHOENHERR, BRIAN WILLIAM Inventor name: DONVITO, MARC BERNARD |