CN113875264A

CN113875264A - Microphone configuration, system, device and method for an eyewear apparatus

Info

Publication number: CN113875264A
Application number: CN202080038007.6A
Authority: CN
Inventors: 樊大申; 陈曦
Original assignee: Solos Technology Ltd
Current assignee: Solos Technology Ltd
Priority date: 2019-05-22
Filing date: 2020-05-21
Publication date: 2021-12-31
Also published as: WO2021048632A3; GB202115400D0; WO2021048632A2; JP7350092B2; GB2597009A; JP2022533391A; GB2597009B

Abstract

Systems and methods for extracting a desired sound signal from a device worn on a user's head are described. The apparatus includes a headset and an array having at least three microphones. The at least three microphones are arranged along a plurality of at least two non-parallel axes. The selection logic is configured to identify a selected axis from the plurality of non-parallel axes and to identify two microphones from the array forming the selected axis. The beamformer is configured to receive as inputs signals from the two microphones and to output a main microphone channel and a reference microphone channel.

Description

Microphone configuration, system, device and method for an eyewear apparatus

This patent application is filed 2015 at 10-18, entitled "head mounted acoustic system and method with noise canceling microphone geometry device", a partial continuation of U.S. non-provisional patent application serial No. 14/886, 077, filed 2014 at 3-12, entitled "dual stage noise reduction structure for desired signal extraction", a partial continuation of U.S. non-provisional patent application serial No. 14/207, 163, U.S. non-provisional patent application serial No. 14/207, 163, requiring that us patent application filed 2013-13, entitled "noise canceling microphone device", priority of U.S. provisional patent application serial No. 61/780, 108 and filed 2014 at 2-18, entitled "system and method for processing sound signals", serial No. 61/941, 088, U.S. provisional patent application.

Patent application serial No. 14/886, 077 is also filed on 14.2.2014 entitled "glasses with microphone arrays", a partial continuation of U.S. non-provisional patent application serial No. 14/180, 994, U.S. non-provisional patent application serial No. 14/180, 994, claiming priority of U.S. provisional patent application serial No. 61/780, 108 filed on 13.3.2013 and priority of U.S. provisional patent application serial No. 61/839, 211 filed on 25.6.2013 and priority of U.S. provisional patent application serial No. 61/839, 227 filed on 25.6.2013 and priority of U.S. provisional patent application serial No. 61/912, 844 filed on 6.12.6.2013.

This patent application also claims priority from U.S. provisional patent application serial No. 62/801, 618 entitled "microphone configuration for eyewear equipment apparatus and method", filed 2019 on 5.2.4.

U.S. provisional patent application serial No. 62/801, 618 is incorporated herein by reference. U.S. provisional patent application serial No. 61/780, 108 is incorporated herein by reference. U.S. provisional patent application serial No. 61/941, 088 is incorporated herein by reference. U.S. non-provisional patent application serial No. 14/207, 163 is incorporated herein by reference. U.S. non-provisional patent application serial No. 14/180, 994 is hereby incorporated by reference. U.S. provisional patent application serial No. 61/839, 211 is incorporated herein by reference. U.S. provisional patent application serial No. 61/839, 227 is hereby incorporated by reference. U.S. provisional patent application serial No. 61/912, 844 is incorporated herein by reference.

Technical Field

The present invention relates generally to wearable devices that detect and process sound signal data, and more particularly to reducing noise in head-mounted acoustic systems and assisting user hearing.

Background

Acoustic systems receive audio signals using acoustic sensors such as microphones. Typically, these systems are used in real-world environments that provide both desired and undesired sound signals (also referred to as noise) to the receiving microphone. Such a receiving microphone is part of various systems, such as: mobile phones, hand-held microphones, hearing aids, etc. These systems typically perform speech recognition processing on the received sound signal. Receiving both the desired and undesired sound signals may adversely affect the quality of the desired sound signal. The degradation of the quality of the desired sound signal may cause the desired sound signal output to the user to be difficult to understand by the user. The use of degraded desired sound signals in algorithms such as Speech Recognition (SR) or Automatic Speech Recognition (ASR) results in an increased error rate, which makes the reconstructed speech difficult to understand. Either of which can present problems.

Handheld systems require the fingers of a user to grasp and/or operate the device in which they are implemented. Such as a mobile telephone. Engaging the user's fingers may prevent the user from performing mission critical functions. This may cause some problems.

The undesired sound signal (noise) may come from a variety of sources that are not the source of the desired sound signal. Thus, the source of the undesired sound signal is statistically uncorrelated with the desired sound signal. These sources may be non-stationary sources or may be from stationary sources. Stationary conditions apply to time and space where the amplitude, frequency and direction of the sound signal do not change significantly. For example, in an automotive environment, engine noise at constant speed is quiet, as is road noise or wind noise, etc. In the case of non-stationary signals, the noise amplitude, frequency distribution and direction of the sound signal vary over time and/or space. Non-stationary noise originates, for example, from car stereo, from transient, i.e., evanescent, noise such as bumps, door openings or closures, background dialogue such as chats in the back seats of vehicles, and the like. Stationary and non-stationary sources of undesired sound signals are present in office environments, concert halls, soccer stadiums, airplane cabins, anywhere users use acoustic systems (e.g., mobile phones, tablets, etc. equipped with microphones, earphones, ear microphones, etc.) sometimes the environment in which the acoustic systems are used is reverberant, causing noise to reverberate in the environment, taking multiple paths of the undesired sound signals to the microphone locations. Either noise source, i.e., non-stationary or stationary undesired sound signals, can increase the error rate of speech recognition algorithms such as SR or ASR, or can easily make it difficult for the system to output intelligible desired sound signals to the user. All this causes problems.

Various noise cancellation methods have been employed to reduce noise from both stationary and non-stationary sources. Existing noise cancellation methods work well in environments where the intensity of the noise is less than the intensity of the desired sound signal, such as in relatively low noise environments. Spectral subtraction is used to reduce noise in speech recognition algorithms and various acoustic systems, such as hearing aids.

When used in Automatic Speech Recognition (ASR) applications, systems that employ spectral subtraction can produce unacceptable error rates when the intensity of undesired sound signals becomes large. This can present problems.

In addition, existing algorithms, such as spectral subtraction, employ nonlinear processing of the spectral subtraction of the sound signal. Non-linear processing of the sound signal results in a disproportionate correlation of the output to the input. Speech Recognition (SR) algorithms are developed using speech signals recorded in a quiet, noise-free environment. Thus, speech recognition algorithms (developed in quiet, noiseless environments) can produce high error rates when nonlinear distortion is introduced in the speech processing process by nonlinear signal processing. Non-linear processing of the sound signal can result in non-linear distortion of the desired sound signal, which disturbs the feature extraction required for speech recognition, which can result in a high error rate. All this causes problems.

Various approaches have been used to attempt to suppress or remove undesired sound signals from acoustic systems, such as in Speech Recognition (SR) or Automatic Speech Recognition (ASR) applications. One method is known as a Voice Activity Detector (VAD). The VAD attempts to detect when there is desired speech and when there is undesired speech. Accepting only the desired speech and treating it as noise by not transmitting the undesired speech. Conventional voice activity detection is only applicable to a single sound source or stationary noise (undesired sound signal) having a small intensity relative to the intensity of the desired sound signal. Thus, conventional voice activity detection makes VADs perform poorly in noisy environments. Furthermore, when the desired and undesired sound signals arrive at the receiving microphone at the same time, it is not efficient to use VAD to remove the undesired sound signal. This may cause some problems.

One problem with acoustic systems having a single microphone for use in noisy environments is that the desired sound signal and the undesired sound signal are received simultaneously on a single channel. The undesired sound signal may make the desired sound signal unintelligible to a human user or to an algorithm designed to use the received speech, such as a Speech Recognition (SR) or Automatic Speech Recognition (ASR) algorithm. This may cause some problems. Multiple channels have been used to solve the problem of receiving both desired and undesired sound signals. Thus, on one channel, a desired sound signal and an undesired sound signal are received, while on the other channel, a sound signal is received which also contains the desired sound signal and the undesired sound signal. Over time, the sensitivity of the various channels may drift, causing the undesired sound signals to become unbalanced between the channels. Drifting the channel sensitivity may result in an inability to accurately remove the undesired sound signal from the desired sound signal. Non-linear distortion of the original desired sound signal may be due to processing of the sound signal obtained from the channel whose sensitivity drifts over time. This may cause some problems.

Drawings

The invention may best be understood by referring to the following description and accompanying drawings that are used to illustrate various embodiments of the invention. The present invention is illustrated by way of example in the embodiments and is not limited by the figures of the accompanying drawings, in which like references indicate similar elements.

Fig. 1 is a schematic diagram of a general flow of configuring a microphone on a headset according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a microphone layout geometry according to an embodiment of the invention.

Fig. 3A is a schematic diagram of a general microphone layout with a primary microphone in a first position according to an embodiment of the invention.

Fig. 3B is a schematic diagram of a measured value of a signal-to-noise ratio difference of the primary microphone of fig. 3A according to an embodiment of the present invention.

Fig. 3C is a schematic illustration of the signal-to-noise ratio difference of the data shown in fig. 3B and increasing the microphone acoustic separation distance in accordance with an embodiment of the present invention.

Fig. 4A is a schematic diagram of a general microphone layout with a primary microphone in a second position according to an embodiment of the invention.

Fig. 4B is a schematic diagram of a measured value of a signal-to-noise ratio difference of the main microphone of fig. 4A according to an embodiment of the present invention.

Fig. 4C is a schematic illustration of the signal-to-noise ratio difference of the data shown in fig. 4B and increasing the microphone acoustic separation distance in accordance with an embodiment of the present invention.

Fig. 5A is a schematic diagram of a general microphone layout with a primary microphone in a third position according to an embodiment of the invention.

Fig. 5B is a schematic diagram of a measured value of a signal-to-noise ratio difference of the primary microphone of fig. 5A according to an embodiment of the present invention.

Fig. 5C is a schematic illustration of the signal-to-noise ratio difference of the data shown in fig. 5B and increasing the microphone acoustic separation distance in accordance with an embodiment of the present invention.

Fig. 6 is a schematic diagram of a microphone directivity pattern in accordance with an embodiment of the present invention.

FIG. 7 is a schematic illustration of a misaligned reference microphone response axis according to an embodiment of the invention.

Fig. 8 is a schematic diagram of eyewear with two embedded microphones in an embodiment of the present invention.

Fig. 9 is a schematic diagram of eyeglasses with three embedded microphones in an embodiment of the invention.

Fig. 10 is a schematic diagram of another embodiment of the present invention utilizing four omnidirectional microphones for use at four acoustic ports in place of two bidirectional microphones.

Figure 11 is a schematic view of eyeglasses of the present invention employing two omnidirectional microphones placed diagonally across the lens opening defined by the front frame of the eyeglasses.

Figure 12 is a schematic diagram of another embodiment of the present invention employing four omnidirectional microphones placed along the top and bottom of the eyeglass frame.

Fig. 13 is a schematic view of another embodiment of the invention in which the microphones are placed on the temple portion of the eyeglasses with the microphones facing inward and the front frame of the eyeglasses with the lower center angle and facing downward.

Fig. 14 is a schematic view of another embodiment of the invention in which the microphones are placed on the temple portion of the eyeglasses with the microphones facing inward and the front frame of the eyeglasses with the lower center angle and facing downward.

Fig. 15 is a schematic diagram of eyewear with a built-in acoustic noise cancellation system according to an embodiment of the present invention.

Fig. 16 is a schematic diagram of a primary microphone location in the headset from fig. 15, in accordance with an embodiment of the present invention.

Figure 17 is a schematic diagram of eyewear with a built-in acoustic noise cancellation system according to an embodiment of the present invention.

FIG. 18 is a schematic view of a visor cap with a built-in acoustic noise cancellation system according to an embodiment of the present invention.

Fig. 19 is a schematic diagram of a helmet with a built-in acoustic noise cancellation system according to an embodiment of the present invention.

Fig. 20 is a schematic diagram of a process of extracting a desired sound signal according to an embodiment of the present invention.

FIG. 21 is a schematic diagram of a system architecture according to an embodiment of the invention.

Fig. 22 is a schematic diagram of filter control according to an embodiment of the invention.

FIG. 23 is another schematic diagram of a system architecture according to an embodiment of the invention.

Fig. 24A is another schematic diagram of a system architecture including auto-balancing according to an embodiment of the present invention.

Fig. 24B is a schematic diagram of a noise reduction flow according to an embodiment of the present invention.

Fig. 25A is a schematic diagram of beamforming according to an embodiment of the invention.

Fig. 25B is another schematic diagram of beamforming according to an embodiment of the invention.

Fig. 25C is a schematic diagram of beamforming with shared acoustic elements according to an embodiment of the invention.

FIG. 26 is a schematic diagram of multi-channel adaptive filtering according to an embodiment of the invention.

Fig. 27 is a schematic diagram of single channel filtering according to an embodiment of the invention.

FIG. 28A is a schematic illustration of a desired voice activity detection according to an embodiment of the present invention.

FIG. 28B is a diagram of a normalized speech threshold comparator, according to an embodiment of the present invention.

FIG. 28C is a schematic illustration of desired voice activity detection using multiple reference channels in accordance with an embodiment of the present invention.

FIG. 28D is a schematic diagram of a process flow using compression according to an embodiment of the invention.

FIG. 28E is a schematic diagram of different functions for providing compression according to embodiments of the invention.

Fig. 29A is a diagram illustrating an auto-balancing architecture according to an embodiment of the invention.

Fig. 29B is a schematic diagram of automatic balancing according to an embodiment of the present invention.

Fig. 29C is a schematic diagram of filtering according to an embodiment of the invention.

Fig. 30 is a schematic diagram of a process for automatic balancing according to an embodiment of the present invention.

Fig. 31 is a schematic diagram of a speech signal processing system according to an embodiment of the present invention.

Fig. 32A is a perspective view of a microphone configuration on a headset according to an embodiment of the invention.

Fig. 32B is a top view of a microphone configuration on a headset corresponding to fig. 32A, according to an embodiment of the invention.

Fig. 32C is a bottom view of a microphone configuration on a headset corresponding to fig. 32A, in accordance with an embodiment of the invention.

Fig. 32D is a perspective view of another set of microphone layouts on a headset according to an embodiment of the invention.

Fig. 32E is a bottom view of a microphone layout on a headset corresponding to fig. 32D, in accordance with an embodiment of the invention.

Fig. 33 is a schematic illustration of the headset from fig. 32A-D relative to different sound sources according to an embodiment of the present invention.

Fig. 34 is a schematic diagram of processing sound signals from a microphone array configured with a headset according to an embodiment of the present invention.

Detailed Description

In the following detailed description of the various embodiments of the invention, reference is made to the accompanying drawings in which like references indicate similar elements, and in which is shown by way of illustration specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims.

Apparatus and methods for detecting and processing acoustic signals including desired and undesired acoustic signals within a headset are described herein. In one or more embodiments, the noise cancellation architecture combines multi-channel noise cancellation and single-channel noise cancellation to extract the desired sound signal from the undesired sound signal. In one or more embodiments, multi-channel sound signal compression is used for desired voice activity detection. In one or more embodiments, the channels are automatically balanced. In one or more embodiments, the system automatically selects a subset of microphones from an array of possible microphones for sound signal extraction. In one or more embodiments, a hearing aid is provided to a user to facilitate hearing sounds from a local environment.

Fig. 1 shows a general flow diagram of configuring a microphone on a headset according to an embodiment of the invention at 100. Referring to FIG. 1, flow begins at block 102. At block 104, a "primary" or "main" microphone channel is created on the headset using one or more microphones. The primary microphone is mounted to optimize reception of a desired sound signal to enhance a first signal-to-noise ratio, expressed as SNR, associated with the primary microphone_M. At block 106, a reference microphone channel is created on the headset using one or more microphones. The reference microphone is located on the head-mounted device to provide a lower signal-to-noise ratio for detection of the desired sound signal from the user, resulting in a signal expressed as SNR_ROf the second signal-to-noise ratio. Thus, a signal-to-noise ratio difference is achieved at block 108 by the layout geometry of the microphones on the headset, resulting in a first signal-to-noise ratio, SNR_MSNR greater than the second signal-to-noise ratio_R。

At block 110, signal-to-noise ratio differences are achieved by beamforming by creating different response patterns (directivity patterns) for the primary and reference microphone channels. The generation of the signal-to-noise ratio difference using different directivity patterns is described in more detail below with reference to the drawings.

In various embodiments, the signal-to-noise ratio difference is achieved at block 112 by a combination of one or more of microphone placement geometry, beamforming, and using different directivity patterns for the main and reference channels. Flow ends at block 114.

Fig. 2 illustrates a microphone layout geometry, generally at 200, in accordance with an embodiment of the present invention. Referring to FIG. 200, aThe desired audio signal 204 is emitted from the desired audio signal source, the user's mouth as shown at 202. The source 202 provides the desired sound signal 204 to a microphone mounted on the headset. The first microphone 206 is located at a distance d from the source 202₁208, at the distance indicated. Second microphone 210 is located a distance d from source 202₂212, at the distance indicated by 212. The system 200 is also subject to undesired sound signals as shown at 218.

With respect to the source 202, the acoustic distances of the first microphone 206 and the second microphone 210 from the source 202 are different, as indicated by Δ L at 214. The difference in acoustic distance Δ L214 is given by equation 216. As used in the description of the embodiments, the distance d₁And d₂Representing the path of sound waves to the

microphones

206 and 210, respectively. Thus, these distances may be linear or curved depending on the particular location of the microphone on the headset and the frequencies of sound of interest. For clarity of illustration, these paths and corresponding distances are shown as straight lines, but this is not meant to be limiting.

The undesired sound signal 218 is generally represented by a distance d₁And d₂Various sources at greater distances. E.g. building noise, car noise, aircraft noise, etc., all originating from the usual ratio d₁And d₂At distances several orders of magnitude greater. Thus, the undesired sound signals 218 substantially affect each other at the

microphone locations

206 and 210, or at least are received at a fairly uniform level at each location. Due to various mechanisms, the difference in acoustic distance Δ L relative to the first microphone 208, 214 reduces the amplitude of the desired sound signal 204 received at the second microphone 210. One such mechanism is, for example, spherical dispersion, which results in a desired acoustic signal following 1/r²Is attenuated. Where r is the distance (e.g., 208 or 212) between the source (e.g., 202) and the receiving location (e.g., 206 or 210). The reduction in the desired sound signal at the second microphone location 210 reduces the signal-to-noise ratio at 210 relative to 206 because the noise amplitude is substantially the same at each location, but the signal amplitude is reduced at 210 relative to the amplitude received at 206. Dependent on path lengthOne mechanism is the difference in acoustic impedance along one path and the other, resulting in a curved acoustic path rather than a straight path. In general, these mechanisms combine to reduce the amplitude of the desired sound signal received at the reference microphone location relative to the primary microphone location. Thus, the layout geometry is used to provide a signal-to-noise ratio difference between the two microphone positions used by the noise cancellation system, as will be described further below, to reduce unwanted sound signals from the primary microphone channel.

The microphone layout geometry allows for various configurations for placement of the primary and reference microphones. In various embodiments, a general microphone placement method that allows microphones to be placed in different locations on a headgear device is described and presented below in connection with fig. 3A-5C.

Fig. 3A shows a general microphone layout with primary microphones in a first position, generally at 300, according to an embodiment of the present invention. Referring to fig. 3A, a head-mounted device 302 is shown. As used in the detailed description of the present embodiments, the head-mounted device may be any device configured to be worn on the head of a user, such as, but not limited to, eyeglasses, goggles, helmets, sun visors, headbands, and the like. In the discussion given below in connection with fig. 3A through 5C, it will be appreciated that the discussion is equally applicable to any head-mounted device, such as those shown in fig. 8 through 19, as well as those head-mounted devices not specifically shown in the figures herein. Thus, embodiments of the present invention may be applicable to headset devices that have not been named or invented.

Referring back to fig. 3A, in one embodiment, the headset has a frame 302, a lens 308, and a lens 310, the frame 302 having attached a temple 304 and a temple 306. In various embodiments, the head mounted device 302 is a pair of eyeglasses worn on the head of the user. A plurality of microphones, such as microphone 1, microphone 2, microphone 3, microphone 4, microphone 5, microphone 6, microphone 7, microphone 8, and optionally microphone 9 and microphone 10, are located on the head-mounted device 302. In various embodiments, the headset as shown including frame 302/

temples

304 and 306 may be sized to include electronics 318 for signal processing as further described below. Electronics 318 provide electrical connections to a microphone mounted on the head-mounted device 302.

The head-mounted device 302 has an interior volume defined by its structure within which electronics 318 may be mounted. Alternatively, the electronic device 318 may be mounted external to the structure. In one or more embodiments, an access panel is provided to access the electronics 318. In other embodiments, an access door is not explicitly provided, but electronics 318 may be contained within the volume of the head-mounted device 302. In this case, the electronics 318 may be embedded prior to assembly of the headset, with one or more components interlocked together to form a housing that captures the electronics 318 therein. In other embodiments, the headset is molded around the electronics 318, thereby encapsulating the electronics 318 within the volume of the headset 302. In various non-limiting embodiments, electronics 318 includes an adaptive noise cancellation unit, a single channel noise cancellation unit, a filter controller, a power supply, a desired voice activity detector, a filter, and so forth. Other components of the electronics 118 are described in the following figures.

The headset 302 may include a switch (not shown) for powering the headset 302 on or off.

The head-mounted device 302 may contain within its volume a data processing system for processing sound signals received by its associated microphone. The data processing system may include one or more of the elements of the system shown in FIG. 31, and described further below. Accordingly, the illustrations of fig. 3A-5C do not limit the embodiments of the invention.

The headgear device of fig. 3A illustrates that the microphone may be placed anywhere on the device. The ten positions selected for illustration in the drawings are chosen merely to illustrate the general principles of layout geometry and do not constitute a limitation on embodiments of the invention. Thus, microphones may be used in different locations than those shown, and different microphones may be used in different locations. For purposes of illustration and without limitation, the measurements made in connection with fig. 3A-5C use omni-directional microphones. In other embodiments, directional microphones are used. In an exemplary configuration for signal-to-noise ratio measurements, each microphone is mounted in a housing, each housing having a port to the environment. Arrow 1b shows the direction of the port associated with the microphone 1. Arrow 2b shows the direction of the port associated with the microphone 2. Arrow 3b shows the direction of the port associated with the microphone 3. The arrow 4b shows the direction of the port associated with the microphone 4. The arrow 5b shows the direction of the port associated with the microphone 5. Arrow 6b shows the direction of the port associated with the microphone 6. The arrow 7b shows the direction of the port associated with the microphone 7. Arrow 8b shows the direction of the port associated with microphone 8.

The user's mouth is shown at 312 and is similar to the source of the desired sound signal shown at 202 in fig. 2. The acoustic path length (referred to herein as the acoustic distance or distance) from the user's mouth 312 to each microphone is illustrated by the arrow from the user's mouth 312 to the respective microphone location. E.g. d₁Representing the acoustic distance from the user's mouth 312 to the microphone 1. d₂Representing the acoustic distance from the user's mouth 312 to the microphone 2. d₃Representing the acoustic distance from the user's mouth 312 to the microphone 3. d₄Representing the acoustic distance from the user's mouth 312 to the microphone 4. d₅Representing the acoustic distance from the user's mouth 312 to the microphone 5. d₆Representing the acoustic distance from the user's mouth 312 to the microphone 6. d₇Representing the acoustic distance from the user's mouth 312 to the microphone 7. d₈Representing the acoustic distance from the user's mouth 312 to the microphone 8. Likewise, the optional microphone 9 and the microphone 10 also have an acoustic distance; however, to maintain clarity in the figures, they have not been so labeled.

In fig. 3A, the

microphones

1, 2, 3 and 6 and the user's mouth 312 lie substantially on an X-Z plane (see coordinate system 316), corresponding to an acoustic distance d₁、d₂、d₃And d₆Generally indicated by a straight line. Arrival wheatPaths of the

winds

4, 5, 7 and 8, i.e. d₄、d₅、d₇And d₈Is represented as a curved path, which reflects the fact that the user's head is opaque to the sound field. Thus, in this case, there is some bending of the acoustic path. In general, the acoustic path between the source of the desired sound signal and the microphone on the headset may be linear or curved. As long as the difference in path length between the main microphone and the reference microphone is large enough, the necessary signal-to-noise ratio difference required by the noise cancellation system is obtained to achieve an acceptable level of noise cancellation.

To make the measurements shown in fig. 3B and 3C, an acoustic testing device is used to measure the signal-to-noise ratio difference between the primary microphone and the reference microphone location. The test facility includes a mannequin with built-in speakers for simulating a user wearing the head mounted device. A speaker located at the user's mouth is used to generate the desired sound signal. The mannequin was placed in the anechoic chamber of the acoustic testing apparatus. Background noise is generated in anechoic chambers with speaker arrays. The pink noise spectrum is used in the measurement process; however, other weights in frequency may be used for the background noise field. During these measurements, the spectral amplitude level of the background noise was set to 75 dB/upa/Hz. The head-mounted device is placed on the mannequin. During testing, the microphone was located on the headset in the position shown in fig. 3A. The microphone for the main or main channel is selected as the microphone 1 for the first measurement sequence, as shown in fig. 3B and 3C below.

The desired audio signal is constituted by the word "Camera". This word is delivered through the speakers in the manikin. The received signal corresponding to the word "Camera" at microphone 1 is processed by a noise cancellation system (as described in the figures below), gated in time, and averaged to produce a "signal" amplitude corresponding to microphone 1. The corresponding signal corresponding to the word "Camera" is measured at each of the other microphones at

locations

2, 3, 4, 5, 6, 7, and 8 in turn. Similarly, at each microphone location, the background noise spectral level is measured. From these measurements, the signal-to-noise ratio is calculated at each microphone location, and then the signal-to-noise ratio difference of the microphone pair is calculated, as shown in the following figure.

Fig. 3B shows generally at 320 a measure of the signal-to-noise ratio difference for the primary microphone of fig. 3A, in accordance with an embodiment of the present invention. Referring to fig. 3B and 3A, at 314, the microphone 1 is used as the primary or main microphone. The various positions are then used to place reference microphones such as microphone 2, microphone 3, microphone 6, microphone 4, microphone 5, microphone 7, and microphone 8. In fig. 3B, column 322 represents a microphone pair for a set of measurements. Column 324 represents the approximate difference in acoustic path length between a given microphone pair of column 322. The approximate acoustic path length difference Δ L is given by equation 216 in FIG. 2. Column 326 lists the dimensionless number ranges 1 to 7 for seven different microphone pairs used for signal-to-noise ratio measurements. Column 328 lists the signal-to-noise ratio difference for a given microphone pair listed in column 322. Each

row

330, 332, 334, 336, 338, 340, and 342 lists a different pair of microphones where the reference microphone is changed while the primary microphone 314 remains unchanged as microphone 1. Note that the approximate differences in acoustic path lengths of the various microphone pairs may be arranged in increasing order, as shown in equation 344. The microphone pairs have been arranged in rows 330-342 in a manner that increases the approximate acoustic path length difference 324 according to equation 344. The signal-to-noise ratio difference changes from 5.55dB for microphone 2 to act as a reference microphone to 10.48dB when microphone 8 is used as a reference microphone.

Figure 3C illustrates generally by 350 a signal-to-noise ratio difference and increasing microphone acoustic separation distance for the data shown in figure 3B, in accordance with an embodiment of the present invention. Referring to fig. 3C, the signal-to-noise ratio difference is plotted on the vertical axis, indicated by 352, and the dimensionless X value from column 326 (fig. 3B) is plotted on the horizontal axis, indicated by 354. Note that, as described above, the dimensionless X value represents the approximate acoustic path length difference Δ L. The X-axis 354 does not correspond exactly to Δ L, but it does relate to Δ L because the data is arranged and plotted in a manner that increases the approximate acoustic path length difference Δ L. This data ordering helps to illustrate the characteristics of the signal-to-noise ratio difference described above in connection with fig. 2, i.e., the signal-to-noise ratio difference will increase as the acoustic path length difference between the primary microphone and the reference microphone increases. This behavior is identified by observing that the signal-to-noise ratio difference increases with changes in Δ L using curve 356, which curve 356 plots the data from column 328 as a function of the data from column 326 (fig. 3B).

Fig. 4A shows a general microphone layout with the primary microphones in the second position, generally at 420, according to an embodiment of the present invention. In fig. 4A, the second position for primary microphone 414 is where microphone 2 is located. The above test is repeated using microphone 2 as the primary microphone and the reference microphone positions may alternatively be the positions where microphone 6, microphone 3, microphone 4, microphone 5, microphone 7 and microphone 8 are located. These data are explained below with reference to fig. 4B and 4C.

Fig. 4B illustrates a measurement of a signal-to-noise ratio difference of the primary microphone of fig. 4A, according to an embodiment of the present invention. Referring to fig. 4B and 4A, microphone 2 is used as the primary or main microphone 414. The various positions are then used to place reference microphones such as microphone 6, microphone 3, microphone 4, microphone 5, microphone 7 and microphone 8. In fig. 4B, column 422 represents a microphone pair for a set of measurements. Column 424 represents the approximate difference in acoustic path length between a given microphone pair of column 422. The approximate acoustic path length difference Δ L is given by equation 216 in FIG. 2. Column 426 lists the dimensionless number ranges 1 to 6 for six different microphone pairs used for signal-to-noise ratio measurements. Column 428 lists the signal-to-noise ratio difference for a given microphone pair listed in column 422. Each

row

430, 432, 434, 336, 438 and 440 lists a different microphone pair, with the reference microphone changed while the primary microphone 414 remains unchanged as microphone 2. Note that the approximate differences in acoustic path lengths for the various microphone pairs may be arranged in increasing order, as shown in equation 442. According to equation 442, the microphone pairs have been arranged in rows 430-440 in a manner that increases the approximate acoustic path length difference 424. The signal-to-noise ratio difference changes from 1.2dB for microphone 6 to act as a reference microphone to 5.2dB when microphone 8 is used as a reference microphone.

Fig. 4C illustrates signal-to-noise ratio differences and increasing microphone acoustic separation distances for the data shown in fig. 4B, in accordance with an embodiment of the present invention. Referring to fig. 4C, the signal-to-noise ratio difference is plotted on the vertical axis, shown by 452, and the dimensionless X value from column 426 (fig. 4B) is plotted on the horizontal axis, shown by 454. Note that, as described above, the dimensionless X value represents the approximate acoustic path length difference Δ L. The X-axis 454 does not correspond exactly to Δ L, but it does relate to Δ L because the data is arranged and plotted in a manner that increases the approximate acoustic path length difference Δ L. This data ordering helps to illustrate the characteristics of the signal-to-noise ratio difference described above in connection with fig. 2, i.e., the signal-to-noise ratio difference will increase as the acoustic path length difference between the primary microphone and the reference microphone increases. This behavior is identified by observing that the signal-to-noise ratio difference increases with changes in Δ L using curve 456, curve 456 plotting the data from column 428 as a function of the data from column 426 (fig. 4B).

Fig. 5A shows a generic microphone layout with the primary microphones in a third position, according to an embodiment of the present invention. In fig. 5A, the third position for the primary microphone 514 is where the microphone 3 is located. The above test is repeated using microphone 3 as the primary microphone and the reference microphone positions may alternatively be the positions of microphone 6, microphone 4, microphone 5, microphone 7, and microphone 8. These data are explained below with reference to fig. 5B and 5C.

Fig. 5B illustrates a measurement of a signal-to-noise ratio difference of the primary microphone of fig. 5A, according to an embodiment of the present invention. Referring to fig. 5B and 5A, the microphone 3 is used as the main or primary microphone 514. The various positions are then used to place reference microphones such as microphone 6, microphone 4, microphone 5, microphone 7 and the position of microphone 8. In fig. 5B, column 522 represents a microphone pair for a set of measurements. Column 524 represents the approximate difference in acoustic path length between a given microphone pair of column 522. The approximate acoustic path length difference Δ L is given by equation 216 in FIG. 2. Column 526 lists the dimensionless number ranges 1 to 5 for five different microphone pairs used for signal-to-noise ratio measurements. Column 528 lists the signal-to-noise ratio difference for a given microphone pair listed in column 522. Each row 530,532, 534,536 and 538 lists a different pair of microphones where the reference microphone is changed while the primary microphone 514 remains unchanged as microphone 3. Note that the approximate differences in acoustic path lengths of the various microphone pairs may be arranged in increasing order as shown in equation 540. From equation 540, the microphone pairs have been arranged in

rows

530 and 538 in a manner that increases the approximate acoustic path length difference 524. The signal-to-noise ratio difference changes from 0dB for the microphone 6 to act as a reference microphone to 5.16dB when the microphone 7 is used as a reference microphone.

Fig. 5C illustrates the signal-to-noise ratio difference and increased microphone acoustic separation distance for the data shown in fig. 5B, in accordance with an embodiment of the present invention. Referring to fig. 5C, the signal-to-noise ratio difference is plotted on the vertical axis, indicated by 552, and the dimensionless X value from column 526 (fig. 5B) is plotted on the horizontal axis, indicated by 554. Note that, as described above, the dimensionless X value represents the approximate acoustic path length difference Δ L. The X-axis 554 does not correspond exactly to Δ L, but it does relate to Δ L because the data is arranged and plotted in a manner that increases the approximate acoustic path length difference Δ L. This data ordering helps to illustrate the characteristics of the signal-to-noise ratio difference described above in connection with fig. 2, i.e., the signal-to-noise ratio difference will increase as the acoustic path length difference between the primary microphone and the reference microphone increases. This behavior is identified by observing that the signal-to-noise ratio difference increases with changes in Δ L using curve 556, which curve 556 plots data from column 528 as a function of data from column 526 (fig. 5B).

Note that the particular location of the microphones in the views presented in the above figures is chosen for illustrative purposes only. These positions are not intended to limit embodiments of the present invention. Other locations for the microphones on the head-mounted device are used in other embodiments.

Thus, as described above in connection with fig. 1 block 108 and fig. 2-5C, in various embodiments, the microphone layout geometry is used to produce an acoustic path length difference between the two microphones and a corresponding signal-to-noise ratio difference between the primary microphone and the reference microphone. The signal-to-noise ratio difference may also be achieved by using different directivity patterns for the primary and reference microphones. In some embodiments, beamforming is used to create different directivity patterns for the primary and reference channels. For example, in FIG. 5A, the acoustic path length d₃And d₆Are too similar, so this choice of positions for the primary and reference microphones does not produce a sufficient signal-to-noise ratio difference (0 dB at column 528 and row 530 in fig. 5B). In this case, variations in microphone directivity pattern (one or two microphones) and/or beamforming may be used to produce the desired signal-to-noise ratio difference between the main channel and the reference channel.

Directional microphones may be used to reduce the reception of desired sound signals and/or increase the reception of undesired sound signals, thereby reducing the signal-to-noise ratio of the second microphone (reference microphone), which may result in an increased signal-to-noise ratio difference between the primary microphone and the reference microphone. An example of the use of a second microphone (not shown) and the techniques taught in fig. 6 and 7 below is shown in fig. 3A. In some embodiments, the second microphone may be substantially co-located with the microphone 1. In other embodiments, the second microphone is at a distance from source 312 that is equal to the distance of the first microphone from source 312. In some embodiments, the second microphone is a directional microphone whose primary response axis is substantially perpendicular (or equivalent to misaligned) to the acoustic path d₁. Thus, in the desired sound signal d₁There is a direction of zero or small response of the second microphone to the desired sound signal from 312. The result is that the signal-to-noise ratio of the second microphone decreases and the calculated signal-to-noise ratio difference between the first microphone and the second microphone increases. Note that the two microphones may be placed anywhere on the headset 302, including the same locations as described above. In other embodiments, one or more microphone elements are used as inputs to the beamformer, resulting in the main and reference channels having different directivity patterns and resulting in a signal-to-noise ratio difference between them.

Fig. 6 illustrates a microphone directivity pattern, generally at 600, in accordance with an embodiment of the present invention. Referring to fig. 6, an omni-directional microphone directivity pattern is shown by circle 602, circle 602 having a constant radius 604, constant radius 604 representing uniform sensitivity as a function of angle α (α) as represented by 608 measured from reference 606.

An example of a directional microphone having a cardioid directional pattern 622 is shown in diagram 620, where cardioid directional pattern 622 has a peak sensitivity axis indicated at 624 and a null indicated at 626. The cardioid directional pattern may be formed by two omnidirectional microphones or by one omnidirectional microphone and a suitable mounting structure for the microphone.

One example of a directional microphone having a dual directivity pattern 642/644 is shown in diagram 640, where the first lobe 642 of the dual directivity pattern has a first peak sensitivity axis indicated at 648 and the second lobe 644 has a second peak sensitivity axis indicated at 646. A first zero exists in direction 650 and a second zero exists in direction 652.

An example of a directional microphone having a super-cardioid directional pattern is shown in diagram 660, where super-cardioid directional pattern 664/665 has a peak sensitivity axis indicated by direction 662, a short sensitivity axis indicated by direction 666, and nulls indicated by directions 668 and 670.

Figure 7 illustrates a misaligned reference microphone response axis, generally at 700, in accordance with an embodiment of the present invention. Referring to fig. 7, a microphone is indicated at 702. Microphone 702 is a directional microphone having a main response axis 706 and nulls in its directivity pattern represented by 704. The incident sound field is indicated as arriving from direction 708. In various embodiments, microphone 702 is a two-way microphone, such as that shown in fig. 6. Directional microphone 702, properly positioned on the headset, when used as a reference microphone, reduces the signal-to-noise ratio by limiting the response to desired sound signals from direction 708 while responding to undesired sound signals from direction 710. As described above, the response of the directional microphone 702 will increase the signal-to-noise ratio difference.

Thus, it is within the teachings of the embodiments presented herein that one or more primary microphones and one or more reference microphones are placed at multiple locations on the headset to obtain a suitable signal-to-noise ratio difference between the primary and reference microphones. Such a signal-to-noise ratio difference enables a desired sound signal to be extracted from an acoustic signal containing the desired sound signal and an undesired sound signal, as described below in connection with the following figures. The microphones may be placed at different locations on the headset, including co-placing the primary microphone and the reference microphone at a common location on the headset.

In some embodiments, the technique of microphone placement geometry is combined with different directivity patterns obtained at the microphone level or by beamforming to produce a signal-to-noise ratio difference between the main channel and the reference channel, according to block 112 (fig. 1).

In various embodiments, the head-mounted device is an eyeglass device as described below in connection with the following figures. Fig. 8 is an exemplary diagram of an eyewear device 800 in accordance with an embodiment of the present invention. As shown in the figure, the eyewear apparatus 800 includes eyewear 802 having an embedded microphone. The eyewear 802 has two

microphones

804 and 806. The first microphone 804 is disposed in the middle of the frame of the eyeglasses 802. The second microphone 806 is disposed on the side of the frame of the eyeglasses 802.

Microphones

804 and 806 may be bi-directional or unidirectional pressure gradient microphone elements. In one or more embodiments, each

microphone

804 and 806 is a microphone assembly within a rubber boot. The rubber boot provides an acoustic port with an acoustic conduit on the front and back of the microphone. The two

microphones

804 and 806 and their respective jackets may be identical.

Microphones

804 and 806 may be hermetically sealed (e.g., hermetically sealed). The acoustic duct is filled with a windshield material. The ports are sealed with a woven fabric layer. The lower and upper acoustic ports are sealed with a water-resistant membrane. The microphone may be built into the structure of the spectacle frame. Each microphone has a top and bottom aperture as acoustic ports, and in one embodiment,

microphones

804 and 806, which may be pressure gradient microphone elements, may each be replaced by two omnidirectional microphones.

Fig. 9 is a schematic diagram of another example of an embodiment of the present invention. As shown in fig. 9, the eyewear device 900 includes eyewear 952 having three embedded microphones. The eyeglasses 952 in fig. 9 are similar to the eyeglasses 802 in fig. 8, but instead employ three microphones instead of two microphones. The eyeglasses 952 in fig. 9 have a first microphone 954 disposed in the middle of the eyeglasses 952, a second microphone 956 disposed on the left side of the eyeglasses 952, and a third microphone 958 disposed on the right side of the eyeglasses 952. These three microphones may be used in the three microphone embodiment described above.

Fig. 10 is a schematic diagram of eyewear 1000 in an embodiment of the present invention in which the two bi-directional microphones shown in fig. 8 are replaced with four omni-

directional microphones

1002, 1004, 1006, 1008 and electronic beam steering, for example. Replacing two bi-directional microphones with four omni-directional microphones provides more flexibility and manufacturability for the designers of the eyeglass frame. In an exemplary embodiment with four omnidirectional microphones, the four omnidirectional microphones may be located anywhere on the eyeglass frame, preferably with the microphone pairs arranged vertically around the lens. In the present embodiment, the

omnidirectional microphones

1002 and 1004 are main microphones for detecting main sounds to be separated from interference; the

microphones

1004, 1008 are reference microphones for detecting background noise to be separated from the main sound. The microphone array may be an omnidirectional microphone, wherein the omnidirectional microphone may be any combination of the following microphones: an electret condenser microphone, an analog micro-electro-mechanical system (MEMS) microphone, or a digital MEMS microphone.

Another exemplary embodiment of the present invention, as shown in fig. 11, includes an eyewear apparatus having a noise canceling microphone array, the eyewear apparatus including a frame 1100, the microphone array coupled to the frame, the microphone array including at least a first microphone 1102 and a second microphone 1104, the first microphone coupled to the frame proximate a temple area, the temple area may be located generally between a top corner of a lens opening and a support arm and provide a first audio channel output, and the second microphone coupled to the frame proximate an inner lower corner of the lens opening and provide a second audio channel output. The second microphone is located on the diagonal of the lens opening 1106, but it can be located anywhere along the inner frame of the lens, such as the lower corner, upper corner or edge of the inner frame. Further, the second microphone may be disposed along an inner edge of the lens on the left or right side of the nose bridge.

In yet another embodiment of the present invention, at least one flexible Printed Circuit Board (PCB) strip may be used to connect the microphone array to the frame, as shown in fig. 12. In the present embodiment, the eyewear device 1200 of the present invention includes: an upper flexible PCB strip 1202 including a first microphone 1204 and a fourth microphone 1206, and a lower flexible PCB strip 1208 including a second microphone 1210 and a third microphone 1212.

In further exemplary embodiments, the frame may further include an array of vent holes corresponding to the array of microphones. The microphone array may be a bottom port or top port microelectromechanical system (MEMS) microphone. As shown in fig. 13, fig. 13 is a microphone assembly of the eyewear of fig. 12, a MEMS microphone assembly 1300 comprising a MEMS microphone 1302 affixed to a flexible Printed Circuit Board (PCB) 1304. A gasket 1306 separates the flexible PCB1304 from the device housing 1308. A vent 1310 is defined by the flexible PCB1304, gasket 1306 and device housing 1308, the vent 1310 being an audio channel that directs sound waves to the MEMS microphone 1302. The first and fourth MEMS microphones may be connected to the upper flexible PCB strip, the second and third MEMS microphones may be connected to the lower flexible PCB strip, and the MEMS microphone array may be arranged such that the bottom port or the top port receives sound signals through the corresponding vent holes.

Fig. 14 shows another alternative embodiment of eyewear 1400 in which

microphones

1402, 1404 are placed at temple region 1406 and front frame 1408, respectively.

Fig. 15 illustrates generally at 1500 eyewear having a built-in acoustic noise cancellation system in accordance with an embodiment of the present invention. Referring to fig. 15, the headset 1502 includes one or more microphones for the primary channel and one or more microphones for the reference channel. The head-mounted device 1502 is configured as a wearable computer with an information display 1504. In various embodiments, electronic devices are included at 1506 and/or 1508. In various embodiments, the electronics may include noise cancellation electronics, which are described more fully below in connection with the figures. In other embodiments, the noise cancellation electronics are not co-located with the headset 1502, but are located external to the headset 1502. These embodiments provide wireless communication links, e.g. with

Protocol,

And the like to transmit sound signals received from the microphone to an external location for processing by the noise cancellation electronics.

Fig. 16 shows the primary microphone locations in the headset from fig. 15 generally at 1600 in accordance with an embodiment of the present invention. Referring to fig. 16, the primary microphone location is shown by 1602.

Fig. 17 illustrates generally at 1700 a goggle having a built-in acoustic noise cancellation system in accordance with an embodiment of the present invention. Referring to fig. 17, a head-mounted device in the form of goggles 1702 is configured with a primary microphone at location 1704 and a reference microphone at location 1706. In various embodiments, noise cancellation electronics are included in the goggles 1702. The noise cancellation electronics will be described more fully below with reference to the accompanying drawings. In other embodiments, the noise cancellation electronics are not co-located with the headset 1702, but are located external to the headset 1702. These embodiments provide wireless communication links, e.g. with

Protocol,

Protocols, etc. to transmit sound signals received from the microphones to an external location for processing by noise cancellation electronics.

FIG. 18 illustrates a visor cap having a built-in acoustic noise cancellation system, shown generally at 1800, in accordance with an embodiment of the present invention. Referring to fig. 18, a head-mounted device in the form of a visor 1802 has a primary microphone 1804 and a reference microphone 1806. In various embodiments, visor 1802 includes noise cancellation electronics. The noise cancellation electronics will be described more fully below with reference to the accompanying drawings. In other embodiments, the noise cancellation electronics are not co-located with the headset 1802, but ratherIs located outside the headset 1802. These embodiments provide wireless communication links, e.g. with

Protocol,

Fig. 19 shows a helmet having a built-in acoustic noise cancellation system, generally at 1900, in accordance with an embodiment of the invention. Referring to fig. 19, a head-mounted device in the form of a helmet 1902 has one primary microphone 1904 and one reference microphone 1906. In various embodiments, noise cancellation electronics are included in the helmet 1902. The noise cancellation electronics will be described more fully below with reference to the accompanying drawings. In other embodiments, the noise cancellation electronics are not co-located with the headset 1902, but are located external to the headset 1902. These embodiments provide wireless communication links, e.g. with

Protocol,

Fig. 20 shows a flow, generally at 2000, of extracting a desired sound signal, in accordance with an embodiment of the present invention. Referring to FIG. 20, flow begins at block 2002. At block 2004, a primary sound signal is received from a primary microphone located on the headset. At block 2006, a reference sound signal is received from a reference microphone located on the headset. At block 2008, a normalized primary sound signal is formed. In various embodiments, the normalized primary sound signal is formed using one or more reference sound signals, as described in the figures below. At block 2010, the normalized primary sound signal is used to control noise cancellation using a sound signal processing system contained within the headset. The flow stops at block 2012.

Fig. 21 illustrates a system architecture, generally at 2100, in accordance with an embodiment of the invention. Referring to fig. 21, two channels are input to one adaptive noise canceling unit 2106. The first channel, referred to herein as the primary channel 2102, is synonymously referred to as the "primary" or "primary" channel in the description of the present embodiment. The primary channel 2102 contains desired sound signals and undesired sound signals. As described in detail in the following figures, the sound signal input on the primary channel 2102 results from the presence of desired and undesired sound signals on one or more acoustic elements. Depending on the configuration of the one or more microphones for the main channel, the microphone elements may output analog signals. The analog signal is converted to a digital signal using an analog-to-digital converter (AD) (not shown). Further, the amplifier may be located near the microphone element or the AD converter. The second channel, referred to herein as the reference channel 2104, provides a sound signal that is also produced by the presence of the desired sound signal and the undesired sound signal. Optionally, the second reference channel 2104b may be input to an adaptive noise cancellation unit 2106. Similar to the main channel, the microphone elements may output analog signals depending on the configuration of the one or more microphones for the reference channel. The analog signal is converted to a digital signal using an analog-to-digital converter (AD) (not shown). Further, the amplifier may be located near the microphone element or the AD converter. In some embodiments, the microphone is implemented as a digital microphone.

In some embodiments, the primary channel 2102 has an omnidirectional response and the reference channel 2104 has an omnidirectional response. In some embodiments, the acoustic beam patterns of the acoustic elements of the main channel 2102 and the reference channel 2104 are different. In other embodiments, the beam patterns of the acoustic elements of the primary channel 2102 and the reference channel 2104 are the same. However, the desired sound signal received on the primary channel 2102 is different from the desired sound signal received on the reference channel 2104. Thus, the signal-to-noise ratio of the main channel 2102 is different from the signal-to-noise ratio of the reference channel 2104. In general, the signal-to-noise ratio of the reference channel is less than the signal-to-noise ratio of the main channel. In various embodiments, the difference between the main channel signal-to-noise ratio and the reference channel signal-to-noise ratio is about 1 or 2 decibels (dB) or greater, as non-limiting examples. In other non-limiting embodiments, the difference between the main channel signal-to-noise ratio and the reference channel signal-to-noise ratio is about 1 decibel (dB) or less. Thus, embodiments of the present invention are suitable for high noise environments that may result in a low signal-to-noise ratio relative to the desired sound signal, and low noise environments that may have a higher signal-to-noise ratio. As used in the description of the embodiments, the signal-to-noise ratio means a ratio of a desired sound signal to an undesired sound signal in one channel. Further, the term "main channel signal-to-noise ratio" is used interchangeably with the term "main signal-to-noise ratio". Likewise, the term "reference channel signal-to-noise ratio" is used interchangeably with the term "reference signal-to-noise ratio".

The primary channel 2102, the reference channel 2104, and the optional second reference channel 2104b provide inputs to an adaptive noise cancellation unit 2106. Although a second reference channel is shown, in various embodiments, more than two reference channels are used. The adaptive noise cancellation unit 2106 filters the undesired sound signal from the main channel 2102, thereby providing a first stage of filtering with multiple channel inputs. In various embodiments, the adaptive noise cancellation unit 2106 utilizes an adaptive Finite Impulse Response (FIR) filter. A reverberant sound field may be rendered using the environment in embodiments of the invention. Thus, the adaptive noise cancellation unit 2106 includes a delay for the main channel that is sufficient to approximate the impulse response of the system usage environment. The size of the delay used depends on the specific application for which the system is designed, including whether reverberation must be considered in the design. In some embodiments, the magnitude of the delay may be a fraction of a millisecond for microphone channels that are placed very close together (where reverberation is insignificant). It is noted that at the low end of the range of values available for delay, the acoustic propagation time between channels may represent a minimum delay value. Thus, in various embodiments, the delay value may range from about a fraction of a millisecond to about 500 milliseconds or longer, depending on the application. Further explanation of the adaptive noise cancellation unit 1106 and components associated therewith is provided below in connection with the following figures.

The output 2107 of the adaptive noise cancellation unit 2106 is input to a single channel noise cancellation unit 2118. The single channel noise cancellation unit 2118 filters the output 2107 and further reduces the undesired sound signal from the output 2107, thereby providing a second stage of filtering. The filtering of the undesired sound signal by the single-channel noise cancellation unit 2118 is mainly a stationary effect. The single channel noise cancellation unit 2118 includes a linear filter, such as a wiener filter, a Minimum Mean Square Error (MMSE) filter implementation, a linear stationary noise filter, or other bayesian filtering method using a priori information about the parameters to be estimated. The filter used by the single-channel noise cancellation unit 2118 will be more fully described with reference to the drawings.

The sound signal from the main channel 2102 is input into the filter controller 2112 at 2108. Likewise, the sound signal from the reference channel 2104 is input into the filter controller 2112 at 2110. The optional second reference channel is input into filter control 2112 at 2108 b. Filter control 2112 provides a control signal 2114 to an adaptive noise cancellation unit 2106 and a control signal 2116 to a single channel noise cancellation unit 2118. In various embodiments, the operation of the filter controller 2112 will be described more fully below with reference to the accompanying drawings. The output 2120 of the single channel noise cancellation unit 2118 provides a sound signal that contains mostly the desired sound signal and a reduced amount of undesired sound signals.

The system architecture shown in fig. 21 may be applied in a variety of different systems for processing sound signals, according to embodiments of the present invention. Some examples of different acoustic systems are, but are not limited to, mobile communications, hand-held microphones, boom microphones, microphone-headset hearing aids, hands-free microphone devices, wearable systems embedded in a spectacle frame, near-to-eye (NTE) headset displays or headset computing devices, commonly configured head-mounted devices such as, but not limited to, spectacle goggles, sun visors, headbands, helmets, and the like. The environment in which these acoustic systems are used may have multiple sources of acoustic energy incident on the acoustic elements that provide the acoustic signals for the main channel 2102 and the reference channel 2104. In various embodiments, the desired sound signal is typically the result of the user's own voice (see FIG. 2 above). In various embodiments, the undesired sound signals are generally a result of a combination of undesired acoustic energy from multiple acoustic sources incident on the acoustic elements of the main and reference channels. Therefore, the undesired sound signal is not statistically correlated with the desired sound signal. Furthermore, there is a non-causal relationship between the undesired sound signal in the main channel and the undesired sound signal in the reference channel. In this case, echo cancellation does not work due to non-causal relationships, and because there is no measurement of a pure noise signal (undesired sound signal) other than the signal of interest (desired sound signal). In an echo canceling noise reduction system, a loudspeaker producing an acoustic signal provides a measure of the pure noise signal. In the context of the system embodiments described herein, there is no speaker or noise source from which a pure noise signal can be extracted.

Figure 22 illustrates generally at 2112 a filter controller in accordance with an embodiment of the invention. Referring to fig. 22, the sound signal from the main channel 2102 is input into the desired voice activity detection unit 2202 at 2108. The primary channel activity detector 2206 monitors the sound signal at 2108 to create a signature associated with activity on the primary channel 2102 (fig. 21). Optionally, a second reference channel activity detector (not shown) monitors the sound signal at 2110b to create a signature associated with activity on the second reference channel. Optionally, the output of the second reference channel activity detector is connected to disable control logic 2214. The reference channel activity detector 2208 monitors the sound signal at 2110 to create markers associated with the activity on the reference channel 2104 (fig. 21). The desired voice activity detection unit 2202 utilizes the sound signal input from 2110, 2108 and optionally 2110b to generate the desired voice activity signal 2204. The operation of the desired voice activity detection unit 2202 will be described more fully in the following figures.

In various embodiments, the disable logic 2214 receives as inputs information about the main channel activity at 2210, information about the reference channel activity at 2212, and information about whether the desired sound signal is present at 2204. In various embodiments, disable logic 2214 outputs filter control signal 2114/2116, which is sent to, for example, adaptive noise cancellation unit 2106 and single channel noise cancellation unit 2118 of fig. 21. The implementation and operation of the main channel activity detector 2206, reference channel activity detector 2208 and disable logic 2214 are more fully described in U.S. patent US 7386135 entitled "cardioid beam with desired null based acoustic apparatus, system and method," which is incorporated herein by reference.

In operation, in various embodiments, the system of fig. 21 and the filter controller of fig. 22 provide filtering and removal of undesired sound signals from the main channel 2102 when the adaptive noise cancellation unit 2106 and the single channel noise cancellation unit 2118 apply successive filtering stages. In one or more embodiments, the application of signal processing throughout the system is a linear application. In linear signal processing, the output is linearly related to the input. Thus, changing the input value results in a proportional change in the output. The linear application of the signal processing procedure to the signal preserves the quality and fidelity of the desired sound signal, thereby substantially eliminating or minimizing any non-linear distortion of the desired sound signal. Maintaining the signal quality of the desired sound signal is useful to the user because accurate reproduction of speech helps to facilitate accurate delivery of information.

Furthermore, algorithms for processing speech, such as Speech Recognition (SR) algorithms or Automatic Speech Recognition (ASR) algorithms, benefit from accurate rendering of sound signals substantially free of non-linear distortions. Thus, by embodiments of the present invention, distortions that may be generated due to the application of non-linear signal processing procedures are eliminated. The linear noise cancellation algorithm taught by embodiments of the present invention makes modifications to the desired sound signal transparent to the operation of the SR and ASR algorithms used by the speech recognition engine. Therefore, the error rate of the speech recognition engine is greatly reduced by applying the embodiment of the invention.

Figure 23 illustrates another system architecture diagram, generally designated 2300, in accordance with an embodiment of the present invention. Referring to fig. 23, in the system architecture presented herein, at 2302, a first channel provides an acoustic signal (nominally labeled MIC1 in the figure) from a first microphone. At 2304, a second channel provides an acoustic signal (nominally labeled MIC2 in the figure) from a second microphone. In various embodiments, one or more microphones may be used to create the signal from the first microphone 2302. In various embodiments, one or more microphones may be used to create the signal from the second microphone 2304. In some embodiments, one or more acoustic elements may be used to create a signal that is useful for the signal from the first microphone 2302 and the signal from the second microphone 2304 (see fig. 25C described below). Thus, one acoustic element may be shared by 2302 and 2304. In various embodiments, the arrangement of acoustic elements providing signals at 2302, 2304, the main channel, and the reference channel are described below in connection with the following figures.

The beamformer 2305 receives as input signals from the first microphone 2302 and from the second microphone 2304 and optionally from a third microphone 2304b (nominally labeled as MIC3 in the figure). The beamformer 2305 uses the

signals

2302, 2304 and optionally 2304b to create a main channel 2308a containing desired sound signals and undesired sound signals. The beamformer 2305 also creates one or more reference channels 2310a and optionally 2311a using the

signals

2302, 2304 and optionally 2304 b. The reference channel contains a desired sound signal and an undesired sound signal. The signal-to-noise ratio of the main channel, referred to as the "main channel signal-to-noise ratio" is greater than the signal-to-noise ratio of the reference channel, referred to herein as the "reference channel signal-to-noise ratio". The beamformer 2305 and/or the arrangement of acoustic elements for MIC1 and MIC2 provide a main channel signal-to-noise ratio that is greater than a reference channel signal-to-noise ratio.

The beamformer 2305 is connected to an adaptive noise cancellation unit 2306 and a filter control unit 2312. The main channel signal is output from the beamformer 2305 at 2308a and input to the adaptive noise cancellation unit 2306. Likewise, the reference channel signal is output from the beamformer 2305 at 2310a and input to the adaptive noise cancellation unit 2306. The main channel signal is also output from the beamformer 2305 and input to the filter controller 2312 at 2308 b. Likewise, the reference channel signal is output from the beamformer 2305 and input to the filter controller 2312 at 2310 b. Optionally, the second reference channel signal is output at 2311a and input to adaptive noise cancellation unit 2306, and the optional second reference channel signal is output at 2311b and input to filter controller 2012.

The filter controller 2312 uses the

inputs

2308b, 2310b and optionally 2311b to generate a channel activity flag and desired voice activity detection to provide a filter control signal 2314 to the adaptive noise cancellation unit 2306 and a filter control signal 2316 to the single channel noise reduction unit 2318.

At 2307, the adaptive noise cancellation unit 2306 provides multi-channel filtering and filters a first number of undesired sound signals from the primary channel 2308a during a first stage of filtering to output a filtered primary channel. A mono noise reduction unit 2318 receives the filtered primary channel 2307 as input and provides a second stage of filtering to further reduce the undesired sound signal from 2307. The single-channel noise reduction unit 2318 outputs the most desirable sound signal at 2320.

In various embodiments, different types of microphones may be used to provide the sound signals required by the embodiments of the invention presented herein. Any transducer that converts acoustic waves into an electrical signal is suitable for use with embodiments of the present invention taught herein. Some non-limiting examples of microphones are, but are not limited to, moving coil microphones, condenser microphones, Electret Condenser Microphones (ECM), and micro-electro-mechanical systems (MEMS) microphones. In other embodiments, a Condenser Microphone (CM) is used. In other embodiments, a micromachined microphone is used. Piezoelectric film based microphones are used with other embodiments. The piezoelectric element is made of a ceramic material, a plastic material, or a thin film. In other embodiments, a micromachined microphone array is used. In other embodiments, silicon or polysilicon micromachined microphones are used. In some embodiments, a bi-directional pressure gradient microphone is used to provide multiple sound channels. Various microphones or microphone arrays including the systems described herein may be mounted on or within a structure such as glasses or headphones.

Figure 24A illustrates another system architecture diagram, generally designated 2400, incorporating auto-balancing, in accordance with an embodiment of the present invention. Referring to fig. 24A, in the system architecture presented herein, at 2402, a first channel provides an acoustic signal (nominally labeled MIC1 in the figure) from a first microphone. At 2404, a second channel provides an acoustic signal (nominally labeled MIC2 in the figure) from a second microphone. In various embodiments, one or more microphones may be used to create the signal from the first microphone 2402. In various embodiments, one or more microphones may be used to create a signal from second microphone 2404. In some embodiments, one or more acoustic elements may be used to create a signal that becomes part of the signal from the first microphone 2402 and the signal from the second microphone 2404, as described above in connection with the previous figures. In various embodiments, the arrangement of acoustic elements providing signals at 2302, 2304, the main channel, and the reference channel are described below in connection with the following figures.

The beamformer 2405 receives as input the signal from the first microphone 2402 and the signal from the second microphone 2404. The beamformer 2405 uses

signals

2402 and 2404 to create a main channel containing both desired and undesired sound signals. The beamformer 2405 also creates reference

channels using signals

2402 and 2404. Optionally, at 2404b, a third channel provides an acoustic signal (nominally labeled MIC3 in the figure) from a third microphone. The signal is input to beamformer 2405. In various embodiments, one or more microphones may be used to create signal 2404b from a third microphone. The reference channel contains both the desired and undesired sound signals. The signal-to-noise ratio of the main channel, referred to as the "main channel signal-to-noise ratio" is greater than the signal-to-noise ratio of the reference channel, referred to herein as the "reference channel signal-to-noise ratio". The beamformer 2405 and/or arrangement of acoustic elements for MIC1, MIC2 and optionally MIC3 provides a main channel signal-to-noise ratio that is greater than a reference channel signal-to-noise ratio. In some embodiments, a bi-directional pressure gradient microphone element provides

signals

2402, 2404, and optionally 2404 b.

The beamformer 2405 is connected to an adaptive noise cancellation unit 2406 and a desired voice activity detector 2412 (filter controller). The main channel signal is output from the beamformer 2405 at 2408a and input to the adaptive noise cancellation unit 2406. Likewise, the reference channel signal is output from the beamformer 2405 at 2410a and input to the adaptive noise cancellation unit 2406. The main channel signal is also output from the beamformer 2405 and input to the desired voice activity detector 2412 at 2408 b. Likewise, reference channel signals are output from the beamformer 2405 and input to the desired voice activity detector 2412 at 2410 b. Optionally, a second reference channel signal is output from the beamformer 2405 at 2409a and input to the adaptive noise cancellation unit 2406, and a second reference channel signal is output from the beamformer 2405 at 2409b and input to the desired voice activity detector 2412.

Desired voice activity detector 2412 uses

inputs

2408b, 2410b, and optionally 2409b to generate filter control signal 2414 for adaptive noise cancellation unit 2408 and filter control signal 2416 for single channel noise reduction unit 2418. At 2407, the adaptive noise cancellation unit 2406 provides multi-channel filtering and filters a first number of undesired sound signals from the primary channel 2408a during the first stage of filtering to output a filtered primary channel. A single channel noise reduction unit 2418 receives the filtered main channel 2407 as input and provides a second stage of filtering to further reduce the undesired sound signal from 2407. The single channel noise reduction unit 2418 outputs the most desirable sound signal at 2420.

The desired voice activity detector 2412 provides a control signal 2422 to an auto-balance unit 2424. An auto-balancing unit 2424 is connected to the signal path of the first microphone 2402 at 2426. The auto-balancing unit 2424 is also connected to the signal path of the second microphone 2404 at 2428. Optionally, an auto-balancing unit 2424 is also connected to the signal path of the third microphone 2404b at 2429. An auto-balancing unit 2424 balances the microphone response to far-field signals over the life of the system. Maintaining microphone channel balance may improve system performance and maintain a high level of performance by preventing microphone sensitivity drift. The automatic balancing unit will be described more fully below in conjunction with the following figures.

Fig. 24B illustrates generally at 2450 a flow for noise reduction, in accordance with an embodiment of the present invention. Referring to FIG. 24B, flow begins at block 2452. At block 2454, a primary sound signal is received by the system. The primary sound signal may be, for example, in various embodiments, a signal as represented by 2102 (FIG. 21), 2302/2308a/2308b (FIG. 23), or 2402/2408a/2408b (FIG. 24A). At block 2456, a reference sound signal is received by the system. The reference acoustic signal may be, for example, in various embodiments, a signal as represented by 2104 and optionally 2104b (fig. 21), 2304/2310a/2310b and optionally 2304b/2311a/2311b (fig. 23), or 2404/2410a/2410b and optionally 2404b/2409a/2409b (fig. 24A). At block 2458, adaptive filtering is performed using the multiple input channels, e.g., using adaptive filtering units 2106 (fig. 21), 2306 (fig. 23), and 2406 (fig. 24A) to provide filtered sound signals as represented by 2107 (fig. 21), 2307 (fig. 23), and 2407 (fig. 24A). At block 2460, the filtered sound signal resulting from the processing of block 2458 is filtered using a single channel unit. The single channel cell may be, for example, a cell as represented by 2118 (fig. 21), 2318 (fig. 23), or 2418 (fig. 24A) in various embodiments. The flow stops at block 2462.

In various embodiments, the adaptive noise cancellation units, such as 2106 (fig. 21), 2306 (fig. 23), and 2406 (fig. 24A), are implemented in an integrated circuit device, which may include: an integrated circuit package containing an integrated circuit. In some embodiments, the adaptive

noise cancellation units

2106 or 2306 or 2406 are implemented in a single integrated circuit die. In other embodiments, the adaptive

noise cancellation unit

2106 or 2306 or 2406 is implemented in more than one integrated circuit die of an integrated circuit device, which may include: a multi-chip package containing the integrated circuit.

In various embodiments, the single channel noise cancellation units, such as 2018 (fig. 21), 2318 (fig. 23), and 2418 (fig. 24A), are implemented in an integrated circuit device, which may include: an integrated circuit package containing an integrated circuit. In some embodiments, the single channel

noise cancellation unit

2118 or 2318 or 2418 is implemented in a single integrated circuit die. In other embodiments, the single channel

noise cancellation unit

2118 or 2318 or 2418 is implemented in more than one integrated circuit die of an integrated circuit device, which may include: a multi-chip package containing the integrated circuit.

In various embodiments, a filter controller, such as 2112 (fig. 21 and 22) or 2312 (fig. 23), is implemented in an integrated circuit device, which may include: an integrated circuit package containing an integrated circuit. In some embodiments,

filter controller

2112 or 2312 is implemented in a single integrated circuit die. In other embodiments,

filter controller

2112 or 2312 is implemented in more than one integrated circuit die of an integrated circuit device, which may include: a multi-chip package containing the integrated circuit.

In various embodiments, the beamformer, such as 2305 (fig. 23) or 2405 (fig. 24A), is implemented in an integrated circuit device, which may include: an integrated circuit package containing an integrated circuit. In some embodiments, the

beamformer

2305 or 2405 is implemented in a single integrated circuit die. In other embodiments, the

beamformer

2305 or 2405 is implemented in more than one integrated circuit die of an integrated circuit device, which may include: a multi-chip package containing the integrated circuit.

Fig. 25A illustrates beamforming, generally at 2500, in accordance with an embodiment of the present invention. Referring to fig. 25A, a beamforming module is applied to both

microphone inputs

2502 and 2504. In one or more embodiments, microphone input 2502 may originate from a first directional microphone, microphone input 2504 may originate from a second directional microphone, or

microphone signals

2502 and 2504 may originate from omni-directional microphones. In other embodiments,

microphone signals

2502 and 2504 are provided by the output of a bi-directional pressure gradient microphone. Various directional microphones may be used, such as, but not limited to, microphones having a cardioid beam pattern, a dipole beam pattern, an omni-directional beam pattern, or a user-defined beam pattern. In some embodiments, one or more acoustic elements are configured to provide

microphone inputs

2502 and 2504.

In various embodiments, beamforming module 2506 includes filter 2508. Depending on the type of microphone used and the particular application, filter 2508 may provide a Direct Current (DC) blocking filter that filters DC and very low frequency components of microphone input 2502. After filter 2508, in some embodiments, additional filtering is provided by filter 2510. Some microphones have a non-flat response as a function of frequency. In this case, it may be desirable to flatten the frequency response of the microphone with a de-emphasis filter. The filter 2510 may provide de-emphasis, thereby flattening the frequency response of the microphone. After de-emphasis filtering by filter 2510, the primary microphone channel is provided to an adaptive noise cancellation unit at 2512a and to a desired voice activity detector at 2512 b.

Microphone input 2504 is input into beamforming module 2506 and, in some embodiments, filtered by filter 2512. Depending on the type of microphone used and the particular application, filter 2512 can provide a Direct Current (DC) blocking filter that filters DC and very low frequency components of microphone input 2504. The filter 2514 filters the sound signal output from the filter 2512. The filter 2514 adjusts gain, phase, and may also determine the formation of the frequency response of the acoustic signal. After filter 2514, in some embodiments, additional filtering is provided by filter 2516. Some microphones have a non-flat response as a function of frequency. In this case, it may be desirable to flatten the frequency response of the microphone with a de-emphasis filter. The filter 2516 may provide de-emphasis, thereby flattening the frequency response of the microphone. After de-emphasis filtering by filter 2516, the reference microphone channel is provided to an adaptive noise cancellation unit at 2518a and to a desired voice activity detector at 2518 b.

Optionally, a third microphone channel is input into the beamforming module 2506 at 2504 b. Similar to the signal path described above for channel 2504, filter 2512b filters the third microphone channel. Depending on the type of microphone used and the particular application, filter 2512b can provide a Direct Current (DC) blocking filter that filters DC and very low frequency components of microphone input 2504 b. The filter 2514b filters the sound signal output from the filter 2512 b. The filter 2514b adjusts gain, phase, and may also determine the formation of the frequency response of the acoustic signal. After filter 2514b, in some embodiments, additional filtering is provided by filter 2516 b. Some microphones have a non-flat response as a function of frequency. In this case, it may be desirable to flatten the frequency response of the microphone with a de-emphasis filter. The filter 2516b may provide de-emphasis, thereby flattening the frequency response of the microphone. After de-emphasis filtering by filter 2516b, the second reference microphone channel is provided to the adaptive noise cancellation unit at 2520a and to the desired voice activity detector at 2520 b.

Figure 25B shows another schematic diagram of beamforming generally at 2530 in accordance with an embodiment of the present invention. Referring to fig. 25B, a beam pattern is created for the main channel using the first and

second microphones

2532 and 2538. A signal 2534 output from the first microphone 2532 is input to an adder 2536. The signal 2540 output from the second microphone 2538 is adjusted in amplitude at block 2542 and in phase by applying a delay at block 2544 to produce a signal 2546 which is input to the summer 2536. Adder 2536 subtracts one signal from the other, resulting in output signal 2548. The output signal 2548 has a beam pattern that may take on a variety of forms depending on the initial beam pattern of the

microphones

2532 and 2538 and the gain applied at 2542 and the delay applied at 2544. By way of non-limiting example, the beam pattern may include a cardioid, dipole, or the like.

A beam pattern is created for the reference channel using the third and

fourth microphones

2552 and 2558. A signal 2554 output from the third microphone 2552 is input to the adder 2556. The signal 2560 output from the fourth microphone 2558 is adjusted in amplitude at block 2562 and in phase by applying a delay at block 2564 to produce the signal 2566 which is input to the summer 2556. Adder 2556 subtracts one signal from the other, resulting in output signal 2568. Depending on the initial beam pattern of

microphones

2552 and 2558 and the gain applied at 2562 and the delay applied at 2564, output signal 2568 has a beam pattern that may take on a variety of forms. By way of non-limiting example, the beam pattern may include a cardioid, dipole, or the like.

Fig. 25C illustrates generally 2570 beamforming with shared acoustic elements, according to an embodiment of the present invention. Referring to fig. 25C, a microphone 2552 is shared between the main channel and the reference channel. The output from the microphone 2552 is split and delivered to a gain 2574 and delay 2576 at 2572, and then input to the summer 2536 at 2586. The appropriate gain at 2574 and delay at 2576 may be selected to equivalently achieve an output 2578 from adder 2536 that is equivalent to output 2548 from adder 2536 (fig. 25B). Similarly, the gain 2582 and delay 2584 may be adjusted to provide an output signal 2588 equivalent to 2568 (fig. 25B). By way of non-limiting example, the beam pattern may include a cardioid, dipole, or the like.

Fig. 26 illustrates multi-channel adaptive filtering, generally 2600, in accordance with an embodiment of the invention. Referring to fig. 26, an embodiment of an adaptive filtering unit is shown with a main channel 2604 (containing a microphone signal) input to a delay element 2606. The reference channel 2602 (containing the microphone signal) is input to an adaptive filter 2608. In various embodiments, the adaptive filter 2608 may be an adaptive FIR filter designed to implement normalized least mean square adaptation (NLMS) or another algorithm. Embodiments of the invention are not limited to NLMS adaptation. The adaptive FIR filter filters the estimated desired sound signal from the reference signal 2602. In one or more embodiments, the output 2609 of the adaptive filter 2608 is input to an adder 2610. The delayed main channel signal 2607 is input to an adder 2610 and the output 2609 is subtracted from the delayed main channel signal 2607. The output of the adder 2616 provides a signal containing a desired sound signal with a reduced number of undesired sound signals.

Many situations in which acoustic systems using embodiments of the present invention are used are in the presence of reverberation. Reverberation results in a form of noise and in undesired sound signals, the subject of filtering and signal extraction as described herein. In various embodiments, the two-channel adaptive FIR filtering, shown by 2600, models the reverberation between the two channels and the environment in which they are applied. Thus, the undesired sound signal propagates along the direct path and the reverberant path that requires the impulse response of the adaptive FIR filter to simulate the environment. Various approximations of the impulse response of the environment may be made depending on the accuracy required. In one non-limiting example, the amount of delay is approximately equal to the impulse response time of the environment. In one non-limiting example, the amount of delay is greater than the impulse response of the environment. In an embodiment, the amount of delay is approximately equal to n times the impulse response time of the environment, where n may be equal to 2 or 3 or more, for example. Alternatively, the delay amount is not an integer multiple of the impulse response time, e.g., 0.5, 1.4, 2.75, etc. For example, in one embodiment, the filter length is approximately equal to twice the delay selected for 2606. Thus, if an adaptive filter with 200 taps is used, the length of the delay 2606 would be approximately equal to a time delay of 100 taps. The time delay corresponding to the propagation time through 100 taps is provided for illustration only and is not meant to be limiting in any way to embodiments of the present invention.

Embodiments of the present invention may be used in a variety of environments having a range of impulse response times. Some examples of impulse response times are given as non-limiting examples for illustrative purposes only and constitute a limitation of embodiments of the present invention. For example, office environments typically have impulse response times of about 100 to 200 milliseconds. The cabin interior may provide an impulse response time of 30 milliseconds to 60 milliseconds. In general, embodiments of the present invention are useful in environments where the impulse response time may range from a few milliseconds to 500 milliseconds or more.

The adaptive filtering unit 2600 communicates with disable logic, such as disable logic 2214, and filter control signals 2114 (fig. 22) at 2614. The signal 2614 controlled by the disable logic 2214 is used to control the filtering performed by the filter 2608 and the adaptation of the filter coefficients. The output 2616 of the adaptive filtering unit 2600 is input to single channel noise cancellation units such as those described in the previous figures above, e.g., 2118 (fig. 21), 2318 (fig. 23), and 2418 (fig. 24A). The first level of undesired sound signal has been extracted from the primary channel resulting in output 2616. Under various operating conditions, the level of noise, i.e., the undesired sound signal, may be very high relative to the signal of interest, i.e., the desired sound signal. Embodiments of the present invention are operable in situations where there is a certain difference in signal-to-noise ratio between the main channel and the reference channel. In some embodiments, the difference in signal-to-noise ratio is on the order of 1 decibel (dB) or less. In other embodiments, the difference in signal-to-noise ratio is on the order of 1 decibel (dB) or greater. The output 2616 is additionally filtered to reduce the amount of undesired sound signals contained therein during subsequent use of the single channel noise reduction unit.

The disable logic described above in fig. 22, including signal 2614 (fig. 26), provides substantial nulling of filter 2608 and non-adaptation of filter coefficients when the primary or reference channel is determined to be inactive. In this case, the signal present on the main channel 2604 is output at 2616.

If the main and reference channels are active and the desired sound signal is detected or the pause threshold has not been reached, then adaptation is disabled and the filter coefficients are frozen and the signal on the reference channel 2602 is filtered through the filter 2608 and subtracted from the main channel 2607 using adder 2610 and output at 2616.

The filter coefficients are adjusted if the main and reference channels are active and no desired sound signal is detected and a pause threshold (also referred to as pause time) is exceeded. The pause threshold depends on the application. For example, in one non-limiting example, in the case of Automatic Speech Recognition (ASR), the pause threshold may be on the order of a fraction of a second.

Fig. 27 illustrates generally by 2700 single channel filtering, according to an embodiment of the present invention. Referring to fig. 27, the single-channel noise reduction unit utilizes a linear filter with a single-channel input. Examples of filters suitable for use therein are wiener filters, filters employing Minimum Mean Square Error (MMSE), etc. The output from the adaptive noise cancellation unit (as described above) is input to the filter 2702 at 2704. Input signal 2704 contains a desired sound signal and noise components, i.e., undesired sound signals, represented as total power in equation 2714

Filter 2702 applies the equation shown by 2714 to input signal 2704. Total power

Is one of the numerators of equation 2714 and is obtained from the input of filter 2704. When the desired sound signal is absent from signal 2704, an estimate of the noise is obtained

I.e. undesired sound signals. Noise estimation value

Is the total power in the molecule

The other term subtracted. The total power is the term in the denominator of equation 2714. Estimation of noise

Obtained in the absence of the desired sound signal is obtained from an input signal 2704 that is notified by a signal 2716 received from inhibit logic, such as inhibit logic 2214 (fig. 22), that indicates when the desired sound is presentTone signals and when the desired sound signal is not present. When the desired sound signal is not present on signal 2704, the noise estimate is updated. When the desired sound signal is present, the noise estimate is frozen and filtered using the noise estimate previously determined during the last interval when the desired sound signal was not present.

Fig. 28A illustrates generally by 2800 desired voice activity detection, in accordance with an embodiment of the present invention. Referring to FIG. 28A, a dual input required speech detector is shown at 2806. Sound signals from the main channels are input at 2802 to a first signal path 2807a of a two input desired speech detector 2806, from, for example, a beamformer or from the main channels described above in connection with the previous figures. The first signal path 2807a includes a voice band filter 2808. The voice band filter 2808 captures most of the required voice energy in the main channel 2802. In various embodiments, voice band filter 2808 is a band pass filter characterized by a lower corner frequency, an upper corner frequency, and a roll off from the upper corner frequency. In various embodiments, the lower corner frequency may range from 50 to 300Hz, depending on the application. For example, in a broadband phone, the lower corner frequency is about 50 Hz. In a standard telephone, the lower corner frequency is about 300 Hz. The upper corner frequency is selected to allow the filter to pass most of the speech energy picked up by the relatively flat part of the frequency response of the microphone. Thus, the upper corner frequency can be set at different positions depending on the application. A non-limiting example of a location is 2500 Hz. Another non-limiting location for the upper corner frequency is 4000 Hz.

The first signal path 2807a includes a short term power calculator 2810. Short-term power calculator 2810 is implemented in various embodiments as a Root Mean Square (RMS) measurement, a power detector, an energy detector, and so forth. The short term power calculator 2810 may be synonymously referred to as a short term power calculator 2810. The short term power detector 2810 approximately calculates the instantaneous power in the filtered signal. The output of the short-term power detector 2810(Y1) is input to the signal compressor 2812. In various embodiments, compressor 2812 converts signals to Log₂Domain, Log₁₀Domain, etc. In other embodiments, the compressor 2812 performs a user-defined compression algorithm on signal Y1.

Similar to the first signal path described above, the sound signal from the reference channel is input at 2804 to a second signal path 2807b of a two-input desired speech detector 2806, from, for example, a beamformer or from the reference channel described above in connection with previous figures. The second signal path 2807b includes a voice band filter 2816. The voice band filter 2816 captures most of the required voice energy in the reference channel 2804. In various embodiments, voice band filter 2816 is a band pass filter characterized by having a lower corner frequency, an upper corner frequency, and roll-off from the upper corner frequency as described above for first signal path and voice band filter 2808.

The second signal path 2807b includes a short term power calculator 2818. Short-term power calculator 2818 is implemented in various embodiments as a Root Mean Square (RMS) measurement, a power detector, an energy detector, and so forth. The short term power calculator 2818 may be synonymously referred to as a short term power calculator 2818. The short term power detector 2818 approximately calculates the instantaneous power in the filtered signal. The output of the short term power detector 2818(Y2) is input to the signal compressor 2820. In various embodiments, compressor 2820 converts the signal to a Log₂Domain, Log₁₀Domain, etc. In other embodiments, compressor 2820 performs a user-defined compression algorithm on signal Y2.

At subtractor 2824, the compressed signal from second signal path 2822 is subtracted from the compressed signal from first signal path 2814, which results in a normalized primary signal at 2826 (Z). In other embodiments, different compression functions are applied at 2812 and 2820, which results in different normalization of the signal at 2826. In other embodiments, when log compression is not implemented, a division operation may be applied at 2824 to complete the normalization. For example when compression based on a square root function is implemented.

The normalized primary signal 2826 is input into a single channel normalized voice threshold comparator (SC-NVTC)2828, which results in a normalized desired voice activity detection signal 2830. Note that the architecture of the dual channel voice activity detector provides detection of the desired voice using a normalized desired voice activity detection signal 2830 based on the overall difference in signal-to-noise ratios of the two input channels. Thus, the normalized desired voice activity detection signal 2830 is based on the integration of energy in the voice band rather than energy in specific frequency bins, thereby preserving linearity within the noise cancellation unit described above.

Signals

2814 and 2822 are compressed, using logarithmic compression, to provide an input at 2826(Z) that has a noise floor that can take values that vary from below zero to above zero (see

column

2895c, 2895d, or 2895E in fig. 28E), unlike uncompressed single-channel inputs, whose noise floor is always above zero (see column 2895b in fig. 28E).

FIG. 28B illustrates generally 2850 a single-channel normalized Speech threshold comparator (SC-NVTC) according to an embodiment of the invention. Referring to fig. 28B, the normalized primary signal 2826 is input to a long-term normalized power estimator 2832. The long-term normalized power estimator 2832 provides a running estimate of the normalized primary signal 2826. The operating estimate provides a lower limit for the desired sound signal. The offset value 2834 is added to the running estimate of the output of the long term normalized power estimator 2832 in adder 2836. The output of the adder 2838 is input to a comparator 2840. The instantaneous estimate 2842 of the normalized primary signal 2826 is input to a comparator 2840. Comparator 2840 contains logic that compares the instantaneous value at 2842 with the running ratio at 2838 plus an offset. If the value at 2842 is greater than the value at 2838, the desired voice signal is detected and a flag is set accordingly and transmitted as part of the normalized desired voice activity detection signal 2830. If the value at 2842 is less than the value at 2838, the desired sound signal is not detected and a flag is set accordingly and transmitted as part of the normalized desired voice activity detection signal 2830. The long-term normalized power estimator 2832 averages the normalized primary signal 2826 over a sufficiently long length of time to mitigate variations in amplitude fluctuations. Thus, the amplitude fluctuations change slowly at 2833. The average time may vary from a fraction of a second to several minutes, as a non-limiting example. In various embodiments, the averaging time is selected to provide a slowly varying amplitude fluctuation at the output of 2832.

Fig. 28C illustrates generally by 2846 a desired voice activity detection using multiple reference channels in accordance with an embodiment of the present invention. Referring to FIG. 28C, the required speech detector is shown at 2848. The desired speech detector 2848 includes as inputs a main channel 2802 and a first signal path 2807a (described above in connection with fig. 28A) and a reference channel 2804 and a second signal path 2807b (also described above in connection with fig. 28A). In addition, there is a second reference channel 2850 that is input to the desired speech detector 2848 and is part of the third signal path 2807 c. Similar to the second signal path 2807b (described above), the sound signal from the second reference channel is input at 2850 to the third signal path 2807b of the multiple input desired speech detector 2848, e.g., from a beamformer or from the second reference channel described above in connection with the previous figures. The third signal path 2807b includes a voice band filter 2852. The speech band filter 2852 captures most of the required speech energy in the second reference channel 2850. In various embodiments, voice band filter 2852 is a band pass filter characterized by having a lower corner frequency, an upper corner frequency, and roll-off from the upper corner frequency as described above for second signal path and voice band filter 2808.

The third signal path 2807c includes a short term power calculator 2854. The short-term power calculator 2854 is implemented in various embodiments as a Root Mean Square (RMS) measurement, a power detector, an energy detector, and so on. The short term power calculator 2854 may be synonymously referred to as the short term power calculator 2854. The short term power detector 2854 approximately calculates the instantaneous power in the filtered signal. The output of the short term power detector 2854 is input to a signal compressor 2856. In various embodiments, compressor 2856 converts the signal to a Log₂Domain, Log₁₀Domain, etc. In other embodiments, compressor 2854 performs a user-defined compression algorithm on signal Y3.

At subtractor 2860, the compressed signal from third signal path 2858 is subtracted from the compressed signal from first signal path 2814, which results in a normalized primary signal at 2862 (Z2). In other embodiments, different compression functions are applied at 2856 and 2812, which results in different normalization of the signal at 2862. In other embodiments, when log compression is not implemented, a division operation may be applied at 2860. For example when compression based on a square root function is implemented.

The normalized primary signal 2862 is input into a single channel normalized voice threshold comparator (SC-NVTC)2864, which results in a normalized desired voice activity detection signal 2868. Note that the architecture of the multi-channel voice activity detector provides detection of the desired speech using a normalized desired voice activity detection signal 2868 based on the overall difference in signal-to-noise ratios of the two input channels. Thus, the normalized desired voice activity detection signal 2868 is based on the integration of energy in the voice band rather than energy in specific frequency bins, thereby preserving linearity within the noise cancellation unit described above.

Signals

2814 and 2858 are compressed, using logarithmic compression, to provide an input at 2862(Z2) that has a noise floor (see column 2895c, column 2895d, or column 2895E in fig. 28E) that can take values that vary from below zero to above zero, unlike uncompressed single-channel inputs, whose noise floor is always above zero (see column 2895b in fig. 28E).

The desired speech detector 2848, having a multi-channel input using at least two reference channel inputs, provides two normalized desired speech

activity detection signals

2868 and 2870 for outputting a desired voice activity signal 2874. In an embodiment, the normalized desired voice

activity detection signals

2868 and 2870 are input to a logical or gate 2872. The logical or gate outputs the desired voice activity signal 2874 according to its

inputs

2868 and 2870. In other embodiments, additional reference channels may be added to the desired speech detector 2848. Each additional reference channel is used to create another normalized primary channel that is input to another single-channel normalized speech threshold comparator (SC-NVTC) (not shown). The output from an additional single channel normalized voice threshold comparator (SC-NVTC) (not shown) is combined with 2874 through an additional xor gate (also not shown) (in one embodiment) to provide the desired voice activity signal, which is output as described above in connection with the previous figures. The additional reference channel is utilized in the multi-channel desired speech detector, as described above, resulting in more robust detection of the desired speech signal, since more information about the noise field is obtained by the multiple reference channels.

Fig. 28D illustrates generally at 2880 one flow of using compression in accordance with an embodiment of the present invention. Referring to FIG. 28D, flow begins at block 2882. Using as shown in connection with FIG. 28A or FIG. 28C, e.g. Log₁₀Compression, or user-defined compression, the main channel is compressed at block 2884. Using as shown in connection with FIG. 28A or FIG. 28C, e.g. Log₁₀Compression, or user-defined compression, the reference channel is compressed at block 2886. A normalized primary sound signal is created at block 2888. The normalized sound signal is used to detect the desired speech at block 2890. The flow stops at block 2892.

FIG. 28E illustrates generally by 2893 different functions for providing compression in accordance with embodiments of the present invention. Referring to FIG. 28E, table 2894 illustrates several compression functions for purposes of illustration, but this is not meant to be limiting. Column 2895a contains six sample values of variable x. In this example, the variable X ranges from 0.01 to 1000.0, as shown at 2896. Column 2895b shows no compression, where Y ═ X. Column 2895c shows radix-10 log compression, where the compression value Y is log10 (X). Column 2895d shows ln (x) compression, where the compression value Y ═ ln (x). Column 2895e shows radix-2 Log compression, where Y is Log2 (X). User-defined compression (not shown) may also be implemented as desired to provide more or less compression than 2895c, 2895d, or 2895 e. Compressing the results of the short-

term power detectors

2810 and 2818 with compression functions at 2812 and 2820 (fig. 28A) reduces the dynamic range of the normalized primary signal at 2826(Z) input to the single-channel normalized speech threshold comparator (SC-NVTC) 2828. Similarly, compressing the results of the short-

term power detectors

2810, 2818 and 2854 with compression functions at 2812, 2820 and 2856 (fig. 28C) reduces the dynamic range of the normalized primary signal at 2826(Z) and 2862(Z2) input to the SC-NVTC828 and SC-NVTC 864, respectively. The reduced dynamic range achieved by compression may result in more accurate detection of the presence of a desired acoustic signal, and thus a greater degree of noise reduction may be achieved by embodiments of the invention presented herein.

In various embodiments, the components of the multiple input desired speech detector as shown in fig. 28A, 28B, 28C, 28D, and 28E are implemented in an integrated circuit device that may include: an integrated circuit package containing an integrated circuit. In some embodiments, the multiple-input required speech detector is implemented in a single integrated circuit die. In other embodiments, the multiple-input desired speech detector is implemented in more than one integrated circuit die of an integrated circuit device, which may include: a multi-chip package containing the integrated circuit.

Fig. 29A illustrates generally at 2900 an auto-balancing architecture, according to an embodiment of the invention. Referring to fig. 29A, an auto-balancing assembly 2903 has a first signal path 2905a and a second signal path 2905 b. At 2902b, a first channel 2902a (MIC 1) is connected to a first signal path 2905 a. At 2904b, the second sound channel 2904a is connected to a second signal path 2905 b. The sound signal is input to the voice band filter 2906 at 2902 b. The voice band filter 2906 captures most of the required voice energy in the first channel 2902 a. In various embodiments, the voice band filter 1906 is a band pass filter characterized by a lower corner frequency, an upper corner frequency, and a roll-off from the upper corner frequency. In various embodiments, the lower corner frequency may range from 50 to 300Hz, depending on the application. For example, in a broadband phone, the lower corner frequency is about 50 Hz. In a standard telephone, the lower corner frequency is about 300 Hz. The upper corner frequency is selected to allow the filter to pass most of the speech energy picked up by the relatively flat part of the frequency response of the microphone. Thus, the upper corner frequency can be set at different positions depending on the application. A non-limiting example of a location is 2500 Hz. Another non-limiting location for the upper corner frequency is 4000 Hz.

First signal path 2905a includes a long-term power calculator 2908. Long-term power calculator 2908 is implemented in various embodiments as a Root Mean Square (RMS) measurement, a power detector, an energy detector, and so forth. Long-term power calculator 2908 may be synonymously referred to as long-term power calculator 2908. Long-term power detector 2908 approximately calculates a running average long-term power in the filtered signal. The output 2909 of the long term power detector 2908 is input to a divider 2917. Control signal 2914 is input to long-term power calculator 2908 at 2916. Control signal 2914 provides signals as described above in connection with the desired sound signal detector, fig. 28A, 28B, 28C, which indicate when the desired sound signal is present and when the desired sound signal is not present. Segments of the acoustic signal on the first channel 2902b, which have the desired acoustic signal present, are excluded from the long-term power averaging produced at 2908.

The sound signal is input at 2904b to the voice band filter 2910 of the second signal path 2905 b. The voice band filter 2910 captures most of the required voice energy in the second channel 2904 a. In various embodiments, the voice band filter 2910 is a band pass filter characterized by a lower corner frequency, an upper corner frequency, and a roll off from the upper corner frequency. In various embodiments, the lower corner frequency may range from 50 to 300Hz, depending on the application. For example, in a broadband phone, the lower corner frequency is about 50 Hz. In a standard telephone, the lower corner frequency is about 300 Hz. The upper corner frequency is selected to allow the filter to pass most of the speech energy picked up by the relatively flat part of the frequency response of the microphone. Thus, the upper corner frequency can be set at different positions depending on the application. A non-limiting example of a location is 2500 Hz. Another non-limiting location for the upper corner frequency is 4000 Hz.

Second signal path 2905b includes long-term power calculator 2912. The long-term power calculator 2912 is implemented in various embodiments as a Root Mean Square (RMS) measurement, a power detector, an energy detector, and so forth. Long-term power calculator 2912 may be synonymously referred to as long-term power calculator 2912. The long-term power detector 2912 approximately calculates a running average long-term power in the filtered signal. The output 2913 of the long-term power detector 2912 is input to a divider 2917. Control signal 2914 is input at 2916 to long-term power calculator 2912. Control signal 2914 provides signals as described above in connection with the desired sound signal detector, fig. 28A, 28B, 28C, which indicate when the desired sound signal is present and when the desired sound signal is not present. The segments of the acoustic signal on the second channel 2904b having the desired acoustic signal present are excluded from the long-term power average generated at 2912.

In one embodiment, output 2909 is normalized by output 2913 at 2917 to produce an amplitude correction signal 2918. In one embodiment, a divider is used at 2917. The amplitude correction signal 2918 is multiplied by the instantaneous value of the second microphone signal at 2904a at multiplier 2920 to produce a corrected second microphone signal at 2922.

In another embodiment, output 2913 is normalized at 2917 by output 2909 to produce an amplitude correction signal 2918. In one embodiment, a divider is used at 2917. Amplitude correction signal 2918 is multiplied by the instantaneous value of the first microphone signal on 1902a using a multiplier connected to 2902a (not shown) to produce a corrected first microphone signal for first microphone channel 2902 a. Thus, in various embodiments, either the second microphone signal is automatically balanced with respect to the first microphone signal or, alternatively, the first microphone signal is automatically balanced with respect to the second microphone signal.

It should be noted that the long term average powers calculated at 2908 and 2912 are performed when the desired sound signal is not present. Thus, the average power represents the average of the undesired sound signal that is typically generated in the far field. In various embodiments, as a non-limiting example, in some embodiments, the duration of the long-term power calculator ranges from about a fraction of a second, e.g., one-half second to five seconds to several minutes, and is application dependent.

FIG. 29B illustrates auto-balancing generally at 2950, according to an embodiment of the invention. Referring to fig. 29B, an automatic balancing component 2952 is configured to receive a main channel 2954a and a reference channel 2956a as inputs. The balance function continues similar to the description given above in connection with fig. 29A using the first channel 2902a (MIC 1) and the second channel 2904a (MIC 2).

Referring to fig. 29B, an auto-balancing component 2952 has a first signal path 2905a and a second signal path 2905B. At 2954b, first channel 2954a (main) is connected to first signal path 2905 a. At 2956b, second channel 2956a is connected to second signal path 2905 b. The sound signal is input to the voice band filter 2906 at 2954 b. The voice band filter 2906 captures most of the required voice energy in the first channel 2954 a. In various embodiments, voice band filter 2906 is a band pass filter characterized by a lower corner frequency, an upper corner frequency, and a roll off from the upper corner frequency. In various embodiments, the lower corner frequency may range from 50 to 300Hz, depending on the application. For example, in a broadband phone, the lower corner frequency is about 50 Hz. In a standard telephone, the lower corner frequency is about 300 Hz. The upper corner frequency is selected to allow the filter to pass most of the speech energy picked up by the relatively flat part of the frequency response of the microphone. Thus, the upper corner frequency can be set at different positions depending on the application. A non-limiting example of a location is 2500 Hz. Another non-limiting location for the upper corner frequency is 4000 Hz.

First signal path 2905a includes a long-term power calculator 2908. Long-term power calculator 2908 is implemented in various embodiments as a Root Mean Square (RMS) measurement, a power detector, an energy detector, and so forth. Long-term power calculator 2908 may be synonymously referred to as long-term power calculator 2908. Long-term power detector 2908 approximately calculates a running average long-term power in the filtered signal. The output 2909b of the long term power detector 2908 is input to the divider 2917. Control signal 2914 is input to long-term power calculator 2908 at 2916. Control signal 2914 provides signals as described above in connection with the desired sound signal detector, fig. 28A, 28B, 28C, which indicate when the desired sound signal is present and when the desired sound signal is not present. Segments of the sound signal on the first channel 2954b, which have the desired sound signal present, are excluded from the long-term power averaging produced at 2908.

The sound signal is input to the voice band filter 2910 of the second signal path 2905b at 2956 b. The voice band filter 2910 captures most of the required voice energy in the second channel 2956 a. In various embodiments, the voice band filter 2910 is a band pass filter characterized by a lower corner frequency, an upper corner frequency, and a roll off from the upper corner frequency. In various embodiments, the lower corner frequency may range from 50 to 300Hz, depending on the application. For example, in a broadband phone, the lower corner frequency is about 50 Hz. In a standard telephone, the lower corner frequency is about 300 Hz. The upper corner frequency is selected to allow the filter to pass most of the speech energy picked up by the relatively flat part of the frequency response of the microphone. Thus, the upper corner frequency can be set at different positions depending on the application. A non-limiting example of a location is 2500 Hz. Another non-limiting location for the upper corner frequency is 4000 Hz.

Second signal path 2905b includes long-term power calculator 2912. The long-term power calculator 2912 is implemented in various embodiments as a Root Mean Square (RMS) measurement, a power detector, an energy detector, and so forth. Long-term power calculator 2912 may be synonymously referred to as long-term power calculator 2912. The long-term power detector 2912 approximately calculates a running average long-term power in the filtered signal. The output 2913b of the long-term power detector 2912 is input to a divider 2917. Control signal 2914 is input at 2916 to long-term power calculator 2912. Control signal 2916 provides signals as described above in connection with the desired sound signal detector, fig. 28A, 28B, 28C, which indicate when the desired sound signal is present and when the desired sound signal is not present. The segment of the acoustic signal on the second channel 2956b having the desired acoustic signal present is excluded from the long term power averaging produced at 2912.

In one embodiment, output 2909b is normalized at 2917 by output 2913b to produce amplitude correction signal 2918 b. In one embodiment, a divider is used at 2917. The amplitude correction signal 2918b is multiplied at multiplier 2920 by the instantaneous value of the second microphone signal at 2956a to produce a corrected second microphone signal at 2922 b.

In another embodiment, either output 2913b is normalized at 2917 by output 2909b to produce the amplitude correction signal 2918 b. In one embodiment, a divider is used at 2917. The amplitude correction signal 2918b is multiplied by the instantaneous value of the first microphone signal on 2954a using a multiplier connected to 2954a (not shown) to produce a corrected first microphone signal for the first microphone channel 2954 a. Thus, in various embodiments, either the second microphone signal is automatically balanced with respect to the first microphone signal or, alternatively, the first microphone signal is automatically balanced with respect to the second microphone signal.

Embodiments of the auto-balancing assembly 2902 or 2952 are configured for auto-balancing multiple microphone channels, as shown in fig. 24A. In such a configuration, the plurality of channels (e.g., the plurality of reference channels) are balanced with respect to the main channel. Or multiple reference channels and a main channel are balanced with respect to a particular reference channel as described above in connection with fig. 29A or 29B.

Fig. 29C illustrates filtering according to an embodiment of the present invention. Referring to fig. 29C, 2960a shows two

microphone signals

2966a and 2968a having an amplitude 2962 plotted as a function of frequency 2964. In some embodiments, the microphone does not have a constant sensitivity as a function of frequency. For example, microphone response 2966a may represent a microphone output (response) having a non-flat frequency response excited by a broadband excitation that is flat in frequency. Microphone response 2966a includes non-flat region 2974 and flat region 2970. For this example, the microphone producing response 2968a has uniform sensitivity with respect to frequency; thus, 2968a is generally flat in response to broadband excitation, which is flat in frequency. In some embodiments, it makes sense to balance the flat area 2970 of the microphone response. In this case, the non-flat region 2974 is filtered out so that the energy in the non-flat region 2974 does not affect the microphone auto-balance procedure. Of interest is the difference 2972 between the flat areas of the two microphone responses.

In 2960b, a filter function 2978a is shown and plotted with amplitude 2976 plotted as a function of frequency 2964. In various embodiments, the filter function is selected to eliminate the non-flat portion 2974 of the microphone response. Filter function 2978a is characterized by a lower corner frequency 2978b and an upper corner frequency 2978 c. The filter function of 2960b is applied to the two

microphone signals

2966a and 2968a, the result of which is shown in 2960 c.

In 2960c, filtered

representations

2966c and 2968c of the microphone signals 2966a and 2968a are plotted as a function of amplitude 2980 and frequency 2966. The difference 2972 represents the sensitivity difference between the two filtered

microphone signals

2966c and 2968 c. This difference between the two microphone responses is balanced by the system described above in connection with fig. 29A and 29B. Referring back to fig. 29A and 29B, in various embodiments, the voice band filters 2906 and 2910, in a non-limiting example, may apply the filtering function shown in 2960B to the microphone channels 2902B and 2904B (fig. 29A) or the main and reference channels 2954B and 2956B (fig. 29B). The difference 2972 between the two microphone channels may be minimized or eliminated by the auto-balancing procedure described in fig. 29A or fig. 29B above.

Figure 30 illustrates a flow of auto-balancing, generally at 3000, in accordance with an embodiment of the present invention. Referring to FIG. 30, flow begins at block 3002. At block 3004, an average long-term power in the first microphone channel is calculated. The average long-term power calculated for the first microphone channel does not include segments of the microphone signal that occur when the desired sound signal is present. The input from the desired voice activity detector is used to exclude relevant portions of the desired sound signal. At block 3006, an average power in the second microphone channel is calculated. The average long-term power calculated for the second microphone channel does not include segments of the microphone signal that occur when the desired sound signal is present. The input from the desired voice activity detector is used to exclude relevant portions of the desired sound signal. At block 3008, an amplitude correction signal is calculated using the average values calculated in

blocks

3004 and 3006.

In various embodiments, the components of the auto-

balancing component

2903 or 2952 are implemented in an integrated circuit device, which may include: an integrated circuit package containing an integrated circuit. In some embodiments, the auto-

balancing component

2903 or 2952 is implemented in a single integrated circuit die. In other embodiments, the auto-balancing

assembly

2903 or 2952 is implemented in more than one integrated circuit die of an integrated circuit device, which may include: a multi-chip package containing the integrated circuit.

Fig. 31 illustrates, generally at 3100, a sound signal processing system in which embodiments of the invention can be used. The block diagram is a high-level conceptual representation that may be implemented in various ways and with various architectures. Referring to fig. 31, a bus system 3102 interconnects a Central Processing Unit (CPU)3104, a Read Only Memory (ROM)3106, a Random Access Memory (RAM)3108, a memory 3110, a display 3120, audio 3122, a keyboard 3124, a pointer 3126, a Data Acquisition Unit (DAU)3128, and a communication 3130. The bus system 3102 may be one or more of a system bus, a Peripheral Component Interconnect (PCI), an Advanced Graphics Port (AGP), a Small Computer System Interface (SCSI), Institute of Electrical and Electronics Engineers (IEEE) standard number 1394 (firewire), a Universal Serial Bus (USB), or a dedicated bus designed for custom applications, for example. The CPU 3104 may be a single, multiple, or even a distributed computing resource or Digital Signal Processing (DSP) chip. The memory 3110 may be a Compact Disc (CD), Digital Versatile Disc (DVD), Hard Disk (HD), optical disc, magnetic tape, flash memory, memory stick, video recorder, or the like. The sound signal processing system 3100 may be used to receive sound signals input from multiple microphones (e.g., a first microphone, a second microphone, etc.), or from a primary channel and multiple reference channels as described above in connection with previous figures. Note that the sound signal processing system may include some, all, more, or a rearrangement of components in the block diagrams, depending on the actual implementation of the sound signal processing system. In some embodiments, aspects of system 3100 are implemented in software, while in some embodiments, aspects of system 3100 are implemented in dedicated hardware, such as a Digital Signal Processing (DSP) chip, and in a combination of dedicated hardware and software as is known and understood by those of ordinary skill in the art.

Thus, in various embodiments, the sound signal data is received at 3129 for processing by the sound signal processing system 3100. Such data may be transmitted at 3132 via communications interface 3130 for further processing at the remote location. As will be appreciated by those skilled in the art, obtaining a connection to a network, such as an intranet or the internet, through 3132 enables the sound signal processing system 3100 to communicate with other data processing devices or systems at a remote location.

For example, embodiments of the invention may be implemented on a computer system 3100 configured as a desktop computer or workstation, e.g., running a software program such as

XP Home or

Compatibility of XP Professional, Linux, Unix and other operating systems

The computer of (2) and a computer from apple computer corporation running an operating system such as OS X. Alternatively, in conjunction with such implementations, embodiments of the present invention may be configured with devices such as speakers, headphones, video monitors, etc. configured for use with a bluetooth communication channel. In other embodiments, embodiments of the invention are configured to be implemented by mobile devices such as smart phones, tablets, wearable devices such as glasses, near-eye (NTE) headsets, commonly configured head-worn devices such as, but not limited to, glasses, goggles, visors, headbands, helmets, and the like, or the like.

In one or more embodiments, a hearing aid is provided to a user to facilitate hearing sounds from a local environment.

Fig. 32A illustrates generally at 3200 a microphone configuration on a head-mounted device, according to an embodiment of the invention. Fig. 32B shows generally a top view of a microphone configuration on a headset corresponding to fig. 32A, by 3220, according to an embodiment of the present invention. Fig. 32C illustrates a bottom view, generally 3240, of a microphone configuration on a headset corresponding to fig. 32A, in accordance with an embodiment of the invention. Fig. 33 shows the headset from fig. 32A with respect to different sound sources generally by 3300 according to an embodiment of the present invention. Referring to fig. 32A to 33 together, the head mounted device 3201 is presented in the shape of glasses used in a three-dimensional space. The three-dimensional space is represented by the X, Y, Z axis at 3301 (fig. 33). The three-dimensional space is represented by a cartesian coordinate system as known in the art. However, this is not meant to be limiting. The three-dimensional space may be represented by another coordinate system. In other embodiments, the head-mounted device is in the shape of goggles or the like, which is not meant to be limiting. Herein, the term "eyewear" or "eyewear device" is synonymous with head-mounted devices. The head-mounted device 3201 has a front frame containing one or more lenses made of glass or plastic, a left frame 3214, and a right frame 3212. The left and right frames are also known in the art as temples. The headset is shown with four microphones, microphone 0(3202), microphone 1(3204), microphone 2(3206), and microphone 3 (3210). In one or more embodiments, microphone 0(3202) is located below the bottom of left side frame 3214, and microphones 1(3204) and 2(3206) are located at the top of left side frame 3214. A microphone 3(3210) is located on top of the right side frame 3212. Alternatively, microphones 0(3202), 1(3204), and 2(3206) are located on the right side frame 3212, and microphones 3(3210) are located on the left side frame 3214.

In various embodiments, the eyewear apparatus includes a microphone array coupled to at least one side frame member. The microphone array includes at least a first and a second microphone. In one or more embodiments, first and second microphones, e.g., 3202 and 3204, are located near the side frame member 3214 of the front frame member. As at 3209L2, the first and second microphones are approximately between 5mm and 30mm from the front frame member, and may be approximately 15mm (fig. 32B). The first microphone (microphone 0(3202)) is located on the bottom side of side frame member 3214, and the second microphone (microphone 1(3204)) is located on the top side of side member 3214 just or near the top of side frame member 3214. In another embodiment, the third microphone (microphone 2(3206)) is located on the side frame member 3214 and away from the front frame member. Such as L at 3208₁The third microphone (microphone 2(3206)) is shown to be between about 10mm and 20mm from the location of the first and/or second microphone (3202/3204), and may be about 15 mm. If the distance L is₁Too long, the third microphone (microphone 2(3206)) may be close to the speaker embedded in the side frame member and located near the ear of the wearer. In this case, there may be an echo from the speaker to the microphone 2 (3206). For certain embodiments, this echo is reduced by reducing the distance L₁Is solved. Distance L₁The reduction in (b) increases the separation distance between the microphones 2(3206) and the loudspeaker 3350, thereby reducing any echo.

In another embodiment, a fourth microphone (microphone 3(3210)) is located on the other side frame member 3212. Microphone 3(3210) is shown near the front frame member, but may be elsewhere along the frame member 3212. The distance between microphone 1(3204) and microphone 3(3210) is determined by the width of the eyeglass frame and is large enough for the system to detect the difference in signal levels from the two microphones. The distance between microphones 1(3204) and 3(3210) is not a constant number, but is typically provided by the geometry and size of the head-mounted device. Likewise, the distance between microphones 0(3202) and microphones 3(3210) is not a fixed number, but is typically provided by the geometry and size of the headset.

Fig. 32D shows a perspective view of another set of microphone layouts on a headset, generally indicated by 3260, in accordance with an embodiment of the present invention. Fig. 32E shows a bottom view, generally 3280, of a layout of microphones on a headset, corresponding to fig. 32D, in accordance with an embodiment of the present invention. Referring to fig. 32D, microphones 0(3202) and microphones 1(3204) are located at temples 3212 on the inner surface thereof. Microphone 2(3206) is located on the bottom surface of right temple 3212 and is set back from microphones 0 (3202)/microphones 1(3204) with L as described above₁An equal amount. Microphone 0 (3202)/distance between microphone 1(3204) and front frame L as described above₂(FIG. 32B). Referring back to fig. 32D, the microphones 3(3210) are located on the bottom sides of the temples 3210. Alternatively, one or both of microphones 2(3206) and 3(3210) may be located on the top surface of their respective temples.

In an alternative embodiment, the microphone layout shown in fig. 32D/32E may be reversed with respect to the temple. For example, microphones 0(3202), microphones 1(3204), microphones 2(3206) may be located on the inner surface of left temple 3214, and microphones 3(3210) may be located on right temple 3212.

These four microphones support three or more microphone combinations for different usage scenarios described herein as configuration 1 using microphone 0 and microphone 1, configuration 2 using microphone 1 and microphone 2, and configuration 3 using microphone 1 and microphone 3. In some embodiments, a software interface is used to control switching between these microphone combinations and sequencing between configurations.

In various embodiments, the eyewear will have more than four microphones or less than four microphones. Four microphones are used to illustrate one or more embodiments described herein and do not constitute a limitation on embodiments of the invention. Three configurations of microphones are described below for use by a user of the head-mounted device, receiving and processing sound signals to aid the user's hearing, and in some cases for remote use by way of, for example, speech recognition, command and control, reception and listening by another user, and local use by embedded speech recognition, etc. The configuration described below may be used to provide the main sound signal and the reference sound signal used in the above-described noise canceling system.

Configuration 1

In one or more embodiments, when the user wears the head mounted device 101 and speaks simultaneously, microphone 0 and microphone 1 are used to process the sound signals. In configuration 1, the signals output from microphone 0 and microphone 1 are beamformed so that the primary sound signal response is at a position down the axis 3302. The axis 3302 is in the nominal direction of the user's mouth 3310, but need not be precisely aligned therewith. Microphone 0 and microphone 1 have different acoustic distances to the user's mouth 3320, where the acoustic distance of microphone 0 is less than the acoustic distance of microphone 1. The sound signal 3312 emanating from the user's mouth 3310 is received with maximum acoustic sensitivity to the direction of the user 3310, relative to the microphone pair microphone 0 and microphone 1. The sound signal thus obtained is used as the main signal input to the multi-channel noise canceling system. By beamforming the microphone pair microphone 0 and microphone 1 with the primary response steered 180 degrees away from the sound source 3310, a reference signal is obtained that contains mostly noise (mostly undesired sound signals). Thus, the reference signal is obtained in a direction looking up along the axis 3302, away from the user's mouth 3310, and towards a potential noise source, such as the noise source represented by 3360, which emits noise 3362 (an undesired sound signal). The signal obtained so as to be hidden from the user's mouth 3310 is used as a reference signal to be input to the above-described multi-channel noise canceling system. The beamforming applied to the reference signal minimizes the acoustic sensitivity 3 to signals from the user's mouth 3310 and maximizes the sensitivity to noise generated in directions away from the user's mouth. Thus, the signal-to-noise ratio difference between microphone 0 and microphone 1 is maximized for reducing noise in the primary signal by subsequent application of noise cancellation.

Processing to reduce noise (undesired sound signals) in the signal of interest (desired sound signal) allows the combination of microphone 0 and microphone 1 to help enhance the user's speech when making a call in a noisy environment. It also helps to improve the performance of the system in command and control when used in noisy environments. In a noisy environment, the user's voice is hidden in background noise and is difficult to understand by far-end listeners and to recognize by the speech engine during a call. The combination of microphone 0 and microphone 1 uses beamforming techniques to improve the signal-to-noise ratio (SNR) of the user's voice relative to background noise (and to increase the signal-to-noise ratio difference between microphone 0 and microphone 1), thereby improving the accuracy of voice activity detection for noise cancellation. This combination provides useful performance gains even in extremely noisy environments with background noise amplitudes of 90-dB or greater. As described above, the microphone 0 and the microphone 1 may be implemented using omni-directional microphones.

Configuration 2

In one or more embodiments, when a user is listening to a remote sound source, such as 3330, while wearing the head mounted device 3201, microphone 1 and microphone 2 are used to process the sound signals. In configuration 2, the signals output from microphone 1 and microphone 2 are beamformed to place the primary sound signal response at a forward position along axis 3304 to receive a sound signal 3332 emitted by a sound source shown at 3330 with maximum acoustic sensitivity directed relative to the microphone to the direction of the sound source 3330 of microphone 1 and microphone 2. The signal thus obtained is used as the main signal input to the multi-channel noise cancellation system. The reference signal, which mainly contains noise, can be obtained from the microphone 2 with or without beamforming. When omni-directional microphones are used for both microphone 1 and microphone 2, beamforming for both microphone 1 and microphone 2 to obtain the primary signal while using only microphone 2 for the reference signal, without beamforming for microphone 1, increases the sensitivity of beamforming to the source 3330 direction by about 6dB relative to the sensitivity of microphone 2 alone to the source 3330. This processing provides a significant signal-to-noise ratio difference between microphone 1 and microphone 2 that is beneficial for noise cancellation performance. Axis 3304 points in a nominal direction in front of the user, but need not be precisely aligned therewith. Microphone 1 and microphone 2 have different acoustic distances to a sound source, e.g. 3330, located in front of the user. The acoustic distance between the sound source 3330 and the microphone 1 is smaller than the acoustic distance between the microphone 2 and the sound source 3330. Thus, the

microphones

1 and 2 can be flexibly arranged on the head mounted device to provide different acoustic distances with respect to a sound source located in front of the head mounted device without having to be directed directly to the sound source 3330.

In an alternative embodiment, beamforming microphone 1 and microphone 2 for a microphone having a primary response steered 180 degrees away from the acoustic source 3330 may be used to provide a reference signal (primarily an undesired sound signal). Note that the reference signal obtained in combination with it preferably has the least amount of desired sound signal. The reference signal may be obtained according to two methods, compared and then a selection may be made based on the best system performance. Thus, the reference signal obtained by either method has a signal-to-noise ratio that is less than the signal-to-noise ratio of the primary signal. Thus, a signal-to-noise ratio difference is obtained for the microphone 1/microphone 2 pair relating to the signal of interest originating from what is nominally the front of the headset 3201, e.g., 3330/3332. The signal obtained by evading the acoustic source 3330 by any of the methods described above is used as a reference signal input to the multi-channel noise cancellation system. The beamforming for the reference signal is selected to provide minimum acoustic sensitivity to signals from the front of the user (the desired sound signal), e.g., source 3330, and to maximize sensitivity to noise generated from directions other than source 3330. Thus, the signal-to-noise ratio difference between the microphone 1 and the microphone 2 is maximized for reducing noise in the primary signal by subsequent application of noise cancellation.

The output of the noise cancellation system is then provided to a speaker 3350 to assist the user in hearing the sound from the sound source 3330. The speaker 3350 is incorporated into one or both side frames of the glasses 3201. Thus, in various embodiments, the combination of

microphones

1, 2 is used to enhance the hearing of the user, for example, during some activities such as watching television or talking to a person in front of the user wearing glasses 3201. Some people with hearing difficulties are unable to understand sound signals clearly, especially in noisy environments. Combination 2 applies beamforming techniques to help the user focus on the sound signal of interest by spatially eliminating background noise.

Configuration 3

In one or more embodiments, microphone 1 and microphone 3 are used to process sound signals when a user is listening to or interacting with a remote sound source, such as 3320 or 3340, from one side or the other while wearing the head mounted device 3201. Alternatively, microphone 3 and microphone 2 are used to process signals for configuration 3, or microphone 3 and microphone 0 are used. The following description of the arrangement 3 is provided in the form of the microphone 3 and the microphone 1 and is not intended to be limiting in this regard. In configuration 3, the acoustic energy output from the

microphones

1 and 3 is compared to determine which side of the user the largest sound comes from. This information is useful because in a meeting, for example, people sit around a table and different people may speak from time to time, thereby creating different directions of arrival relative to the user wearing the glasses 3201. In configuration 3, the signals output from a selected pair of microphones are processed so that the primary sound signal response is at a position along axis 3306, which axis 3306 is in the nominal direction of the sound source, but need not be precisely aligned therewith. A selected pair of microphones, for example, microphone 3 and microphone 0, microphone 3 and microphone 1, or microphone 3 and microphone 2, one of which has a different acoustic distance to the sound source.

According to one method of operation, the primary microphone is the microphone from the pair of microphone 1, microphone 3 having the greatest acoustic energy output. The other microphone of the pair of

microphones

1, 3 is then designated as the reference microphone. After determining which microphone outputs the greatest acoustic energy, an alternating process of the primary signal and the reference signal may be performed. For example, in one or more embodiments, beamforming is applied to the signals output by

microphones

1 and 3. In one example, the primary signal is obtained when the main response axis in the beamforming process is steered to one side (direction) of the maximum acoustic energy being measured. In this example, the reference signal is obtained by turning the main response axis in the beamforming process to the opposite side of the main response axis.

One variation of this process is to use beamforming to obtain the primary signals, i.e., the output of the beamforming microphone 1 and microphone 3 (which is steered to the side of the microphone 1 and microphone 3 where the greatest acoustic energy is measured, while the non-beamformed output of the microphone with the lower acoustic energy is used for the reference signal.

Another variation of this process is to obtain the reference signals, i.e. the outputs of the beamforming microphone 1 and microphone 3 (which are turned to the side of the one of the microphone 1 and microphone 3 where the minimum acoustic energy is measured, while the non-beamformed outputs of the microphone with the maximum acoustic energy are used for the primary signals, using beamforming.

In a non-limiting example, referring to FIG. 33, there is a hypothetical use scenario when the sound source 3320 is louder than the sound source 3340. In one or more embodiments, the system is designed to select the microphone 3 as the side that receives the primary signal. Receiving the primary signal may be accomplished by any of the methods described directly above, such as beamforming the microphone 1 and the microphone 3 while placing the primary response axis 3306 in the direction of the acoustic source 3320. Alternatively, the output of the microphone 3 may be used as the primary signal without beamforming. By beamforming the microphone 1 and the microphone 3 while placing the primary response axis 3306 in the opposite direction to the sound source 3320, a reference signal may be obtained. Alternatively, the output of the microphone 1 may be used as a reference signal without beamforming.

In some embodiments, a system is implemented by the above method to sort, e.g., beamform, to select a primary signal or reference signal and use the non-beamformed output of the microphone for the primary signal or reference signal. A performance metric of each method, for example, a signal-to-noise ratio difference between the main signal and the reference signal, is calculated, and the method having the largest signal-to-noise ratio difference is a method for processing signals from the

microphones

1 and 3. The ranking by these methods may be performed at the beginning of signal processing, or may be performed continuously to monitor performance metrics, which may then be dynamically updated based on evolving changes in the performance metrics. Thus, many different approaches may be used in the implementation of configuration 3. The output of the noise cancellation system is then provided to one or more speakers 3350 to assist the user in hearing the sound from the sound source 3320. The speaker 3350 is incorporated into one or both side frames (temples) of the eyeglasses 3201.

A similar process is achieved when the acoustic source 3340 generates greater acoustic energy 3342 at the microphone 1 relative to the level of acoustic energy received at the microphone 3. In this case, the system may use a beamforming process to adjust the direction of the primary response axis of the microphone pair in the direction of the acoustic source 3340.

The pair of

microphones

1 and 3 helps the user to pick up stronger sounds from around the user during a conversation, especially from the left and right sides, by comparing the sound energy picked up from the

microphones

1 and 3, and the voice signals may come from different directions (right or left) of the user during a group conference or chat. Configuration 3 compares the sound signal energy on each of the two microphones to determine from which side the sound signal is coming, thereby helping the user to focus on the person speaking actively during the conversation. The output of the noise cancellation system is then provided to a speaker 3350 to assist the user in hearing the sound from the

sound source

3320 or 3340. The speaker 3350 is incorporated into one or both side frames of the glasses 3201.

Configuration switching and scanning

In various embodiments, the system may be configured to switch between two, three, or more configurations. The scanning configuration or scanning of the different beams (or selected pairs of microphones) formed by the array of microphones incorporated in the head-mounted device may also be done automatically by signal processing (hardware or a combination of hardware and software) built into the head-mounted device. Thus, in some embodiments, a system is implemented that forms beams (or processes selected pairs of microphones) by scanning in multiple directions relative to a user and provides assistance to the user with sound signals that have been received and improved by one or more of beamforming, noise cancellation, and/or volume adjustment, prior to presentation to the user, either locally or remotely.

For example, when watching television and making a call simultaneously, the system may be configured to switch between configuration 1 (place a call) and configuration 2 (watch television). The metric for switching to configuration 1 (phone function) may be related to detecting a change in acoustic energy on microphone 0.

Another example of a configuration switch may be a switch from configuration 3 to configuration 2 during a conversation. For example, in a conference, a person sitting on the right of the user wearing the glasses 3201 starts speaking. Such geometry is represented by the source 3320 outputting acoustic energy 3322 and the output of the microphone 3, where the output of the microphone 3 is greater than the output of the microphone 1. At this point, the system operates under configuration 3. When the user listens to and recognizes that the speaker is on the right, the user may turn his or her head to the right to face the speaker. Now facing speaker 3320, the difference between the acoustic energy received at microphone 1 and microphone 3 is reduced while the acoustic energy at microphone 1 is increased. In this case, the system switches to configuration 2 described above.

In one mode of operation, the user does not have to turn the head from side to face the speaker in the conference. As the person speaking actively changes from one location to another, for example, from location 3320 (right side relative to glasses 3201) to location 3340 (left side relative to glasses 3201) to location 3330 (front of glasses 3201) to location 3380 (back of glasses 3201). The system will switch between microphone pairs and directions to select the primary microphone (individual or beamformed output) in the direction of the speaker and the reference microphone (individual or beamformed output) in the direction of the noise (primarily the undesired sound signal).

Thus, embodiments of the present invention are implemented by a system that switches between

configurations

1, 2, and 3 (or any subset thereof), the switching of

configurations

1, 2, and 3 (or any subset thereof) may operate by mechanical switching, audio switching, or by an intelligent design that may operate by analyzing one or more performance metrics, including, for example and without limitation, maximum signal-to-noise ratio differences, maximum acoustic energy output from a microphone or beamforming output, and the like.

Three configurations using three or four microphones have been described above in connection with the figures. Note that more than four microphones may be used with the headset to provide an approximate n number of directions (axes) and possible configurations for processing the acoustic signals. Also, beamforming may be performed with more than two microphones.

Fig. 34 illustrates generally at 3400 processing sound signals from a microphone array configured with a headset according to an embodiment of the present invention. Referring to FIG. 34, flow begins at block 3402. At block 3404, a microphone that is part of an array of microphones attached to the headset is scanned. Scanning involves analyzing the sound signal from the microphone for signal amplitude levels and, in some cases, other parameters. At block 3406, a configuration is selected based on the scan from block 3404. In some embodiments, selection logic is used to select between the configurations available with a given microphone array. At block 3408, the sound signal from the configuration selected at block 3406 is processed to improve the sound signal. Improving the sound signal may include inputting the sound signal to a noise cancellation module to remove the bottom sound signal from the primary channel. Improving the sound signal may include amplifying the sound signal and presenting the amplified sound signal to a user of the headset over a speaker associated with the headset. The flow stops at block 3412.

For purposes of discussion and understanding of embodiments of the present invention, it is understood that various terms are used by those skilled in the art to describe techniques and methods. Furthermore, in the description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the embodiments may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that logical, mechanical, electrical and other changes may be made without departing from the scope of the present invention.

Some portions of the description may be presented in terms of algorithms and symbolic representations of operations on data bits within, for example, a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of acts leading to a desired result. The acts are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, waveforms, data, time series, or the like.

It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the discussion, it is appreciated that throughout the description, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or "displaying" or the like, may refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.

An apparatus for performing the operations herein may implement the present invention. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. The computer program may be stored on a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, hard disks, optical disks, compact disk read-only memories (CD-ROMs), and magneto-optical disks, read-only memories (ROMs), Random Access Memories (RAMs), electrically programmable read-only memories (EPROM), electrically erasable programmable read-only memories (EEPROMs), flash memories, magnetic or optical cards, etc., or any type of media suitable for storing electronic instructions local to or remote from a computer.

The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the required method. For example, any of the methods according to the present invention may be implemented by hard-wired circuitry resulting from programming a general-purpose processor, or by any combination of hardware and software. Those skilled in the art will appreciate that the invention may be practiced with other computer system configurations than those described, including: hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, Digital Signal Processing (DSP) devices, network personal computers, minicomputers, mainframe computers, and the like. The embodiments may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In other examples, the embodiments of the invention described above in fig. 1-31 may be implemented using a system on a chip (SOC), a bluetooth chip, a Digital Signal Processing (DSP) chip, a codec with Integrated Circuit (ICs), or other hardware and software implementations.

The method of the present invention may be implemented using computer software. If written in a programming language conforming to a recognized standard, sequences of instructions designed to implement the methods can be compiled for execution on a variety of hardware platforms and for interface to a variety of operating systems. In addition, the present invention is not described with reference to any particular programming language. It should be appreciated that a variety of programming languages may be used to implement the embodiments described herein. Additionally, it is common in the art to speak of software, in one form or another (e.g., program, procedure, application, driver … …) as taking an action or causing a result. Such expressions are merely a shorthand way of saying that execution of the software by a computer causes the processor of the computer to perform an action or produce a result.

It is to be understood that various terms and techniques are used by those skilled in the art to describe communications, protocols, applications, implementations, mechanisms, and the like. A similar technique is to describe the implementation of the technique in terms of algorithms or mathematical expressions. That is, although a technique may be, for example, implemented as executing code on a computer, the expression of that technique may be more aptly and succinctly conveyed or communicated as a formula, algorithm, mathematical expression, block diagram, or flowchart. Thus, those of ordinary skill in the art will recognize that the implementation of a + B ═ C as a block of an addition function, in hardware and/or software, will take two inputs (a and B) and produce one summed output (C). Thus, the use of formula, algorithm, or mathematical expression as descriptions is to be understood as having a physical embodiment in at least hardware and/or software (e.g., a computer system in which the techniques of the present invention may be implemented and realized as an embodiment).

A non-transitory machine-readable medium is understood to include any mechanism for storing information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium, synonymously referred to as a computer-readable medium, includes Read Only Memory (ROM); random Access Memory (RAM); a magnetic disk storage medium; an optical storage medium; a flash memory device; electrical, optical, acoustical or other form of information transfer, other than by way of a propagated signal (e.g., a carrier wave, an infrared signal, a digital signal, etc.); and so on.

As used in this specification, the word "one embodiment" or "an embodiment" or similar phrases means that the feature being described is included in at least one embodiment of the present invention. References to "one embodiment" in this description do not necessarily refer to the same embodiment, however, the embodiments are not mutually exclusive. Nor does "one embodiment" imply that there is only one embodiment of the invention. For example, features, structures, acts, etc. described in connection with one embodiment may be included in other embodiments. Thus, the invention may include various combinations and/or integrations of the embodiments described herein.

Thus, embodiments of the present invention may be used to reduce or eliminate undesired sound signals from an acoustic system that processes and transmits the sound signals. Some non-limiting examples of systems may be, but are not limited to, use in short-arm headsets, such as telephone audio headsets suitable for enterprise call centers, industrial and general mobile use, in-line "earbuds" headsets, near-eye (NTE) headset displays or headset computing devices with input lines (wires, cables or other connectors) mounted on or within the eyeglass frame, long-arm headsets for high noise environments such as industrial, military and aerospace applications, and off-the-shelf microphones that may be used to provide sound effects of theater or symphony hall type sound quality. Other embodiments of the present invention are readily implemented in head-mounted devices having a general configuration, such as, but not limited to, eyeglasses, goggles, visors, headbands, helmets, and the like.

While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, but can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.

Claims

1. A device to be worn on a user's head, comprising:

a head-mounted device;

an array having at least three microphones, the at least three microphones being arranged along a plurality of at least two non-parallel axes;

selection logic configured to identify a selected axis from the plurality of non-parallel axes and to identify two microphones from an array forming the selected axis; and

a beamformer configured to receive as input signals from the two microphones and to output a main microphone channel and a reference microphone channel.

2. The apparatus of claim 1, wherein the selection logic identifies the selected axis using a metric.

3. The apparatus of claim 2, wherein the metric comprises a microphone that receives a maximum sound pressure level.

4. The apparatus of claim 3, wherein the metric comprises a microphone that receives a minimum sound pressure level.

5. The apparatus of claim 2, wherein the selection logic is configured to monitor the metric and select a new selected axis from the plurality of non-parallel axes based on a new value of the metric.

6. The apparatus of claim 2, wherein the metric is a maximum signal-to-noise ratio difference of the two microphones.

7. The apparatus of claim 1, further comprising:

a switch to select the selected axis based on a state of the switch.

8. The apparatus of claim 1,

the primary microphone channel and the reference microphone channel are input to a two-stage noise cancellation module.

9. The apparatus of claim 1, further comprising:

a speaker connected to the head-mounted device and configured to provide a signal that the user can hear.

10. The apparatus of claim 9, wherein the primary microphone channel and the reference microphone channel are used to create a signal input to the speaker when the selected axis is not directed toward the user's mouth.

11. The apparatus of claim 1, wherein the first microphone, the second microphone, and the third microphone of the array are located on a first temple of the headset.

12. The apparatus of claim 11, wherein the first microphone and the second microphone are located on an inner surface of the first temple.

13. The apparatus of claim 11, wherein the array further comprises:

a fourth microphone located on a second temple of the headset and the first microphone and the fourth microphone form a third axis, the second microphone and the fourth microphone form a fourth axis and the third axis is different from the fourth axis, the selection logic selects an activity direction from at least one of the first axis, the second axis, and the third axis.

14. A device to be worn on a user's head, comprising:

a head-mounted device, the head-mounted device further comprising:

an array of three microphones, the array being connected to the headset, wherein a first microphone and a second microphone of the array define a first axis; the second and third microphones define a second axis; and

a speaker connected to the head-mounted device and configured to provide a signal that the user can hear; and

selection logic to select an active direction from among the first axis and the second axis when the active direction is the first axis;

a. output from the first microphone and the second microphone is to be processed for transmission from the headset; when the direction of motion is the second axis;

b. the outputs from the second and third microphones are to be processed for use as inputs to the loudspeaker.

15. The apparatus of claim 14, wherein the first microphone, the second microphone, and the third microphone are located on a first temple of the headset.

16. The apparatus of claim 15, wherein the first microphone and the second microphone are located on an interior surface of the first temple, and the third microphone is located on a bottom surface of the first temple.

17. The apparatus of claim 15, wherein the array further comprises:

18. A method for selecting a sound signal received on a device worn on a user's head, comprising:

comparing sound signals from an array having at least three microphones, wherein the positions of the at least three microphones define three non-parallel axes;

selecting a first microphone pair from the array, wherein the first microphone pair comprises a first microphone and a second microphone;

forming a primary microphone signal from the first microphone pair; and

forming a reference microphone signal from the first microphone pair; wherein the primary microphone signal and the reference microphone signal are input to a noise cancellation module to reduce noise from the primary microphone signal.

19. The method of claim 18, wherein the comparing forms at least three microphone pairs from the array, identifies a potential primary microphone and a potential reference microphone from each microphone pair, calculates a signal-to-noise ratio difference for each microphone pair, and the first microphone pair is the microphone pair with the largest SNR difference.

20. The method of claim 18, wherein the forming the primary microphone signal is accomplished by beamforming the first microphone pair, and wherein forming the reference microphone signal is not accomplished by beamforming the first microphone pair.

21. The method of claim 18, wherein the forming the primary microphone signal is not accomplished by beamforming the first microphone pair, and wherein forming the reference microphone signal is accomplished by beamforming the first microphone pair.

22. The method of claim 19, wherein beamforming is performed on the pair of microphones during the comparing of the sound signals.

23. A device to be worn on a user's head, comprising:

a head-mounted device configured to be worn on the head of the user;

a first microphone connected to the head-mounted device to receive a first sound signal from a sound source;

a second microphone connected to the head mounted device to receive a second sound signal from the sound source; and

a beamformer, the beamformer further comprising:

a first input configured to receive the first sound signal;

a second input configured to receive the second sound signal;

a main signal output from which the beamformer is configured to form a main signal from the first and second sound signals, wherein the main signal is formed by steering a main response axis in a first direction, the main signal being output; and

a reference signal output, the beamformer configured to form a reference signal from the first sound signal and the second sound signal, wherein the reference signal is formed by steering a reference response axis in a second direction, wherein the first direction is different from the second direction, the reference signal being output from the reference signal output.

24. The apparatus of claim 23, wherein a first axis formed between the first microphone and the second microphone is directed toward the mouth of the user when the headset is worn on the head of the user.

25. The apparatus of claim 24, wherein a second axis formed between the first microphone and the second microphone is directed forward of the user when the headset is worn on the head of the user.

26. The apparatus of claim 25, wherein a third axis formed between the first microphone and the second microphone is directed to a side of the user when the headset is worn on the head of the user.

27. The apparatus of claim 26, further comprising:

selection logic configured to select the first direction from one of the first axis, the second axis, and the third axis based on a predefined criteria.

28. The apparatus of claim 23, wherein the main signal is input to a two-stage noise cancellation unit as a main channel, and wherein the reference signal is input to the two-stage noise cancellation unit as a reference channel.

29. A device to be worn on a user's head, comprising:

a head-mounted device configured to be worn on the head of the user;

a first microphone connected to a first temple of the headset to receive a first sound signal from a sound source, the first microphone being a first distance from the sound source;

a second microphone connected to a first temple of the headset to receive a second sound signal from the sound source, the second microphone being a second distance from the sound source; and

a beamformer, the beamformer further comprising:

a first input configured to receive the first sound signal;

a second input configured to receive the second sound signal;

a reference signal, the second sound signal being for the reference signal, the second distance being greater than the first distance.