CN112927685B

CN112927685B - Dynamic voice recognition method and device thereof

Info

Publication number: CN112927685B
Application number: CN201911242880.2A
Authority: CN
Inventors: 王美华; 陈庆隆
Original assignee: Realtek Semiconductor Corp
Current assignee: Realtek Semiconductor Corp
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2024-09-03
Anticipated expiration: 2039-12-06
Also published as: CN112927685A

Abstract

The invention provides a dynamic voice recognition method and a device. The dynamic voice recognition method comprises the following steps of executing a first stage: the voice data is detected by the digital microphone and stored in the first memory, the voice is detected in the voice data to generate a voice detection signal, and the second stage or the third stage is selectively determined and executed by the first processing circuit according to the total effective data amount, the transmission bit rate of the digital microphone and the identification interval time. And executing the second stage, wherein the first processing circuit outputs a first instruction to the second processing circuit, and the second processing circuit enables the memory access circuit to transfer the sound data to the second memory and store the sound data as voice data according to the first instruction. Executing the third stage, the first processing circuit outputs a second instruction to the second processing circuit, the second processing circuit instructs the memory access circuit to transfer the sound data to the second memory and store the sound data as voice data according to the second instruction, and the second processing circuit confirms whether the voice data matches with the preset voice instruction.

Description

Dynamic speech recognition method and device

技术领域Technical Field

本发明系有关一种语音检测辨识技术，特别是关于一种动态语音辨识方法及其装置。The present invention relates to a speech detection and recognition technology, and in particular to a dynamic speech recognition method and device thereof.

背景技术Background Art

在现有电子设备中，语音助理(voice assistant)技术广泛应用于各领域中，且支援语音唤醒功能。在语音助理处于待机模式(standby mode)下，仍然需要听令于热词并在有热词出现时给予对应回应，因此语音助理必须定期唤醒，语音助理的处理系统会在待机模式下启动，以利用语音活动检测电路检测是否有人声，并在有人声出现时才进一步进入语音辨识，以确认人声中是否有热词(hot words)存在，进而据此判断是否执行电子设备的系统开机或执行对应操作。In existing electronic devices, voice assistant technology is widely used in various fields and supports voice wake-up function. When the voice assistant is in standby mode, it still needs to listen to hot words and give corresponding responses when hot words appear. Therefore, the voice assistant must be woken up regularly. The processing system of the voice assistant will start in standby mode to use the voice activity detection circuit to detect whether there is a human voice, and only when a human voice appears will it further enter voice recognition to confirm whether there are hot words in the human voice, and then determine whether to execute the system startup of the electronic device or perform corresponding operations.

然而，等频率的定期唤醒语音助理进行检测，其灵敏度较差。同时，语音助理的处理系统也需满足低功率的操作，以符合能源要求的相关规范。However, the sensitivity of waking up the voice assistant regularly for detection is poor. At the same time, the processing system of the voice assistant also needs to meet low-power operation to meet relevant energy requirements.

发明内容Summary of the invention

有鉴于此，本发明提出一种动态语音辨识方法，包含执行一第一阶段：利用数字麦克风检测声音资料并储存在第一存储器；于声音资料中检测到人声而产生人声检测信号；及通过第一处理电路根据总有效资料量、数字麦克风的传输位元速率及辨识间隔时间，选择性决定执行第二阶段或第三阶段。执行第二阶段，第一处理电路输出第一指令至第二处理电路，第二处理电路根据第一指令使存储器存取电路转移声音资料至第二存储器并储存为语音资料。执行第三阶段，第一处理电路输出第二指令，第二处理电路根据第二指令使存储器存取电路转移声音资料至第二存储器并储存为语音资料，且第二处理电路确认第二存储器中的语音资料是否匹配一预设语音指令。In view of this, the present invention proposes a dynamic speech recognition method, which includes executing a first stage: using a digital microphone to detect sound data and store it in a first memory; detecting a human voice in the sound data and generating a human voice detection signal; and selectively determining to execute a second stage or a third stage through a first processing circuit according to the total effective data volume, the transmission bit rate of the digital microphone, and the recognition interval time. In executing the second stage, the first processing circuit outputs a first instruction to the second processing circuit, and the second processing circuit causes the memory access circuit to transfer the sound data to the second memory and store it as voice data according to the first instruction. In executing the third stage, the first processing circuit outputs a second instruction, and the second processing circuit causes the memory access circuit to transfer the sound data to the second memory and store it as voice data according to the second instruction, and the second processing circuit confirms whether the voice data in the second memory matches a preset voice instruction.

本发明另提出一种动态语音辨识装置，包含数字麦克风、第一存储器、语音活动检测电路、存储器存取电路、第二存储器、第一处理电路及第二处理电路。数字麦克风用以检测一声音资料。第一存储器电性连接数字麦克风，用以储存声音资料。语音活动检测电路电性连接数字麦克风，用以检测声音资料并产生一人声检测信号。存储器存取电路电性连接第一存储器，用以根据第一指令转移声音资料至第二存储器，以储存为语音资料。第一处理电路电性连接语音活动检测电路。第二处理电路电性连接第一处理电路、第二存储器及存储器存取电路。其中，此动态语音辨识装置用以执行前述的动态语音辨识方法。The present invention further proposes a dynamic speech recognition device, comprising a digital microphone, a first memory, a voice activity detection circuit, a memory access circuit, a second memory, a first processing circuit and a second processing circuit. The digital microphone is used to detect a sound data. The first memory is electrically connected to the digital microphone to store the sound data. The voice activity detection circuit is electrically connected to the digital microphone to detect the sound data and generate a human voice detection signal. The memory access circuit is electrically connected to the first memory to transfer the sound data to the second memory according to a first instruction to store it as voice data. The first processing circuit is electrically connected to the voice activity detection circuit. The second processing circuit is electrically connected to the first processing circuit, the second memory and the memory access circuit. Among them, this dynamic speech recognition device is used to execute the aforementioned dynamic speech recognition method.

依据一些实施例，第一处理电路接收到人声检测信号时，第一处理电路于辨识间隔时间后输出第一指令或第二指令。According to some embodiments, when the first processing circuit receives the human voice detection signal, the first processing circuit outputs the first instruction or the second instruction after a recognition interval time.

依据一些实施例，辨识间隔时间是由一预算关系值决定，预算关系值小于等于目标平均功率消耗*前一周期时间*1/3时，辨识间隔时间系为2秒；预算关系值大于目标平均功率消耗*前一周期时间*1/3且小于等于目标平均功率消耗*前一周期时间*2/3时，辨识间隔时间系为1.5秒；以及预算关系值大于目标平均功率消耗*前一周期时间*2/3时，辨识间隔时间系为1秒。According to some embodiments, the identification interval time is determined by a budget relationship value, when the budget relationship value is less than or equal to the target average power consumption * previous cycle time * 1/3, the identification interval time is 2 seconds; when the budget relationship value is greater than the target average power consumption * previous cycle time * 1/3 and less than or equal to the target average power consumption * previous cycle time * 2/3, the identification interval time is 1.5 seconds; and when the budget relationship value is greater than the target average power consumption * previous cycle time * 2/3, the identification interval time is 1 second.

依据一些实施例，预算关系值系为目标平均功率消耗*前一周期时间-(第一阶段的第一平均功率消耗*第一阶段的第一时间+第二阶段的第二平均功率消耗*第二阶段的第二时间+第三阶段的第三平均功率消耗*第三阶段的第三时间)，其中前一周期时间等于第一时间、第二时间及第三时间的总和。According to some embodiments, the budget relationship value is target average power consumption*previous cycle time-(first average power consumption of first stage*first time of first stage+second average power consumption of second stage*second time of second stage+third average power consumption of third stage*third time of third stage), wherein the previous cycle time is equal to the sum of the first time, the second time and the third time.

依据一些实施例，如第三平均功率消耗大于第二平均功率消耗，且第二平均功率消耗大于第一平均功率消耗。According to some embodiments, the third average power consumption is greater than the second average power consumption, and the second average power consumption is greater than the first average power consumption.

依据一些实施例，在产生人声检测信号之后，该第一处理电路判断第一存储器是否已存满声音资料，并在存满声音资料时继续进行下一步骤。According to some embodiments, after generating the human voice detection signal, the first processing circuit determines whether the first memory is full of sound data, and proceeds to the next step when the first memory is full of sound data.

综上所述，本发明在进行动态语音辨识时，将使用者经验考虑在内，并在待机模式下触发搜寻预设语音指令(热词)时，可以降低平均功率消耗，提供一个灵敏度较佳的方法。In summary, the present invention takes user experience into consideration when performing dynamic voice recognition and triggers the search for a preset voice command (hot word) in standby mode, thereby reducing average power consumption and providing a method with better sensitivity.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

通过参照附图详细描述其示例实施例，本发明的上述和其它目标、特征及优点将变得更加显而易见。The above and other objects, features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.

图1是根据本发明一实施例的电子装置的方块示意图。FIG. 1 is a block diagram of an electronic device according to an embodiment of the present invention.

图2是根据本发明一实施例的动态语音辨识方法的流程示意图。FIG. 2 is a schematic flow chart of a dynamic speech recognition method according to an embodiment of the present invention.

图3是根据本发明实施例的动态语音辨识装置的波形示意图。FIG. 3 is a waveform diagram of a dynamic speech recognition device according to an embodiment of the present invention.

图4是根据本发明另一实施例的动态语音辨识方法的流程示意图。FIG. 4 is a flow chart of a dynamic speech recognition method according to another embodiment of the present invention.

附图标记说明：Description of reference numerals:

10电子装置10 Electronic devices

20动态语音辨识装置20 Dynamic speech recognition device

21数字麦克风21 digital microphones

22第一存储器22 First Memory

23语音活动检测电路23 Voice activity detection circuit

24存储器存取电路24 memory access circuit

25第一处理电路25 first processing circuit

26第二处理电路26 second processing circuit

27第二存储器27 Second Memory

30影音处理电路30 Audio and video processing circuit

31～33核心处理电路31~33 core processing circuit

34～36第三存储器34～36 Third memory

C1第一指令C1 First Instruction

C2第二指令C2 second instruction

SD1声音资料SD1 sound data

SD2语音资料SD2 Voice Data

SS人声检测信号SS human voice detection signal

ST1第一阶段ST1 Phase 1

ST2第二阶段ST2 Phase 2

ST3第三阶段ST3 Phase 3

T周期时间T cycle time

T1～T2时间T1～T2 time

Ti辨识间隔时间Ti identification interval time

S10～S28 步骤Steps S10 to S28

S30～S36 步骤Steps S30 to S36

具体实施方式DETAILED DESCRIPTION

图1是根据本发明一实施例的电子装置的方块示意图，请参阅图1所示，电子装置10包含有一动态语音辨识装置20、一影音处理电路30、数个核心处理电路31～33及数个第三存储器34～36，且数个核心处理电路31～33皆电性接至第三存储器34～36。在动态语音辨识装置20在待机模式(standby mode)下辨识到预设语音指令时，电子装置10会执行系统开机程序，使影音处理电路30、数个核心处理电路31～33及数个第三存储器34～36可以彼此协同运作，以播放电子装置10接收到的影音信号。在一实施例中，电子装置10可以是电视，但不限于此。FIG. 1 is a block diagram of an electronic device according to an embodiment of the present invention. Referring to FIG. 1 , the electronic device 10 includes a dynamic voice recognition device 20, a video processing circuit 30, a plurality of core processing circuits 31-33 and a plurality of third memories 34-36, and the plurality of core processing circuits 31-33 are all electrically connected to the third memories 34-36. When the dynamic voice recognition device 20 recognizes a preset voice command in standby mode, the electronic device 10 executes a system boot procedure so that the video processing circuit 30, the plurality of core processing circuits 31-33 and the plurality of third memories 34-36 can cooperate with each other to play the video signal received by the electronic device 10. In one embodiment, the electronic device 10 can be a television, but is not limited thereto.

动态语音辨识装置20包含一数字麦克风21、一第一存储器22、一语音活动检测电路23、一存储器存取电路24、一第一处理电路25、一第二处理电路26以及一第二存储器27。数字麦克风21系用以检测一声音资料SD1。第一存储器22系电性连接数字麦克风21，用以储存声音资料SD1。在一实施例中，第一存储器22可以是但不限于静态随机存取存储器(SRAM)。The dynamic speech recognition device 20 includes a digital microphone 21, a first memory 22, a voice activity detection circuit 23, a memory access circuit 24, a first processing circuit 25, a second processing circuit 26, and a second memory 27. The digital microphone 21 is used to detect a sound data SD1. The first memory 22 is electrically connected to the digital microphone 21 and is used to store the sound data SD1. In one embodiment, the first memory 22 can be, but is not limited to, a static random access memory (SRAM).

语音活动检测电路23电性连接数字麦克风21，用以检测声音资料SD1并产生一人声检测信号SS。在一实施例中，语音活动检测电路23可以是但不限于语音识别晶片或语音识别处理电路。The voice activity detection circuit 23 is electrically connected to the digital microphone 21 to detect the sound data SD1 and generate a human voice detection signal SS. In one embodiment, the voice activity detection circuit 23 can be but is not limited to a voice recognition chip or a voice recognition processing circuit.

存储器存取电路24电性连接第一存储器22及第二存储器27，用以根据一第一指令转移声音资料SD1至第二存储器27，以将声音资料SD1储存为一语音资料SD2。在一实施例中，存储器存取电路24可以是但不限于直接存储器存取(Direct Memory Acess，DMA)电路，第二存储器27可以是但不限于动态随机存取存储器(DRAM)。The memory access circuit 24 is electrically connected to the first memory 22 and the second memory 27, and is used to transfer the sound data SD1 to the second memory 27 according to a first instruction, so as to store the sound data SD1 as a voice data SD2. In one embodiment, the memory access circuit 24 can be but not limited to a direct memory access (DMA) circuit, and the second memory 27 can be but not limited to a dynamic random access memory (DRAM).

第一处理电路25电性连接语音活动检测电路23，用以根据人声检测信号SS对应产生第一指令C1或第二指令C2。第二处理电路26系电性连接第一处理电路25、第二存储器27及存储器存取电路24，第二处理电路26根据第一指令C1使存储器存取电路24转移声音资料SD1至第二存储器27并储存为语音资料SD2；或是第二处理电路26根据第二指令C2使存储器存取电路24转移声音资料SD1至第二存储器27并储存为语音资料SD2，且确认第二存储器27中的语音资料SD2是否匹配一预设语音指令。在一实施例中，第一处理电路25可以使用功率消耗较低的微控制器，例如，8051微控制器，但本发明并不以此为限。第二处理电路26则可以使用一般的微处理器、微控制器、中央处理器等各种类型的处理电路，但本发明并不以此为限。The first processing circuit 25 is electrically connected to the voice activity detection circuit 23, and is used to generate a first instruction C1 or a second instruction C2 according to the human voice detection signal SS. The second processing circuit 26 is electrically connected to the first processing circuit 25, the second memory 27 and the memory access circuit 24. The second processing circuit 26 causes the memory access circuit 24 to transfer the sound data SD1 to the second memory 27 and store it as the voice data SD2 according to the first instruction C1; or the second processing circuit 26 causes the memory access circuit 24 to transfer the sound data SD1 to the second memory 27 and store it as the voice data SD2 according to the second instruction C2, and confirms whether the voice data SD2 in the second memory 27 matches a preset voice instruction. In one embodiment, the first processing circuit 25 can use a microcontroller with low power consumption, such as an 8051 microcontroller, but the present invention is not limited thereto. The second processing circuit 26 can use various types of processing circuits such as a general microprocessor, a microcontroller, a central processing unit, etc., but the present invention is not limited thereto.

在一实施例中，第一指令C1或是第二指令C2为修改共用状态的指令。In one embodiment, the first instruction C1 or the second instruction C2 is an instruction for modifying a shared state.

图2是根据本发明一实施例的动态语音辨识方法的流程示意图，图3是根据本发明实施例的动态语音辨识装置的波形示意图，请同时参阅图1、图2及图3所示，动态语音辨识方法包含利用动态语音辨识装置20执行一第一阶段ST1(步骤S10～步骤S18、步骤S22)及执行一第二阶段ST2(步骤S20)或一第三阶段ST3(步骤S24～步骤S26)，以下系针对各阶段详细说明。FIG2 is a flow chart of a dynamic speech recognition method according to an embodiment of the present invention, and FIG3 is a waveform chart of a dynamic speech recognition device according to an embodiment of the present invention. Please refer to FIG1 , FIG2 and FIG3 simultaneously. The dynamic speech recognition method includes utilizing a dynamic speech recognition device 20 to execute a first stage ST1 (steps S10 to S18, step S22) and to execute a second stage ST2 (step S20) or a third stage ST3 (steps S24 to S26). The following is a detailed description of each stage.

在执行第一阶段ST1(纯待机阶段)中，如步骤S10所示，利用数字麦克风21检测声音资料SD1，并将声音资料SD1储存在第一存储器22中。如步骤S12所示，语音活动检测电路23系检测声音资料SD1是否有人声出现，并在声音资料SD1中检测到人声时会被触发而产生人声检测信号SS，并将人声检测信号SS传输出至第一处理电路25。如步骤S14所示，第一处理电路25判断第一存储器22是否已经存满声音资料SD1，并在存满声音资料SD1时继续进行下一步骤S16，以确保有足够的声音资料SD1可以进行后续步骤。如步骤S16所示，第一处理电路25根据一总有效资料量、数字麦克风21的传输位元速率及一辨识间隔时间Ti，选择性决定执行第二阶段ST2(DMA阶段)或第三阶段ST3(语音辨识阶段)。In the first stage ST1 (pure standby stage), as shown in step S10, the digital microphone 21 is used to detect the sound data SD1, and the sound data SD1 is stored in the first memory 22. As shown in step S12, the voice activity detection circuit 23 detects whether there is a human voice in the sound data SD1, and is triggered to generate a human voice detection signal SS when a human voice is detected in the sound data SD1, and transmits the human voice detection signal SS to the first processing circuit 25. As shown in step S14, the first processing circuit 25 determines whether the first memory 22 is full of sound data SD1, and proceeds to the next step S16 when the sound data SD1 is full, so as to ensure that there is enough sound data SD1 for the subsequent steps. As shown in step S16, the first processing circuit 25 selectively determines to execute the second stage ST2 (DMA stage) or the third stage ST3 (voice recognition stage) according to a total effective data amount, the transmission bit rate of the digital microphone 21 and an identification interval time Ti.

在一实施例中，已知有目标平均功率消耗、第一阶段ST1的第一平均功率消耗、第二阶段ST2的第二平均功率消耗及第三阶段ST3的第三平均功率消耗，并已得到前一周期时间T中，各阶段所占的时间，包含第一阶段ST1的第一时间Ta、第二阶段ST2的第二时间Tb及第三阶段ST3的第三时间Tc，其中前一周期时间T等于第一时间Ta、第二时间Tb及第三时间Tc的总和，亦即T＝Ta+Tb+Tc。在一实施例中，此周期时间T可以是但不限于16秒。因此通过前面各参数可以得到有关功率使用的一预算关系值(Budget)，此预算关系值系为目标平均功率消耗*前一周期时间T-(第一阶段ST1的第一平均功率消耗*第一阶段ST1的第一时间Ta+第二阶段ST2的第二平均功率消耗*第二阶段ST2的第二时间Tb+第三阶段ST3的第三平均功率消耗*第三阶段ST3的第三时间Tc)。In one embodiment, the target average power consumption, the first average power consumption of the first stage ST1, the second average power consumption of the second stage ST2, and the third average power consumption of the third stage ST3 are known, and the time occupied by each stage in the previous cycle time T has been obtained, including the first time Ta of the first stage ST1, the second time Tb of the second stage ST2, and the third time Tc of the third stage ST3, wherein the previous cycle time T is equal to the sum of the first time Ta, the second time Tb, and the third time Tc, that is, T = Ta + Tb + Tc. In one embodiment, this cycle time T can be but is not limited to 16 seconds. Therefore, a budget relationship value (Budget) related to power use can be obtained through the above parameters, and this budget relationship value is the target average power consumption * previous cycle time T-(the first average power consumption of the first stage ST1 * the first time Ta of the first stage ST1 + the second average power consumption of the second stage ST2 * the second time Tb of the second stage ST2 + the third average power consumption of the third stage ST3 * the third time Tc of the third stage ST3).

在取得预算关系值之后，即可根据预算关系值动态决定辨识间隔时间Ti。详言之，当预算关系值小于等于目标平均功率消耗*前一周期时间T*1/3时，决定辨识间隔时间Ti系为2秒。当预算关系值大于目标平均功率消耗*前一周期时间T*1/3且小于等于目标平均功率消耗*前一周期时间T*2/3时，决定辨识间隔时间Ti系为1.5秒。当预算关系值大于目标平均功率消耗*前一周期时间T*2/3时，则决定辨识间隔时间Ti系为1秒。接着，已知总有效资料量系为第一存储器22的有效资料量及第二存储器27的有效资料量的总和，以及数字麦克风21的传输位元速率，因此，当总有效资料量小于数字麦克风21的传输位元速率与辨识间隔时间的乘积时，第一处理电路25决定执行第二阶段ST2的DMA阶段。当总有效资料量大于等于数字麦克风21的传输位元速率与辨识间隔时间的乘积时，第一处理电路25决定执行第三阶段ST3的语音辨识阶段。After obtaining the budget relationship value, the identification interval time Ti can be dynamically determined according to the budget relationship value. Specifically, when the budget relationship value is less than or equal to the target average power consumption * the previous cycle time T*1/3, the identification interval time Ti is determined to be 2 seconds. When the budget relationship value is greater than the target average power consumption * the previous cycle time T*1/3 and less than or equal to the target average power consumption * the previous cycle time T*2/3, the identification interval time Ti is determined to be 1.5 seconds. When the budget relationship value is greater than the target average power consumption * the previous cycle time T*2/3, the identification interval time Ti is determined to be 1 second. Next, it is known that the total effective data volume is the sum of the effective data volume of the first memory 22 and the effective data volume of the second memory 27, and the transmission bit rate of the digital microphone 21. Therefore, when the total effective data volume is less than the product of the transmission bit rate of the digital microphone 21 and the identification interval time, the first processing circuit 25 determines to execute the DMA stage of the second stage ST2. When the total effective data amount is greater than or equal to the product of the transmission bit rate of the digital microphone 21 and the recognition interval time, the first processing circuit 25 determines to execute the speech recognition stage of the third stage ST3 .

当第一处理电路25决定执行第二阶段ST2时，如步骤S18所示，第一处理电路25会先唤醒第二处理电路26，然后进入到第二阶段ST2。在第二阶段ST2中，如步骤S20所示，第一处理电路25输出第一指令C1至第二处理电路26，第二处理电路26根据第一指令C1使存储器存取电路24转移第一存储器22内的声音资料SD1至第二存储器27，以储存为语音资料SD2。在第二阶段ST2中仅透过存储器存取电路24转换语音资料SD2到第二存储器27中，而不需进行语音辨识。When the first processing circuit 25 decides to execute the second stage ST2, as shown in step S18, the first processing circuit 25 first wakes up the second processing circuit 26, and then enters the second stage ST2. In the second stage ST2, as shown in step S20, the first processing circuit 25 outputs the first instruction C1 to the second processing circuit 26, and the second processing circuit 26 causes the memory access circuit 24 to transfer the sound data SD1 in the first memory 22 to the second memory 27 according to the first instruction C1, so as to store it as voice data SD2. In the second stage ST2, the voice data SD2 is only converted to the second memory 27 through the memory access circuit 24, and voice recognition is not required.

当第一处理电路25决定执行第三阶段ST3时，如步骤S22所示，第一处理电路25会先唤醒第二处理电路27，然后进入到第三阶段ST3。在第三阶段ST3中，如步骤S24所示，第一处理电路25输出第二指令C2至第二处理电路26，第二处理电路26再根据第二指令C2使存储器存取电路24转移第一存储器22内的声音资料SD1至第二存储器27，以储存为语音资料SD2，并确认第二存储器27中的语音资料SD2是否匹配预设语音指令。如步骤S26所示，第二处理电路26判断第二存储器27中的语音资料SD2是否有匹配预设语音指令，若语音资料SD2确认有匹配预设语音指令时，即如步骤S28所示执行系统开机程序，以唤醒其他电路，包含影音处理电路30、核心处理电路31～33及第三存储器34～36等来进行系统开机。When the first processing circuit 25 decides to execute the third stage ST3, as shown in step S22, the first processing circuit 25 first wakes up the second processing circuit 27, and then enters the third stage ST3. In the third stage ST3, as shown in step S24, the first processing circuit 25 outputs the second instruction C2 to the second processing circuit 26, and the second processing circuit 26 then causes the memory access circuit 24 to transfer the sound data SD1 in the first memory 22 to the second memory 27 according to the second instruction C2 to store as voice data SD2, and confirm whether the voice data SD2 in the second memory 27 matches the preset voice instruction. As shown in step S26, the second processing circuit 26 determines whether the voice data SD2 in the second memory 27 matches the preset voice instruction. If the voice data SD2 is confirmed to match the preset voice instruction, the system boot procedure is executed as shown in step S28 to wake up other circuits, including the audio and video processing circuit 30, the core processing circuits 31-33 and the third memories 34-36, etc. to boot the system.

图4系根据本发明另一实施例的动态语音辨识方法的流程示意图，请同时参阅图1、图3及图4所示，动态语音辨识方法包含利用动态语音辨识装置20执行一第一阶段ST1(步骤S10～步骤S16)及执行一第二阶段ST2(步骤S30)或一第三阶段ST3(步骤S32～步骤S34)，以下系针对各阶段详细说明。FIG4 is a flow chart of a dynamic speech recognition method according to another embodiment of the present invention. Please refer to FIG1 , FIG3 and FIG4 simultaneously. The dynamic speech recognition method includes utilizing the dynamic speech recognition device 20 to execute a first stage ST1 (steps S10 to S16) and to execute a second stage ST2 (step S30) or a third stage ST3 (steps S32 to S34). The following is a detailed description of each stage.

在执行第一阶段ST1(纯待机阶段)中，如步骤S10所示，利用数字麦克风21检测声音资料SD1，并将声音资料SD1储存在第一存储器22中。如步骤S12所示，语音活动检测电路23系检测声音资料SD1是否有人声出现，并在检测到人声时会被触发而产生人声检测信号SS传输出至第一处理电路25。如步骤S14所示，第一处理电路25判断第一存储器22是否已经存满声音资料SD1，并在存满声音资料SD1时继续进行下一步骤S16，以确保有足够的声音资料SD1可以进行后续步骤。如步骤S16所示，第一处理电路25根据一总有效资料量、数字麦克风21的传输位元速率及一辨识间隔时间Ti，选择性决定执行第二阶段ST2(DMA阶段)或第三阶段ST3(语音辨识阶段)。In the first stage ST1 (pure standby stage), as shown in step S10, the digital microphone 21 is used to detect the sound data SD1, and the sound data SD1 is stored in the first memory 22. As shown in step S12, the voice activity detection circuit 23 detects whether there is a human voice in the sound data SD1, and when the human voice is detected, it will be triggered to generate a human voice detection signal SS and transmit it to the first processing circuit 25. As shown in step S14, the first processing circuit 25 determines whether the first memory 22 is full of sound data SD1, and proceeds to the next step S16 when the sound data SD1 is full, so as to ensure that there is enough sound data SD1 for the subsequent steps. As shown in step S16, the first processing circuit 25 selectively determines to execute the second stage ST2 (DMA stage) or the third stage ST3 (voice recognition stage) according to a total effective data amount, the transmission bit rate of the digital microphone 21 and an identification interval time Ti.

当第一处理电路25决定执行第二阶段ST2时，如步骤S30所示，在第二阶段ST2中，第一处理电路25输出第一指令C1并唤醒第二处理电路26，第二处理电路26根据第一指令C1使存储器存取电路24转移第一存储器22内的声音资料SD1至第二存储器27，以储存为语音资料SD2。When the first processing circuit 25 decides to execute the second stage ST2, as shown in step S30, in the second stage ST2, the first processing circuit 25 outputs the first instruction C1 and wakes up the second processing circuit 26. The second processing circuit 26 causes the memory access circuit 24 to transfer the sound data SD1 in the first memory 22 to the second memory 27 according to the first instruction C1 to store it as voice data SD2.

当第一处理电路25决定执行第三阶段ST3时，如步骤S32所示，在第三阶段ST3中，第一处理电路25输出第二指令C2并唤醒第二处理电路26，第二处理电路26根据第二指令C2使存储器存取电路24转移第一存储器22内的声音资料SD1至第二存储器27，以储存为语音资料SD2，并确认第二存储器27中的语音资料SD2是否匹配预设语音指令。如步骤S34所示，第二处理电路26判断第二存储器27中的语音资料SD2是否有匹配预设语音指令，若语音资料SD2确认有匹配预设语音指令时，即如步骤S28所示执行系统开机程序，以唤醒所有电路进行系统开机。When the first processing circuit 25 decides to execute the third stage ST3, as shown in step S32, in the third stage ST3, the first processing circuit 25 outputs the second instruction C2 and wakes up the second processing circuit 26. The second processing circuit 26 causes the memory access circuit 24 to transfer the sound data SD1 in the first memory 22 to the second memory 27 according to the second instruction C2 to store it as voice data SD2, and confirms whether the voice data SD2 in the second memory 27 matches the preset voice instruction. As shown in step S34, the second processing circuit 26 determines whether the voice data SD2 in the second memory 27 matches the preset voice instruction. If the voice data SD2 is confirmed to match the preset voice instruction, the system boot procedure is executed as shown in step S28 to wake up all circuits to boot up the system.

上述动态语音辨识方法的多个步骤(S10～S26及S30～S34)仅为示例，并非限于上述示例的顺序执行。在不违背本发明的精神与范围下，在动态语音辨识方法下的各种操作当可适当地增加、替换、省略或以不同顺序执行。The multiple steps (S10-S26 and S30-S34) of the above-mentioned dynamic speech recognition method are only examples and are not limited to the order of execution of the above-mentioned examples. Without violating the spirit and scope of the present invention, various operations under the dynamic speech recognition method can be appropriately increased, replaced, omitted or executed in a different order.

在一实施例中，当第一处理电路25接收到人声检测信号SS时，第一处理电路25会于辨识间隔时间Ti后输出第一指令C1或第二指令C2。如图1及图3所示，第一处理电路25于时间T1接收到人声检测信号SS时，第一处理电路25会在辨识间隔时间Ti后的时间T2输出第一指令C1或第二指令C2，其中，此辨识间隔时间Ti可基于前述方式来动态决定，以确保接收到的声音资料SD1足以反映预设语音指令后才致能第二处理电路26与第二存储器27，故可满足低功率的操作，以符合能源要求的相关规范。In one embodiment, when the first processing circuit 25 receives the human voice detection signal SS, the first processing circuit 25 will output the first instruction C1 or the second instruction C2 after the identification interval time Ti. As shown in FIG1 and FIG3, when the first processing circuit 25 receives the human voice detection signal SS at time T1, the first processing circuit 25 will output the first instruction C1 or the second instruction C2 at time T2 after the identification interval time Ti, wherein the identification interval time Ti can be dynamically determined based on the aforementioned method to ensure that the second processing circuit 26 and the second memory 27 are enabled only after the received sound data SD1 is sufficient to reflect the preset voice instruction, so that low-power operation can be satisfied to meet the relevant specifications of energy requirements.

在一实施例中，若预设语音指令所设置的关键词为『Hi,TV』时，请参阅图1及图3所示，于时间T1时，数字麦克风21检测到外界声音，并产生声音资料SD1，且第一存储器22储存此声音资料SD1，例如，数字麦克风21检测到使用者对动态语音辨识装置20说出『Hi,TV…』等语音指令。同时，语音活动检测电路23判断出此声音资料SD1具有人声而输出人声检测信号SS。于时间T2时，第一处理电路25输出第一指令C1或第二指令C2。第二处理电路26与第二存储器27也被致能，此时，第二处理电路26根据第一指令C1或第二指令C2使存储器存取电路24被致能，以转移声音资料SD1至第二存储器27并储存为语音资料SD2。因此，第二处理电路26可分析语音资料SD2，以确认语音资料SD2是否匹配于预设语音指令『Hi,TV』，并在第二处理电路26确认语音资料SD2匹配于预设语音指令，以唤醒其他电路来执行系统开机程序。In one embodiment, if the keyword set by the preset voice command is "Hi, TV", please refer to FIG. 1 and FIG. 3. At time T1, the digital microphone 21 detects external sound and generates sound data SD1, and the first memory 22 stores the sound data SD1. For example, the digital microphone 21 detects that the user says the voice command "Hi, TV..." to the dynamic voice recognition device 20. At the same time, the voice activity detection circuit 23 determines that the sound data SD1 has a human voice and outputs a human voice detection signal SS. At time T2, the first processing circuit 25 outputs the first command C1 or the second command C2. The second processing circuit 26 and the second memory 27 are also enabled. At this time, the second processing circuit 26 enables the memory access circuit 24 according to the first command C1 or the second command C2 to transfer the sound data SD1 to the second memory 27 and store it as voice data SD2. Therefore, the second processing circuit 26 can analyze the voice data SD2 to confirm whether the voice data SD2 matches the preset voice command "Hi, TV", and the second processing circuit 26 confirms that the voice data SD2 matches the preset voice command to wake up other circuits to execute the system boot procedure.

在一实施例中，第一阶段ST1系使用到动态语音辨识装置20中的数字麦克风21、第一存储器22、语音活动检测电路23及第一处理电路25。第二阶段ST2系使用动态语音辨识装置20中的数字麦克风21、第一存储器22、语音活动检测电路23、存储器存取电路24、第一处理电路25、部分第二处理电路26(仅有启动第二存储器的部分功能)及第二存储器27。第三阶段ST3系使用动态语音辨识装置20中的数字麦克风21、第一存储器22、语音活动检测电路23、存储器存取电路24、第一处理电路25、第二处理电路26及第二存储器27等全部电路。因此，第三阶段ST3的第三平均功率消耗大于第二阶段ST2的第二平均功率消耗，且第二平均功率消耗大于第一阶段ST1的第一平均功率消耗。例如，第一阶段ST1所对应的消耗功率约为0.5瓦特，第三阶段ST3所对应的消耗功率为4瓦特，则第二阶段ST2所对应消耗功率则介于两者的间。In one embodiment, the first stage ST1 uses the digital microphone 21, the first memory 22, the voice activity detection circuit 23 and the first processing circuit 25 in the dynamic speech recognition device 20. The second stage ST2 uses the digital microphone 21, the first memory 22, the voice activity detection circuit 23, the memory access circuit 24, the first processing circuit 25, part of the second processing circuit 26 (only part of the function of the second memory is activated) and the second memory 27 in the dynamic speech recognition device 20. The third stage ST3 uses all the circuits of the digital microphone 21, the first memory 22, the voice activity detection circuit 23, the memory access circuit 24, the first processing circuit 25, the second processing circuit 26 and the second memory 27 in the dynamic speech recognition device 20. Therefore, the third average power consumption of the third stage ST3 is greater than the second average power consumption of the second stage ST2, and the second average power consumption is greater than the first average power consumption of the first stage ST1. For example, the power consumption corresponding to the first stage ST1 is about 0.5 watts, the power consumption corresponding to the third stage ST3 is 4 watts, and the power consumption corresponding to the second stage ST2 is between the two.

因此，本发明可以根据前一周期时间T内各阶段所占用的时间(第一时间、第二时间及第三时间)以及各阶段的平均功率消耗来决定预算关系值，以根据预算关系值动态决定辨识间隔时间Ti的长短，进而据此判断是否需要进行语音资料的辨识(执行第二阶段ST2或第三阶段ST3)，故可根据实际运作的功率消耗来动态进行语音辨识。所以，本发明可以在进行动态语音辨识时，将使用者经验考虑在内，并在待机模式下触发搜寻预设语音指令时，可以降低平均功率消耗，以提供一个灵敏度较佳的方法。Therefore, the present invention can determine the budget relationship value according to the time occupied by each stage in the previous cycle time T (the first time, the second time and the third time) and the average power consumption of each stage, so as to dynamically determine the length of the recognition interval time Ti according to the budget relationship value, and then judge whether it is necessary to recognize the voice data (execute the second stage ST2 or the third stage ST3) accordingly, so that the voice recognition can be dynamically performed according to the power consumption of the actual operation. Therefore, the present invention can take the user experience into consideration when performing dynamic voice recognition, and can reduce the average power consumption when triggering the search for the preset voice command in the standby mode, so as to provide a method with better sensitivity.

以上所述的实施例仅系为说明本发明的技术思想及特点，其目的在使熟悉此项技术者能够了解本发明的内容并据以实施，当不能以的限定本发明的专利范围，即大凡依本发明所揭示的精神所作的均等变化或修饰，仍应涵盖在本发明的专利范围内。The embodiments described above are only for illustrating the technical ideas and features of the present invention, and their purpose is to enable those familiar with the technology to understand the contents of the present invention and implement them accordingly. They cannot be used to limit the patent scope of the present invention. That is, all equivalent changes or modifications made according to the spirit disclosed by the present invention should still be included in the patent scope of the present invention.

Claims

1. A dynamic speech recognition method, comprising:

A first phase is performed:

detecting sound data by a digital microphone and storing the sound data in a first memory;

Detecting voice in the voice data to generate a voice detection signal; and

Selectively determining to execute a second stage or a third stage by a first processing circuit according to a total effective data amount, a transmission bit rate of the digital microphone and an identification interval time;

the second phase is performed:

the first processing circuit outputs a first instruction to a second processing circuit, and the second processing circuit enables a memory access circuit to transfer the sound data to a second memory and store the sound data as voice data according to the first instruction; and

The third phase is performed:

The first processing circuit outputs a second instruction to the second processing circuit, the second processing circuit enables the memory access circuit to transfer the sound data to the second memory and store the sound data as the voice data according to the second instruction, and the second processing circuit confirms whether the voice data in the second memory matches a preset voice instruction, wherein the first processing circuit determines to execute the second stage when the total effective data amount is smaller than the product of the transmission bit rate of the digital microphone and the identification interval time; and when the total effective data amount is greater than or equal to the product of the transmission bit rate of the digital microphone and the identification interval time, the first processing circuit determines to execute the third stage, wherein the total effective data amount is the sum of the effective data amount of the first memory and the effective data amount of the second memory.

2. The method of claim 1, wherein the first processing circuit outputs the first command or the second command after the recognition interval time when the first processing circuit receives the voice detection signal.

3. The method of claim 2, wherein the recognition interval is determined by a budget relationship value, the recognition interval being 2 seconds when the budget relationship value is less than or equal to 1/3 of a period before the target average power consumption; the identification interval time is 1.5 seconds when the budget relation value is greater than the target average power consumption by 1/3 of the previous period time and less than or equal to the target average power consumption by 2/3 of the previous period time; and the identification interval is 1 second when the budget is greater than the target average power consumption by 2/3 of the previous period.

4. The method of claim 3, wherein the budget relationship value is the target average power consumption for the previous period of time- (the first average power consumption for the first phase of time + the first average power consumption for the second phase of time + the second average power consumption for the second phase of time + the third average power consumption for the third phase of time), wherein the previous period of time is equal to the sum of the first time, the second time and the third time.

5. The method of claim 4, wherein the third average power consumption is greater than the second average power consumption, and the second average power consumption is greater than the first average power consumption.

6. The method of claim 1, wherein after the step of generating the voice detection signal, further comprising: judging whether the first memory is full of the sound data, and continuing to proceed to the next step when the sound data is full.

7. The method of claim 1, wherein in the step of performing the first stage, after selectively determining to perform the second stage or the third stage, further comprising: the first processing circuit wakes up the second processing circuit.

8. The method of claim 1, wherein the first processing circuit wakes up the second processing circuit when the first processing circuit outputs the first instruction or the second instruction.

9. A dynamic speech recognition device, comprising:

a digital microphone for detecting a sound data;

A first memory electrically connected to the digital microphone for storing the sound data;

a voice activity detection circuit electrically connected to the digital microphone for detecting the voice data and generating a voice detection signal;

a memory access circuit electrically connected to the first memory, the memory access circuit transferring the audio data to a second memory for storing as a voice data;

A first processing circuit electrically connected to the voice activity detection circuit; and

The second processing circuit is electrically connected with the first processing circuit, the second memory and the memory access circuit;

the dynamic voice recognition device is used for executing the following steps:

A first phase is performed:

detecting the sound data by the digital microphone and storing the sound data in the first memory;

The voice activity detection circuit detects voice in the voice data to generate the voice detection signal; and

Selectively determining to execute a second stage or a third stage by the first processing circuit according to a total effective data amount, a transmission bit rate of the digital microphone and an identification interval time;

the second phase is performed:

The first processing circuit outputs a first instruction to the second processing circuit, and the second processing circuit enables the memory access circuit to transfer the sound data to the second memory and store the sound data as the voice data according to the first instruction; and

The third phase is performed: