CN101154385A - Control method for robot voice motion and its control system - Google Patents
Control method for robot voice motion and its control system Download PDFInfo
- Publication number
- CN101154385A CN101154385A CNA200610047912XA CN200610047912A CN101154385A CN 101154385 A CN101154385 A CN 101154385A CN A200610047912X A CNA200610047912X A CN A200610047912XA CN 200610047912 A CN200610047912 A CN 200610047912A CN 101154385 A CN101154385 A CN 101154385A
- Authority
- CN
- China
- Prior art keywords
- voice
- robot
- control method
- speech
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Manipulator (AREA)
Abstract
The invention belongs to the simulation robot computer control application field in computer intelligent control and robot speech technology, in particular to a robot speech control method and control system, which adopts speech digital filtering, judgment and extraction of speech signal feature initial point, acoustic model and pattern matching to realize speech signal conversion, comprises the following steps: (1) speech signal and data are read; (2) a digital filtering module is invoked; (3) judgment and extraction of speech signal feature initial point are carried out; if the step succeeds, the step (5) is executed, otherwise the step (4) is executed; (4) the length of data series is adjusted; (5) speech time and speech characteristic value of the initial point and the stop point of the speech signal are calculated; (6) acoustic model and pattern matching is completed; (7) information or motion of optimum feature result is obtained. Moreover, the system comprises a robot speech signal conversion part, a phoneme processing part, a DSP digital processing part, an execution unit drive control part, an actuator and a feedback part.
Description
Technical field
The invention belongs to emulated robot computer control application in computer intelligence control and the robot voice technology, relate in particular to a kind of control method of robot voice action and the control system that is adopted thereof.
Background technology
Robot voice is that machine is by voice or recognizing voice of importing and the technology that voice signal is changed into corresponding information or action.As the particular study field, voice technology is again a cross discipline, and it closely links to each other with numerous subjects such as acoustics, phonetics, linguistics, digital signal processing theory, information theory, control theory, computer science.
Anthropomorphic robot has had human resemblance at present, so the easier living environment that adapts to the mankind has broad application prospects.Anthropomorphic robot is high-tech integrated platform, has become the commanding elevation of Robotics development.Carry out anthropomorphic robot research, not only promote multidisciplinary development such as mechano-electronic, sensing control, artificial intelligence, and will open up the new application of following home services, social amusement, dangerous operation and military aspect.A kind of robot---the most anthropoid robot of professor's Shi Heihao development of Osaka, Japan university.Black great and the most perfect robot that his Osaka university research team produces of stone.In June, 2005, during Aichi, Janpan World's Fair, Shi Heihao " Repliee Q1 " appears on public occasions for the first time.The lip type of this robot can change along with the difference of pronunciation.So being controlled at of the speech action of emulated robot has important meaning on the robot application.
The robot voice technology is difficultly in the basic algorithm design technology of computing machine to grasp but also be extremely important technology, its application is very extensive, therefore, the robot voice technology has crucial effects and meaning to improving computerized algorithm design and analysis level and solving the actual computation problem.
Summary of the invention
The present invention is intended to overcome the deficiencies in the prior art part, and provide a kind of have action response speed and sensitivity preferably, the effect that action combines with voice is true to nature, when robot carries out speech exchange, the sound that can realize anthropomorphic robot is consistent with the lip action, make its expression abundanter, control method that the robot voice more true to nature that moves moves and the control system that is adopted thereof.
The object of the present invention is achieved like this: the control method of a kind of robot voice action, mainly adopt the judgement of speech digit filtering, phonic signal character starting point and extraction, acoustic model and pattern match and voice signal changed; The present invention can implement as follows:
1) reads voice signal and data.
2) call number filtration module.
3) carry out judgement of phonetic feature starting point and extraction, if success is carried out five, otherwise carried out four.
4) adjust data sequence length.
5) terminal of voice signal, voice time and phonetic feature value.
6) carry out acoustic model and pattern match.
7) corresponding information of Zui Jia feature result or action.
As a kind of preferred version, the present invention is in execution in step 6) after, continue and implement to reduce restriction point; If success, execution in step 7), otherwise continue to implement to reduce restriction point.
As another kind of preferred version, speech digit filtering of the present invention can adopt FIR filtering and multiply-accumulate function to realize.
As the third preferred version, the present invention can compare the maximal value Max and the minimum M in of voice signal simultaneously in conjunction with the terminal of voice signal, and computing voice time and phonetic feature value go out the corresponding action data with this correspondence that is for further processing.
The present invention can adopt regular template of non-linear time that described acoustic model and pattern are mated.
The control system of robot voice action, it comprises: robot voice conversion of signals part, phoneme processing section, DSP digitized processing part, execution unit drive control part, topworks, feedback fraction; The analog voice signal of outside input is after the A/D conversion is sent into the phoneme processing section and carried out the calculating of acoustic model unit, partly carry out corresponding data processing through the DSP digitized processing again, and relevant control signal is transferred to the execution unit drive control part to drive topworks; Feedback fraction will feed back to DSP digitized processing part in real time from the operating state signal of topworks simultaneously.
As a kind of preferred version, robot voice conversion of signals part of the present invention can adopt chip U1.
As another kind of preferred version, execution unit drive control part of the present invention adopts chip U2 and triode Q1, triode Q2, triode Q3, triode Q4, triode Q5 and triode Q6.
Topworks of the present invention adopts motor M G1; Described feedback fraction comprises resistance R 12, resistance R 13.
In addition, DSP digitized processing part of the present invention can adopt chip U3.
The present invention is to the effect true to nature that is combined with of the voice of humanoid robot and action, have action response speed and sensitivity preferably, adjusting by control circuit when robot carries out speech exchange, the sound of anthropomorphic robot and the consistance of lip action have been realized, make the expression of anthropomorphic robot abundanter, move more true to nature; When the machine person to person engages in the dialogue interchange, reached approaching more human effect on the sound and the form of expression, thereby satisfied human the demand of anthropomorphic robot in function aspects.
Description of drawings
The invention will be further described below in conjunction with the drawings and specific embodiments.Protection scope of the present invention will not only be confined to following statement.
Fig. 1 is the sampling principle block diagram to filtering of the present invention;
Fig. 2 is voice acoustic model of the present invention and pattern match schematic diagram;
Fig. 3 is a kind of control method process flow diagram of the present invention to the robot voice action;
Fig. 4 is the another kind of control method process flow diagram of the present invention to the robot voice action;
Fig. 5 is the control circuit system theory diagram of the present invention to speech action;
Fig. 6 is a speech action processing flow chart of the present invention;
Fig. 7 is a speech analysis system basic block diagram of the present invention;
Fig. 8 is circuit theory diagrams of the present invention.
Embodiment
Shown in Fig. 1~4, the characteristics of the voice signal of robot application itself comprise polytrope, dynamic, instantaneity and continuity etc.; But the voice signal on the time domain is difficult to directly use, therefore we need extract the feature of voice from voice signal, obtain the essential characteristic of voice, the extraction of voice acoustic feature is a process that information is compressed significantly, because the time-varying characteristics of voice signal, feature extraction must be carried out on a bit of speech frame signal, also promptly carries out short-time analysis.Usually to carry out pre-emphasis to promote signal and to carry out filtering to signal.The performance of voice system is subjected to the influence of many factors, comprises different voice, tongue, environmental noise, transmission channel or the like.To the raising system overcome the ability that these uncertain factors influence, make system's stable performance under different applied environments, condition, improve system robustness.According to different influence source, automatically, targetedly voice system is adjusted, make voice system have superperformance.
Below to the digital filtering of the voice that improve system performance, the technical method that voice signal changes corresponding information or action into is introduced respectively based on the judgement of feature starting point and extraction, acoustic model and the pattern match etc. of the voice signal of speech frame.
(1) digital filtering of voice
The voice signal of robot needs filtering so that further carry out characteristic processing.System applies FIR filtering and multiply-accumulate function realize.
Specific implementation:
It is as follows that the complicacy of audio digital signals FIR filter response needs each sampling period to carry out the add up form of summarizing instruction of digital MAC operation:
MR=[Rd]
*[Rs]{,ss}{,n};
MR=[Rd]
*[Rs],us{,n};
In the formula:
MR is the register pair that is used to add up, and is served as by R3, R4;
Rd is a destination register, is used herein to audio digital signals sampled data pointer;
Rs is a source-register, is used herein to audio digital signals coefficient register pointer;
N is for participating in the audio digital signals sample number of filtering operation;
Ss is the setting that signed number multiplies each other, and default settings promptly for this reason; Us is the setting that unsigned number multiplies each other.
The sample number of supposing to participate in filtering is 4, and the memory content that points to respectively by audio digital signals sampled data pointer R1 and coefficient register pointer R2 before execution command shown in Fig. 1 (a).So, when having carried out command M R=[R1] * [R2], after 4, can produce the following action:
1) carrying out MAC after the totalizer MR zero clearing calculates:
MR=C
1*X
n-1+C
2*X
n-2+C
3*X
n-3+C
4*
X
n-4;
And the moved right position of the individual word of n (n=4) of pointer is shown in Fig. 1 (b).
2) content (sampled data sample) of the storer of audio digital signals pointer R1 sensing has moved forward the position of a word.Its meaning is Xn can be left in after the Xn-1 in regular turn, and the oldest data sample (Xn-4) can be replaced by time old data sample (Xn-3), as shown in Figure 1 when the data sample (Xn) of sampling new.
To the noise in the voice system, comprise that electronic noise, transmitted noise that neighbourhood noise and speech processes process add have carried out effective filtering by above operation.Strengthen system to the insensitive characteristic of noise, guaranteed the accurate of audio digital signals sampled data, reached the functional effect of expection.
(2) the feature starting point of voice signal is judged and is extracted
The feature request of voice signal is accurate.Method is with an average characteristics E:
E=Mn/N
In the formula,
N is a data sequence length.
Program is at first stored the measurement data by 10 frame ENERGY E of this static state that begins constantly, compares maximal value ITU and minimum value ITL simultaneously.Data that this is later and maximal value ITU and least energy ITL do comprehensive judgement.Earlier according to ITL, the initial starting point N1 that ITU calculates is defined as the frame number that average amplitude is raised to ITL at first by N1 during decision logic.But if along with moving after the time, frame amplitude dropped to before being raised to ITU under the ITL, and then original N1 is as original state, and claimed that the next point that is raised to ITL again is N1, and the like.If there is continuous m frame to surpass maximal value ITU, just judge that then this is a starting point.
According to the point that passes through the start-stop of voice signal, compare maximal value Max and minimum M in simultaneously, computing voice time and phonetic feature value go out the corresponding action data with this correspondence that is for further processing.
(3) acoustic model and pattern match
Because voice signal has sizable randomness, therefore should adopt regular template matching algorithm of non-linear time.
Algorithm is described with Fig. 2, and imagination has two sequence Q and C (suppose that Q is a sequence to be matched, C is a template sequence), and length is respectively m and n, is expressed as follows
Q=q1,q2,...,qi,...,qn (1)
C=c1,c2,...,cj,...,cm (2)
Will with the coupling these two sequences, we construct the matrix of a m * n, the entry of a matrix element (i, j) number element be sequence qi and cj " distance " d (qi, cj), (get d (q, c)=(q-c)
2), we obtain the regular matrix between a Q and the C sequence like this.Hacures among Fig. 2 are the optimal paths that obtain.
Seek optimal path and be from the final stage of process and begin, promptly optimizing decision is the decision process of backward.The time of carrying out is when regular, all will consider can reach along y direction all possible point (promptly in allowing the zone have a few) of the currency of i for each i value, limited by the path and can reduce these possible points, and obtain several possible previous points.May put and to find out best previous point by the total cost function formula for each, obtain the cost of this point.Along with the carrying out of process, bifurcated is wanted in the path, and the possibility of bifurcated also constantly increases.Constantly repeat this process, obtain from (n, m) optimal path of putting to (1,1).
Regular result will satisfy
D[Q (i) wherein, C (w (i))] be the distance measure between i frame test vector Q (i) and the j frame template vector C (j); D is in the coupling path between two vectors under the regular situation of optimal time.
Shortest path can obtain by the cycle calculations following formula, wherein, distance adds up and γ (i, j) (cumulative distance) 2 of being before to have obtained (i, j) minor increment that is adjacent element adds up and sum apart from d.
γ (i, j)=d (qi, cj)+min{ γ (i-1, j-1), γ (i-1, j), γ (i, j-1) } mate and compare with acoustics model (pattern) with this phonetic feature input, obtain best feature result.
To the control method of robot voice action, its step is as follows:
1) reads voice signal and data.Master routine calls by the audio digital signals of the process A/D conversion of MIC passage input or the digital voice data in the system memory unit.
2) call number filtration module.To the noise in the voice system, comprise that electronic noise, transmitted noise that neighbourhood noise and speech processes process add have carried out effective filtering.Strengthen system to the insensitive characteristic of noise, guaranteed the accurate of voice signal sampled data.
3) carry out judgement of phonetic feature starting point and extraction, if success is carried out five, otherwise carried out four.Program is at first in the measurement data of 10 frame ENERGY E of these static voice that begin constantly of storage, simultaneously relatively maximal value ITU and minimum value ITL.Data that this is later and maximal value ITU and least energy ITL do comprehensive judgement.Earlier according to ITL, the initial starting point N1 that ITU calculates is defined as the frame number that average amplitude is raised to ITL at first by N1 during decision logic.But if along with moving after the time, frame amplitude dropped to before being raised to ITU under the ITL, and then original N1 is as original state, and claimed that the next point that is raised to ITL again is N1, and the like.If there is continuous m frame to surpass maximal value ITU, just judge that then this is a starting point.
4) adjust data sequence length.If do not determine starting point, then adjust i.e. (m) frame number of data sequence length, repeating step 3 is up to the starting point that calculates voice signal.
5) terminal of voice signal, voice time and phonetic feature value.According to the point that passes through the start-stop of voice signal, compare maximal value Max and minimum M in simultaneously, computing voice time and phonetic feature value go out the corresponding action data with this correspondence that is for further processing.
6) carry out acoustic model and pattern match.Acoustic model and reference template are carried out the time contrast, measure the similarity degree that draws between two templates.
7) corresponding information of Zui Jia feature result or action.Coupling according to acoustic model and pattern draws eigenwert.According to the similar degree of feature mode, draw eigenwert.According to the feature result of voice, application program can be provided with corresponding Information sign and move so that program calls or be provided with the lip that matches with voice accordingly when handling.
Proceeded for the 1st step, the next voice unit of analytical calculation.
As shown in Figure 4, to the control method of robot voice action, its step is as follows:
1) reads voice signal.Master routine calls by the audio digital signals of the process A/D conversion of MIC passage input or the digital voice data in the system memory unit.
2) call number filtration module.To the noise in the voice system, comprise that electronic noise, transmitted noise that neighbourhood noise and speech processes process add have carried out effective filtering.Strengthen system to the insensitive characteristic of noise, guaranteed the accurate of voice signal sampled data.
3) carry out judgement of phonetic feature starting point and extraction, if success is carried out five, otherwise carried out four.Program is at first in the measurement data of 10 frame ENERGY E of these static voice that begin constantly of storage, simultaneously relatively maximal value ITU and minimum value ITL.Data that this is later and maximal value ITU and least energy ITL do comprehensive judgement.Earlier according to ITL, the initial starting point N1 that ITU calculates is defined as the frame number that average amplitude is raised to ITL at first by N1 during decision logic.But if along with moving after the time, frame amplitude dropped to before being raised to ITU under the ITL, and then original N1 is as original state, and claimed that the next point that is raised to ITL again is N1, and the like.If there is continuous m frame to surpass maximal value ITU, just judge that then this is a starting point.
4) adjust data sequence length.If do not determine starting point, then adjust i.e. (m) frame number of data sequence length, repeating step 3 is up to the starting point that calculates voice signal.
5) terminal of voice signal.According to the point that passes through the start-stop of voice signal, compare maximal value Max and minimum M in simultaneously, computing voice time and phonetic feature value go out the corresponding action data with this correspondence that is for further processing.
6) voice time and phonetic feature value.Computing according to 3~5 steps calculates voice time and phonetic feature value.
7) call the acoustic model storehouse.Call the model bank that voice system is stored in storage unit.
8) carry out acoustic model and pattern match, if success is carried out ten, otherwise carried out nine.Acoustic model and reference template are carried out the time contrast, measure the similarity degree that draws between two templates.
9) reduce restriction point.Reduce the contrast restriction point of recognition template and reference template if coupling is unsuccessful.
10) draw best feature result.Coupling according to acoustic model and pattern draws eigenwert.
11) carry out corresponding information or action.According to the feature result of voice, application program can be provided with corresponding Information sign and move so that program calls or be provided with the lip that matches with voice accordingly when handling.
Proceeded for the 1st step, the next voice unit of analytical calculation.
Sonification techniques is very helpful to strengthening the man-machine communication.Now robot is that play-back technology by computing machine comes out sound playing, does not have the action of lip to cooperate, and is more stiff, the while robot witless expression.The voice of robot are handled through phoneme, carry out the calculating of acoustic model cell size (word pronunciation model, semitone joint model and phoneme model), after passing through pre-filtering, sampling and quantification, windowing, end-point detection, the pre-emphasis etc. of DSP then, be transferred to the motor-driven control section, carry out the drive motor action, simultaneously feedback circuit operating state that the operating state of motor is gathered real-time voice lip type feeds back to DSP, moves adjustments, to realize that voice lip type moves accurately.
Mainly carry out data processing,, voice signal is divided into overlapping some frames are arranged, each frame is extracted phonetic feature based on the feature extraction of speech frame.Because the voice signal frequency is no more than 3400Hz usually, so according to Nyquist's theorem, the employing sampling rate is that the frequency of 8kHz meets the demands.
System of the present invention comprises: main control part, amplifier, driver, feedback circuit, topworks and voice pre-service, feature extraction, measurement estimation, aftertreatment etc.
Shown in Fig. 5~8, the control system of robot voice action comprises: robot voice conversion of signals part, phoneme processing section, DSP digitized processing part, execution unit drive control part, topworks, feedback fraction; The analog voice signal of outside input is after the A/D conversion is sent into the phoneme processing section and carried out the calculating of acoustic model unit, partly carry out corresponding data processing through the DSP digitized processing again, and relevant control signal is transferred to the execution unit drive control part to drive topworks; Feedback fraction will feed back to DSP digitized processing part in real time from the operating state signal of topworks simultaneously; Described robot voice conversion of signals partly adopts chip U1.Described execution unit drive control part employing chip U2 and triode triode Q1, triode Q2, triode Q3, triode Q4, triode Q5, triode Q6; Described topworks adopts motor M G1; Described feedback fraction comprises resistance R 12, resistance R 13; Described DSP digitized processing partly adopts chip U3.
As Fig. 5 to shown in the control circuit system theory diagram of speech action, native system in use, voice signal by the simulation of MIC input becomes digital signal after finishing the A/D conversion, robot system is carried out phoneme with audio digital signals and is handled, carry out acoustic model unit (word pronunciation model, semitone joint model and phoneme model) calculating, pass through the pre-filtering of DSP then, sampling and quantification, windowing, end-point detection, laggard line data such as pre-emphasis is handled, carry out feature extraction based on speech frame, voice signal is divided into overlapping some frames are arranged, each frame is extracted phonetic feature.Because the voice signal frequency is no more than 3400Hz usually, so according to Nyquist's theorem, the employing sampling rate is that the frequency of 8kHz meets the demands.Then relevant control signal is transferred to the motor-driven control section, carry out the drive motor action, the operating state that the while feedback circuit is gathered real-time voice lip type the operating state of motor feeds back to the DSP part, in time move adjustment by system, to realize voice lip type action accurately.
1) in the control circuit system principle of speech action, as Fig. 5, the voice signal of input simulation becomes digital signal after finishing the A/D conversion, but the voice signal on the time domain is difficult to directly use, therefore we need extract the feature of voice from voice signal, the voice of robot are handled through phoneme, carry out acoustic model cell size (word pronunciation model, semitone joint model and phoneme model) processing, obtain the essential characteristic of voice, mainly carry out the pre-filtering that data processing is so promptly passed through DSP, sampling and quantification, feature extraction based on speech frame, end-point detection, voice signal is divided into overlapping some frames are arranged, each frame is extracted phonetic feature pre-emphasis etc., can obtain the essential characteristic of voice on the one hand, also play the effect of data compression on the other hand.The model of robot voice system is made up of acoustic model and language model two parts usually.
Acoustic model is the bottom model of system, and the purpose of acoustic model provides a kind of feature vector sequence of effective method computing voice and the distance between each pronunciation template.The design of acoustic model is closely related with the language pronouncing characteristics.The acoustic model unit is to amount of voice data size, system effectiveness, and dirigibility has bigger influence.After DSP carries out the acoustic model data processing, and be transferred to the motor-driven control section, and carrying out the drive motor action, the operating state that the while feedback circuit is gathered real-time voice lip type the operating state of motor feeds back to DSP, move adjustment, to realize speech action accurately.
2) robot voice is the technology that voice signal is changed into corresponding information or action, and the characteristics of voice signal itself comprise polytrope, dynamic, instantaneity and continuity etc.
Voice system is a kind of multi-dimensional model system in essence.Voice technology is based on the basic theories of statistical model, and voice system is divided into three parts:
(1) phonetic feature extracts: its objective is to extract time dependent phonetic feature sequence from speech waveform.
(2) acoustic model and pattern match: acoustic model usually with the phonetic feature of the phonetic feature that obtains and input with acoustics model (pattern) mate with relatively, obtain best feature result.
(3) language model and Language Processing: language model comprises grammer network that is made of voice command or the language model that is made of statistical method, and Language Processing can be carried out morpheme analysis.General speech action processing flow chart is as shown in Figure 6:
Speech analysis system basic structure
A kind of digital signal that digital filter (Digital Filter) will be imported or sequence transformation are another kind of sequence output, and digital filter is applied to digital speech, Digital Image Processing and spectrum analysis etc.The logical y of digital signal processor (Digital Signal Processor) crosses a series of numerals and represents signal and information thereof, and by numerical calculation method conversion and these signals of processing.In order to constitute DSP, have a kind of parts can finish the multiplying of two numerical value apace and with product accumulation in register.(MAC, Multiply ﹠amp take advantage of and add up; ACcumulate) higher arithmetic speed should be arranged.The hardware configuration of speech analysis system is enough to constitute the hardware MAC unit that DSP uses with combining of its order set, is applicable to the application of these aspects of DSP.
Audio digital signals is handled
It is to be based upon on the DSP hardware foundation that audio digital signals is handled.Usually DSP is divided into fixed point and floating-point two classes by the complexity of computing, and fixed DSP adopts integer arithmetic, for jumbo voice pre-service application scenario; Floating-point DSP is used for high-performance and complicated real arithmetic, the computing that the frequency displacement during suitable voice signal is handled is handled.
We are referred to as " model bank " with the storage space of mode standard, set up model bank, exactly pending voice command are carried out spectrum analysis, extract the mode standard of characteristic parameter as voice.
Analytic process is at first wanted the noise of filtering input speech signal and is carried out pre-emphasis and handle, promote high fdrequency component, carry out spectrum analysis with methods such as linear predictor coefficients then, the characteristic parameter of finding out voice is as unknown pattern, then the mode standard with storage in advance compares, when the unknown pattern of input is consistent with the feature of mode standard,, produce action data output just by system validation.If the voice of input and the feature of mode standard are in full accord no doubt good, but voice contain uncertain factor, on all four condition does not often exist, therefore, pre-establish the feature mode of calculating input voice and the similar degree of each feature mode, promptly the most similar pattern apart from minimum, produces action data to this as the means of corresponding voice.
Speech analysis system basic structure is as shown in Figure 7:
Shown in Fig. 4 native system circuit theory diagrams, the voice signal that is carried out the simulation of MIC input by U1 becomes digital signal after finishing the A/D conversion, robot system with audio digital signals carry out, phoneme handles, carry out the calculating of acoustic model unit (word pronunciation model, semitone joint model and phoneme model), laggard line data such as digital filtering, sampling and the quantification of process DSP, windowing, end-point detection, pre-emphasis are handled then, carry out feature extraction based on speech frame, voice signal is divided into overlapping some frames are arranged, each frame is extracted phonetic feature.Analytic process is at first wanted the noise of filtering input speech signal and is carried out pre-emphasis and handle, promote high fdrequency component, carry out spectrum analysis with methods such as linear predictor coefficients then, the characteristic parameter of finding out voice is as unknown pattern, then the mode standard with storage in advance compares, when the unknown pattern of input is consistent with the feature of mode standard,, produce action data output by system validation.Relevant control signal is transferred to motor-driven control section U2 and Q1, Q2, Q3, Q4, Q5, Q6 drive, carry out external action by the MG1 motor, R12, R13 etc. feed back the executing location of motor, carry out signal Processing by U3, passing to the U1 feedback circuit gathers the operating state of motor, the operating state of real-time voice lip type feeds back to the DSP part, in time moves adjustment by system, to realize voice lip type action accurately.
As shown in Figure 8, VCC is connected with R12-1, R12-2, R13-1, R13-2.VDD is connected with Q1-1, Q4-1, R1-1, U1-7, U1-15, U1-36, U1-51, U1-52, U1-75, U2-36U3-36.GND with, C1-1, C2-1, C3-2, C4-1, C5-1, C6-2, C10-2, C11-2, C12-2, C 13-2, LS1-2, Q5-1, Q6-1, R6-2, R9-1, S1-2, S2-2, U1-9, U1-19, U1-24, U1-38, U1-49, U1-50, U1-62, U2-3, U2-14, U3-3, U3-14 be connected.D0 is connected with U1-53.D1 is connected with U1-54.D2 is connected with U1-55.D3 is connected with U1-56.D4 is connected with U1-57.D5 is connected with U1-58.D6 is connected with U1-59.D7 is connected with U1-60.IOD0 is connected with U1-41, U2-11, U3-11.IOD1 is connected with U1-42, U2-10, U3-10.IOD2 is connected with U1-43, U2-9, U3-9.IOD3 is connected with U1-44, U2-8, U3-8.IOD4 is connected with U1-45, U2-7, U3-7.IOD5 is connected with U1-46, U2-6, U3-6.IOD6 is connected with U1-47, U2-5, U3-5.IOD7 is connected with U1-48, U2-4, U3-4.IOIN0 is connected with U3-35.IOIN1 is connected with U3-34.IOIN2 is connected with U3-33.IOIN3 is connected with U3-32.IOIN4 is connected with U3-31.IOIN5 is connected with U3-30.IOIN6 is connected with U3-29.IOIN7 is connected with U3-28.IOIN8 is connected with U3-44.IOIN9 is connected with U3-43.IOIN10 is connected with U3-42.IOIN11 is connected with U3-41.IOIN12 is connected with U3-40.IOIN13 is connected with U3-39.IOIN14 is connected with U3-38.IOIN15 is connected with U3-37.IOIN16 is connected with U3-20.IOIN17 is connected with U3-21.IOIN18 is connected with U3-22.IOIN19 is connected with U3-23.IOIN20 is connected with U3-24.IOIN21 is connected with U3-25.IOOUT0 is connected with R10-2, U2-35.IOOUT1 is connected with R11-2, U2-34.IOOUT2 is connected with U2-33.IOOUT3 is connected with U2-32.IOOUT4 is connected with U2-31.IOOUT5 is connected with U2-30.IOOUT6 is connected with U2-29.IOOUT7 is connected with U2-28.IOOUT8 is connected with U2-44.IOOUT9 is connected with U2-43.IOOUT10 is connected with U2-42.IOOUT11 is connected with U2-41.IOOUT12 is connected with U2-40.IOOUT13 is connected with U2-39.IOOUT14 is connected with U2-38.IOOUT15 is connected with U2-37.IOOUT16 is connected with U2-20.IOOUT17 is connected with U2-21.NetC1_2 is connected with C1-2, R2-2, U1-8.NetC2_2 is connected with C2-2, R2-1.NetC3_1 is connected with C3-1, R1-2, U1-6.NetC6_1 is connected with C6-1, R3-2, R4-1.NetC7_1 is connected with C7-1, R5-2, U1-33.NetC7_2 is connected with C7-2, MK1-1, R3-1.NetC8_1 is connected with C8-1, R7-2, U1-28.NetC8_2 is connected with C8-2, MK1-2, R6-1.NetC9_2 is connected with C9-2, U1-27.NetC10_1 is connected with C10-1, C11-1, R5-1, R7-1, U1-34.NetMG1_2 is connected with MG1-2, Q4-2, Q6-2.NetQ1_2 is connected with MG1-1, Q1-2, Q5-2.NetQ1_3 is connected with Q1-3, Q2-2.NetQ2_1 is connected with Q2-1, R15-2.NetQ2_3 is connected with Q2-3, R10-1.NetQ3_2 is connected with Q3-2, Q4-3.NetQ3_3 is connected with Q3-3, R11-1.NetQ5_3 is connected with Q5-3, R14-1.NetQ6_3 is connected with Q6-3, R15-1.NetR4_2 is connected with R4-2, U1-37.NetR8_1 is connected with C9-1, R8-1.NetR8_2 is connected with C12-1, R8-2, U1-26.NetR14_2 is connected with Q3-1, R14-2.NetU1_13 is connected with C5-2, U1-13, Y1-2.NetU1_2 is connected with LS1-1, U1-21.NetU1_25 is connected with C13-1, R9-2, U1-25.NetY1_1 is connected with C4-2, U1-12, Y1-1.PC0-PC1 is connected with U2-16, U3-17.PC1-IOB5 is connected with U1-81, U2-17.PMC0-IOB3 is connected with U1-2, U2-2.PMC1-IOB4 is connected with U1-1, U2-1.
The resource that system realizes:
2 16 programmable timer/counters (can preset initial count value automatically);
2 10 DAC (number-Mo conversion) output channel;
32 general programmable input/output end ports;
14 interrupt sources can be come self-timer A/B, the time base, the input of 2 external clock references;
The phase-locked loop pll oscillator provides clock signal of system;
7 path 10 position voltage analog-digital converters (ADC) and single channel sound analog-digital converter;
Sound analog-digital converter input channel built-in microphone amplifier and automatic gain control (AGC) function;
Has low-voltage (LVR) function and low-voltage monitoring (LVD) function that resets;
Built-in online artificial circuit ICE (In-Circuit Emulator) interface.
Protection scope of the present invention not only is confined to foregoing, and any all should regarding as in the replacement that does not break away from claim scope of the present invention, conversion and modification falls within protection scope of the present invention.
Claims (10)
1. the control method of robot voice action mainly adopts the judgement of speech digit filtering, phonic signal character starting point and extraction, acoustic model and pattern match and voice signal is changed; It is characterized in that, implement as follows:
1) reads voice signal and data;
2) call number filtration module;
3) carry out judgement of phonetic feature starting point and extraction, if success, execution in step (5), otherwise execution in step (4);
4) adjust data sequence length;
5) terminal of voice signal, voice time and phonetic feature value;
6) carry out acoustic model and pattern match;
7) corresponding information of Zui Jia feature result or action.
2. the control method of robot voice action according to claim 1 is characterized in that: execution in step 6), continue and implement to reduce restriction point; If success, execution in step 7), otherwise continue to implement to reduce restriction point.
3. the control method of robot voice action according to claim 1 and 2 is characterized in that: described speech digit filtering adopts FIR filtering and multiply-accumulate function to realize.
4. the control method of robot voice action according to claim 1 and 2, it is characterized in that: in conjunction with the terminal of voice signal, the maximal value Max and the minimum M in that compare voice signal simultaneously, computing voice time and phonetic feature value go out the corresponding action data with this correspondence that is for further processing.
5. the control method of robot voice action according to claim 1 and 2 is characterized in that: adopt regular template of non-linear time that described acoustic model and pattern are mated.
6. the control method of robot voice action according to claim 3, it is characterized in that: in conjunction with the terminal of voice signal, the maximal value Max and the minimum M in that compare voice signal simultaneously, computing voice time and phonetic feature value go out the corresponding action data with this correspondence that is for further processing.
7. the control method of robot voice action according to claim 3 is characterized in that: adopt regular template of non-linear time that described acoustic model and pattern are mated.
8. according to the control system that control method adopted of the described robot voice of claim 1 action, it is characterized in that, comprising: robot voice conversion of signals part, phoneme processing section, DSP digitized processing part, execution unit drive control part, topworks, feedback fraction; The analog voice signal of outside input is after the A/D conversion is sent into the phoneme processing section and carried out the calculating of acoustic model unit, partly carry out corresponding data processing through the DSP digitized processing again, and relevant control signal is transferred to the execution unit drive control part to drive topworks; Feedback fraction will feed back to DSP digitized processing part in real time from the operating state signal of topworks simultaneously.
9. according to the control system that control method adopted of the described robot voice action of claim 8, it is characterized in that: described DSP digitized processing partly adopts chip (U1).
10. according to the control system that control method adopted of claim 8 or 9 described robot voice actions, it is characterized in that: described execution unit drive control part adopts chip (U2) and triode (Q1, Q2, Q3, Q4, Q5, Q6).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA200610047912XA CN101154385A (en) | 2006-09-28 | 2006-09-28 | Control method for robot voice motion and its control system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA200610047912XA CN101154385A (en) | 2006-09-28 | 2006-09-28 | Control method for robot voice motion and its control system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101154385A true CN101154385A (en) | 2008-04-02 |
Family
ID=39256001
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNA200610047912XA Pending CN101154385A (en) | 2006-09-28 | 2006-09-28 | Control method for robot voice motion and its control system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101154385A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102184732A (en) * | 2011-04-28 | 2011-09-14 | 重庆邮电大学 | Fractal-feature-based intelligent wheelchair voice identification control method and system |
CN105845139A (en) * | 2016-05-20 | 2016-08-10 | 北方民族大学 | Off-line speech control method and device |
CN107240401A (en) * | 2017-06-13 | 2017-10-10 | 厦门美图之家科技有限公司 | A kind of tone color conversion method and computing device |
CN108520741A (en) * | 2018-04-12 | 2018-09-11 | 科大讯飞股份有限公司 | A kind of whispering voice restoration methods, device, equipment and readable storage medium storing program for executing |
TWI735168B (en) * | 2020-02-27 | 2021-08-01 | 東元電機股份有限公司 | Voice robot |
CN113359538A (en) * | 2020-03-05 | 2021-09-07 | 东元电机股份有限公司 | Voice control robot |
-
2006
- 2006-09-28 CN CNA200610047912XA patent/CN101154385A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102184732A (en) * | 2011-04-28 | 2011-09-14 | 重庆邮电大学 | Fractal-feature-based intelligent wheelchair voice identification control method and system |
CN105845139A (en) * | 2016-05-20 | 2016-08-10 | 北方民族大学 | Off-line speech control method and device |
CN107240401A (en) * | 2017-06-13 | 2017-10-10 | 厦门美图之家科技有限公司 | A kind of tone color conversion method and computing device |
CN108520741A (en) * | 2018-04-12 | 2018-09-11 | 科大讯飞股份有限公司 | A kind of whispering voice restoration methods, device, equipment and readable storage medium storing program for executing |
US11508366B2 (en) | 2018-04-12 | 2022-11-22 | Iflytek Co., Ltd. | Whispering voice recovery method, apparatus and device, and readable storage medium |
TWI735168B (en) * | 2020-02-27 | 2021-08-01 | 東元電機股份有限公司 | Voice robot |
CN113359538A (en) * | 2020-03-05 | 2021-09-07 | 东元电机股份有限公司 | Voice control robot |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101154385A (en) | Control method for robot voice motion and its control system | |
Chambers et al. | Hierarchical recognition of intentional human gestures for sports video annotation | |
CN108268452A (en) | A kind of professional domain machine synchronous translation device and method based on deep learning | |
CN107316638A (en) | A kind of poem recites evaluating method and system, a kind of terminal and storage medium | |
CN103065629A (en) | Speech recognition system of humanoid robot | |
CN104102627A (en) | Multi-mode non-contact emotion analyzing and recording system | |
CN106782591A (en) | A kind of devices and methods therefor that phonetic recognization rate is improved under background noise | |
CN110047474A (en) | A kind of English phonetic pronunciation intelligent training system and training method | |
JP7335569B2 (en) | Speech recognition method, device and electronic equipment | |
CN200953239Y (en) | Robot sound action control system | |
Liu et al. | Design and implementation of human-computer interaction intelligent system based on speech control | |
Zhou et al. | Meta-SE: a meta-learning framework for few-shot speech enhancement | |
ATE255762T1 (en) | NON-INFLUENCING DETERMINATION OF LANGUAGE QUALITY | |
CN214504972U (en) | Intelligent musical instrument | |
Jin | Design of Students' Spoken English Pronunciation Training System Based on Computer VB Platform. | |
Kaur | Mouse movement using speech and non-speech characteristics of human voice | |
Junling | Online learning system for English speech automatic recognition based on hidden Markov model algorithm and conditional random field algorithm | |
Sui et al. | Intelligent drumming robot for human interaction | |
Jiaqi et al. | Research on intelligent voice interaction application system based on NAO robot | |
Juang et al. | Intelligent Speech Communication Using Double Humanoid Robots. | |
Liu | Voice control system based on Zynq FPGA | |
Kenshimov et al. | Development of a Verbal Robot Hand Gesture Recognition System | |
Chen | Accelerometer-based hand gesture recognition using fuzzy learning vector quantization | |
Dong et al. | Speech interface ASIC of SOC architecture for embedded application | |
CN113257210B (en) | Multi-mode spectrum conversion method and system for copper or wooden musical instrument |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Open date: 20080402 |