US20020178009A1 - Voice controlled computer interface - Google Patents
Voice controlled computer interfaceInfo
- Publication number
- US20020178009A1 US20020178009A1 US10/102,047 US10204702A US2002178009A1 US 20020178009 A1 US20020178009 A1 US 20020178009A1 US 10204702 A US10204702 A US 10204702A US 2002178009 A1 US2002178009 A1 US 2002178009A1
- Authority
- US
- United States
- Prior art keywords
- command
- mouse
- operating system
- language
- menu
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000009471 action Effects 0.000 claims abstract description 94
- 238000006243 chemical reaction Methods 0.000 claims abstract description 6
- 238000000034 method Methods 0.000 claims description 55
- 230000004044 response Effects 0.000 claims description 9
- 230000008569 process Effects 0.000 description 30
- 238000010586 diagram Methods 0.000 description 23
- 238000004519 manufacturing process Methods 0.000 description 20
- 238000012545 processing Methods 0.000 description 9
- 241000238876 Acari Species 0.000 description 7
- 230000015654 memory Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000035945 sensitivity Effects 0.000 description 4
- 230000001755 vocal effect Effects 0.000 description 4
- 206010000210 abortion Diseases 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000003750 conditioning effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/03—Arrangements for converting the position or the displacement of a member into a coded form
- G06F3/033—Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
- G06F3/038—Control and interface arrangements therefor, e.g. drivers or device-embedded control circuitry
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Definitions
- This invention relates to voice controlled computer interfaces.
- Voice recognition systems can convert human speech into computer information.
- voice recognition systems have been used, for example, to control text-type user interfaces, e.g., the text-type interface of the disk operating system (DOS) of the IBM Personal Computer.
- DOS disk operating system
- Voice control has also been applied to graphical user interfaces, such as the one implemented by the Apple Macintosh computer, which includes icons, pop-up windows, and a mouse. These voice control systems use voiced commands to generate keyboard keystrokes.
- the invention features enabling voiced utterances to be substituted for manipulation of a pointing device, the pointing device being of the kind which is manipulated to control motion of a cursor on a computer display and to indicate desired actions associated with the position of the cursor on the display, the cursor being moved and the desired actions being aided by an operating system in the computer in response to control signals received from the pointing device, the computer also having an alphanumeric keyboard, the operating system being separately responsive to control signals received from the keyboard in accordance with a predetermined format specific to the keyboard; a voice recognizer recognizes the voiced utterance, and an interpreter converts the voiced utterance into control signals which will directly create a desired action aided by the operating system without first being converted into control signals expressed in the predetermined format specific to the keyboard.
- voiced utterances are converted to commands, expressed in a predefined command language, to be used by an operating system of a computer, converting some voiced utterances into commands corresponding to actions to be taken by said operating system, and converting other voiced utterances into commands which carry associated text strings to be used as part of text being processed in an application program running under the operating system.
- the invention features generating a table for aiding the conversion of voiced utterances to commands for use in controlling an operating system of a computer to achieve desired actions in an application program running under the operating system, the application program including menus and control buttons; the instruction sequence of the application program is parsed to identify menu entries and control buttons, and an entry is included in the table for each menu entry and control button found in the application program, each entry in the table containing a command corresponding to the menu entry or control button.
- the invention features enabling a user to create an instance in a formal language of the kind which has a strictly defined syntax; a graphically displayed list of entries are expressed in a natural language and do not comply with the syntax, the user is permitted to point to an entry on the list, and the instance corresponding to the identified entry in the list is automatically generated in response to the pointing.
- the invention enables a user to easily control the graphical interface of a computer. Any actions that the operating system can be commanded to take can be commanded by voiced utterances.
- the commands may include commands that are normally entered through the keyboard as well as commands normally entered through a mouse or any other input device.
- the user may switch back and forth between voiced utterances that correspond to commands for actions to be taken and voiced utterances that correspond to text strings to be used in an application program without giving any indication that the switch has been made.
- Any application may be made susceptible to a voice interface by automatically parsing the application instruction sequence for menus and control buttons that control the application.
- FIG. 1 is a functional block diagram of a Macintosh computer served by a Voice Navigator voice controlled interface system.
- FIG. 2A is a functional block diagram of a Language Maker system for creating word lists for use with the Voice Navigator interface of FIG. 1.
- FIG. 2B depicts the format of the voice files and word lists used with the Voice Navigator interface.
- FIG. 3 is an organizational block diagram of the Voice Navigator interface system.
- FIG. 4 is a flow diagram of the Language Maker main event loop.
- FIG. 5 is a flow diagram of the Run Edit module.
- FIG. 6 is a flow diagram of the Record Actions submodule.
- FIG. 7 is a flow diagram of the Run Modal module.
- FIG. 8 is a flow diagram of the In Button? routine.
- FIG. 9 is a flow diagram of the Event Handler module.
- FIG. 10 is a flow diagram of the Do My Menu module.
- FIGS. 11A through 11I are flow diagrams of the Language Maker menu submodules.
- FIG. 12 is a flow diagram of the Write Production module.
- FIG. 13 is a flow diagram of the Write Terminal submodule.
- FIG. 14 is a flow diagram of the Voice Control main driver loop.
- FIG. 15 is a flow diagram of the Process Input module.
- FIG. 16 is a flow diagram of the Recognize submodule.
- FIG. 17 is a flow diagram of the Process Voice Control Commands routine.
- FIG. 18 is a flow diagram of the ProcessQ module.
- FIG. 19 is a flow diagram of the Get Next submodule.
- FIG. 20 is a chart of the command handlers.
- FIGS. 21A through 21G are flow diagrams of the command handlers.
- FIG. 22 is a flow diagram of the Post Mouse routine.
- FIG. 23 is a flow diagram of the Set Mouse Down routine.
- FIGS. 24 and 25 illustrate the screen displays of Voice Control.
- FIGS. 26 through 29 illustrate the screen displays of Language Maker.
- FIG. 30 is a listing of a language file.
- a Macintosh operating system 132 provides a graphical interactive user interface by processing events received from a mouse 134 and a keyboard 136 and by providing displays including icons, windows, and menus on a display device 138 .
- Operating system 132 provides an environment in which application programs such as Macwrite 139 , desktop utilities such as Calculator 137 , and a wide variety of other programs can be run.
- the operating system 132 also receives events from the Voice Navigator voice controlled computer interface 102 to enable the user to control the computer by voiced utterances.
- the user speaks into a microphone 114 connected via a Voice Navigator box 112 to the SCSI (Small Computer Systems Interface) port of the computer 100 .
- the Voice Navigator box 112 digitizes and processes analog audio signals received from a microphone 114 , and transmits processed digitized audio signals to the Macintosh SCSI port.
- the Voice Navigator box includes an analog-to-digital converter (A/D) for digitizing the audio signal, a DSP (Digital Signal Processing) chip for compressing the resulting digital samples, and protocol interface hardware which configures the digital samples to obey the SCSI protocols.
- A/D analog-to-digital converter
- DSP Digital Signal Processing
- Recognizer Software 120 (available from Dragon Systems, Newton, Mass.) runs under the Macintosh operating system, and is controlled by internal commands 123 received from Voice Control driver 128 (which also operates under the Macintosh operating system).
- Voice Control driver 128 which also operates under the Macintosh operating system.
- One possible algorithm for implementing Recognizer Software 120 is disclosed by Baker et al, in U.S. Pat. No. 4,783,803, incorporated by reference herein.
- Recognizer Software 120 processes the incoming compressed, digitized audio, and compares each utterance of the user to prestored utterance macros. If the user utterance matches a prestored utterance macro, the utterance is recognized, and a command string 121 corresponding to the recognized utterance is delivered to a text buffer 126 .
- Command strings 121 delivered from the Recognizer Software represent commands to be issued to the Macintosh operating system (e.g., menu selections to be made or text to be displayed), or internal
- the Recognizer Software 120 compares the incoming samples of an utterance with macros in a voice file 122 . (The system requires the user to space apart his utterances briefly so that the system can recognize when each utterance ends.) The voice file macros are created by a “training” process, described below. If a match is found (as judged by the recognition algorithm of the Recognizer Software 120 ), a Voice Control command string from a word list 124 (which has been directly associated with voice file 122 ) is fetched and sent to text buffer 126 .
- command strings in text buffer 126 are relayed to Voice Control driver 128 , which drives a Voice Control interpreter 130 in response to the strings.
- a command string 121 may indicate an internal command 123 , such as a command to the Recognizer Software to “learn” new voice file macros, or to adjust the sensitivity of the recognition algorithm.
- Voice Control interpreter 130 sends the appropriate internal command 123 to the Recognizer Software 120 .
- the command string may represent an operating system manipulation, such as a mouse movement.
- Voice Control interpreter 130 produces the appropriate action by interacting with the Macintosh operating system 132 .
- Each application or desktop accessory is associated with a word list 124 and a corresponding voice file 122 ; these are loaded by the Recognition Software when the application or desktop accessory is opened.
- the voice files are generated by the Recognizer Software 120 in its “learn” mode, under the control of internal commands from the Voice Control driver 128 .
- the word lists are generated by the Language Maker desktop accessory 140 , which creates “languages” of utterance names and associated Voice Control command strings, and converts the languages into the word lists.
- Voice Control command strings are strings such as “ESC”, “TEXT”, “@MENU(font,2)”, and belong to a Voice Control command set, the syntax of which will be described later and is set forth in Appendix A.
- the Voice Control and Language Maker software includes about 30,000 lines of code, most of which is written in the C language, the remainder being written in assembly language. A listing of the Voice Control and Language Maker software is provided in microfiche as appendix C.
- the Voice Control software will operate on a Macintosh Plus or later models, configured with a minimum of 1 Mbyte RAM (2 Mbyte for HyperCard and other large applications), a Hard Disk, and with Macintosh operating system version 6.01 or later.
- Macintosh operating system 132 is “event driven”.
- the operating system maintains an event queue (not shown); input devices such as the mouse 134 or the keyboard 136 “post” events to this queue to cause the operating system to, for example, create the appropriate text entry, or trigger a mouse movement.
- the operating system 132 then, for example, passes messages to Macintosh applications (such as MacWrite 139 ) or to desktop accessories (such as Calculator 137 ) indicating events on the queues (if any).
- Voice Control interpreter 130 likewise controls the operating system (and hence the applications and desktop accessories which are currently running) by posting events to the operating system queues.
- the events posted by the Voice Control interpreter typically correspond to mouse activity or to keyboard keystrokes, or both, depending upon the voice commands.
- the Voice Navigator system 102 provides an additional user interface.
- the “voice” events may comprise text strings to be displayed or included with text being processed by the application program.
- the Recognizer Software 120 may be trained to recognize an utterance of a particular user and to associate a corresponding text string with each utterance.
- the Recognizer Software 120 displays to the user a menu of the utterance names (such as “file”, “page down”) which are to be recognized. These names, and the corresponding Voice Control command strings (indicating the appropriate actions) appear in a current word list 124 .
- the user designates the utterance name of interest and then is prompted to speak the utterance corresponding to that name. For example, if the utterance name is “file”, the user might utter “FILE” or “PLEASE FILE”.
- the digitized samples from the Voice Navigator box 112 corresponding to that utterance are then used by the Recognizer Software 120 to create a “macro” representing the utterance, which is stored in the voice file 122 and subsequently associated with the utterance name in the word list 124 .
- the utterance is repeated more than once, in order to create a macro for the utterance that accommodates variation in a particular speaker's voice.
- the meaning of the spoken utterance need not correspond to the utterance name, and the text of the utterance name need not correspond to the Voice Control command strings stored in the word list.
- the user may wish a command string that causes the operating system to save a file to have the utterance name “save file”; the associated command string may be “@MENU(file,2)”; and the utterance that the user trains for this utterance name may be the spoken phrase “immortalize”.
- the Recognizer Software and Voice Control cause that utterance, name, and command string to be properly associated in the voice file and word list 124 .
- the word lists 124 used by the Voice Navigator are created by the Language Maker desk accessory 140 running under the operating system.
- Each word list 124 is hierarchical, that is, some utterance names in the list link to sub-lists of other utterance names. Only the list of utterance names at a currently active level of the hierarchy can be recognized. (In the current embodiment, the number of utterance names at each level of the hierarchy can be as large as 1000.)
- some utterances such as “file”, may summon the file menu on the screen, and link to a subsequent list of utterance names at a lower hierarchical level.
- the file menu may list subsequent commands such as “save”, “open”, or “save as”, each associated with an utterance.
- Language Maker enables the user to create a hierarchical language of utterance names and associated command strings, rearrange the hierarchy of the language, and add new utterance names. Then, when the language is in the form that the user desires, the language is converted to a word list 124 . Because the hierarchy of the utterance names and command strings can be adjusted, when using the Voice Navigator system the user is not bound by the preset menu hierarchy of an application. For example, the user may want to create a “save” command at the top level of the utterance hierarchy that directly saves a file without first summoning the file menu. Also, the user may, for example, create a new utterance name “goodbye”, that saves a file and exits all at once.
- Each language created by Language Maker 140 also contains the command strings which represent the actions (e.g. clicking the mouse at a location, typing text on the screen) to be associated with utterances and utterance names.
- the user does not specify the command strings to describe the actions he wishes to be associated with an utterance and utterance name. In fact, the user does not need to know about, and never sees, the command strings stored in the Language Maker language or the resulting word list 124 .
- a “record” mode to associate a series of actions with an utterance name, the user simply performs the desired actions (such as typing the text at the keyboard, or clicking the mouse at a menu). The actions performed are converted into the appropriate command strings, and when the user turns off the record mode, the command strings are associated with the selected utterance name.
- the user can cause the creation of a language by entering utterance names by typing the names at the keyboard 142 , by using a “create default text” procedure 146 (to parse a text file on the clipboard, in which case one utterance name is created for each word in the text file, and the names all start at the same hierarchical level), or by using a “create default menus” procedure (to parse the executable code 144 for an application, and create a set of utterance names which equal the names of the commands in the menus of the application, in which case the initial hierarchy for the names is the same as the hierarchy of the menus in the application).
- a “create default text” procedure 146 to parse a text file on the clipboard, in which case one utterance name is created for each word in the text file, and the names all start at the same hierarchical level
- a “create default menus” procedure to parse the executable code 144 for an application, and create a set of utterance names which equal the names of
- the names are typed at the keyboard or created by parsing a text file, the names are initially associated with the keystrokes which, when typed at the keyboard, produce the name. Therefore, the name “text” would be initially be associated with the keystrokes t-e-x-t. If the names are created by parsing the executable code 144 for an application, then the names are initially associated with the command strings which execute the corresponding menu commands for the application. These initial command strings can be changed by simply selecting the utterance name to be changed and putting Language Maker into record mode.
- the output of Language Maker is a language file 148 .
- This file contains the utterance names and the corresponding command strings.
- the language file 148 is formatted for input to a VOCAL compiler 150 (available from Dragon Systems), which converts the language file into a word list 124 for use with the Recognition Software.
- the syntax of language files is specified in the Voice Navigator Developer's Reference Manual, provided as Appendix D, and incorporated by reference.
- a macro 147 of each learned utterance is stored in the voice file 122 .
- a corresponding utterance name 149 and command string 151 are associated with one another and with the utterance and are stored in the word list 124 .
- the word list 124 is created and modified by Language Maker 140
- the voice file 122 is created and modified by the Recognition Software 120 in its learn mode, under the control of the Voice Control driver 128 .
- the Voice Navigator hardware box 152 includes an analog-to-digital (A/D) converter 154 for converting the analog signal from the microphone into a digital signal for processing, a DSP section 156 for filtering and compacting the digitized signal, a SCSI manager 158 for communication with the Macintosh, and a microphone control section 160 for controlling the microphone.
- A/D analog-to-digital
- the Voice Navigator system also includes the Recognition Software voice drivers 120 which include routines for utterance detection 164 and command execution 166 .
- the voice drivers For utterance detection 164 , the voice drivers periodically poll 168 the Voice Navigator hardware to determine if an utterance is being received by Voice Navigator box 152 , based on the amplitude of the signal received by the microphone.
- the voice drivers create a speech buffer of encoded digital samples (tokens) to be used by the command execution drivers 166 .
- the recognition drivers can learn new utterances by token-to-terminal conversion 174 .
- the token is converted to a macro for the utterance, and stored as a terminal in a voice file 122 (FIG. 1).
- Recognition and pattern matching 172 is also performed on command by the voice drivers.
- a stored token of incoming digitized samples is compared with macros for the utterances in the current level of the recognition hierarchy. If a match is found, terminal to output conversion 176 is also performed, selecting the command string associated with the recognized utterance from the word list 124 (FIG. 1).
- State management 178 such as changing of sensitivity controls, is also performed on command by the voice drivers.
- the voice Control driver 128 forms an interface 182 to the voice drivers 120 through control commands, an interface 184 to the Macintosh operating system 132 (FIG. 1) through event posting and operating system hooks, and an interface 186 to the user through display menus and prompts.
- the interface 182 to the drivers allows Voice Control access to the Voice Driver command functions 166 .
- This interface allows Voice Control to monitor 188 the status of the recognizer, for example to check for an utterance token in the utterance queue buffered 170 to the Macintosh. If there is an utterance, and if processor time is available, Voice Control issues command sdi_recognize 190 , calling the recognition and pattern match routine 172 in the voice drivers.
- the interface to the drivers may issue command sdi_output 192 which controls the terminal to output conversion routine 176 in the voice drivers, converting a recognized utterance to an command string for use by Voice Control.
- the command string may indicate mouse or keystroke events to be posted to the operating system, or may indicate commands to Voice Control itself (e.g. enabling or disabling Voice Control).
- Voice Control is simply a Macintosh driver with internal parameters, such as sensitivity, and internal commands, such as commands to learn new utterances.
- the actual processing which the user perceives as Voice Control may actually be performed by Voice Control, or by the voice Drivers, depending upon the function. For example, the utterance learning procedures are performed by the Voice Drivers under the control of Voice Control.
- the interface 184 to the Macintosh operating system allows Voice Control, where appropriate, to manipulate the operating system (e.g., by posting events or modifying event queues).
- the macro interpreter 194 takes the command strings delivered from the voice drivers via the text buffer and interprets them to decide what actions to take. These commands may indicate text strings to be displayed on the display or mouse movements or menu selections to be executed.
- Voice Control In the interpretive execution of the command strings, Voice Control must manipulate the Macintosh event queues. This task is performed by OS event management 196 . As discussed above, voice events may simulate events which are ordinarily associated with the keyboard or with the mouse. Keyboard events are handled by OS event management 196 directly. Mouse events are handled by mouse handler 198 . Mouse events require an additional level of handling because mouse events can require operating system manipulation outside of the standard event post routines which are accomplished by the OS event management 196 .
- the main interface into the Macintosh operating system 132 is event based, and is used in the majority of the commands which are voice recognized and issued to the Macintosh. However, there are other “hooks” to the operating system state which are used to control parameters such as mouse placement and mouse motion. For example, as will be discussed later, pushing the mouse button down generates an event, however, keeping the mouse button pushed down and dragging the mouse across a menu requires the use of an operating system hook. For reference, the operating system hooks used by the voice Navigator are listed in Appendix B.
- the operating system hooks are implemented by the trap filters 200 , which are filters used by Voice Control to force the Macintosh operating system to accept the controls implemented by OS event management 196 and mouse handler 198 .
- the Macintosh operating system traps are held in Macintosh read only memories (ROMs), and implement high level commands for controlling the system. Examples of these high level commands are: drawing a string onto the screen, window zooming, moving windows to the front and back of the screen, and polling the status of the mouse button. In order for the Voice Control driver to properly interface with the Macintosh operating system it must control these operating system traps to generate the appropriate events.
- ROMs Macintosh read only memories
- Voice Control To generate menu events, for example, Voice Control “seizes” the menu select trap (i.e. takes control of the trap from the operating system). Once Voice Control has seized the trap, application requests for menu selections are forwarded to Voice Control. In this way Voice Control is able to modify, where necessary, the operating system output to the program, thereby controlling the system behavior as desired.
- the interface 186 to the user provides user control of the Voice Control operations.
- Prompts 202 display the name of each recognized utterance on the Macintosh screen so that the user may determine if the proper utterance has been recognized.
- On-line training 204 allows the user to access, at any time while using the Macintosh, the utterance names in the word list 124 currently in use. The user may see which utterance names have been trained and may retrain the utterance names in an on-line manner (these functions require Voice Control to use the Voice Driver interface, as discussed above).
- User options 206 provide selection of various Voice Control settings, such as the sensitivity and confidence level of the recognizer (i.e., the level of certainty required to decide that an utterance has been recognized). The optimal values for these parameters depend upon the microphone in use and the speaking voice of the user.
- the interface 186 to the user does not operate via the Macintosh event interface. Rather, it is simply a recursive loop which controls the Recognition Software and the state of the Voice Control driver.
- Language Maker 140 includes an application analyzer 210 and an event recorder 212 .
- Application analyzer 210 parses the executable code of applications as discussed above, and produces suitable default utterance names and pre-programmed command strings.
- the application analyzer 210 includes a menu extraction procedure 214 which searches executable code to find text strings corresponding to menus.
- the application analyzer 210 also includes control identification procedures 216 for creating the command strings corresponding to each menu item in an application.
- the event recorder 212 is a driver for recording user commands and creating command strings for utterances. This allows the user to easily create and edit command strings as discussed above.
- Types of events which may be entered into the event recorder include: text entry 218 , mouse events 220 (such as clicking at a specified place on the screen), special events 222 which may be necessary to control a particular application, and voice events 224 which may be associated with operations of the Voice Control driver.
- the Language Maker main event loop 230 is similar in structure to main event loops used by other desk accessories in the Macintosh operating system. If a desk accessory is selected from the “Apple” menu, an “open” event is transmitted to the accessory. In general, if the application in which it resides quits or if the user quits it using its menus, a “close” event is transmitted to the accessory. Otherwise, the accessory is transmitted control events. The message parameter of a control event indicates the kind of event. As seen in FIG. 4, the Language Maker main event loop 230 begins With an analysis 232 of the event type.
- Language Maker tests 234 whether it is already opened. If Language Maker is already opened 236 , the current language (i.e. the list of utterance names from the current word list) is displayed and Language Maker returns 237 to the operating system. If Language Maker is not open 238 , it is initialized and then returns 239 to the operating system.
- Language Maker prompts the user 240 to save the current language as a language file. If the user commands Language Maker to save the current language, the current language is converted by the Write Production module 242 to a language file, and then Language Maker exits 244 . If the current language is not saved, Language Maker exits directly.
- the way in which Language Maker responds to the event depends upon the mode that Language Maker is in, because Language Maker has a utility for recording events (i.e. the mouse movements and clicks or text entry that the user wishes to assign to an utterance), and must record events which do not involve the Language Maker window. However, when not recording, Language Maker should only respond to events in its window. Therefore, Language Maker may respond to events in one mode but not in another.
- Events i.e. the mouse movements and clicks or text entry that the user wishes to assign to an utterance
- a control event 246 is forwarded to one of three branches 248 , 250 , 252 . All menu events are forwarded to the accMenu branch 252 . (Only menu events occurring in desk accessory menus will be forwarded to Language Maker.) All window events for the Language Maker window are forwarded to the accEvent branch 250 . All other events received by Language Maker, which correspond to events for desktop accessories or applications other than Language Maker, initiate activity in the accRun branch 248 , to enable recording of actions.
- Language Maker can record dialog events (i.e. events which involve modal dialog, where the user cannot do anything except respond to the actions in modal dialog boxes). To accomplish this, the user must be able to produce actions (i.e. mouse clicks, menu selections) in the current application so that the dialog boxes are prompted to the screen. Then the user can initialize recording and respond to the dialog boxes. When modal dialog boxes should be produced, events received by Language Maker are also forwarded to the operating system. Otherwise, events are not forwarded to the operating system. Language Maker's modal dialog recording is performed by the Run Modal module 260 .
- the menu indicated by the desk accessory menu event is checked 266 . If the event occurred in the Language Maker menu, it is forwarded to the Do My Menu module 268 . Other events are ignored 270 .
- the Run Edit module 262 performs a loop 272 , 274 . Each action is recorded by the Record Actions submodule 272 . If there are more actions in the event queue then the loop returns to the Record Actions submodule. If a cancel action appears 276 in the event queue then Run Edit returns 277 without updating the current language in memory. Otherwise, if the events are completed successfully, run edit updates the language in memory and turns off recording 278 and returns to the operating system 280 .
- the Record Actions submodule 272 actions performed by the user in record mode are recorded.
- the event is checked by record actions.
- Each non-null event i.e. each action
- Record Actions First, the type of action is checked 282 . If the action selects a menu 284 , then the selected menu is recorded. If the action is a mouse click 286 , the In Button? routine (see FIG. 8) checks if the click occurred inside of a button (a button is a menu selection area in the front window) or not. If so, the button is recorded 288 . If not, the location of the click is recorded 290 .
- actions are recorded by special handlers. These actions include group actions 292 , mouse down actions 294 , mouse up actions 296 , zoom actions 298 , grow actions 300 , and next window actions 302 .
- Some actions in menus can create pop-up menus with subchoices. These actions are handled by popping up the appropriate pop-up menu so that the user may select the desired subchoice. Move actions 304 , pause actions 306 , scroll actions 308 , text actions 310 and voice actions 312 pop up respective menus and Record Actions checks 314 for the menu selection made by the user (with a mouse drag). If no menu selection is made, then no action is recorded 316 . Otherwise, the choice is recorded 318 .
- Other actions may launch applications.
- the selected application is determined. If no application has been selected then no action is recorded 322 , otherwise the selected application is recorded 324 .
- Run Modal procedure 260 allows recording of the modal dialogs of the Macintosh computer. During modal dialogs, the user cannot do anything except respond to the actions in the modal dialog box. In order to record responses to those actions, Run Modal has several phases, each phase corresponding to a step in the recording process.
- Run Modal prompts the user with a Language Maker dialog box that gives the user the options “record” and “cancel” (see FIG. 25). The user may then interact with the current application until arriving at the dialog click that is to be recorded.
- all calls to Run Modal are routed through Select Dialog 326 , which produces the initial Language Maker dialog box, and then returns 327 , ignoring further actions.
- the In Button? procedure 286 determines whether a mouse click event occurred on a button.
- In Button? gets the current window control list 342 (a Macintosh global which contains the locations of all of the button rectangles in the current window, refer to Appendix B) from the operating system and parses the list with a loop 344 - 350 . Each control is fetched 350 , and then the rectangle of the control is found 346 . Each rectangle is analyzed 348 to determine if the click occurred in the rectangle. If not, the next control is fetched 350 , and the loop recurses. If, 344 , the list is emptied then the click did not occur on a button, and no is returned 352 .
- the current window control list 342 a Macintosh global which contains the locations of all of the button rectangles in the current window, refer to Appendix B
- Event Handler module 264 deals with standard Macintosh events in the Language Maker display window.
- the Language Maker display window lists the utterance names in the current language.
- Event Handler determines 358 whether the event is a mouse or keyboard event and subsequently performs the proper action on the Language Maker window.
- Mouse events include: dragging the window 360 , growing the window 362 , scrolling the window 364 , clicking on the window 368 (which selects an utterance name), and dragging on the window 370 (which moves an utterance name from one location on the screen to another, potentially changing the utterance's position in the language hierarchy). Double-clicking 366 on an utterance name in the window selects that utterance name for action recording, and therefore starts the Run Edit module.
- Keyboard events include the standard cut 372 , copy 374 , and paste 376 routines, as well as cursor movements down 380 , up 382 , right 384 , and left 386 . Pressing return at the keyboard 378 , as with a double click at the mouse, selects the current utterance name for action recording by Run Edit. After the appropriate command handler is called, Event Handler returns 388 . The modifications to the language hierarchy performed by the Event Handler module are reflected in hierarchical structure of the language file produced by the Write Production module during close and save operations.
- the Do My Menu module 268 controls all of the menu choices supported by Language Maker. After summoning the appropriate submodule (discussed in detail in FIGS. 11A through 111), Do My Menu returns 408 .
- the New submodule 390 creates a new language.
- the New submodule first checks 410 if Language Maker is open. If so, it prompts the user 412 to save the current language as a language file. If the user saves the current language, New calls Write Production module 414 to save the language. New then calls Create Global Words 416 and forms a new language 418 . Create Global Words 416 will automatically enter a few global (i.e. resident in all languages) utterance names and command strings into the new language.
- utterance names and command strings allow the user to make Voice Control commands, and correspond to utterances such as “show me the active words” and “bring up the voice options” (the utterance macros for the corresponding voice file are trained by the user, or copied from an existing voice file, after the new language is saved).
- the Open submodule 392 opens an existing language for modification.
- the Open submodule 392 checks 420 if Language Maker is open. If so, it prompts the user 422 to save the current language, calling Write Production 424 if yes. open then prompts the user to open the selected language 426 . If the user cancels, Open returns 428 . Otherwise, the language is loaded 430 and Open returns 432 .
- the Save submodule 394 saves the current language in memory as a language file. Save prompts the user to save the current language 434 . If the user cancels, Save returns 436 , otherwise, Save calls Write Production 438 to convert the language into a state machine control file suitable for use by VOCAL (FIG. 2). Finally, Save returns 440 .
- the New Action submodule 396 initializes the event recorders to begin recording a new sequence of actions.
- New Action initializes the event recorder by displaying an action window to the user 442 , setting up a tool palette for the user to use, and initializing recording of actions. Then New Action returns 444 . After New Action is started, actions are not delivered to the operating system directly; rather they are filtered through Language Maker.
- the Record Dialog submodule 398 records responses to dialog boxes through the use of the Run Modal module. Record Dialog 398 gives the user a way to record actions in modal dialog; otherwise the user would be prevented from performing the actions which bring up the dialog boxes. Record Dialog displays 446 the dialog action window (see FIG. 25) and turns recording on. Then Record Dialog returns 448 .
- the Create Default Menus submodule 400 extracts default utterance names (and generates associated command strings) from the executable code for an application.
- Create Default Menus 270 is ordinarily the first choice selected by a user when creating a language for a particular application.
- This submodule looks at the executable code of an application and creates an utterance name for each menu command in the application, associating the utterance name with a command string that will select that menu command.
- a first loop 452 , 456 , 458 , 460 locates the current (X th ) menu handle 456 , initializes menu parsing, checks if the current menu is fully parsed 458 , and reiterates by updating the current menu to the next menu.
- a second loop 458 , 462 , 464 finds each menu name 462 , and checks 464 if the name is hierarchical (i.e. if the name points to further menus). If the names are not hierarchical, the loop recurses. Otherwise, the hierarchical menu is fetched 466 , and a third loop 470 , 472 starts. In the third loop, each item name in the hierarchical menu is fetched 472 , and the loop checks if all hierarchical item names have been fetched 470 .
- the Create Default Text submodule 402 allows the user to convert a text file on the clipboard into a list of utterance names.
- Create default text 402 creates an utterance name for each unique word in the clipboard 474 , and then returns 476 .
- the utterance names are associated with the keyboard entries which will type out the name. For example, a business letter can be copied from the clipboard into default text. Utterances would then be associated with each of the common business terms in the letter. After ten or twelve business letters have been converted the majority of the business letter words would be stored as a set of utterances.
- the Alphabetize Group submodule 404 allows the user to alphabetize the utterance names in a language.
- the selected group of names (created by dragging the mouse over utterance names in the Language Maker window) is alphabetized 478 , and then Alphabetize Group returns 480 .
- the Preferences submodule 406 allows the user to select standard graphic user interface preferences such as font style 482 and font size 484 .
- the Preferences submenu 486 allows the user to state the metric by which mouse locations of recorded actions are stored. The coordinates for mouse actions can be relative to the global window coordinates or relative to the application window coordinates. In the case where application menu selections are performed by mouse clicks, the mouse clicks must always be in relative coordinates so that the window may be moved on the screen without affecting the function of the mouse click.
- the Preferences submenu 486 also determines whether, when a mouse action is recorded, the mouse is left at the location of a click or returned to its original location after a click.
- the user is prompted whether he wants to update the current preference settings for Language Maker. If so, the file is updated 490 and Preferences returns 492 . If not, Preferences returns directly to the operating system 494 without saving.
- the Write Production module 242 is called when a file is saved.
- Write Production saves the current language and converts it from an outline processor format such as that used in the Language Maker application to a hierarchical text format suitable for use with the state machine based Recognition Software.
- Language files are associated with applications and new language files can be created or edited for each additional application to incorporate the various commands of the application into voice recognition.
- the embodiment of the Write Production module depends upon the Recognition Software in use. In general, the Write Production module is written to convert the current language to suitable format for the Recognition Software in use.
- the particular embodiment of Write Production shown in FIG. 12 applies to the syntax of the VOCAL compiler for the Dragon Systems Recognition Software.
- Write Production checks 512 for sublevels in the language. If no sub-levels exist, Write Production returns 514 . Otherwise, the sub-levels are processed by another call 516 to Write Production on the sub-level of the language. After the sub-level is processed, write Production writes the string “)” and returns 518 .
- the Write Terminal submodule 496 writes each utterance name and the associated command string to the language file.
- Write Terminal checks 520 if it is at a terminal. If not, it returns 530 . Otherwise, Write Terminal writes 522 the string corresponding to the utterance name to the language file.
- Write Terminal writes the command string (i.e. “output”) to the language file.
- Write Terminal writes 528 the string “;” to the language file and returns 530 .
- the Voice Control software serves as a gate between the operating system and the applications running on the operating system. This is accomplished by setting the Macintosh operating system's get_next_event procedure equal to a filter procedure created by Voice Control.
- the get_next_event procedure runs when each next_event request is generated by the operating system or by applications. Ordinarily the get_next_event procedure is null, and next_event requests go directly to the operating system.
- the filter procedure passes control to Voice Control on every request. This allows Voice Control to perform voice actions by intercepting mouse and keyboard events, and create new events corresponding to spoken commands.
- the get_next event filter procedure 540 is called before an event is generated by the operating system.
- the event is first checked 54 Z to see if it is a null event. If so, the Process Input module 544 is called directly.
- the Process Input routine 544 checks for new speech input and processes any that has been received.
- the Voice Control driver proceeds through normal filter processing 546 (i.e., any filter processing caused by other applications) and returns 548 . If the next event is not a null event, then displays are hidden 550 . This allows Voice Control to hide any Voice Control displays (such as current language lists) which could have been generated by a previous non-null action.
- any prompt windows have been produced by Voice Control, when a non-null event occurs, the prompt windows are hidden.
- key down events are checked 552 . Because the recognizer is controlled (i.e. turned on and off) by certain special key down events, if the event is a key down event then Voice Control must do further processing. Otherwise, the Voice Control drive procedure moves directly to Process Input 544 . If a key down event has occurred 554 , where appropriate, software latches which control the recognizer are set. This allows activation of the Recognizer Software, the selection of Recognizer options, or the display of languages. Thereafter, the Voice Control driver moves to Process Input 544 .
- the Process Input routine is the heart of the Voice Control driver. It-manages all voice input for the Voice Navigator.
- the Process Input module is called each time an event is processed by the operating system.
- First 546 any latches which need to be set are processed, and the Macintosh waits for a number of delay ticks, if necessary. Delay ticks are included, for example, where a menu drag is being performed by Voice Control, to allow the menu to be drawn on the screen before starting the drag. Also, some applications require delay between mouse or keyboard events.
- recognition is activated 548 the process input routine proceeds to do recognition 562 . If recognition is deactivated, Process Input returns 560 .
- the recognition routine 562 prompts the recognition drivers to check for an utterance (i.e., sound that could be speech input). If there is recognized speech input 564 , Process Input checks the vertical blanking interrupt VBL handler 566 , and deactivates it where appropriate.
- the vertical blanking interrupt cycle is a very low level cycle in the operating system. Every time the screen is refreshed, as the raster is moving from the bottom right to the top left of the screen, the vertical blanking interrupt time occurs. During this blanking time, very short and very high priority routines can be executed. The cycle is used by the Process Input routine to move the mouse continuously by very slowly incrementing of the mouse coordinates where appropriate. To accomplish this, mouse move events are installed onto the VBL queue. Therefore, where appropriate, the VBL handler must be deactivated to move the mouse.
- the Recognize submodule 562 checks for encoded utterances queued by the Voice Navigator box, and then calls the recognition drivers to attempt to recognize any utterances. Recognize returns the number of commands in (i.e. the length of) the command string returned from the recognizer. If, 572 , no utterance is returned from the recognizer, then Recognize returns a length of zero ( 574 ), indicating no recognition has occurred. If an utterance is available, then Recognize calls sdi_recognize 576 , instructing the Recognizer Software to attempt recognition on the utterance. If, 578 , recognition is successful, then the name of the utterance is displayed 582 to the user.
- any close call windows i.e. windows associated with close call choices, prompted by Voice Control in response to the Recognizer Software
- any close call windows are cleared from the display. If recognition is unsuccessful, the Macintosh beeps 580 and zero length is returned 574 .
- Recognize searches 584 for an output string associated with the utterance. If there is an output string, recognize checks if it is asleep 586 . If it is not asleep 590 , the output count is set to the length of the output string and, if the command is a control command 592 (such as “go to sleep” or “wake up”), it is handled by the Process Voice Commands routine 594 .
- a control command 592 such as “go to sleep” or “wake up”
- the Process Voice Commands module deals with commands that control the recognizer.
- the module may perform actions, or may flag actions to be performed by the Process States block 596 (FIG. 16). If the recognizer is put to sleep 600 or awakened 604 , the appropriate flags are set 602 , 606 , and zero is returned 626 , 628 for the length of the command string, indicating to Process States to take no further actions. Otherwise, if the command is scratch_that 608 (ignore last utterance), first_level 612 (go to top of language hierarchy, i.e.
- the ProcessQ module 570 pulls speech input from the speech queue and processes it. If, 630 , the event queue is empty then ProcessQ may proceed, otherwise ProcessQ aborts 632 because the event queue may overflow if speech events are placed on the queue along with other events. If, 634 , the speech queue has any events then process queue checks to see if, 636 , delay ticks for menu drawing or other related activities have expired. If no events are on the speech queue the ProcessQ aborts 636 . If delay ticks have expired, then ProcessQ calls Get Next 642 and returns 644 . Otherwise, if delay ticks have not expired, ProcessQ aborts 640 .
- the Get Next submodule 642 gets characters from the speech queue and processes them. If, 646 , there are no characters in the speech queue then the procedure simply returns 648 . If there are characters in the speech queue then Get Next checks 650 to see if the characters are command characters. If they are, then Get Next calls Check Command 660 . If not, then the characters are text, and Get Next sets the meta bits 652 where appropriate.
- the meta bits are used as flags for conditioning keystrokes such as the condition key, the option key, or the command key. These keys condition the character pressed at the keyboard and create control characters. To create the proper operating system events, therefore, the meta bits must be set where necessary.
- a key down event is posted 654 to the Macintosh event queue, simulating a keypush at the keyboard.
- a key up is posted 656 to the event queue, simulating a key up. If, 658 , there is still room in the event queue, then further speech characters are obtained and processed 646 . If not, then the Get Next procedure returns 676 .
- the command string input corresponds to a command rather than simple key strokes
- the string is handled by the Check Command procedure 660 as illustrated in FIG. 19.
- the next four characters from the speech queue (four characters is the length of all command strings, see Appendix A) are fetched 662 and compared 664 to a command table. If, 666 , the characters equal a voice command, then a command is recognized, and processing is continued by the Handle Command routine 668 . Otherwise, the characters are interpreted as text and processing returns to the meta bits step 652 .
- each command is referenced into a table of command procedures by first computing 670 the command handler offset into the table and then referencing the table, and calling the appropriate command handler 672 . After calling the appropriate command handler, Get Next exits the Process Input module directly 674 (the structure of the software is such that a return from Handle Command would return to the meta bits step 652 , which would be incorrect).
- FIG. 20 The command handlers available to the Handle Command routine are illustrated in FIG. 20. Each command handler is detailed by a flow diagram in FIGS. 21A through 21G. The syntax for the commands is detailed in Appendix A.
- the Menu command will pull down a menu, for example, @MENU(apple,0) (where apple is the menu number for the apple menu) will pull down the apple menu.
- Menu command will also select an item from the menu, for example, @MENU(apple,calculator) (where calculator is the itemnumber for the calculator in the apple menu) will select the calculator from the apple menu.
- Menu command initializes by running the Find Menu routine 678 which queues the menu id and the item number for the selected menu. (If the item number in the menu is 0 then Find Menu simply clicks on the menu bar.) After Find Menu returns, if 680 , there are no menus queued for posting, the Menu command simply returns 690 .
- the Menu Select trap is set equal to the My Menu Select routine 692 .
- the cursor coordinates are hidden 684 so that the mouse cannot be seen as it moves on the screen.
- the mouse down occurs on the menu bar the Macintosh operating system generates a menu event for the application.
- Each application receiving a menu event requests service from the operating system to find out what the menu event is. To do this the application issues a Menu Select trap.
- the menu select trap then places the location of the mouse on the stack.
- Menu Command sets 688 the wait ticks to 30, which gives the operating system time to draw the menu, and returns 690 .
- the menuselect global state is reset 694 to clear any previously selected menus, and the desired menu id and the item number are moved to the Macintosh stack 696 , thus selecting the desired menu item.
- the Find Menu routine 700 collects 702 the command parameters for the desired menu. Next, the menuname is compared 704 to the menu name list. If, 706 , there is no menu with the name “menuname”, Find Menu exits 708 . Otherwise, Find Menu compares 710 the itemname to the names of the items in the menu. If, 712 , the located item number is greater than 0, then Find Menu queues 718 the menu id and item number for use by Menu command, and returns 720 . Otherwise, if the item number is 0 then Find Menu simply sets 714 the internal Voice Control flags “mousedown” and “global” flags to true. This indicates to Voice Control that the mouse location should be globally referenced, and that the mouse button should be held down. Then Find Menu calls 716 the Post Mouse routine, which references these flags to manipulate the operating system's mouse state accordingly.
- the Control command 722 performs a button push within a menu, invoking actions such as the save command in the file menu of an application.
- the Control command gets the command parameters 724 from the control string, finds the front window 726 , gets the window command list 728 , and checks 730 if the control name exists in the control list. If the control name does exist in the control list then the control rectangle coordinates are calculated 732 , the Post Mouse routine 734 clicks the mouse in the proper coordinates, and the Control command returns 736 . If the control name is not found, the Control command returns directly.
- the Keypad command 738 simulates numerical entries at the Macintosh keypad. Keypad finds the command parameters for the command string 740 , gets the keycode value 742 for the desired key, posts a key down event 744 to the Macintosh event queue, and returns 746 .
- the Zoom command 748 zooms the front window. Zoom obtains the front window pointer 750 in order to reference the mouse to the front window, calculates the location of the zoom box 752 , uses Post Mouse to click in the zoom box 754 , and returns 756 .
- the Local Mouse command 758 clicks the mouse at a locally referenced location.
- Local Mouse obtains the command parameters for the desired mouse location 760 , uses Post Mouse to click at the desired coordinate 762 , and returns 764 .
- the Global Mouse command 766 clicks the mouse at a globally referenced location.
- Global Mouse obtains the command parameters for the desired mouse location 768 , sets the global flag to true 770 (to signal to Post Mouse that the coordinates are global), uses Post Mouse to click at the desired coordinate 772 , and returns 774 .
- Double Click command double clicks the mouse at a locally referenced location. Double Click obtains the command parameters for the desired mouse location 778 , calls Post Mouse twice 780 , 782 (to click twice in the desired location), and returns 784 .
- Mouse Down command 786 sets the mouse button down.
- Mouse Down sets the mousedown flag to true 788 (to signal to Post Mouse that mouse button should be held down), uses Post Mouse to set the button down 790 , and returns 792 .
- Mouse Up command 794 sets the mouse button up.
- Mouse Up sets the mbState global (see Appendix B) to Mouse Button UP 796 (to signal to the operating system that mouse button should be set up), posts a mouse up event to the Macintosh event queue 798 (to signal to applications that the mouse button has gone up), and returns 800 .
- the Screen Down command 802 scrolls the contents of the current window down.
- Screen Down first looks 804 for the vertical scroll bat in the front window. If, 806 , the scroll bar is not found, Screen Down simply returns 814 . If the scroll bar is found, Screen Down calculates the coordinates of the down arrow 808 , sets the mousedown flag to true 810 (indicating to Post Mouse that the mouse button should be held down), uses Post Mouse to set the mouse button down 812 , and returns 814 .
- the Screen Up command 816 scrolls the contents of the current window up. Screen Up first looks 818 for the vertical scroll bar in the front window. If, 820 , the scroll bar is not found, Screen Up simply returns 828 . If the scroll bar is found, Screen Up calculates the coordinates of the up arrow 822 , sets the mousedown flag to true 824 (indicating to Post Mouse that the mouse button should be held down), uses Post Mouse to set the mouse button down 826 , and returns 828 .
- the Screen Left command 830 scrolls the contents of the current window left.
- Screen Left first looks 832 for the horizontal scroll bar in the front window. If, 834 , the scroll bar is not found, Screen Left simply returns 842 . If the scroll bar is found, Screen Left calculates the coordinates of the left arrow 836 , sets the mousedown flag to true 838 (indicating to Post Mouse that the mouse button should be held down), uses Post Mouse to set the mouse button down 840 , and returns 842 .
- the Screen Right command 844 scrolls the contents of the current window right.
- Screen Right first looks 846 for the horizontal scroll bar in the front window. If, 848 , the scroll bar is not found, Screen Right simply returns 856 . If the scroll bar is found, Screen Right calculates the coordinates of the right arrow 850 , sets the mousedown flag to true 852 (indicating to Post Mouse that the mouse button should be set down), uses Post Mouse to set the mouse button down 854 , and returns 856 .
- the Page Down command 858 moves the contents of the current window down a page.
- Page Down first looks 860 for the vertical scroll bar in the front window. If, 862 , the scroll bar is not found, Page Down simply returns 868 . If the scroll bar is found, Page Down calculates the page down button coordinates 864 , uses Post Mouse to click the mouse button down 866 , and returns 868 .
- Page Up command 870 moves the contents of the current window up a page. Page Up first looks 872 for the vertical scroll bar in the front window. If, 874 , the scroll bar is not found, Page Up simply returns 880 . If the scroll bar is found, Page Up calculates the page up button coordinates 876 , uses Post Mouse to click the mouse button down 878 , and returns 880 .
- Page Left command 882 moves the contents of the current window left a page.
- Page Left first looks 884 for the horizontal scroll bar in the front window. If, 886 , the scroll bar is not found, Page Left simply returns 892 . If the scroll bar is found, Page Left calculates the page left button coordinates 888 , uses Post Mouse to click the mouse button down 890 , and returns 892 .
- Page Right command 894 moves the contents of the current window right a page.
- Page Right first looks 896 for the horizontal scroll bar in the front window. If, 898 , the scroll bar is not found, Page Right simply returns 904 . If the scroll bar is found, Page Right calculates the page right button coordinates 900 , uses Post Mouse to click the mouse button down 902 , and returns 904 .
- the Move command 906 moves the mouse from its current location (y,x), to a new location (y+ ⁇ y,x+ ⁇ x).
- Move gets the command parameters 908 then Move sets the mouse speed to tablet 910 (this cancels the mouse acceleration, which otherwise would make mouse movements uncontrollable), adds the offset parameters to the current mouse location 912 , forces a new cursor position and resets the mouse speed 914 , and returns 916 .
- the Move to Global Coordinate command 918 moves the cursor to the global coordinates given by the Voice Control command string.
- Move to Global gets the command parameters 920 , then Move to Global checks 922 if there is a position parameter. If there is a position parameter, the screen position coordinates are fetched 924 . In either case, the global coordinates are calculated 926 , the mouse speed is set to tablet 928 , the mouse position is set to the new coordinates 930 , the cursor is forced to the new position 932 , and Move to Global returns 934 .
- the Move to Local Coordinate command 936 moves the cursor to the local coordinates given by the Voice Control command string.
- Move to Local gets the command parameters 938 , then Move to Local checks 940 if there is a position parameter. If there is a position parameter, the local position coordinates are fetched 942 . In either case, the global coordinates are calculated 944 , the mouse speed is set to tablet 946 , the mouse position is set to the new coordinates 948 , the cursor is forced to the new position 950 , and Move to Global returns 952 .
- the Move Continuous command 954 moves the mouse continuously from its present location, moving ⁇ y, ⁇ x every refresh of the screen. This is accomplished by inserting 956 the VBL Move routine 960 in the Vertical Blanking Interrupt queue of the Macintosh and returning 958 . Once in the queue, the VBL Move routine 960 will be executed every screen refresh. The VBL Move routine simply adds the ⁇ y and ⁇ x values to the current cursor position 962 , resets the cursor 964 , and returns 966 .
- the Option Key Down command 968 sets the option key down. This is done by setting the option key bit in the keyboard bit map to TRUE 970 , and returning 972 .
- the Option Key Up command 974 sets the option key up. This is done by setting the option key bit in the keyboard bit map to FALSE 976 , and returning 978 .
- the Shift Key Down command 980 sets the shift key down. This is done by setting the shift key bit in the keyboard bit map to TRUE 982 , and returning 984 .
- the Shift Key Up command 986 sets the shift key up. This is done by setting the shift key bit in the keyboard bit map to FALSE 988 , and returning 990 .
- the Command Key Down command 992 sets the command key down. This is done by setting the command key bit in the keyboard bit map to TRUE 994 , and returning 996 .
- the Command Key Up command 998 sets the command key up. This is done by setting the command key bit in the keyboard bit map to FALSE 1000 , and returning 1002 .
- the Control Key Down command 1004 sets the control key down. This is done by setting the control key bit in the keyboard bit map to TRUE 1006 , and returning 1008 .
- the Control Key Up command 1010 sets the control key up. This is done by setting the control key bit in the keyboard bit map to FALSE 1012 , and returning 1014 .
- the Next Window command 1016 moves the front window to the back. This is done by getting the front window 1018 and sending it to the back 1020 , and returning 1022 .
- the Erase command 1024 erases numchars characters from the screen.
- the number of characters typed by the most recent voice command is stored by Voice Control. Therefore, Erase will erase the characters from the most recent voice command. This is done by a loop which posts delete key keydown events 1026 and checks 1028 if the number posted equals numchars. When numchars deletes have been posted, Erase returns 1030 .
- the Capitalize command 1032 capitalizes the next keystroke. This is done by setting the caps flag to TRUE 1034 , and returning 1036 .
- the Launch command 1038 launches an application.
- the application must be on the boot drive no more than one level deep. This is done by getting the name of the application 1040 (“appl_name”), searching for appl_name on the boot volume 1042 , and, if, 1044 , the application is found, setting the volume to the application folder 1048 , launching the application 1050 (no return is necessary because the new application will clear the Macintosh queue). If the application is not found, Launch simply returns 1046 .
- the Post Mouse routine 1052 posts mouse down events to the Macintosh event queue and can set traps to monitor mouse activity and to keep the mouse down.
- the actions of Post Mouse are determined by the Voice Control flags global and mousedown, which are set by command handlers before calling Post Mouse. After a Post Mouse, when an application does a get_next_event it will see a mouse down event in the event queue, leading to events such as clicks, mouse downs or double clicks.
- Post Mouse saves the current mouse location 1054 so that the mouse may be returned to its initial location after the mouse events are produced.
- the cursor is hidden 1056 to shield the user from seeing the mouse moving around the screen.
- the mouse speed is set to tablet 1062 (to avoid acceleration problems), and the mouse down is posted to the Macintosh event queue 1064 . If, 1066 , the mousedown flag is TRUE (i.e. if the mouse button should be held down) then the Set Mouse Down routine is called 1072 and Post Mouse returns 1070 . Otherwise, if the mouse down flag is FALSE, then a click is created by posting a mouse up event to the Macintosh event queue 1068 and returning 1070 .
- the Set Mouse Down routine 1072 holds the mouse button down by replacing 1074 the Macintosh button trap with a Voice Control trap named My Button.
- the My Button trap then recognizes further voice commands and creates mouse drags or clicks as appropriate.
- Set Mouse Down checks 1076 if the Macintosh is a Macintosh Plus, in which case the Post Event trap must also be reset 1078 to the Voice Control My Post Event trap. (The Macintosh Plus will not simply check the mbstate global flag to determine the mouse button state. Rather, the Post Event trap in a Macintosh Plus will poll the actual mouse button to determine its state, and will post mouse up events if the mouse button is up.
- the Post Event trap is replaced with a My Post Event trap, which will not poll the status of the mouse button.
- the mbState flag is set to MouseDown 1080 (indicating that the mouse button is down) and Set Mouse Down returns 1082 .
- the My Button trap 1084 replaces the Macintosh button trap, thereby seizing control of the button state from the operating system.
- My Button Each time My Button is called, it checks 1086 the Macintosh mouse button state bit mbState. If mbState has been set to UP, My Button moves to the End Button routine 1106 which sets mbState to UP 1108 , removes any VBL routine which has been installed 1110 , resets the Button and Post Event traps to the original Macintosh traps 1112 , resets the mouse speed and couples the cursor to the mouse 1114 , shows the cursor 1102 , and returns 1104 .
- My Button checks for the expiration of wait ticks (which allow the Macintosh time to draw menus on the screen) 1088 , and calls the recognize routine 1090 to recognize further speech commands. After further speech commands are recognized, My Button determines 1092 its next action based on the length of the command string. If the command string length is less than zero, then the next voice command was a Voice Control internal command, and the mouse button is released by calling End Button 1106 . If the command string length is greater than zero, then a command was recognized, and the command is queued onto the voice que 1094 , and the voice queue is checked for further commands 1096 .
- My Button If nothing was recognized (command string length of zero), then My Button skips directly to checking the voice queue 1096 . If there is nothing in the voice queue, then My Button returns 1104 . However, if there is a command in the voice queue, then My Button checks 1098 if the command is a mouse movement command (which would cause a mouse drag). If it is not a mouse movement, then the mouse button is released by calling End Button 1106 . If the command is a mouse movement, then the command is executed 1100 (which drags the mouse), the cursor is displayed 1102 , and My Button returns.
- FIG. 24 a screen display of a record actions session is shown.
- the user is recording a local mouse click 1106 , and the click is being acknowledged in the action list 1108 and in the action window 1110 .
- dialog boxes 1112 for recording a manual printer feed are displayed to the user, as well as the Voice Control Run Modal dialog box 1114 prompting the user to record the dialogs.
- the user is preparing to record a click on the Manual Feed button 1116 .
- the user has requested the current language, which is displayed by Voice Control in a pop-up display 1120 .
- FIG. 30 a listing of the Write Production output file as displayed in FIG. 29 is provided.
- the graphic user interface controlled by a voice recognition system could be other than that of the Apple Macintosh computer.
- the recognizer could be other than that marketed by Dragon Systems.
- Appendix A which sets forth the Voice Control command language syntax
- Appendix B which lists some of the Macintosh OS globals used by the Voice Navigator system
- Appendix C which is a fiche of the Voice Navigator executable code
- Appendix D which is the Developer's Reference Manual for the voice Navigator system
- Appendix E which is the Voice Navigator User's Manual, all incorporated by reference herein.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Input From Keyboards Or The Like (AREA)
- Digital Computer Display Output (AREA)
Abstract
Description
- This invention relates to voice controlled computer interfaces.
- Voice recognition systems can convert human speech into computer information. Such voice recognition systems have been used, for example, to control text-type user interfaces, e.g., the text-type interface of the disk operating system (DOS) of the IBM Personal Computer.
- Voice control has also been applied to graphical user interfaces, such as the one implemented by the Apple Macintosh computer, which includes icons, pop-up windows, and a mouse. These voice control systems use voiced commands to generate keyboard keystrokes.
- In general, in one aspect, the invention features enabling voiced utterances to be substituted for manipulation of a pointing device, the pointing device being of the kind which is manipulated to control motion of a cursor on a computer display and to indicate desired actions associated with the position of the cursor on the display, the cursor being moved and the desired actions being aided by an operating system in the computer in response to control signals received from the pointing device, the computer also having an alphanumeric keyboard, the operating system being separately responsive to control signals received from the keyboard in accordance with a predetermined format specific to the keyboard; a voice recognizer recognizes the voiced utterance, and an interpreter converts the voiced utterance into control signals which will directly create a desired action aided by the operating system without first being converted into control signals expressed in the predetermined format specific to the keyboard.
- In general, in another aspect of the invention, voiced utterances are converted to commands, expressed in a predefined command language, to be used by an operating system of a computer, converting some voiced utterances into commands corresponding to actions to be taken by said operating system, and converting other voiced utterances into commands which carry associated text strings to be used as part of text being processed in an application program running under the operating system.
- In general, in another aspect, the invention features generating a table for aiding the conversion of voiced utterances to commands for use in controlling an operating system of a computer to achieve desired actions in an application program running under the operating system, the application program including menus and control buttons; the instruction sequence of the application program is parsed to identify menu entries and control buttons, and an entry is included in the table for each menu entry and control button found in the application program, each entry in the table containing a command corresponding to the menu entry or control button.
- In general, in another aspect, the invention features enabling a user to create an instance in a formal language of the kind which has a strictly defined syntax; a graphically displayed list of entries are expressed in a natural language and do not comply with the syntax, the user is permitted to point to an entry on the list, and the instance corresponding to the identified entry in the list is automatically generated in response to the pointing.
- The invention enables a user to easily control the graphical interface of a computer. Any actions that the operating system can be commanded to take can be commanded by voiced utterances. The commands may include commands that are normally entered through the keyboard as well as commands normally entered through a mouse or any other input device. The user may switch back and forth between voiced utterances that correspond to commands for actions to be taken and voiced utterances that correspond to text strings to be used in an application program without giving any indication that the switch has been made. Any application may be made susceptible to a voice interface by automatically parsing the application instruction sequence for menus and control buttons that control the application.
- Other advantages and features will become apparent from the following description of the preferred embodiment and from the claims.
- We first briefly describe the drawings.
- FIG. 1 is a functional block diagram of a Macintosh computer served by a Voice Navigator voice controlled interface system.
- FIG. 2A is a functional block diagram of a Language Maker system for creating word lists for use with the Voice Navigator interface of FIG. 1.
- FIG. 2B depicts the format of the voice files and word lists used with the Voice Navigator interface.
- FIG. 3 is an organizational block diagram of the Voice Navigator interface system.
- FIG. 4 is a flow diagram of the Language Maker main event loop.
- FIG. 5 is a flow diagram of the Run Edit module.
- FIG. 6 is a flow diagram of the Record Actions submodule.
- FIG. 7 is a flow diagram of the Run Modal module.
- FIG. 8 is a flow diagram of the In Button? routine.
- FIG. 9 is a flow diagram of the Event Handler module.
- FIG. 10 is a flow diagram of the Do My Menu module.
- FIGS. 11A through 11I are flow diagrams of the Language Maker menu submodules.
- FIG. 12 is a flow diagram of the Write Production module.
- FIG. 13 is a flow diagram of the Write Terminal submodule.
- FIG. 14 is a flow diagram of the Voice Control main driver loop.
- FIG. 15 is a flow diagram of the Process Input module.
- FIG. 16 is a flow diagram of the Recognize submodule.
- FIG. 17 is a flow diagram of the Process Voice Control Commands routine.
- FIG. 18 is a flow diagram of the ProcessQ module.
- FIG. 19 is a flow diagram of the Get Next submodule.
- FIG. 20 is a chart of the command handlers.
- FIGS. 21A through 21G are flow diagrams of the command handlers.
- FIG. 22 is a flow diagram of the Post Mouse routine.
- FIG. 23 is a flow diagram of the Set Mouse Down routine.
- FIGS. 24 and 25 illustrate the screen displays of Voice Control.
- FIGS. 26 through 29 illustrate the screen displays of Language Maker.
- FIG. 30 is a listing of a language file.
- Referring to FIG. 1, in an Apple Macintosh
computer 100, a Macintoshoperating system 132 provides a graphical interactive user interface by processing events received from amouse 134 and akeyboard 136 and by providing displays including icons, windows, and menus on adisplay device 138.Operating system 132 provides an environment in which application programs such as Macwrite 139, desktop utilities such as Calculator 137, and a wide variety of other programs can be run. - The
operating system 132 also receives events from the Voice Navigator voice controlledcomputer interface 102 to enable the user to control the computer by voiced utterances. For this purpose, the user speaks into amicrophone 114 connected via a Voice Navigator box 112 to the SCSI (Small Computer Systems Interface) port of thecomputer 100. The Voice Navigator box 112 digitizes and processes analog audio signals received from amicrophone 114, and transmits processed digitized audio signals to the Macintosh SCSI port. The Voice Navigator box includes an analog-to-digital converter (A/D) for digitizing the audio signal, a DSP (Digital Signal Processing) chip for compressing the resulting digital samples, and protocol interface hardware which configures the digital samples to obey the SCSI protocols. - Recognizer Software120 (available from Dragon Systems, Newton, Mass.) runs under the Macintosh operating system, and is controlled by
internal commands 123 received from Voice Control driver 128 (which also operates under the Macintosh operating system). One possible algorithm for implementing Recognizer Software 120 is disclosed by Baker et al, in U.S. Pat. No. 4,783,803, incorporated by reference herein. Recognizer Software 120 processes the incoming compressed, digitized audio, and compares each utterance of the user to prestored utterance macros. If the user utterance matches a prestored utterance macro, the utterance is recognized, and a command string 121 corresponding to the recognized utterance is delivered to atext buffer 126. Command strings 121 delivered from the Recognizer Software represent commands to be issued to the Macintosh operating system (e.g., menu selections to be made or text to be displayed), orinternal commands 123 to be issued by the Voice Control driver. - During recognition, the Recognizer Software120 compares the incoming samples of an utterance with macros in a
voice file 122. (The system requires the user to space apart his utterances briefly so that the system can recognize when each utterance ends.) The voice file macros are created by a “training” process, described below. If a match is found (as judged by the recognition algorithm of the Recognizer Software 120), a Voice Control command string from a word list 124 (which has been directly associated with voice file 122) is fetched and sent totext buffer 126. - The command strings in
text buffer 126 are relayed toVoice Control driver 128, which drives aVoice Control interpreter 130 in response to the strings. - A command string121 may indicate an
internal command 123, such as a command to the Recognizer Software to “learn” new voice file macros, or to adjust the sensitivity of the recognition algorithm. In this case,Voice Control interpreter 130 sends the appropriateinternal command 123 to theRecognizer Software 120. In other cases, the command string may represent an operating system manipulation, such as a mouse movement. In this case,Voice Control interpreter 130 produces the appropriate action by interacting with theMacintosh operating system 132. - Each application or desktop accessory is associated with a
word list 124 and acorresponding voice file 122; these are loaded by the Recognition Software when the application or desktop accessory is opened. - The voice files are generated by the
Recognizer Software 120 in its “learn” mode, under the control of internal commands from theVoice Control driver 128. - The word lists are generated by the Language
Maker desktop accessory 140, which creates “languages” of utterance names and associated Voice Control command strings, and converts the languages into the word lists. Voice Control command strings are strings such as “ESC”, “TEXT”, “@MENU(font,2)”, and belong to a Voice Control command set, the syntax of which will be described later and is set forth in Appendix A. - The Voice Control and Language Maker software includes about 30,000 lines of code, most of which is written in the C language, the remainder being written in assembly language. A listing of the Voice Control and Language Maker software is provided in microfiche as appendix C. The Voice Control software will operate on a Macintosh Plus or later models, configured with a minimum of 1 Mbyte RAM (2 Mbyte for HyperCard and other large applications), a Hard Disk, and with Macintosh operating system version 6.01 or later.
- In order to understand the interaction of the
Voice Control interpreter 130 and the operating system, note thatMacintosh operating system 132 is “event driven”. The operating system maintains an event queue (not shown); input devices such as themouse 134 or thekeyboard 136 “post” events to this queue to cause the operating system to, for example, create the appropriate text entry, or trigger a mouse movement. Theoperating system 132 then, for example, passes messages to Macintosh applications (such as MacWrite 139) or to desktop accessories (such as Calculator 137) indicating events on the queues (if any). In one mode of operation,Voice Control interpreter 130 likewise controls the operating system (and hence the applications and desktop accessories which are currently running) by posting events to the operating system queues. The events posted by the Voice Control interpreter typically correspond to mouse activity or to keyboard keystrokes, or both, depending upon the voice commands. Thus, theVoice Navigator system 102 provides an additional user interface. In some cases, the “voice” events may comprise text strings to be displayed or included with text being processed by the application program. - At any time during the operation of the Voice Navigator system, the
Recognizer Software 120 may be trained to recognize an utterance of a particular user and to associate a corresponding text string with each utterance. In this mode, theRecognizer Software 120 displays to the user a menu of the utterance names (such as “file”, “page down”) which are to be recognized. These names, and the corresponding Voice Control command strings (indicating the appropriate actions) appear in acurrent word list 124. The user designates the utterance name of interest and then is prompted to speak the utterance corresponding to that name. For example, if the utterance name is “file”, the user might utter “FILE” or “PLEASE FILE”. The digitized samples from the Voice Navigator box 112 corresponding to that utterance are then used by theRecognizer Software 120 to create a “macro” representing the utterance, which is stored in thevoice file 122 and subsequently associated with the utterance name in theword list 124. ordinarily, the utterance is repeated more than once, in order to create a macro for the utterance that accommodates variation in a particular speaker's voice. - The meaning of the spoken utterance need not correspond to the utterance name, and the text of the utterance name need not correspond to the Voice Control command strings stored in the word list. For example, the user may wish a command string that causes the operating system to save a file to have the utterance name “save file”; the associated command string may be “@MENU(file,2)”; and the utterance that the user trains for this utterance name may be the spoken phrase “immortalize”. The Recognizer Software and Voice Control cause that utterance, name, and command string to be properly associated in the voice file and
word list 124. - Referring to FIG. 2A, the word lists124 used by the Voice Navigator are created by the Language
Maker desk accessory 140 running under the operating system. Eachword list 124 is hierarchical, that is, some utterance names in the list link to sub-lists of other utterance names. Only the list of utterance names at a currently active level of the hierarchy can be recognized. (In the current embodiment, the number of utterance names at each level of the hierarchy can be as large as 1000.) In the operation of Voice Control, some utterances, such as “file”, may summon the file menu on the screen, and link to a subsequent list of utterance names at a lower hierarchical level. For example, the file menu may list subsequent commands such as “save”, “open”, or “save as”, each associated with an utterance. - Language Maker enables the user to create a hierarchical language of utterance names and associated command strings, rearrange the hierarchy of the language, and add new utterance names. Then, when the language is in the form that the user desires, the language is converted to a
word list 124. Because the hierarchy of the utterance names and command strings can be adjusted, when using the Voice Navigator system the user is not bound by the preset menu hierarchy of an application. For example, the user may want to create a “save” command at the top level of the utterance hierarchy that directly saves a file without first summoning the file menu. Also, the user may, for example, create a new utterance name “goodbye”, that saves a file and exits all at once. - Each language created by
Language Maker 140 also contains the command strings which represent the actions (e.g. clicking the mouse at a location, typing text on the screen) to be associated with utterances and utterance names. In order for the training of the Voice Navigator system to be more intuitive, the user does not specify the command strings to describe the actions he wishes to be associated with an utterance and utterance name. In fact, the user does not need to know about, and never sees, the command strings stored in the Language Maker language or the resultingword list 124. - In a “record” mode, to associate a series of actions with an utterance name, the user simply performs the desired actions (such as typing the text at the keyboard, or clicking the mouse at a menu). The actions performed are converted into the appropriate command strings, and when the user turns off the record mode, the command strings are associated with the selected utterance name.
- While using Language Maker, the user can cause the creation of a language by entering utterance names by typing the names at the
keyboard 142, by using a “create default text” procedure 146 (to parse a text file on the clipboard, in which case one utterance name is created for each word in the text file, and the names all start at the same hierarchical level), or by using a “create default menus” procedure (to parse theexecutable code 144 for an application, and create a set of utterance names which equal the names of the commands in the menus of the application, in which case the initial hierarchy for the names is the same as the hierarchy of the menus in the application). - If the names are typed at the keyboard or created by parsing a text file, the names are initially associated with the keystrokes which, when typed at the keyboard, produce the name. Therefore, the name “text” would be initially be associated with the keystrokes t-e-x-t. If the names are created by parsing the
executable code 144 for an application, then the names are initially associated with the command strings which execute the corresponding menu commands for the application. These initial command strings can be changed by simply selecting the utterance name to be changed and putting Language Maker into record mode. - The output of Language Maker is a language file148. This file contains the utterance names and the corresponding command strings. The language file 148 is formatted for input to a VOCAL compiler 150 (available from Dragon Systems), which converts the language file into a
word list 124 for use with the Recognition Software. The syntax of language files is specified in the Voice Navigator Developer's Reference Manual, provided as Appendix D, and incorporated by reference. - Referring to FIG. 2B, a
macro 147 of each learned utterance is stored in thevoice file 122. Acorresponding utterance name 149 andcommand string 151 are associated with one another and with the utterance and are stored in theword list 124. Theword list 124 is created and modified byLanguage Maker 140, and thevoice file 122 is created and modified by theRecognition Software 120 in its learn mode, under the control of theVoice Control driver 128. - Referring to FIG. 3, in the
Voice Navigator system 102, the VoiceNavigator hardware box 152 includes an analog-to-digital (A/D)converter 154 for converting the analog signal from the microphone into a digital signal for processing, aDSP section 156 for filtering and compacting the digitized signal, aSCSI manager 158 for communication with the Macintosh, and amicrophone control section 160 for controlling the microphone. - The Voice Navigator system also includes the Recognition
Software voice drivers 120 which include routines forutterance detection 164 andcommand execution 166. Forutterance detection 164, the voice drivers periodically poll 168 the Voice Navigator hardware to determine if an utterance is being received byVoice Navigator box 152, based on the amplitude of the signal received by the microphone. When an utterance is detected 170, the voice drivers create a speech buffer of encoded digital samples (tokens) to be used by thecommand execution drivers 166. Oncommand 166 from theVoice Control driver 128, the recognition drivers can learn new utterances by token-to-terminal conversion 174. The token is converted to a macro for the utterance, and stored as a terminal in a voice file 122 (FIG. 1). - Recognition and pattern matching172 is also performed on command by the voice drivers. During recognition, a stored token of incoming digitized samples is compared with macros for the utterances in the current level of the recognition hierarchy. If a match is found, terminal to
output conversion 176 is also performed, selecting the command string associated with the recognized utterance from the word list 124 (FIG. 1).State management 178, such as changing of sensitivity controls, is also performed on command by the voice drivers. - The
voice Control driver 128 forms aninterface 182 to thevoice drivers 120 through control commands, aninterface 184 to the Macintosh operating system 132 (FIG. 1) through event posting and operating system hooks, and aninterface 186 to the user through display menus and prompts. - The
interface 182 to the drivers allows Voice Control access to the Voice Driver command functions 166. This interface allows Voice Control to monitor 188 the status of the recognizer, for example to check for an utterance token in the utterance queue buffered 170 to the Macintosh. If there is an utterance, and if processor time is available, Voice Control issuescommand sdi_recognize 190, calling the recognition and pattern match routine 172 in the voice drivers. In addition, the interface to the drivers may issuecommand sdi_output 192 which controls the terminal tooutput conversion routine 176 in the voice drivers, converting a recognized utterance to an command string for use by Voice Control. The command string may indicate mouse or keystroke events to be posted to the operating system, or may indicate commands to Voice Control itself (e.g. enabling or disabling Voice Control). - From the user's perspective, Voice Control is simply a Macintosh driver with internal parameters, such as sensitivity, and internal commands, such as commands to learn new utterances. The actual processing which the user perceives as Voice Control may actually be performed by Voice Control, or by the voice Drivers, depending upon the function. For example, the utterance learning procedures are performed by the Voice Drivers under the control of Voice Control.
- The
interface 184 to the Macintosh operating system allows Voice Control, where appropriate, to manipulate the operating system (e.g., by posting events or modifying event queues). Themacro interpreter 194 takes the command strings delivered from the voice drivers via the text buffer and interprets them to decide what actions to take. These commands may indicate text strings to be displayed on the display or mouse movements or menu selections to be executed. - In the interpretive execution of the command strings, Voice Control must manipulate the Macintosh event queues. This task is performed by
OS event management 196. As discussed above, voice events may simulate events which are ordinarily associated with the keyboard or with the mouse. Keyboard events are handled byOS event management 196 directly. Mouse events are handled bymouse handler 198. Mouse events require an additional level of handling because mouse events can require operating system manipulation outside of the standard event post routines which are accomplished by theOS event management 196. - The main interface into the
Macintosh operating system 132 is event based, and is used in the majority of the commands which are voice recognized and issued to the Macintosh. However, there are other “hooks” to the operating system state which are used to control parameters such as mouse placement and mouse motion. For example, as will be discussed later, pushing the mouse button down generates an event, however, keeping the mouse button pushed down and dragging the mouse across a menu requires the use of an operating system hook. For reference, the operating system hooks used by the voice Navigator are listed in Appendix B. - The operating system hooks are implemented by the trap filters200, which are filters used by Voice Control to force the Macintosh operating system to accept the controls implemented by
OS event management 196 andmouse handler 198. - The Macintosh operating system traps are held in Macintosh read only memories (ROMs), and implement high level commands for controlling the system. Examples of these high level commands are: drawing a string onto the screen, window zooming, moving windows to the front and back of the screen, and polling the status of the mouse button. In order for the Voice Control driver to properly interface with the Macintosh operating system it must control these operating system traps to generate the appropriate events.
- To generate menu events, for example, Voice Control “seizes” the menu select trap (i.e. takes control of the trap from the operating system). Once Voice Control has seized the trap, application requests for menu selections are forwarded to Voice Control. In this way Voice Control is able to modify, where necessary, the operating system output to the program, thereby controlling the system behavior as desired.
- The
interface 186 to the user provides user control of the Voice Control operations.Prompts 202 display the name of each recognized utterance on the Macintosh screen so that the user may determine if the proper utterance has been recognized. On-line training 204 allows the user to access, at any time while using the Macintosh, the utterance names in theword list 124 currently in use. The user may see which utterance names have been trained and may retrain the utterance names in an on-line manner (these functions require Voice Control to use the Voice Driver interface, as discussed above).User options 206 provide selection of various Voice Control settings, such as the sensitivity and confidence level of the recognizer (i.e., the level of certainty required to decide that an utterance has been recognized). The optimal values for these parameters depend upon the microphone in use and the speaking voice of the user. - The
interface 186 to the user does not operate via the Macintosh event interface. Rather, it is simply a recursive loop which controls the Recognition Software and the state of the Voice Control driver. -
Language Maker 140 includes anapplication analyzer 210 and anevent recorder 212.Application analyzer 210 parses the executable code of applications as discussed above, and produces suitable default utterance names and pre-programmed command strings. Theapplication analyzer 210 includes amenu extraction procedure 214 which searches executable code to find text strings corresponding to menus. Theapplication analyzer 210 also includescontrol identification procedures 216 for creating the command strings corresponding to each menu item in an application. - The
event recorder 212 is a driver for recording user commands and creating command strings for utterances. This allows the user to easily create and edit command strings as discussed above. - Types of events which may be entered into the event recorder include:
text entry 218, mouse events 220 (such as clicking at a specified place on the screen),special events 222 which may be necessary to control a particular application, andvoice events 224 which may be associated with operations of the Voice Control driver. - Language Maker
- Referring to FIG. 4, the Language Maker
main event loop 230 is similar in structure to main event loops used by other desk accessories in the Macintosh operating system. If a desk accessory is selected from the “Apple” menu, an “open” event is transmitted to the accessory. In general, if the application in which it resides quits or if the user quits it using its menus, a “close” event is transmitted to the accessory. Otherwise, the accessory is transmitted control events. The message parameter of a control event indicates the kind of event. As seen in FIG. 4, the Language Makermain event loop 230 begins With ananalysis 232 of the event type. - If the event is an open event Language Maker tests234 whether it is already opened. If Language Maker is already opened 236, the current language (i.e. the list of utterance names from the current word list) is displayed and Language Maker returns 237 to the operating system. If Language Maker is not open 238, it is initialized and then returns 239 to the operating system.
- If the event is a close event, Language Maker prompts the
user 240 to save the current language as a language file. If the user commands Language Maker to save the current language, the current language is converted by theWrite Production module 242 to a language file, and then Language Maker exits 244. If the current language is not saved, Language Maker exits directly. - If the event is a
control event 246, then the way in which Language Maker responds to the event depends upon the mode that Language Maker is in, because Language Maker has a utility for recording events (i.e. the mouse movements and clicks or text entry that the user wishes to assign to an utterance), and must record events which do not involve the Language Maker window. However, when not recording, Language Maker should only respond to events in its window. Therefore, Language Maker may respond to events in one mode but not in another. - A
control event 246 is forwarded to one of threebranches accMenu branch 252. (Only menu events occurring in desk accessory menus will be forwarded to Language Maker.) All window events for the Language Maker window are forwarded to theaccEvent branch 250. All other events received by Language Maker, which correspond to events for desktop accessories or applications other than Language Maker, initiate activity in theaccRun branch 248, to enable recording of actions. - In the
accRun branch 248, events are recorded and associated with the selected utterance name. Before any events are recorded Language Maker checks 254 if Language Maker is recording; if not, Language Maker returns 256. If recording is on 258, then Language Maker checks the current recording mode. while recording, Language Maker seizes control of the operating system by setting control flags that cause the operating system to call Language Maker every tick of the Macintosh (i.e. every {fraction (1/60)} second). - If the user has set Language Maker in dialog mode, Language Maker can record dialog events (i.e. events which involve modal dialog, where the user cannot do anything except respond to the actions in modal dialog boxes). To accomplish this, the user must be able to produce actions (i.e. mouse clicks, menu selections) in the current application so that the dialog boxes are prompted to the screen. Then the user can initialize recording and respond to the dialog boxes. When modal dialog boxes should be produced, events received by Language Maker are also forwarded to the operating system. Otherwise, events are not forwarded to the operating system. Language Maker's modal dialog recording is performed by the
Run Modal module 260. - If modal dialog events are not being recorded, the user records with Language Maker in “action” mode, and Language Maker proceeds to the
Run Edit module 262. - In the accEvent branch, all events are forwarded to the
Event Handler module 264. - In the accMenu branch, the menu indicated by the desk accessory menu event is checked266. If the event occurred in the Language Maker menu, it is forwarded to the Do
My Menu module 268. Other events are ignored 270. - Referring to FIG. 5, the
Run Edit module 262 performs aloop Record Actions submodule 272. If there are more actions in the event queue then the loop returns to the Record Actions submodule. If a cancel action appears 276 in the event queue then Run Edit returns 277 without updating the current language in memory. Otherwise, if the events are completed successfully, run edit updates the language in memory and turns off recording 278 and returns to theoperating system 280. - Referring to FIG. 6, in the
Record Actions submodule 272, actions performed by the user in record mode are recorded. When the current application makes a request for the next event on the event queue, the event is checked by record actions. Each non-null event (i.e. each action) is processed by Record Actions. First, the type of action is checked 282. If the action selects amenu 284, then the selected menu is recorded. If the action is amouse click 286, the In Button? routine (see FIG. 8) checks if the click occurred inside of a button (a button is a menu selection area in the front window) or not. If so, the button is recorded 288. If not, the location of the click is recorded 290. - Other actions are recorded by special handlers. These actions include
group actions 292, mouse downactions 294, mouse upactions 296, zoomactions 298, growactions 300, andnext window actions 302. - Some actions in menus can create pop-up menus with subchoices. These actions are handled by popping up the appropriate pop-up menu so that the user may select the desired subchoice. Move
actions 304, pauseactions 306, scrollactions 308,text actions 310 andvoice actions 312 pop up respective menus and Record Actions checks 314 for the menu selection made by the user (with a mouse drag). If no menu selection is made, then no action is recorded 316. Otherwise, the choice is recorded 318. - Other actions may launch applications. In this
case 320 the selected application is determined. If no application has been selected then no action is recorded 322, otherwise the selected application is recorded 324. - Referring to FIG. 7, the
Run Modal procedure 260 allows recording of the modal dialogs of the Macintosh computer. During modal dialogs, the user cannot do anything except respond to the actions in the modal dialog box. In order to record responses to those actions, Run Modal has several phases, each phase corresponding to a step in the recording process. - In the first phase, when the user selects dialog recording, Run Modal prompts the user with a Language Maker dialog box that gives the user the options “record” and “cancel” (see FIG. 25). The user may then interact with the current application until arriving at the dialog click that is to be recorded. During this phase, all calls to Run Modal are routed through
Select Dialog 326, which produces the initial Language Maker dialog box, and then returns 327, ignoring further actions. - To enter the second, recording, phase, the user clicks on the “record” button in the Language Maker dialog box, indicating that the following dialog responses are to be recorded. In this phase, calls to Run Modal are routed to Record328, which uses the In Button? routine 330 to check if a button in current application's dialog box has been selected. If the click occurred in a button, then the button is recorded 332, and Run Modal returns 333. Otherwise, the location of the click is recorded 334 and Run Modal returns 335.
- Finally, when all clicks are recorded, the user clicks on the “cancel” button in the Language Maker dialog box, entering the third phase of the recording session. The click in the “cancel” button causes Run Modal to route to Cancel336, which updates 338 the current language in memory, then returns 340.
- Referring to FIG. 8, the In Button?
procedure 286 determines whether a mouse click event occurred on a button. In Button? gets the current window control list 342 (a Macintosh global which contains the locations of all of the button rectangles in the current window, refer to Appendix B) from the operating system and parses the list with a loop 344-350. Each control is fetched 350, and then the rectangle of the control is found 346. Each rectangle is analyzed 348 to determine if the click occurred in the rectangle. If not, the next control is fetched 350, and the loop recurses. If, 344, the list is emptied then the click did not occur on a button, and no is returned 352. However, if the click did occur in a rectangle, then, if, 351, the rectangle is named, the click occurred on a button, and yes is returned 354; if the rectangle is not named 356, the click did not occur on a button, and no is returned 356. - Referring to FIG. 9, the
Event Handler module 264 deals with standard Macintosh events in the Language Maker display window. The Language Maker display window lists the utterance names in the current language. As shown in FIG. 9, Event Handler determines 358 whether the event is a mouse or keyboard event and subsequently performs the proper action on the Language Maker window. - Mouse events include: dragging the
window 360, growing thewindow 362, scrolling thewindow 364, clicking on the window 368 (which selects an utterance name), and dragging on the window 370 (which moves an utterance name from one location on the screen to another, potentially changing the utterance's position in the language hierarchy). Double-clicking 366 on an utterance name in the window selects that utterance name for action recording, and therefore starts the Run Edit module. - Keyboard events include the
standard cut 372,copy 374, and paste 376 routines, as well as cursor movements down 380, up 382, right 384, and left 386. Pressing return at thekeyboard 378, as with a double click at the mouse, selects the current utterance name for action recording by Run Edit. After the appropriate command handler is called, Event Handler returns 388. The modifications to the language hierarchy performed by the Event Handler module are reflected in hierarchical structure of the language file produced by the Write Production module during close and save operations. - Referring to FIG. 10, the Do
My Menu module 268 controls all of the menu choices supported by Language Maker. After summoning the appropriate submodule (discussed in detail in FIGS. 11A through 111), Do My Menu returns 408. - Referring to FIG. 11A, the
New submodule 390 creates a new language. The New submodulefirst checks 410 if Language Maker is open. If so, it prompts theuser 412 to save the current language as a language file. If the user saves the current language, New callsWrite Production module 414 to save the language. New then callsCreate Global Words 416 and forms anew language 418. CreateGlobal Words 416 will automatically enter a few global (i.e. resident in all languages) utterance names and command strings into the new language. These utterance names and command strings allow the user to make Voice Control commands, and correspond to utterances such as “show me the active words” and “bring up the voice options” (the utterance macros for the corresponding voice file are trained by the user, or copied from an existing voice file, after the new language is saved). - Referring to FIG. 11B, the
Open submodule 392 opens an existing language for modification. The Open submodule 392checks 420 if Language Maker is open. If so, it prompts theuser 422 to save the current language, callingWrite Production 424 if yes. open then prompts the user to open the selectedlanguage 426. If the user cancels, Open returns 428. Otherwise, the language is loaded 430 and Open returns 432. - Referring to FIG. 1C, the Save submodule394 saves the current language in memory as a language file. Save prompts the user to save the
current language 434. If the user cancels, Save returns 436, otherwise, Save callsWrite Production 438 to convert the language into a state machine control file suitable for use by VOCAL (FIG. 2). Finally, Save returns 440. - Referring to FIG. 1D, the New Action submodule396 initializes the event recorders to begin recording a new sequence of actions. New Action initializes the event recorder by displaying an action window to the
user 442, setting up a tool palette for the user to use, and initializing recording of actions. Then New Action returns 444. After New Action is started, actions are not delivered to the operating system directly; rather they are filtered through Language Maker. - Referring to FIG. 1E, the
Record Dialog submodule 398 records responses to dialog boxes through the use of the Run Modal module.Record Dialog 398 gives the user a way to record actions in modal dialog; otherwise the user would be prevented from performing the actions which bring up the dialog boxes. Record Dialog displays 446 the dialog action window (see FIG. 25) and turns recording on. Then Record Dialog returns 448. - Referring to FIG. 11F, the Create
Default Menus submodule 400 extracts default utterance names (and generates associated command strings) from the executable code for an application. CreateDefault Menus 270 is ordinarily the first choice selected by a user when creating a language for a particular application. This submodule looks at the executable code of an application and creates an utterance name for each menu command in the application, associating the utterance name with a command string that will select that menu command. When called, Create Default Menus gets 450 the menu bar from the executable code of the application, and initializes the current menu to be the first menu (X=1). Next, each menu is processed recursively. When all menus are processed, Create Default Menus returns 454. Afirst loop menu handle 456, initializes menu parsing, checks if the current menu is fully parsed 458, and reiterates by updating the current menu to the next menu. Asecond loop menu name 462, and checks 464 if the name is hierarchical (i.e. if the name points to further menus). If the names are not hierarchical, the loop recurses. Otherwise, the hierarchical menu is fetched 466, and athird loop - Referring to FIG. 11G, the Create
Default Text submodule 402 allows the user to convert a text file on the clipboard into a list of utterance names. Createdefault text 402 creates an utterance name for each unique word in the clipboard 474, and then returns 476. The utterance names are associated with the keyboard entries which will type out the name. For example, a business letter can be copied from the clipboard into default text. Utterances would then be associated with each of the common business terms in the letter. After ten or twelve business letters have been converted the majority of the business letter words would be stored as a set of utterances. - Referring to FIG. 11H, the Alphabetize Group submodule404 allows the user to alphabetize the utterance names in a language. The selected group of names (created by dragging the mouse over utterance names in the Language Maker window) is alphabetized 478, and then Alphabetize Group returns 480.
- Referring to FIG. 11I, the Preferences submodule406 allows the user to select standard graphic user interface preferences such as
font style 482 andfont size 484. The Preferences submenu 486 allows the user to state the metric by which mouse locations of recorded actions are stored. The coordinates for mouse actions can be relative to the global window coordinates or relative to the application window coordinates. In the case where application menu selections are performed by mouse clicks, the mouse clicks must always be in relative coordinates so that the window may be moved on the screen without affecting the function of the mouse click. The Preferences submenu 486 also determines whether, when a mouse action is recorded, the mouse is left at the location of a click or returned to its original location after a click. When the preference selections are done 488, the user is prompted whether he wants to update the current preference settings for Language Maker. If so, the file is updated 490 and Preferences returns 492. If not, Preferences returns directly to theoperating system 494 without saving. - Referring to FIG. 12, the
Write Production module 242 is called when a file is saved. Write Production saves the current language and converts it from an outline processor format such as that used in the Language Maker application to a hierarchical text format suitable for use with the state machine based Recognition Software. Language files are associated with applications and new language files can be created or edited for each additional application to incorporate the various commands of the application into voice recognition. - The embodiment of the Write Production module depends upon the Recognition Software in use. In general, the Write Production module is written to convert the current language to suitable format for the Recognition Software in use. The particular embodiment of Write Production shown in FIG. 12 applies to the syntax of the VOCAL compiler for the Dragon Systems Recognition Software.
- Write Production first tests the
language 494 to determine if there are any sub-levels. If not, the Write Terminal submodule 496 saves the top level language, and Write Production returns 498. If sub-levels exist in the language, then each sub-level is processed by a tail-recursive loop. If a root entry exists in the language 500 (i.e. if only one utterance name exists at the current level) then Write Production writes 502 the string “Root=(” to the file, and checks forsub-levels 512. Otherwise, if no root exists, Write Terminal is called 504 to save the names in the current level of the language. Next, the string “TERMINAL=” is written 506, and if, 508, the language level is terminal, the string “(” is written. Next, WriteProduction checks 512 for sublevels in the language. If no sub-levels exist, Write Production returns 514. Otherwise, the sub-levels are processed by anothercall 516 to Write Production on the sub-level of the language. After the sub-level is processed, write Production writes the string “)” and returns 518. - Referring to FIG. 13, the Write Terminal submodule496 writes each utterance name and the associated command string to the language file. First, Write Terminal checks 520 if it is at a terminal. If not, it returns 530. Otherwise, Write Terminal writes 522 the string corresponding to the utterance name to the language file. Next, if, 524, there is an associated command string, Write Terminal writes the command string (i.e. “output”) to the language file. Finally, Write Terminal writes 528 the string “;” to the language file and returns 530.
- Voice Control
- The Voice Control software serves as a gate between the operating system and the applications running on the operating system. This is accomplished by setting the Macintosh operating system's get_next_event procedure equal to a filter procedure created by Voice Control. The get_next_event procedure runs when each next_event request is generated by the operating system or by applications. Ordinarily the get_next_event procedure is null, and next_event requests go directly to the operating system. The filter procedure passes control to Voice Control on every request. This allows Voice Control to perform voice actions by intercepting mouse and keyboard events, and create new events corresponding to spoken commands.
- The Voice Control filter procedure is shown in FIG. 14.
- After
installation 538, the get_nextevent filter procedure 540 is called before an event is generated by the operating system. The event is first checked 54Z to see if it is a null event. If so, theProcess Input module 544 is called directly. TheProcess Input routine 544 checks for new speech input and processes any that has been received. After Process Input, the Voice Control driver proceeds through normal filter processing 546 (i.e., any filter processing caused by other applications) and returns 548. If the next event is not a null event, then displays are hidden 550. This allows Voice Control to hide any Voice Control displays (such as current language lists) which could have been generated by a previous non-null action. Therefore, if any prompt windows have been produced by Voice Control, when a non-null event occurs, the prompt windows are hidden. Next, key down events are checked 552. Because the recognizer is controlled (i.e. turned on and off) by certain special key down events, if the event is a key down event then Voice Control must do further processing. Otherwise, the Voice Control drive procedure moves directly toProcess Input 544. If a key down event has occurred 554, where appropriate, software latches which control the recognizer are set. This allows activation of the Recognizer Software, the selection of Recognizer options, or the display of languages. Thereafter, the Voice Control driver moves to ProcessInput 544. - Referring to FIG. 15, the Process Input routine is the heart of the Voice Control driver. It-manages all voice input for the Voice Navigator. The Process Input module is called each time an event is processed by the operating system. First546, any latches which need to be set are processed, and the Macintosh waits for a number of delay ticks, if necessary. Delay ticks are included, for example, where a menu drag is being performed by Voice Control, to allow the menu to be drawn on the screen before starting the drag. Also, some applications require delay between mouse or keyboard events. Next, if recognition is activated 548 the process input routine proceeds to do
recognition 562. If recognition is deactivated, Process Input returns 560. - The
recognition routine 562 prompts the recognition drivers to check for an utterance (i.e., sound that could be speech input). If there is recognizedspeech input 564, Process Input checks the vertical blanking interrupt VBL handler 566, and deactivates it where appropriate. - The vertical blanking interrupt cycle is a very low level cycle in the operating system. Every time the screen is refreshed, as the raster is moving from the bottom right to the top left of the screen, the vertical blanking interrupt time occurs. During this blanking time, very short and very high priority routines can be executed. The cycle is used by the Process Input routine to move the mouse continuously by very slowly incrementing of the mouse coordinates where appropriate. To accomplish this, mouse move events are installed onto the VBL queue. Therefore, where appropriate, the VBL handler must be deactivated to move the mouse.
- Other speech input is placed568 on a speech queue, which stores speech related events for the processor until they can be handled by the ProcessQ routine. However, regardless of whether speech is recognized,
ProcessQ 570 is always called by Process Input. Therefore, the speech events queued to ProcessQ are eventually executed, but not necessarily in the same Process Input cycle. After calling ProcessQ, Process Input returns 571. - Referring to FIG. 16, the Recognize
submodule 562 checks for encoded utterances queued by the Voice Navigator box, and then calls the recognition drivers to attempt to recognize any utterances. Recognize returns the number of commands in (i.e. the length of) the command string returned from the recognizer. If, 572, no utterance is returned from the recognizer, then Recognize returns a length of zero (574), indicating no recognition has occurred. If an utterance is available, then Recognize calls sdi_recognize 576, instructing the Recognizer Software to attempt recognition on the utterance. If, 578, recognition is successful, then the name of the utterance is displayed 582 to the user. At the same time, any close call windows (i.e. windows associated with close call choices, prompted by Voice Control in response to the Recognizer Software) are cleared from the display. If recognition is unsuccessful, the Macintosh beeps 580 and zero length is returned 574. - If recognition is successful, Recognize
searches 584 for an output string associated with the utterance. If there is an output string, recognize checks if it is asleep 586. If it is not asleep 590, the output count is set to the length of the output string and, if the command is a control command 592 (such as “go to sleep” or “wake up”), it is handled by the Process Voice Commands routine 594. - If there is no output string for the recognized utterance, or if the recognizer is asleep, then the output of Recognize is zero (588). After the output count is determined 596, the state of the recognizer is processed 596. At this time, if the Voice Control state flags have been modified by any of the Recognize subroutines, the appropriate actions are initialized. Finally, Recognize returns 598.
- Referring to FIG. 17, the Process Voice Commands module deals with commands that control the recognizer. The module may perform actions, or may flag actions to be performed by the Process States block596 (FIG. 16). If the recognizer is put to
sleep 600 or awakened 604, the appropriate flags are set 602, 606, and zero is returned 626, 628 for the length of the command string, indicating to Process States to take no further actions. Otherwise, if the command is scratch_that 608 (ignore last utterance), first_level 612 (go to top of language hierarchy, i.e. set the voice Control state to the root state for the language), word_list 616 (show the current language), orvoice_options 620, the appropriate flags are set and 610, 614, 618, 622, and a string length of −1 is returned 624, 628, indicating that the recognizer state should be changed by Process States 596 (FIG. 16). - Referring to FIG. 18 the
ProcessQ module 570 pulls speech input from the speech queue and processes it. If, 630, the event queue is empty then ProcessQ may proceed, otherwise ProcessQ aborts 632 because the event queue may overflow if speech events are placed on the queue along with other events. If, 634, the speech queue has any events then process queue checks to see if, 636, delay ticks for menu drawing or other related activities have expired. If no events are on the speech queue the ProcessQ aborts 636. If delay ticks have expired, then ProcessQ callsGet Next 642 and returns 644. Otherwise, if delay ticks have not expired, ProcessQ aborts 640. - Referring to FIG. 19, the Get Next submodule642 gets characters from the speech queue and processes them. If, 646, there are no characters in the speech queue then the procedure simply returns 648. If there are characters in the speech queue then Get
Next checks 650 to see if the characters are command characters. If they are, then Get Next calls Check Command 660. If not, then the characters are text, and Get Next sets themeta bits 652 where appropriate. - When the Macintosh posts an event, the meta bits (see Appendix B) are used as flags for conditioning keystrokes such as the condition key, the option key, or the command key. These keys condition the character pressed at the keyboard and create control characters. To create the proper operating system events, therefore, the meta bits must be set where necessary. Once the meta bits are set652, a key down event is posted 654 to the Macintosh event queue, simulating a keypush at the keyboard. Following this, a key up is posted 656 to the event queue, simulating a key up. If, 658, there is still room in the event queue, then further speech characters are obtained and processed 646. If not, then the Get Next procedure returns 676.
- If the command string input corresponds to a command rather than simple key strokes, the string is handled by the Check Command procedure660 as illustrated in FIG. 19. In the Check Command procedure 660 the next four characters from the speech queue (four characters is the length of all command strings, see Appendix A) are fetched 662 and compared 664 to a command table. If, 666, the characters equal a voice command, then a command is recognized, and processing is continued by the
Handle Command routine 668. Otherwise, the characters are interpreted as text and processing returns to the meta bits step 652. - In the
Handle Command procedure 668 each command is referenced into a table of command procedures byfirst computing 670 the command handler offset into the table and then referencing the table, and calling theappropriate command handler 672. After calling the appropriate command handler, Get Next exits the Process Input module directly 674 (the structure of the software is such that a return from Handle Command would return to the meta bits step 652, which would be incorrect). - The command handlers available to the Handle Command routine are illustrated in FIG. 20. Each command handler is detailed by a flow diagram in FIGS. 21A through 21G. The syntax for the commands is detailed in Appendix A.
- Referring to FIG. 21A, the Menu command will pull down a menu, for example, @MENU(apple,0) (where apple is the menu number for the apple menu) will pull down the apple menu. Menu command will also select an item from the menu, for example, @MENU(apple,calculator) (where calculator is the itemnumber for the calculator in the apple menu) will select the calculator from the apple menu. Menu command initializes by running the Find Menu routine678 which queues the menu id and the item number for the selected menu. (If the item number in the menu is 0 then Find Menu simply clicks on the menu bar.) After Find Menu returns, if 680, there are no menus queued for posting, the Menu command simply returns 690. However, if menus are queued for posting, Menu command intercepts 682 one of the Macintosh internal traps called Menu Select. The Menu Select trap is set equal to the My Menu Select routine 692. Next the cursor coordinates are hidden 684 so that the mouse cannot be seen as it moves on the screen. Next, Menu command posts 686 a mouse down (i.e. pushes the mouse button down) on the menu bar. When the mouse down occurs on the menu bar the Macintosh operating system generates a menu event for the application. Each application receiving a menu event requests service from the operating system to find out what the menu event is. To do this the application issues a Menu Select trap. The menu select trap then places the location of the mouse on the stack. However, when the application issues a menu select trap in this case, it is serviced by the My Menu Select routine 692 instead, thereby allowing Menu command to insert the desired menu coordinates in the place of the real coordinates. After posting a mouse down in the appropriate menu bar, Menu Command sets 688 the wait ticks to 30, which gives the operating system time to draw the menu, and returns 690.
- In the My
Menu Select trap 692 the menuselect global state is reset 694 to clear any previously selected menus, and the desired menu id and the item number are moved to theMacintosh stack 696, thus selecting the desired menu item. - The
Find Menu routine 700 collects 702 the command parameters for the desired menu. Next, the menuname is compared 704 to the menu name list. If, 706, there is no menu with the name “menuname”, Find Menu exits 708. Otherwise, Find Menu compares 710 the itemname to the names of the items in the menu. If, 712, the located item number is greater than 0, then FindMenu queues 718 the menu id and item number for use by Menu command, and returns 720. Otherwise, if the item number is 0 then Find Menu simply sets 714 the internal Voice Control flags “mousedown” and “global” flags to true. This indicates to Voice Control that the mouse location should be globally referenced, and that the mouse button should be held down. Then Find Menu calls 716 the Post Mouse routine, which references these flags to manipulate the operating system's mouse state accordingly. - Referring to FIG. 21B, the
Control command 722 performs a button push within a menu, invoking actions such as the save command in the file menu of an application. To do this, the Control command gets thecommand parameters 724 from the control string, finds thefront window 726, gets thewindow command list 728, and checks 730 if the control name exists in the control list. If the control name does exist in the control list then the control rectangle coordinates are calculated 732, the Post Mouse routine 734 clicks the mouse in the proper coordinates, and the Control command returns 736. If the control name is not found, the Control command returns directly. - The
Keypad command 738 simulates numerical entries at the Macintosh keypad. Keypad finds the command parameters for thecommand string 740, gets thekeycode value 742 for the desired key, posts a key downevent 744 to the Macintosh event queue, and returns 746. - The
Zoom command 748 zooms the front window. Zoom obtains the front window pointer 750 in order to reference the mouse to the front window, calculates the location of thezoom box 752, uses Post Mouse to click in thezoom box 754, and returns 756. - The Local Mouse command758 clicks the mouse at a locally referenced location. Local Mouse obtains the command parameters for the desired
mouse location 760, uses Post Mouse to click at the desired coordinate 762, and returns 764. - The
Global Mouse command 766 clicks the mouse at a globally referenced location. Global Mouse obtains the command parameters for the desiredmouse location 768, sets the global flag to true 770 (to signal to Post Mouse that the coordinates are global), uses Post Mouse to click at the desired coordinate 772, and returns 774. - The Double Click command double clicks the mouse at a locally referenced location. Double Click obtains the command parameters for the desired
mouse location 778, calls Post Mouse twice 780, 782 (to click twice in the desired location), and returns 784. - The Mouse Down
command 786 sets the mouse button down. Mouse Down sets the mousedown flag to true 788 (to signal to Post Mouse that mouse button should be held down), uses Post Mouse to set the button down 790, and returns 792. - The Mouse Up
command 794 sets the mouse button up. Mouse Up sets the mbState global (see Appendix B) to Mouse Button UP 796 (to signal to the operating system that mouse button should be set up), posts a mouse up event to the Macintosh event queue 798 (to signal to applications that the mouse button has gone up), and returns 800. - Referring to FIG. 21D, the Screen Down command802 scrolls the contents of the current window down. Screen Down
first looks 804 for the vertical scroll bat in the front window. If, 806, the scroll bar is not found, Screen Down simply returns 814. If the scroll bar is found, Screen Down calculates the coordinates of thedown arrow 808, sets the mousedown flag to true 810 (indicating to Post Mouse that the mouse button should be held down), uses Post Mouse to set the mouse button down 812, and returns 814. - The Screen Up
command 816 scrolls the contents of the current window up. Screen Upfirst looks 818 for the vertical scroll bar in the front window. If, 820, the scroll bar is not found, Screen Up simply returns 828. If the scroll bar is found, Screen Up calculates the coordinates of theup arrow 822, sets the mousedown flag to true 824 (indicating to Post Mouse that the mouse button should be held down), uses Post Mouse to set the mouse button down 826, and returns 828. - The Screen Left command830 scrolls the contents of the current window left. Screen Left first looks 832 for the horizontal scroll bar in the front window. If, 834, the scroll bar is not found, Screen Left simply returns 842. If the scroll bar is found, Screen Left calculates the coordinates of the
left arrow 836, sets the mousedown flag to true 838 (indicating to Post Mouse that the mouse button should be held down), uses Post Mouse to set the mouse button down 840, and returns 842. - The Screen Right command844 scrolls the contents of the current window right. Screen Right first looks 846 for the horizontal scroll bar in the front window. If, 848, the scroll bar is not found, Screen Right simply returns 856. If the scroll bar is found, Screen Right calculates the coordinates of the
right arrow 850, sets the mousedown flag to true 852 (indicating to Post Mouse that the mouse button should be set down), uses Post Mouse to set the mouse button down 854, and returns 856. - Referring to FIG. 21E, the
Page Down command 858 moves the contents of the current window down a page. Page Down first looks 860 for the vertical scroll bar in the front window. If, 862, the scroll bar is not found, Page Down simply returns 868. If the scroll bar is found, Page Down calculates the page down button coordinates 864, uses Post Mouse to click the mouse button down 866, and returns 868. - The
Page Up command 870 moves the contents of the current window up a page. Page Up first looks 872 for the vertical scroll bar in the front window. If, 874, the scroll bar is not found, Page Up simply returns 880. If the scroll bar is found, Page Up calculates the page up button coordinates 876, uses Post Mouse to click the mouse button down 878, and returns 880. - The Page Left command882 moves the contents of the current window left a page. Page Left first looks 884 for the horizontal scroll bar in the front window. If, 886, the scroll bar is not found, Page Left simply returns 892. If the scroll bar is found, Page Left calculates the page left button coordinates 888, uses Post Mouse to click the mouse button down 890, and returns 892.
- The Page
Right command 894 moves the contents of the current window right a page. Page Right first looks 896 for the horizontal scroll bar in the front window. If, 898, the scroll bar is not found, Page Right simply returns 904. If the scroll bar is found, Page Right calculates the page right button coordinates 900, uses Post Mouse to click the mouse button down 902, and returns 904. - Referring to FIG. 21F, the
Move command 906 moves the mouse from its current location (y,x), to a new location (y+δy,x+δx). First, Move gets thecommand parameters 908, then Move sets the mouse speed to tablet 910 (this cancels the mouse acceleration, which otherwise would make mouse movements uncontrollable), adds the offset parameters to thecurrent mouse location 912, forces a new cursor position and resets themouse speed 914, and returns 916. - The Move to Global Coordinate
command 918 moves the cursor to the global coordinates given by the Voice Control command string. First, Move to Global gets thecommand parameters 920, then Move toGlobal checks 922 if there is a position parameter. If there is a position parameter, the screen position coordinates are fetched 924. In either case, the global coordinates are calculated 926, the mouse speed is set totablet 928, the mouse position is set to thenew coordinates 930, the cursor is forced to thenew position 932, and Move to Global returns 934. - The Move to Local Coordinate
command 936 moves the cursor to the local coordinates given by the Voice Control command string. First, Move to Local gets thecommand parameters 938, then Move toLocal checks 940 if there is a position parameter. If there is a position parameter, the local position coordinates are fetched 942. In either case, the global coordinates are calculated 944, the mouse speed is set totablet 946, the mouse position is set to the new coordinates 948, the cursor is forced to thenew position 950, and Move to Global returns 952. - The Move
Continuous command 954 moves the mouse continuously from its present location, moving δy, δx every refresh of the screen. This is accomplished by inserting 956 the VBL Move routine 960 in the Vertical Blanking Interrupt queue of the Macintosh and returning 958. Once in the queue, the VBL Move routine 960 will be executed every screen refresh. The VBL Move routine simply adds the δy and δx values to thecurrent cursor position 962, resets thecursor 964, and returns 966. - Referring to FIG. 21G, the Option Key Down
command 968 sets the option key down. This is done by setting the option key bit in the keyboard bit map to TRUE 970, and returning 972. - The Option Key Up
command 974 sets the option key up. This is done by setting the option key bit in the keyboard bit map toFALSE 976, and returning 978. - The Shift Key Down
command 980 sets the shift key down. This is done by setting the shift key bit in the keyboard bit map to TRUE 982, and returning 984. - The Shift Key Up
command 986 sets the shift key up. This is done by setting the shift key bit in the keyboard bit map toFALSE 988, and returning 990. - The Command
Key Down command 992 sets the command key down. This is done by setting the command key bit in the keyboard bit map to TRUE 994, and returning 996. - The Command
Key Up command 998 sets the command key up. This is done by setting the command key bit in the keyboard bit map to FALSE 1000, and returning 1002. - The Control
Key Down command 1004 sets the control key down. This is done by setting the control key bit in the keyboard bit map to TRUE 1006, and returning 1008. - The Control
Key Up command 1010 sets the control key up. This is done by setting the control key bit in the keyboard bit map to FALSE 1012, and returning 1014. - The
Next Window command 1016 moves the front window to the back. This is done by getting thefront window 1018 and sending it to the back 1020, and returning 1022. - The Erase
command 1024 erases numchars characters from the screen. The number of characters typed by the most recent voice command is stored by Voice Control. Therefore, Erase will erase the characters from the most recent voice command. This is done by a loop which posts deletekey keydown events 1026 andchecks 1028 if the number posted equals numchars. When numchars deletes have been posted, Erase returns 1030. - The Capitalize
command 1032 capitalizes the next keystroke. This is done by setting the caps flag to TRUE 1034, and returning 1036. - The
Launch command 1038 launches an application. The application must be on the boot drive no more than one level deep. This is done by getting the name of the application 1040 (“appl_name”), searching for appl_name on theboot volume 1042, and, if, 1044, the application is found, setting the volume to theapplication folder 1048, launching the application 1050 (no return is necessary because the new application will clear the Macintosh queue). If the application is not found, Launch simply returns 1046. - Referring to FIG. 22, the Post Mouse routine1052 posts mouse down events to the Macintosh event queue and can set traps to monitor mouse activity and to keep the mouse down. The actions of Post Mouse are determined by the Voice Control flags global and mousedown, which are set by command handlers before calling Post Mouse. After a Post Mouse, when an application does a get_next_event it will see a mouse down event in the event queue, leading to events such as clicks, mouse downs or double clicks.
- First, Post Mouse saves the
current mouse location 1054 so that the mouse may be returned to its initial location after the mouse events are produced. Next the cursor is hidden 1056 to shield the user from seeing the mouse moving around the screen. Next the global flag is checked. If, 1058, the coordinates are local (i.e. global=FALSE) then they are converted 1060 to global coordinates. Next, the mouse speed is set to tablet 1062 (to avoid acceleration problems), and the mouse down is posted to theMacintosh event queue 1064. If, 1066, the mousedown flag is TRUE (i.e. if the mouse button should be held down) then the Set Mouse Down routine is called 1072 and Post Mouse returns 1070. Otherwise, if the mouse down flag is FALSE, then a click is created by posting a mouse up event to theMacintosh event queue 1068 and returning 1070. - Referring to FIG. 23, the Set
Mouse Down routine 1072 holds the mouse button down by replacing 1074 the Macintosh button trap with a Voice Control trap named My Button. The My Button trap then recognizes further voice commands and creates mouse drags or clicks as appropriate. After initializing My Button, Set Mouse Down checks 1076 if the Macintosh is a Macintosh Plus, in which case the Post Event trap must also be reset 1078 to the Voice Control My Post Event trap. (The Macintosh Plus will not simply check the mbstate global flag to determine the mouse button state. Rather, the Post Event trap in a Macintosh Plus will poll the actual mouse button to determine its state, and will post mouse up events if the mouse button is up. Therefore, to force the Macintosh Plus to accept the mouse button state as dictated by Voice Control, during voice actions, the Post Event trap is replaced with a My Post Event trap, which will not poll the status of the mouse button.) Next, the mbState flag is set to MouseDown 1080 (indicating that the mouse button is down) and Set Mouse Down returns 1082. - The My
Button trap 1084 replaces the Macintosh button trap, thereby seizing control of the button state from the operating system. Each time My Button is called, it checks 1086 the Macintosh mouse button state bit mbState. If mbState has been set to UP, My Button moves to theEnd Button routine 1106 which sets mbState toUP 1108, removes any VBL routine which has been installed 1110, resets the Button and Post Event traps to the original Macintosh traps 1112, resets the mouse speed and couples the cursor to themouse 1114, shows thecursor 1102, and returns 1104. - However, if the mouse button is to remain down, My Button checks for the expiration of wait ticks (which allow the Macintosh time to draw menus on the screen)1088, and calls the recognize routine 1090 to recognize further speech commands. After further speech commands are recognized, My Button determines 1092 its next action based on the length of the command string. If the command string length is less than zero, then the next voice command was a Voice Control internal command, and the mouse button is released by calling
End Button 1106. If the command string length is greater than zero, then a command was recognized, and the command is queued onto thevoice que 1094, and the voice queue is checked forfurther commands 1096. If nothing was recognized (command string length of zero), then My Button skips directly to checking thevoice queue 1096. If there is nothing in the voice queue, then My Button returns 1104. However, if there is a command in the voice queue, then My Button checks 1098 if the command is a mouse movement command (which would cause a mouse drag). If it is not a mouse movement, then the mouse button is released by callingEnd Button 1106. If the command is a mouse movement, then the command is executed 1100 (which drags the mouse), the cursor is displayed 1102, and My Button returns. - Screen Displays
- Referring to FIG. 24, a screen display of a record actions session is shown. The user is recording a
local mouse click 1106, and the click is being acknowledged in theaction list 1108 and in theaction window 1110. - Referring to FIG. 25, a record actions session using dialog boxes is shown. The
dialog boxes 1112 for recording a manual printer feed are displayed to the user, as well as the Voice Control RunModal dialog box 1114 prompting the user to record the dialogs. The user is preparing to record a click on theManual Feed button 1116. - Referring to FIG. 26, the Language Maker menu1118 is shown.
- Referring to FIG. 27, the user has requested the current language, which is displayed by Voice Control in a pop-
up display 1120. - Referring to FIG. 28, the user has clicked on the utterance name “apple”1122, requesting a retraining of the utterance for “apple”. Voice Control has responded with a
dialog box 1124 asking the user to say “apple” twice into the microphone. - Referring to FIG. 29, the text format of a Write Production output file1126 (to be compiled by VOCAL) and the corresponding Language Maker display for the file 1128 are shown. It is clear from FIG. 29 that the Language Maker display is far more intuitive.
- Referring to FIG. 30, a listing of the Write Production output file as displayed in FIG. 29 is provided.
- Other Embodiments
- Other embodiments of the invention are within the scope of the claims which follow the appendices. For example, the graphic user interface controlled by a voice recognition system could be other than that of the Apple Macintosh computer. The recognizer could be other than that marketed by Dragon Systems.
- Included in the Appendices are Appendix A, which sets forth the Voice Control command language syntax, Appendix B, which lists some of the Macintosh OS globals used by the Voice Navigator system, Appendix C, which is a fiche of the Voice Navigator executable code, Appendix D, which is the Developer's Reference Manual for the voice Navigator system, and Appendix E, which is the Voice Navigator User's Manual, all incorporated by reference herein.
- A portion of the disclosure of this patent document contains material which is subject to copyright protection (for example, the microfiche Appendix, the User's Manual, and the Reference Manual). The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/102,047 US20020178009A1 (en) | 1989-06-23 | 2002-03-20 | Voice controlled computer interface |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US37077989A | 1989-06-23 | 1989-06-23 | |
US97343592A | 1992-11-09 | 1992-11-09 | |
US08/165,014 US5377303A (en) | 1989-06-23 | 1993-12-09 | Controlled computer interface |
US20088694A | 1994-02-23 | 1994-02-23 | |
US66549396A | 1996-06-12 | 1996-06-12 | |
US10/102,047 US20020178009A1 (en) | 1989-06-23 | 2002-03-20 | Voice controlled computer interface |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US66549396A Continuation | 1989-06-23 | 1996-06-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
US20020178009A1 true US20020178009A1 (en) | 2002-11-28 |
Family
ID=23461140
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US08/165,014 Expired - Lifetime US5377303A (en) | 1989-06-23 | 1993-12-09 | Controlled computer interface |
US09/783,725 Abandoned US20020010582A1 (en) | 1989-06-23 | 2001-02-14 | Voice controlled computer interface |
US09/852,049 Abandoned US20020128843A1 (en) | 1989-06-23 | 2001-05-09 | Voice controlled computer interface |
US10/102,047 Abandoned US20020178009A1 (en) | 1989-06-23 | 2002-03-20 | Voice controlled computer interface |
Family Applications Before (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US08/165,014 Expired - Lifetime US5377303A (en) | 1989-06-23 | 1993-12-09 | Controlled computer interface |
US09/783,725 Abandoned US20020010582A1 (en) | 1989-06-23 | 2001-02-14 | Voice controlled computer interface |
US09/852,049 Abandoned US20020128843A1 (en) | 1989-06-23 | 2001-05-09 | Voice controlled computer interface |
Country Status (2)
Country | Link |
---|---|
US (4) | US5377303A (en) |
JP (1) | JPH03163623A (en) |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030154077A1 (en) * | 2002-02-13 | 2003-08-14 | International Business Machines Corporation | Voice command processing system and computer therefor, and voice command processing method |
US20040133564A1 (en) * | 2002-09-03 | 2004-07-08 | William Gross | Methods and systems for search indexing |
US20050149932A1 (en) * | 2003-12-10 | 2005-07-07 | Hasink Lee Z. | Methods and systems for performing operations in response to detecting a computer idle condition |
US20050204295A1 (en) * | 2004-03-09 | 2005-09-15 | Freedom Scientific, Inc. | Low Vision Enhancement for Graphic User Interface |
US20080072234A1 (en) * | 2006-09-20 | 2008-03-20 | Gerald Myroup | Method and apparatus for executing commands from a drawing/graphics editor using task interaction pattern recognition |
US20080133487A1 (en) * | 2002-09-03 | 2008-06-05 | Idealab | Methods and systems for search indexing |
US20080262842A1 (en) * | 2007-04-20 | 2008-10-23 | Asustek Computer Inc. | Portable computer with speech recognition function and method for processing speech command thereof |
US7853297B1 (en) | 2001-10-18 | 2010-12-14 | Iwao Fujisaki | Communication device |
US7865216B1 (en) | 2001-10-18 | 2011-01-04 | Iwao Fujisaki | Communication device |
US7890089B1 (en) | 2007-05-03 | 2011-02-15 | Iwao Fujisaki | Communication device |
US7917167B1 (en) | 2003-11-22 | 2011-03-29 | Iwao Fujisaki | Communication device |
US7945287B1 (en) | 2001-10-18 | 2011-05-17 | Iwao Fujisaki | Communication device |
US20110173002A1 (en) * | 2010-01-12 | 2011-07-14 | Denso Corporation | In-vehicle device and method for modifying display mode of icon indicated on the same |
US7996038B1 (en) | 2003-09-26 | 2011-08-09 | Iwao Fujisaki | Communication device |
US8081962B1 (en) | 2004-03-23 | 2011-12-20 | Iwao Fujisaki | Communication device |
US8165878B2 (en) | 2010-04-26 | 2012-04-24 | Cyberpulse L.L.C. | System and methods for matching an utterance to a template hierarchy |
US8208954B1 (en) | 2005-04-08 | 2012-06-26 | Iwao Fujisaki | Communication device |
US8229512B1 (en) | 2003-02-08 | 2012-07-24 | Iwao Fujisaki | Communication device |
US8241128B1 (en) | 2003-04-03 | 2012-08-14 | Iwao Fujisaki | Communication device |
US8340726B1 (en) | 2008-06-30 | 2012-12-25 | Iwao Fujisaki | Communication device |
US20130093445A1 (en) * | 2011-10-15 | 2013-04-18 | David Edward Newman | Voice-Activated Pulser |
US8452307B1 (en) | 2008-07-02 | 2013-05-28 | Iwao Fujisaki | Communication device |
US8543157B1 (en) | 2008-05-09 | 2013-09-24 | Iwao Fujisaki | Communication device which notifies its pin-point location or geographic area in accordance with user selection |
US8639214B1 (en) | 2007-10-26 | 2014-01-28 | Iwao Fujisaki | Communication device |
US8676273B1 (en) | 2007-08-24 | 2014-03-18 | Iwao Fujisaki | Communication device |
US20140142949A1 (en) * | 2012-11-16 | 2014-05-22 | David Edward Newman | Voice-Activated Signal Generator |
US9043206B2 (en) | 2010-04-26 | 2015-05-26 | Cyberpulse, L.L.C. | System and methods for matching an utterance to a template hierarchy |
WO2015180231A1 (en) * | 2014-05-29 | 2015-12-03 | 中兴通讯股份有限公司 | Voice interaction method and apparatus |
US9659058B2 (en) | 2013-03-22 | 2017-05-23 | X1 Discovery, Inc. | Methods and systems for federation of results from search indexing |
US9880983B2 (en) | 2013-06-04 | 2018-01-30 | X1 Discovery, Inc. | Methods and systems for uniquely identifying digital content for eDiscovery |
US10346550B1 (en) | 2014-08-28 | 2019-07-09 | X1 Discovery, Inc. | Methods and systems for searching and indexing virtual environments |
Families Citing this family (192)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH03163623A (en) * | 1989-06-23 | 1991-07-15 | Articulate Syst Inc | Voice control computor interface |
US6092043A (en) * | 1992-11-13 | 2000-07-18 | Dragon Systems, Inc. | Apparatuses and method for training and operating speech recognition systems |
US6101468A (en) * | 1992-11-13 | 2000-08-08 | Dragon Systems, Inc. | Apparatuses and methods for training and operating speech recognition systems |
US5890122A (en) * | 1993-02-08 | 1999-03-30 | Microsoft Corporation | Voice-controlled computer simulateously displaying application menu and list of available commands |
JP3530591B2 (en) * | 1994-09-14 | 2004-05-24 | キヤノン株式会社 | Speech recognition apparatus, information processing apparatus using the same, and methods thereof |
EP0747807B1 (en) * | 1995-04-11 | 2002-03-06 | Dragon Systems Inc. | Moving an element shown on a computer display |
US5761641A (en) * | 1995-07-31 | 1998-06-02 | Microsoft Corporation | Method and system for creating voice commands for inserting previously entered information |
US5903864A (en) * | 1995-08-30 | 1999-05-11 | Dragon Systems | Speech recognition |
US5903870A (en) * | 1995-09-18 | 1999-05-11 | Vis Tell, Inc. | Voice recognition and display device apparatus and method |
US6601027B1 (en) | 1995-11-13 | 2003-07-29 | Scansoft, Inc. | Position manipulation in speech recognition |
US5794189A (en) * | 1995-11-13 | 1998-08-11 | Dragon Systems, Inc. | Continuous speech recognition |
US5799279A (en) * | 1995-11-13 | 1998-08-25 | Dragon Systems, Inc. | Continuous speech recognition of text and commands |
US6064959A (en) * | 1997-03-28 | 2000-05-16 | Dragon Systems, Inc. | Error correction in speech recognition |
US5920841A (en) * | 1996-07-01 | 1999-07-06 | International Business Machines Corporation | Speech supported navigation of a pointer in a graphical user interface |
US5873064A (en) * | 1996-11-08 | 1999-02-16 | International Business Machines Corporation | Multi-action voice macro method |
US5930757A (en) * | 1996-11-21 | 1999-07-27 | Freeman; Michael J. | Interactive two-way conversational apparatus with voice recognition |
US6108515A (en) * | 1996-11-21 | 2000-08-22 | Freeman; Michael J. | Interactive responsive apparatus with visual indicia, command codes, and comprehensive memory functions |
KR100288976B1 (en) * | 1997-01-08 | 2001-05-02 | 윤종용 | Method for constructing and recognizing menu commands of television receiver |
US5924068A (en) * | 1997-02-04 | 1999-07-13 | Matsushita Electric Industrial Co. Ltd. | Electronic news reception apparatus that selectively retains sections and searches by keyword or index for text to speech conversion |
US5909667A (en) * | 1997-03-05 | 1999-06-01 | International Business Machines Corporation | Method and apparatus for fast voice selection of error words in dictated text |
US5893063A (en) * | 1997-03-10 | 1999-04-06 | International Business Machines Corporation | Data processing system and method for dynamically accessing an application using a voice command |
US5897618A (en) * | 1997-03-10 | 1999-04-27 | International Business Machines Corporation | Data processing system and method for switching between programs having a same title using a voice command |
US5884265A (en) * | 1997-03-27 | 1999-03-16 | International Business Machines Corporation | Method and system for selective display of voice activated commands dialog box |
US6212498B1 (en) | 1997-03-28 | 2001-04-03 | Dragon Systems, Inc. | Enrollment in speech recognition |
US5966691A (en) * | 1997-04-29 | 1999-10-12 | Matsushita Electric Industrial Co., Ltd. | Message assembler using pseudo randomly chosen words in finite state slots |
US6038534A (en) * | 1997-09-11 | 2000-03-14 | Cowboy Software, Inc. | Mimicking voice commands as keyboard signals |
ATE254327T1 (en) * | 1997-12-30 | 2003-11-15 | Koninkl Philips Electronics Nv | VOICE RECOGNITION APPARATUS USING A COMMAND LEXICO |
US6438523B1 (en) | 1998-05-20 | 2002-08-20 | John A. Oberteuffer | Processing handwritten and hand-drawn input and speech input |
US6195635B1 (en) | 1998-08-13 | 2001-02-27 | Dragon Systems, Inc. | User-cued speech recognition |
US6243076B1 (en) | 1998-09-01 | 2001-06-05 | Synthetic Environments, Inc. | System and method for controlling host system interface with point-of-interest data |
US6514201B1 (en) | 1999-01-29 | 2003-02-04 | Acuson Corporation | Voice-enhanced diagnostic medical ultrasound system and review station |
US6487530B1 (en) * | 1999-03-30 | 2002-11-26 | Nortel Networks Limited | Method for recognizing non-standard and standard speech by speaker independent and speaker dependent word models |
US6330540B1 (en) | 1999-05-27 | 2001-12-11 | Louis Dischler | Hand-held computer device having mirror with negative curvature and voice recognition |
DE10082416D2 (en) * | 1999-08-13 | 2001-11-22 | Genologic Gmbh | Device for converting voice commands and / or language texts into keyboard and / or mouse movements and / or texts |
US20010043234A1 (en) * | 2000-01-03 | 2001-11-22 | Mallik Kotamarti | Incorporating non-native user interface mechanisms into a user interface |
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US7109970B1 (en) | 2000-07-01 | 2006-09-19 | Miller Stephen S | Apparatus for remotely controlling computers and other electronic appliances/devices using a combination of voice commands and finger movements |
US7035805B1 (en) * | 2000-07-14 | 2006-04-25 | Miller Stephen S | Switching the modes of operation for voice-recognition applications |
US6836759B1 (en) * | 2000-08-22 | 2004-12-28 | Microsoft Corporation | Method and system of handling the selection of alternates for recognized words |
US7120646B2 (en) | 2001-04-09 | 2006-10-10 | Health Language, Inc. | Method and system for interfacing with a multi-level data structure |
US7996232B2 (en) * | 2001-12-03 | 2011-08-09 | Rodriguez Arturo A | Recognition of voice-activated commands |
US6889191B2 (en) * | 2001-12-03 | 2005-05-03 | Scientific-Atlanta, Inc. | Systems and methods for TV navigation with compressed voice-activated commands |
US20050154588A1 (en) * | 2001-12-12 | 2005-07-14 | Janas John J.Iii | Speech recognition and control in a process support system |
US20040054538A1 (en) * | 2002-01-03 | 2004-03-18 | Peter Kotsinadelis | My voice voice agent for use with voice portals and related products |
KR20020023294A (en) * | 2002-01-12 | 2002-03-28 | (주)코리아리더스 테크놀러지 | GUI Context based Command and Control Method with Speech recognition |
US7548847B2 (en) * | 2002-05-10 | 2009-06-16 | Microsoft Corporation | System for automatically annotating training data for a natural language understanding system |
US20040107179A1 (en) * | 2002-08-22 | 2004-06-03 | Mdt, Inc. | Method and system for controlling software execution in an event-driven operating system environment |
US20050027539A1 (en) * | 2003-07-30 | 2005-02-03 | Weber Dean C. | Media center controller system and method |
US7389235B2 (en) * | 2003-09-30 | 2008-06-17 | Motorola, Inc. | Method and system for unified speech and graphic user interfaces |
US20050083300A1 (en) * | 2003-10-20 | 2005-04-21 | Castle Daniel C. | Pointer control system |
US20060044261A1 (en) * | 2004-09-02 | 2006-03-02 | Kao-Cheng Hsieh | Pointing input device imitating inputting of hotkeys of a keyboard |
US20060123220A1 (en) * | 2004-12-02 | 2006-06-08 | International Business Machines Corporation | Speech recognition in BIOS |
US8788271B2 (en) * | 2004-12-22 | 2014-07-22 | Sap Aktiengesellschaft | Controlling user interfaces with contextual voice commands |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US8635073B2 (en) * | 2005-09-14 | 2014-01-21 | At&T Intellectual Property I, L.P. | Wireless multimodal voice browser for wireline-based IPTV services |
US8229733B2 (en) * | 2006-02-09 | 2012-07-24 | John Harney | Method and apparatus for linguistic independent parsing in a natural language systems |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8886540B2 (en) * | 2007-03-07 | 2014-11-11 | Vlingo Corporation | Using speech recognition results based on an unstructured language model in a mobile communication facility application |
US20110054899A1 (en) * | 2007-03-07 | 2011-03-03 | Phillips Michael S | Command and control utilizing content information in a mobile voice-to-speech application |
US8838457B2 (en) * | 2007-03-07 | 2014-09-16 | Vlingo Corporation | Using results of unstructured language model based speech recognition to control a system-level function of a mobile communications facility |
US8886545B2 (en) | 2007-03-07 | 2014-11-11 | Vlingo Corporation | Dealing with switch latency in speech recognition |
US8635243B2 (en) * | 2007-03-07 | 2014-01-21 | Research In Motion Limited | Sending a communications header with voice recording to send metadata for use in speech recognition, formatting, and search mobile search application |
US8949266B2 (en) * | 2007-03-07 | 2015-02-03 | Vlingo Corporation | Multiple web-based content category searching in mobile search application |
US8949130B2 (en) * | 2007-03-07 | 2015-02-03 | Vlingo Corporation | Internal and external speech recognition use with a mobile communication facility |
US20080221884A1 (en) | 2007-03-07 | 2008-09-11 | Cerra Joseph P | Mobile environment speech processing facility |
US10056077B2 (en) | 2007-03-07 | 2018-08-21 | Nuance Communications, Inc. | Using speech recognition results based on an unstructured language model with a music system |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US8165886B1 (en) | 2007-10-04 | 2012-04-24 | Great Northern Research LLC | Speech interface system and method for control and interaction with applications on a computing system |
US8595642B1 (en) | 2007-10-04 | 2013-11-26 | Great Northern Research, LLC | Multiple shell multi faceted graphical user interface |
US10002189B2 (en) | 2007-12-20 | 2018-06-19 | Apple Inc. | Method and apparatus for searching using an active ontology |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US8849672B2 (en) * | 2008-05-22 | 2014-09-30 | Core Wireless Licensing S.A.R.L. | System and method for excerpt creation by designating a text segment using speech |
US20100030549A1 (en) | 2008-07-31 | 2010-02-04 | Lee Michael M | Mobile device having human language translation capability with positional feedback |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US10540976B2 (en) | 2009-06-05 | 2020-01-21 | Apple Inc. | Contextual voice commands |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10255566B2 (en) | 2011-06-03 | 2019-04-09 | Apple Inc. | Generating and processing task items that represent tasks to perform |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US8626511B2 (en) * | 2010-01-22 | 2014-01-07 | Google Inc. | Multi-dimensional disambiguation of voice commands |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US8738377B2 (en) | 2010-06-07 | 2014-05-27 | Google Inc. | Predicting and learning carrier phrases for speech input |
US8660934B2 (en) | 2010-06-30 | 2014-02-25 | Trading Technologies International, Inc. | Order entry actions |
CN102541574A (en) * | 2010-12-13 | 2012-07-04 | 鸿富锦精密工业(深圳)有限公司 | Application program opening system and method |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
KR101295711B1 (en) * | 2011-02-15 | 2013-08-16 | 주식회사 팬택 | Mobile communication terminal device and method for executing application with voice recognition |
US9081550B2 (en) * | 2011-02-18 | 2015-07-14 | Nuance Communications, Inc. | Adding speech capabilities to existing computer applications with complex graphical user interfaces |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US8994660B2 (en) | 2011-08-29 | 2015-03-31 | Apple Inc. | Text correction processing |
US8831955B2 (en) * | 2011-08-31 | 2014-09-09 | International Business Machines Corporation | Facilitating tangible interactions in voice applications |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9317605B1 (en) | 2012-03-21 | 2016-04-19 | Google Inc. | Presenting forked auto-completions |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
CN103577072A (en) * | 2012-07-26 | 2014-02-12 | 中兴通讯股份有限公司 | Terminal voice assistant editing method and device |
TW201409351A (en) * | 2012-08-16 | 2014-03-01 | Hon Hai Prec Ind Co Ltd | Electronic device with voice control function and voice control method |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
AU2014214676A1 (en) | 2013-02-07 | 2015-08-27 | Apple Inc. | Voice trigger for a digital assistant |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
WO2014144579A1 (en) | 2013-03-15 | 2014-09-18 | Apple Inc. | System and method for updating an adaptive speech recognition model |
WO2014144949A2 (en) | 2013-03-15 | 2014-09-18 | Apple Inc. | Training an at least partial voice command system |
WO2014197336A1 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
WO2014197334A2 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
WO2014197335A1 (en) | 2013-06-08 | 2014-12-11 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
CN110442699A (en) | 2013-06-09 | 2019-11-12 | 苹果公司 | Operate method, computer-readable medium, electronic equipment and the system of digital assistants |
JP2016521948A (en) | 2013-06-13 | 2016-07-25 | アップル インコーポレイテッド | System and method for emergency calls initiated by voice command |
US9646606B2 (en) | 2013-07-03 | 2017-05-09 | Google Inc. | Speech recognition using domain knowledge |
AU2014306221B2 (en) | 2013-08-06 | 2017-04-06 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
EP3480811A1 (en) | 2014-05-30 | 2019-05-08 | Apple Inc. | Multi-command single utterance input method |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US20160225369A1 (en) * | 2015-01-30 | 2016-08-04 | Google Technology Holdings LLC | Dynamic inference of voice command for software operation from user manipulation of electronic device |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US20160328205A1 (en) * | 2015-05-05 | 2016-11-10 | Motorola Mobility Llc | Method and Apparatus for Voice Operation of Mobile Applications Having Unnamed View Elements |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
DK179588B1 (en) | 2016-06-09 | 2019-02-22 | Apple Inc. | Intelligent automated assistant in a home environment |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
DK179343B1 (en) | 2016-06-11 | 2018-05-14 | Apple Inc | Intelligent task discovery |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
DK179049B1 (en) | 2016-06-11 | 2017-09-18 | Apple Inc | Data driven natural language event detection and classification |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10580405B1 (en) * | 2016-12-27 | 2020-03-03 | Amazon Technologies, Inc. | Voice control of remote device |
WO2018199913A1 (en) | 2017-04-25 | 2018-11-01 | Hewlett-Packard Development Company, L.P. | Machine-learning command interaction |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
CN115509627A (en) * | 2022-11-22 | 2022-12-23 | 威海海洋职业学院 | Electronic equipment awakening method and system based on artificial intelligence |
Citations (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3928724A (en) * | 1974-10-10 | 1975-12-23 | Andersen Byram Kouma Murphy Lo | Voice-actuated telephone directory-assistance system |
US4144582A (en) * | 1970-12-28 | 1979-03-13 | Hyatt Gilbert P | Voice signal processing system |
US4462080A (en) * | 1981-11-27 | 1984-07-24 | Kearney & Trecker Corporation | Voice actuated machine control |
US4627001A (en) * | 1982-11-03 | 1986-12-02 | Wang Laboratories, Inc. | Editing voice data |
US4677569A (en) * | 1982-05-11 | 1987-06-30 | Casio Computer Co., Ltd. | Computer controlled by voice input |
US4688195A (en) * | 1983-01-28 | 1987-08-18 | Texas Instruments Incorporated | Natural-language interface generating system |
US4704696A (en) * | 1984-01-26 | 1987-11-03 | Texas Instruments Incorporated | Method and apparatus for voice control of a computer |
US4726065A (en) * | 1984-01-26 | 1988-02-16 | Horst Froessl | Image manipulation by speech signals |
US4778016A (en) * | 1985-09-17 | 1988-10-18 | Tokyo Electric Co., Ltd. | Weighing method by multirange load cell balance |
US4783803A (en) * | 1985-11-12 | 1988-11-08 | Dragon Systems, Inc. | Speech recognition apparatus and method |
US4785408A (en) * | 1985-03-11 | 1988-11-15 | AT&T Information Systems Inc. American Telephone and Telegraph Company | Method and apparatus for generating computer-controlled interactive voice services |
US4799144A (en) * | 1984-10-12 | 1989-01-17 | Alcatel Usa, Corp. | Multi-function communication board for expanding the versatility of a computer |
US4811243A (en) * | 1984-04-06 | 1989-03-07 | Racine Marsh V | Computer aided coordinate digitizing system |
US4821211A (en) * | 1987-11-19 | 1989-04-11 | International Business Machines Corp. | Method of navigating among program menus using a graphical menu tree |
US4827520A (en) * | 1987-01-16 | 1989-05-02 | Prince Corporation | Voice actuated control system for use in a vehicle |
US4829576A (en) * | 1986-10-21 | 1989-05-09 | Dragon Systems, Inc. | Voice recognition system |
US4874177A (en) * | 1984-05-30 | 1989-10-17 | Girardin Ronald E | Horse racing game |
US4907274A (en) * | 1987-03-13 | 1990-03-06 | Kabushiki Kashia Toshiba | Intelligent work station |
US4914704A (en) * | 1984-10-30 | 1990-04-03 | International Business Machines Corporation | Text editor for speech input |
US4922538A (en) * | 1987-02-10 | 1990-05-01 | British Telecommunications Public Limited Company | Multi-user speech recognition system |
US4931950A (en) * | 1988-07-25 | 1990-06-05 | Electric Power Research Institute | Multimedia interface and method for computer system |
US4949382A (en) * | 1988-10-05 | 1990-08-14 | Griggs Talkwriter Corporation | Speech-controlled phonetic typewriter or display device having circuitry for analyzing fast and slow speech |
US4962535A (en) * | 1987-03-10 | 1990-10-09 | Fujitsu Limited | Voice recognition system |
US5022081A (en) * | 1987-10-01 | 1991-06-04 | Sharp Kabushiki Kaisha | Information recognition system |
US5027406A (en) * | 1988-12-06 | 1991-06-25 | Dragon Systems, Inc. | Method for interactive speech recognition and training |
US5036538A (en) * | 1989-11-22 | 1991-07-30 | Telephonics Corporation | Multi-station voice recognition and processing system |
US5054082A (en) * | 1988-06-30 | 1991-10-01 | Motorola, Inc. | Method and apparatus for programming devices to recognize voice commands |
US5086472A (en) * | 1989-01-12 | 1992-02-04 | Nec Corporation | Continuous speech recognition apparatus |
US5095508A (en) * | 1984-01-27 | 1992-03-10 | Ricoh Company, Ltd. | Identification of voice pattern |
US5133011A (en) * | 1990-12-26 | 1992-07-21 | International Business Machines Corporation | Method and apparatus for linear vocal control of cursor position |
US5157384A (en) * | 1989-04-28 | 1992-10-20 | International Business Machines Corporation | Advanced user interface |
US5208745A (en) * | 1988-07-25 | 1993-05-04 | Electric Power Research Institute | Multimedia interface and method for computer system |
US5231670A (en) * | 1987-06-01 | 1993-07-27 | Kurzweil Applied Intelligence, Inc. | Voice controlled system and method for generating text from a voice controlled input |
US5377303A (en) * | 1989-06-23 | 1994-12-27 | Articulate Systems, Inc. | Controlled computer interface |
US5386494A (en) * | 1991-12-06 | 1995-01-31 | Apple Computer, Inc. | Method and apparatus for controlling a speech recognition function using a cursor control device |
US5864819A (en) * | 1996-11-08 | 1999-01-26 | International Business Machines Corporation | Internal window object tree method for representing graphical user interface applications for speech navigation |
US6038534A (en) * | 1997-09-11 | 2000-03-14 | Cowboy Software, Inc. | Mimicking voice commands as keyboard signals |
US6684188B1 (en) * | 1996-02-02 | 2004-01-27 | Geoffrey C Mitchell | Method for production of medical records and other technical documents |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4776016A (en) * | 1985-11-21 | 1988-10-04 | Position Orientation Systems, Inc. | Voice control system |
US4984177A (en) * | 1988-02-05 | 1991-01-08 | Advanced Products And Technologies, Inc. | Voice language translator |
-
1990
- 1990-06-25 JP JP2166537A patent/JPH03163623A/en active Pending
-
1993
- 1993-12-09 US US08/165,014 patent/US5377303A/en not_active Expired - Lifetime
-
2001
- 2001-02-14 US US09/783,725 patent/US20020010582A1/en not_active Abandoned
- 2001-05-09 US US09/852,049 patent/US20020128843A1/en not_active Abandoned
-
2002
- 2002-03-20 US US10/102,047 patent/US20020178009A1/en not_active Abandoned
Patent Citations (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4144582A (en) * | 1970-12-28 | 1979-03-13 | Hyatt Gilbert P | Voice signal processing system |
US3928724A (en) * | 1974-10-10 | 1975-12-23 | Andersen Byram Kouma Murphy Lo | Voice-actuated telephone directory-assistance system |
US4462080A (en) * | 1981-11-27 | 1984-07-24 | Kearney & Trecker Corporation | Voice actuated machine control |
US4766529A (en) * | 1982-05-11 | 1988-08-23 | Casio Computer Co., Ltd. | Operator guidance by computer voice synthesizer |
US4677569A (en) * | 1982-05-11 | 1987-06-30 | Casio Computer Co., Ltd. | Computer controlled by voice input |
US4627001A (en) * | 1982-11-03 | 1986-12-02 | Wang Laboratories, Inc. | Editing voice data |
US4688195A (en) * | 1983-01-28 | 1987-08-18 | Texas Instruments Incorporated | Natural-language interface generating system |
US4704696A (en) * | 1984-01-26 | 1987-11-03 | Texas Instruments Incorporated | Method and apparatus for voice control of a computer |
US4726065A (en) * | 1984-01-26 | 1988-02-16 | Horst Froessl | Image manipulation by speech signals |
US5095508A (en) * | 1984-01-27 | 1992-03-10 | Ricoh Company, Ltd. | Identification of voice pattern |
US4811243A (en) * | 1984-04-06 | 1989-03-07 | Racine Marsh V | Computer aided coordinate digitizing system |
US4874177A (en) * | 1984-05-30 | 1989-10-17 | Girardin Ronald E | Horse racing game |
US4799144A (en) * | 1984-10-12 | 1989-01-17 | Alcatel Usa, Corp. | Multi-function communication board for expanding the versatility of a computer |
US4914704A (en) * | 1984-10-30 | 1990-04-03 | International Business Machines Corporation | Text editor for speech input |
US4785408A (en) * | 1985-03-11 | 1988-11-15 | AT&T Information Systems Inc. American Telephone and Telegraph Company | Method and apparatus for generating computer-controlled interactive voice services |
US4778016A (en) * | 1985-09-17 | 1988-10-18 | Tokyo Electric Co., Ltd. | Weighing method by multirange load cell balance |
US4783803A (en) * | 1985-11-12 | 1988-11-08 | Dragon Systems, Inc. | Speech recognition apparatus and method |
US4829576A (en) * | 1986-10-21 | 1989-05-09 | Dragon Systems, Inc. | Voice recognition system |
US4827520A (en) * | 1987-01-16 | 1989-05-02 | Prince Corporation | Voice actuated control system for use in a vehicle |
US4922538A (en) * | 1987-02-10 | 1990-05-01 | British Telecommunications Public Limited Company | Multi-user speech recognition system |
US4962535A (en) * | 1987-03-10 | 1990-10-09 | Fujitsu Limited | Voice recognition system |
US4907274A (en) * | 1987-03-13 | 1990-03-06 | Kabushiki Kashia Toshiba | Intelligent work station |
US5231670A (en) * | 1987-06-01 | 1993-07-27 | Kurzweil Applied Intelligence, Inc. | Voice controlled system and method for generating text from a voice controlled input |
US5022081A (en) * | 1987-10-01 | 1991-06-04 | Sharp Kabushiki Kaisha | Information recognition system |
US4821211A (en) * | 1987-11-19 | 1989-04-11 | International Business Machines Corp. | Method of navigating among program menus using a graphical menu tree |
US5054082A (en) * | 1988-06-30 | 1991-10-01 | Motorola, Inc. | Method and apparatus for programming devices to recognize voice commands |
US4931950A (en) * | 1988-07-25 | 1990-06-05 | Electric Power Research Institute | Multimedia interface and method for computer system |
US5208745A (en) * | 1988-07-25 | 1993-05-04 | Electric Power Research Institute | Multimedia interface and method for computer system |
US4949382A (en) * | 1988-10-05 | 1990-08-14 | Griggs Talkwriter Corporation | Speech-controlled phonetic typewriter or display device having circuitry for analyzing fast and slow speech |
US5027406A (en) * | 1988-12-06 | 1991-06-25 | Dragon Systems, Inc. | Method for interactive speech recognition and training |
US5086472A (en) * | 1989-01-12 | 1992-02-04 | Nec Corporation | Continuous speech recognition apparatus |
US5157384A (en) * | 1989-04-28 | 1992-10-20 | International Business Machines Corporation | Advanced user interface |
US5377303A (en) * | 1989-06-23 | 1994-12-27 | Articulate Systems, Inc. | Controlled computer interface |
US5036538A (en) * | 1989-11-22 | 1991-07-30 | Telephonics Corporation | Multi-station voice recognition and processing system |
US5133011A (en) * | 1990-12-26 | 1992-07-21 | International Business Machines Corporation | Method and apparatus for linear vocal control of cursor position |
US5386494A (en) * | 1991-12-06 | 1995-01-31 | Apple Computer, Inc. | Method and apparatus for controlling a speech recognition function using a cursor control device |
US6684188B1 (en) * | 1996-02-02 | 2004-01-27 | Geoffrey C Mitchell | Method for production of medical records and other technical documents |
US5864819A (en) * | 1996-11-08 | 1999-01-26 | International Business Machines Corporation | Internal window object tree method for representing graphical user interface applications for speech navigation |
US6038534A (en) * | 1997-09-11 | 2000-03-14 | Cowboy Software, Inc. | Mimicking voice commands as keyboard signals |
Cited By (101)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7949371B1 (en) | 2001-10-18 | 2011-05-24 | Iwao Fujisaki | Communication device |
US8068880B1 (en) | 2001-10-18 | 2011-11-29 | Iwao Fujisaki | Communication device |
US8064964B1 (en) | 2001-10-18 | 2011-11-22 | Iwao Fujisaki | Communication device |
US7907963B1 (en) | 2001-10-18 | 2011-03-15 | Iwao Fujisaki | Method to display three-dimensional map on communication device |
US8200275B1 (en) | 2001-10-18 | 2012-06-12 | Iwao Fujisaki | System for communication device to display perspective 3D map |
US8024009B1 (en) | 2001-10-18 | 2011-09-20 | Iwao Fujisaki | Communication device |
US7996037B1 (en) | 2001-10-18 | 2011-08-09 | Iwao Fujisaki | Communication device |
US8290482B1 (en) | 2001-10-18 | 2012-10-16 | Iwao Fujisaki | Communication device |
US7904109B1 (en) | 2001-10-18 | 2011-03-08 | Iwao Fujisaki | Communication device |
US7945286B1 (en) | 2001-10-18 | 2011-05-17 | Iwao Fujisaki | Communication device |
US7945256B1 (en) | 2001-10-18 | 2011-05-17 | Iwao Fujisaki | Communication device |
US7907942B1 (en) | 2001-10-18 | 2011-03-15 | Iwao Fujisaki | Communication device |
US8538486B1 (en) | 2001-10-18 | 2013-09-17 | Iwao Fujisaki | Communication device which displays perspective 3D map |
US7945236B1 (en) | 2001-10-18 | 2011-05-17 | Iwao Fujisaki | Communication device |
US7945287B1 (en) | 2001-10-18 | 2011-05-17 | Iwao Fujisaki | Communication device |
US7853297B1 (en) | 2001-10-18 | 2010-12-14 | Iwao Fujisaki | Communication device |
US7865216B1 (en) | 2001-10-18 | 2011-01-04 | Iwao Fujisaki | Communication device |
US8538485B1 (en) | 2001-10-18 | 2013-09-17 | Iwao Fujisaki | Communication device |
US8498672B1 (en) | 2001-10-18 | 2013-07-30 | Iwao Fujisaki | Communication device |
US8086276B1 (en) | 2001-10-18 | 2011-12-27 | Iwao Fujisaki | Communication device |
US20030154077A1 (en) * | 2002-02-13 | 2003-08-14 | International Business Machines Corporation | Voice command processing system and computer therefor, and voice command processing method |
US7299187B2 (en) * | 2002-02-13 | 2007-11-20 | International Business Machines Corporation | Voice command processing system and computer therefor, and voice command processing method |
US8856093B2 (en) | 2002-09-03 | 2014-10-07 | William Gross | Methods and systems for search indexing |
US20040133564A1 (en) * | 2002-09-03 | 2004-07-08 | William Gross | Methods and systems for search indexing |
US7496559B2 (en) | 2002-09-03 | 2009-02-24 | X1 Technologies, Inc. | Apparatus and methods for locating data |
US7424510B2 (en) | 2002-09-03 | 2008-09-09 | X1 Technologies, Inc. | Methods and systems for Web-based incremental searches |
US7370035B2 (en) * | 2002-09-03 | 2008-05-06 | Idealab | Methods and systems for search indexing |
US20080114761A1 (en) * | 2002-09-03 | 2008-05-15 | Idealab | Methods and systems for search indexing |
US20090150363A1 (en) * | 2002-09-03 | 2009-06-11 | William Gross | Apparatus and methods for locating data |
US8498977B2 (en) * | 2002-09-03 | 2013-07-30 | William Gross | Methods and systems for search indexing |
US20080133487A1 (en) * | 2002-09-03 | 2008-06-05 | Idealab | Methods and systems for search indexing |
US20040143569A1 (en) * | 2002-09-03 | 2004-07-22 | William Gross | Apparatus and methods for locating data |
US8019741B2 (en) | 2002-09-03 | 2011-09-13 | X1 Technologies, Inc. | Apparatus and methods for locating data |
US9633139B2 (en) | 2002-09-03 | 2017-04-25 | Future Search Holdings, Inc. | Methods and systems for search indexing |
US20040143564A1 (en) * | 2002-09-03 | 2004-07-22 | William Gross | Methods and systems for Web-based incremental searches |
US10552490B2 (en) | 2002-09-03 | 2020-02-04 | Future Search Holdings, Inc. | Methods and systems for search indexing |
US8229512B1 (en) | 2003-02-08 | 2012-07-24 | Iwao Fujisaki | Communication device |
US8241128B1 (en) | 2003-04-03 | 2012-08-14 | Iwao Fujisaki | Communication device |
US8351984B1 (en) | 2003-09-26 | 2013-01-08 | Iwao Fujisaki | Communication device |
US8301194B1 (en) | 2003-09-26 | 2012-10-30 | Iwao Fujisaki | Communication device |
US8064954B1 (en) | 2003-09-26 | 2011-11-22 | Iwao Fujisaki | Communication device |
US8090402B1 (en) | 2003-09-26 | 2012-01-03 | Iwao Fujisaki | Communication device |
US8095182B1 (en) | 2003-09-26 | 2012-01-10 | Iwao Fujisaki | Communication device |
US8095181B1 (en) | 2003-09-26 | 2012-01-10 | Iwao Fujisaki | Communication device |
US8364201B1 (en) | 2003-09-26 | 2013-01-29 | Iwao Fujisaki | Communication device |
US8121641B1 (en) | 2003-09-26 | 2012-02-21 | Iwao Fujisaki | Communication device |
US8340720B1 (en) | 2003-09-26 | 2012-12-25 | Iwao Fujisaki | Communication device |
US8150458B1 (en) | 2003-09-26 | 2012-04-03 | Iwao Fujisaki | Communication device |
US8160642B1 (en) | 2003-09-26 | 2012-04-17 | Iwao Fujisaki | Communication device |
US8335538B1 (en) | 2003-09-26 | 2012-12-18 | Iwao Fujisaki | Communication device |
US8165630B1 (en) | 2003-09-26 | 2012-04-24 | Iwao Fujisaki | Communication device |
US8331983B1 (en) | 2003-09-26 | 2012-12-11 | Iwao Fujisaki | Communication device |
US8195228B1 (en) | 2003-09-26 | 2012-06-05 | Iwao Fujisaki | Communication device |
US8055298B1 (en) | 2003-09-26 | 2011-11-08 | Iwao Fujisaki | Communication device |
US8331984B1 (en) | 2003-09-26 | 2012-12-11 | Iwao Fujisaki | Communication device |
US8326355B1 (en) | 2003-09-26 | 2012-12-04 | Iwao Fujisaki | Communication device |
US8229504B1 (en) | 2003-09-26 | 2012-07-24 | Iwao Fujisaki | Communication device |
US8041371B1 (en) | 2003-09-26 | 2011-10-18 | Iwao Fujisaki | Communication device |
US8233938B1 (en) | 2003-09-26 | 2012-07-31 | Iwao Fujisaki | Communication device |
US8320958B1 (en) | 2003-09-26 | 2012-11-27 | Iwao Fujisaki | Communication device |
US8010157B1 (en) | 2003-09-26 | 2011-08-30 | Iwao Fujisaki | Communication device |
US8244300B1 (en) | 2003-09-26 | 2012-08-14 | Iwao Fujisaki | Communication device |
US8260352B1 (en) | 2003-09-26 | 2012-09-04 | Iwao Fujisaki | Communication device |
US8311578B1 (en) | 2003-09-26 | 2012-11-13 | Iwao Fujisaki | Communication device |
US7996038B1 (en) | 2003-09-26 | 2011-08-09 | Iwao Fujisaki | Communication device |
US8295880B1 (en) | 2003-09-26 | 2012-10-23 | Iwao Fujisaki | Communication device |
US8224376B1 (en) | 2003-11-22 | 2012-07-17 | Iwao Fujisaki | Communication device |
US8238963B1 (en) | 2003-11-22 | 2012-08-07 | Iwao Fujisaki | Communication device |
US7917167B1 (en) | 2003-11-22 | 2011-03-29 | Iwao Fujisaki | Communication device |
US8295876B1 (en) | 2003-11-22 | 2012-10-23 | Iwao Fujisaki | Communication device |
US8121635B1 (en) | 2003-11-22 | 2012-02-21 | Iwao Fujisaki | Communication device |
US7945914B2 (en) | 2003-12-10 | 2011-05-17 | X1 Technologies, Inc. | Methods and systems for performing operations in response to detecting a computer idle condition |
US20050149932A1 (en) * | 2003-12-10 | 2005-07-07 | Hasink Lee Z. | Methods and systems for performing operations in response to detecting a computer idle condition |
US20050204295A1 (en) * | 2004-03-09 | 2005-09-15 | Freedom Scientific, Inc. | Low Vision Enhancement for Graphic User Interface |
US8270964B1 (en) | 2004-03-23 | 2012-09-18 | Iwao Fujisaki | Communication device |
US8081962B1 (en) | 2004-03-23 | 2011-12-20 | Iwao Fujisaki | Communication device |
US8195142B1 (en) | 2004-03-23 | 2012-06-05 | Iwao Fujisaki | Communication device |
US8121587B1 (en) | 2004-03-23 | 2012-02-21 | Iwao Fujisaki | Communication device |
US8208954B1 (en) | 2005-04-08 | 2012-06-26 | Iwao Fujisaki | Communication device |
US20080072234A1 (en) * | 2006-09-20 | 2008-03-20 | Gerald Myroup | Method and apparatus for executing commands from a drawing/graphics editor using task interaction pattern recognition |
US20080262842A1 (en) * | 2007-04-20 | 2008-10-23 | Asustek Computer Inc. | Portable computer with speech recognition function and method for processing speech command thereof |
US7890089B1 (en) | 2007-05-03 | 2011-02-15 | Iwao Fujisaki | Communication device |
US8676273B1 (en) | 2007-08-24 | 2014-03-18 | Iwao Fujisaki | Communication device |
US8639214B1 (en) | 2007-10-26 | 2014-01-28 | Iwao Fujisaki | Communication device |
US8543157B1 (en) | 2008-05-09 | 2013-09-24 | Iwao Fujisaki | Communication device which notifies its pin-point location or geographic area in accordance with user selection |
US8340726B1 (en) | 2008-06-30 | 2012-12-25 | Iwao Fujisaki | Communication device |
US8452307B1 (en) | 2008-07-02 | 2013-05-28 | Iwao Fujisaki | Communication device |
US8538756B2 (en) * | 2010-01-12 | 2013-09-17 | Denso Corporation | In-vehicle device and method for modifying display mode of icon indicated on the same |
US20110173002A1 (en) * | 2010-01-12 | 2011-07-14 | Denso Corporation | In-vehicle device and method for modifying display mode of icon indicated on the same |
US8600748B2 (en) | 2010-04-26 | 2013-12-03 | Cyberpulse L.L.C. | System and methods for matching an utterance to a template hierarchy |
US8165878B2 (en) | 2010-04-26 | 2012-04-24 | Cyberpulse L.L.C. | System and methods for matching an utterance to a template hierarchy |
US9043206B2 (en) | 2010-04-26 | 2015-05-26 | Cyberpulse, L.L.C. | System and methods for matching an utterance to a template hierarchy |
US8954334B2 (en) * | 2011-10-15 | 2015-02-10 | Zanavox | Voice-activated pulser |
US20130093445A1 (en) * | 2011-10-15 | 2013-04-18 | David Edward Newman | Voice-Activated Pulser |
US8862476B2 (en) * | 2012-11-16 | 2014-10-14 | Zanavox | Voice-activated signal generator |
US20140142949A1 (en) * | 2012-11-16 | 2014-05-22 | David Edward Newman | Voice-Activated Signal Generator |
US9659058B2 (en) | 2013-03-22 | 2017-05-23 | X1 Discovery, Inc. | Methods and systems for federation of results from search indexing |
US9880983B2 (en) | 2013-06-04 | 2018-01-30 | X1 Discovery, Inc. | Methods and systems for uniquely identifying digital content for eDiscovery |
WO2015180231A1 (en) * | 2014-05-29 | 2015-12-03 | 中兴通讯股份有限公司 | Voice interaction method and apparatus |
US10346550B1 (en) | 2014-08-28 | 2019-07-09 | X1 Discovery, Inc. | Methods and systems for searching and indexing virtual environments |
US11238022B1 (en) | 2014-08-28 | 2022-02-01 | X1 Discovery, Inc. | Methods and systems for searching and indexing virtual environments |
Also Published As
Publication number | Publication date |
---|---|
US20020128843A1 (en) | 2002-09-12 |
US5377303A (en) | 1994-12-27 |
US20020010582A1 (en) | 2002-01-24 |
JPH03163623A (en) | 1991-07-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5377303A (en) | Controlled computer interface | |
US5748191A (en) | Method and system for creating voice commands using an automatically maintained log interactions performed by a user | |
US6308157B1 (en) | Method and apparatus for providing an event-based “What-Can-I-Say?” window | |
US6212541B1 (en) | System and method for switching between software applications in multi-window operating system | |
CA2115210C (en) | Interactive computer system recognizing spoken commands | |
US5818423A (en) | Voice controlled cursor movement | |
US5983179A (en) | Speech recognition system which turns its voice response on for confirmation when it has been turned off without confirmation | |
US6088671A (en) | Continuous speech recognition of text and commands | |
US6085159A (en) | Displaying voice commands with multiple variables | |
US7024363B1 (en) | Methods and apparatus for contingent transfer and execution of spoken language interfaces | |
US8140971B2 (en) | Dynamic and intelligent hover assistance | |
US5748841A (en) | Supervised contextual language acquisition system | |
US5890122A (en) | Voice-controlled computer simulateously displaying application menu and list of available commands | |
EP1485773B1 (en) | Voice-controlled user interfaces | |
US6388665B1 (en) | Software platform having a real world interface with animated characters | |
US7461352B2 (en) | Voice activated system and methods to enable a computer user working in a first graphical application window to display and control on-screen help, internet, and other information content in a second graphical application window | |
US5786818A (en) | Method and system for activating focus | |
US6499015B2 (en) | Voice interaction method for a computer graphical user interface | |
JP2001504610A (en) | Apparatus and method for indirectly grouping the contents of operation history stacks into groups | |
US6253177B1 (en) | Method and system for automatically determining whether to update a language model based upon user amendments to dictated text | |
JP2000500243A (en) | Document display system and document display method | |
JPH0580009B2 (en) | ||
JPH0876961A (en) | Method and apparatus for execution of context navigation to historical data | |
EP1190301A1 (en) | Method of interacting with a consumer electronics system | |
US6745165B2 (en) | Method and apparatus for recognizing from here to here voice command structures in a finite grammar speech recognition system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SCOTT L. BAENA, PLAN ADMINISTRATOR FOR POST EFFECT Free format text: OFFICIAL COMMITTEE OF UNSECURED CREDITORS OF LERNOUT & HAUSPIE SPEECH PRODUCTS N.V.'S PLAN OF LIQUIDATION FOR LERNOUT & HAUSPIE SPEECH PRODUCTS N.V. UNDER CHAPTER 11 OF THE BANKRUPTCY CODE;ASSIGNOR:LERNOUT & HAUSPIE SPEECH PRODUCTS N.V.;REEL/FRAME:019047/0157 Effective date: 20030311 Owner name: FONIX/ASI CORPORATION, UTAH Free format text: CHANGE OF NAME;ASSIGNOR:ASI ACQUISITION CORPORATION;REEL/FRAME:019048/0536 Effective date: 19990105 Owner name: SCOTT L. BAENA, PLAN ADMINISTRATOR FOR POST EFFECT Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LERNOUT & HAUSPIE SPEECH PRODUCTS N.V.;REEL/FRAME:019047/0044 Effective date: 20030530 Owner name: LERNOUT & HAUSPIE SPEECH PRODUCTS N.V., BELGIUM Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FONIX CORPORATION;REEL/FRAME:019056/0355 Effective date: 19990901 Owner name: ASI ACQUISITION CORPORATION, UTAH Free format text: MERGER;ASSIGNOR:ARTICULATE SYSTEMS, INC.;REEL/FRAME:019048/0561 Effective date: 19980902 Owner name: SCOTT L. BAENA, PLAN ADMINISTRATOR FOR POST EFFECT Free format text: PLAN ADMINISTRATION AGREEMENT;ASSIGNOR:LERNOUT & HAUSPIE SPEECH PRODUCTS N.V.;REEL/FRAME:019047/0224 Effective date: 20030530 Owner name: FONIX CORPORATION, UTAH Free format text: MERGER;ASSIGNOR:FONIX/ASI CORPORATION;REEL/FRAME:019048/0429 Effective date: 19990901 Owner name: ARTICULATE SYSTEMS, INC., MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FIRMAN, THOMAS R.;REEL/FRAME:019047/0010 Effective date: 19891009 Owner name: ASI ACQUISITION CORPORATION, UTAH Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ARTICULATE SYSTEMS, INC.;REEL/FRAME:019056/0364 Effective date: 19980902 |
|
AS | Assignment |
Owner name: MULTIMODAL TECHNOLOGIES, INC., PENNSYLVANIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SCOTT L. BAENA, PLAN ADMINISTRATOR FOR POST EFFECTIVE DATE L&H;REEL/FRAME:024823/0237 Effective date: 20100708 |
|
AS | Assignment |
Owner name: MULTIMODAL TECHNOLOGIES, LLC, PENNSYLVANIA Free format text: CHANGE OF NAME;ASSIGNOR:MULTIMODAL TECHNOLOGIES, INC.;REEL/FRAME:027061/0492 Effective date: 20110818 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION |
|
AS | Assignment |
Owner name: ROYAL BANK OF CANADA, AS ADMINISTRATIVE AGENT, ONT Free format text: SECURITY AGREEMENT;ASSIGNORS:MMODAL IP LLC;MULTIMODAL TECHNOLOGIES, LLC;POIESIS INFOMATICS INC.;REEL/FRAME:028824/0459 Effective date: 20120817 |
|
AS | Assignment |
Owner name: MULTIMODAL TECHNOLOGIES, LLC, PENNSYLVANIA Free format text: RELEASE OF SECURITY INTEREST;ASSIGNOR:ROYAL BANK OF CANADA, AS ADMINISTRATIVE AGENT;REEL/FRAME:033459/0987 Effective date: 20140731 |
|
AS | Assignment |
Owner name: WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT, NEW YORK Free format text: SECURITY AGREEMENT;ASSIGNOR:MMODAL IP LLC;REEL/FRAME:034047/0527 Effective date: 20140731 Owner name: WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT, Free format text: SECURITY AGREEMENT;ASSIGNOR:MMODAL IP LLC;REEL/FRAME:034047/0527 Effective date: 20140731 |
|
AS | Assignment |
Owner name: CORTLAND CAPITAL MARKET SERVICES LLC, ILLINOIS Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:MULTIMODAL TECHNOLOGIES, LLC;REEL/FRAME:033958/0511 Effective date: 20140731 |
|
AS | Assignment |
Owner name: MULTIMODAL TECHNOLOGIES, LLC, PENNSYLVANIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CORTLAND CAPITAL MARKET SERVICES LLC, AS ADMINISTRATIVE AGENT;REEL/FRAME:048210/0792 Effective date: 20190201 |
|
AS | Assignment |
Owner name: MEDQUIST OF DELAWARE, INC., TENNESSEE Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT;REEL/FRAME:048411/0712 Effective date: 20190201 Owner name: MULTIMODAL TECHNOLOGIES, LLC, TENNESSEE Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT;REEL/FRAME:048411/0712 Effective date: 20190201 Owner name: MEDQUIST CM LLC, TENNESSEE Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT;REEL/FRAME:048411/0712 Effective date: 20190201 Owner name: MMODAL IP LLC, TENNESSEE Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT;REEL/FRAME:048411/0712 Effective date: 20190201 Owner name: MMODAL MQ INC., TENNESSEE Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT;REEL/FRAME:048411/0712 Effective date: 20190201 |