JP2021086510A

JP2021086510A - Electronic apparatus

Info

Publication number: JP2021086510A
Application number: JP2019216816A
Authority: JP
Inventors: 仁志松本; Hitoshi Matsumoto
Original assignee: Kyocera Document Solutions Inc
Current assignee: Kyocera Document Solutions Inc
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2021-06-03

Abstract

To provide an electronic apparatus that can prevent the details of input voice instructions from being wrongly recognized and certainly prevent processing not intended by a user from being executed.SOLUTION: In an image forming apparatus 1, a control unit 10 extracts, from voices input to voice input units 22A to 22C, a voice of a portion in which frequency characteristics are not irregular as a user voice, causes a voice output unit 23 to output the extracted user voice, and after the lapse of a predetermined standby time without receiving cancellation instructions from when the voice output unit 23 outputs the user voice, executes processing indicated by information stored in an HDD 17 in association with the user voice. When receiving the cancellation instructions before the lapse of the standby time, the control unit does not execute the processing.SELECTED DRAWING: Figure 2

Description

本発明は、電子機器に関し、特に、音声による指示を認識するための技術に関する。 The present invention relates to an electronic device, and more particularly to a technique for recognizing a voice instruction.

音声認識機能を備えている電子機器が知られている。例えば、特許文献１には、ユーザーの音声を認識し、認識された音声が予め記憶されている単音の組合せパターンと一致する場合に、当該パターンに対応する機器設定に基づく処理を実行する画像形成装置が開示されている。 Electronic devices having a voice recognition function are known. For example, in Patent Document 1, when a user's voice is recognized and the recognized voice matches a pre-stored single sound combination pattern, image formation is performed based on the device settings corresponding to the pattern. The device is disclosed.

特許文献２には、入力された音声を認識し、音声認識の正確度合いを示すスコアが第１の値以下である場合には音声の入力を再度要求し、第２の値以上である場合には認識された内容に従って処理を実行し、第１の値より大きくかつ第２の値未満である場合には認識された内容を表示部に表示させる複合機が開示されている。 In Patent Document 2, the input voice is recognized, and when the score indicating the accuracy of voice recognition is equal to or less than the first value, the input of voice is requested again, and when the score is equal to or more than the second value. Discloses a multifunction device that executes processing according to the recognized content and displays the recognized content on the display unit when it is larger than the first value and less than the second value.

特開２００６−１６３０９６号公報Japanese Unexamined Patent Publication No. 2006-163096 特開２００６−２０５４９７号公報Japanese Unexamined Patent Publication No. 2006-205497

電子機器に入力された音声に、例えば、ドアの開閉音及びキーボードの打鍵音等の非周期性雑音が含まれていると、正確な音声認識ができず、入力された指示の内容を誤って認識してしまうおそれがある。このように、入力された指示の内容が誤って認識されると、ユーザーが意図していない処理が実行されてしまうという問題がある。特許文献１及び特許文献２に開示されている技術では、非周期性雑音の影響について考慮されておらず、上記した問題を解決することはできない。 If the voice input to the electronic device contains aperiodic noise such as door opening / closing sound and keyboard keystroke sound, accurate voice recognition cannot be performed and the input instruction content is erroneously entered. There is a risk of recognizing it. In this way, if the content of the input instruction is erroneously recognized, there is a problem that a process unintended by the user is executed. The techniques disclosed in Patent Document 1 and Patent Document 2 do not consider the influence of aperiodic noise, and cannot solve the above-mentioned problems.

本発明は、上記の事情に鑑みなされたものであり、入力された音声指示の内容が誤って認識されることを防止するとともに、ユーザーが意図していない処理が実行されることを確実に防ぐことを目的とする。 The present invention has been made in view of the above circumstances, and prevents the contents of the input voice instruction from being erroneously recognized and reliably prevents the user from performing an unintended process. The purpose is.

本発明の一局面に係る電子機器は、複数の処理を実行可能な制御部と、音声が入力される音声入力部と、音声を出力する音声出力部と、予め定められている音声を複数の処理のいずれかを示す情報に対応付けて記憶する記憶部と、を備えている。制御部は、音声入力部に入力された音声から、周波数特性が不規則でない部分の音声をユーザー音声として抽出し、音声出力部に、抽出されたユーザー音声を出力させ、音声出力部がユーザー音声を出力してから、ユーザー音声を取消すための取消指示を受付けることなく予め定められた時間が経過した場合に、複数の処理のうち、ユーザー音声に対応付けて記憶部に記憶されている情報が示す処理を実行し、音声出力部がユーザー音声を出力してから予め定められた時間が経過する前に取消指示を受付けた場合には、上記した処理を実行しない。 The electronic device according to one aspect of the present invention includes a control unit capable of executing a plurality of processes, a voice input unit for inputting voice, a voice output unit for outputting voice, and a plurality of predetermined voices. It includes a storage unit that stores information in association with information indicating any of the processes. The control unit extracts the voice of the part whose frequency characteristics are not irregular from the voice input to the voice input unit as the user voice, causes the voice output unit to output the extracted user voice, and the voice output unit outputs the user voice. When a predetermined time elapses without accepting a cancellation instruction for canceling the user voice after outputting, the information stored in the storage unit in association with the user voice among a plurality of processes is stored. If the processing shown above is executed and the cancellation instruction is received before the predetermined time has elapsed after the voice output unit outputs the user voice, the above processing is not executed.

本発明によれば、音声入力部に入力された音声から、周波数特性が不規則でない部分の音声をユーザー音声として抽出するので、認識された音声に非周期性雑音が含まれることによって、入力された音声指示の内容が誤って認識されることを防止できる。また、ユーザーは、音声出力部によって出力された音声を確認することによって、ユーザー音声が正確に認識されているか否かを確認することができるので、視覚に障害のあるユーザーの利便性が向上する。さらに、ユーザー音声が出力されてから予め定められた待機時間が経過する前に取消指示を受付けた場合には、ユーザー音声に対応する処理が実行されないので、ユーザーが意図していない処理が実行されることを確実に防ぐことができる。 According to the present invention, since the voice of the portion whose frequency characteristics are not irregular is extracted as the user voice from the voice input to the voice input unit, the recognized voice is input due to the inclusion of aperiodic noise. It is possible to prevent the contents of the voice instruction from being erroneously recognized. In addition, the user can confirm whether or not the user's voice is accurately recognized by checking the voice output by the voice output unit, which improves the convenience of the visually impaired user. .. Further, if the cancellation instruction is received before the predetermined waiting time elapses after the user voice is output, the process corresponding to the user voice is not executed, so that the process not intended by the user is executed. Can be reliably prevented.

本発明の一実施形態に係る画像形成装置の構成を示す正面断面図である。It is a front sectional view which shows the structure of the image forming apparatus which concerns on one Embodiment of this invention. 画像形成装置の内部構成を示すブロック図である。It is a block diagram which shows the internal structure of an image forming apparatus. 音声指示認識処理を示すフローチャートである。It is a flowchart which shows the voice instruction recognition process. 設定画面の一例を示す図である。It is a figure which shows an example of the setting screen.

以下、本発明の一実施形態に係る電子機器としての画像形成装置について図面を参照して説明する。図１は、本発明の一実施形態に係る画像形成装置の構成を示す正面断面図である。図１を参照して、画像形成装置１は、コピー機能、スキャン機能、プリント機能、及びファクシミリ機能等の複数の機能を備えている複合機である。画像形成装置１の筐体には、画像形成装置１の様々な機能を実現するための複数の機器が収容されている。例えば、筐体には、画像読取部１１、画像形成部１２、定着部１３、及び給紙部１４等が収容されている。 Hereinafter, an image forming apparatus as an electronic device according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a front sectional view showing a configuration of an image forming apparatus according to an embodiment of the present invention. With reference to FIG. 1, the image forming apparatus 1 is a multifunction device having a plurality of functions such as a copy function, a scanning function, a printing function, and a facsimile function. The housing of the image forming apparatus 1 houses a plurality of devices for realizing various functions of the image forming apparatus 1. For example, the housing includes an image reading unit 11, an image forming unit 12, a fixing unit 13, a paper feeding unit 14, and the like.

図２は、画像形成装置の内部構成を示すブロック図である。図２を参照して、画像形成装置１は、制御ユニット１００を含んでいる。制御ユニット１００は、プロセッサー、ＲＡＭ(Random Access Memory）、及びＲＯＭ（Read Only Memory）等を含んでいる。プロセッサーは、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、又はＡＳＩＣ（Application specific Integrated Circuit）等である。 FIG. 2 is a block diagram showing an internal configuration of the image forming apparatus. With reference to FIG. 2, the image forming apparatus 1 includes a control unit 100. The control unit 100 includes a processor, a RAM (Random Access Memory), a ROM (Read Only Memory), and the like. The processor is, for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), an ASIC (Application Specific Integrated Circuit), or the like.

制御ユニット１００は、ＲＯＭ又はＨＤＤ（Hard Disk Drive）１７に記憶されている制御プログラムが上記プロセッサーによって実行されることにより、制御部１０として機能する。なお、制御部１０は、上記制御プログラムに基づく動作によらず、ロジック回路により構成されていてもよい。 The control unit 100 functions as a control unit 10 when a control program stored in a ROM or an HDD (Hard Disk Drive) 17 is executed by the processor. The control unit 10 may be configured by a logic circuit regardless of the operation based on the control program.

制御部１０は、画像形成装置１の全体制御を司る。より詳細には、制御部１０は、画像形成装置１の各部の動作を制御することによって、コピー処理、スキャン処理、及びプリント処理等を含む複数の処理を実行可能に構成されている。制御部１０はまた、後述する認識プログラムにしたがって動作することによって、画像形成装置１に入力された音声を認識し、認識された音声に対応する処理を実行するための音声指示認識処理を実行する。 The control unit 10 controls the overall control of the image forming apparatus 1. More specifically, the control unit 10 is configured to be able to execute a plurality of processes including a copy process, a scan process, a print process, and the like by controlling the operation of each unit of the image forming apparatus 1. The control unit 10 also recognizes the voice input to the image forming apparatus 1 by operating according to a recognition program described later, and executes a voice instruction recognition process for executing a process corresponding to the recognized voice. ..

制御ユニット１００は、原稿搬送部６、画像読取部１１、画像形成部１２、定着部１３、給紙部１４、表示部１５、操作部１６、ＨＤＤ１７、搬送機構１８、画像処理部１９、画像メモリー２０、ファクシミリ通信部２１、音声入力部２２Ａ乃至２２Ｃ、音声出力部２３、及び通信部２４等と電気的に接続されている。 The control unit 100 includes a document transport unit 6, an image reading unit 11, an image forming unit 12, a fixing unit 13, a paper feeding unit 14, a display unit 15, an operation unit 16, HDD 17, a transport mechanism 18, an image processing unit 19, and an image memory. 20, Facsimile communication unit 21, voice input units 22A to 22C, voice output unit 23, communication unit 24, etc. are electrically connected.

画像読取部１１は、原稿台に載置されている原稿を搬送する原稿搬送部６と、原稿搬送部６によって搬送されてくる原稿又はプラテンガラス７に載置されている原稿を光学的に読み取るスキャナーと、を含んでいるＡＤＦ（Auto Document Feeder）である。画像読取部１１は、光照射部により原稿を照射し、その反射光をＣＣＤ（Charge-Coupled Device）センサーで受光することによって、原稿を読取って画像データを生成する画像読取処理を実行する。 The image scanning unit 11 optically reads the document transporting unit 6 for transporting the document placed on the platen and the document transported by the document transporting unit 6 or the document placed on the platen glass 7. An ADF (Auto Document Feeder) that includes a scanner. The image reading unit 11 irradiates the document with the light irradiation unit and receives the reflected light with the CCD (Charge-Coupled Device) sensor to execute the image reading process of reading the document and generating image data.

画像形成部１２は、感光体ドラム、帯電装置、露光装置、現像装置、及び転写装置を含んでいる。画像形成部１２は、画像読取部１１によって生成された画像データ等に基づいて、給紙部１４から搬送されてくる記録紙Ｐに、トナー像によって構成されている画像を形成する画像形成処理を実行する。 The image forming unit 12 includes a photoconductor drum, a charging device, an exposure device, a developing device, and a transfer device. The image forming unit 12 performs an image forming process of forming an image composed of a toner image on the recording paper P conveyed from the paper feeding unit 14 based on the image data or the like generated by the image reading unit 11. Execute.

定着部１３は、画像形成部１２によってトナー像が形成された記録紙Ｐを加熱及び加圧することによってトナー像を記録紙Ｐに定着させる。定着部１３によってトナー像が定着された記録紙Ｐは、排出トレイ８に排出される。 The fixing unit 13 fixes the toner image on the recording paper P by heating and pressurizing the recording paper P on which the toner image is formed by the image forming unit 12. The recording paper P on which the toner image is fixed by the fixing unit 13 is discharged to the discharge tray 8.

給紙部１４は、手差しトレイと、複数の給紙カセットとを備えている。給紙部１４は、給紙カセットに収容されている記録紙Ｐ、又は手差しトレイに載置されている記録紙を一枚ずつ引出して、画像形成部１２に向けて給紙する。 The paper feed unit 14 includes a manual feed tray and a plurality of paper feed cassettes. The paper feed unit 14 pulls out the recording paper P housed in the paper feed cassette or the recording paper placed on the manual feed tray one by one, and feeds the paper to the image forming unit 12.

表示部１５は、液晶ディスプレイ又は有機ＥＬ（Organic Light-Emitting Diode）ディスプレイ等によって構成されている表示装置である。表示部１５は、制御部１０の制御にしたがって、画像形成装置１によって実行可能な各機能についての各種の画面を表示する。 The display unit 15 is a display device composed of a liquid crystal display, an organic EL (Organic Light-Emitting Diode) display, or the like. The display unit 15 displays various screens for each function that can be executed by the image forming apparatus 1 under the control of the control unit 10.

操作部１６は、画像形成装置１の正面側に設けられている。操作部１６は、入力された指示を取消すための取消キー１６Ａ等の複数のハードキーを含んでいる。操作部１６はまた、表示部１５に重ねて配置されているタッチパネル１６Ｂを含んでいる。ユーザーは、操作部１６を介して、画像形成装置１によって実行可能な各機能についての指示等の各種の情報を入力する。操作部１６は、特許請求の範囲における指示入力部の一例である。 The operation unit 16 is provided on the front side of the image forming apparatus 1. The operation unit 16 includes a plurality of hard keys such as a cancel key 16A for canceling the input instruction. The operation unit 16 also includes a touch panel 16B which is arranged so as to overlap the display unit 15. The user inputs various information such as instructions for each function that can be executed by the image forming apparatus 1 via the operation unit 16. The operation unit 16 is an example of an instruction input unit within the scope of claims.

ＨＤＤ１７は、画像読取部１１によって生成された画像データ等の各種データを記憶するための大容量の記憶装置である。ＨＤＤ１７は、画像形成装置１の一般的な動作を実現するための各種制御プログラムを記憶している。ＨＤＤ１７は、各種制御プログラムの１つとして、本発明の一実施形態に係る音声指示認識処理を実行するための認識プログラムを記憶している。 The HDD 17 is a large-capacity storage device for storing various data such as image data generated by the image reading unit 11. The HDD 17 stores various control programs for realizing the general operation of the image forming apparatus 1. The HDD 17 stores a recognition program for executing the voice instruction recognition process according to the embodiment of the present invention as one of various control programs.

ＨＤＤ１７は、予め定められている文字列を示すテキストデータを、制御部１０によって実行可能な上記した複数の処理のいずれかを示す情報に対応付けて記憶している。例えば、ＨＤＤ１７は、「コピー」という文字列を示すテキストデータを、コピー処理を示す情報に対応付けて記憶している。ＨＤＤ１７は、「スキャン」という文字列を示すテキストデータを、スキャン処理を示す情報に対応付けて記憶している。ＨＤＤ１７は、「プリント」という文字列を示すテキストデータを、プリント処理を示す情報に対応付けて記憶している。 The HDD 17 stores text data indicating a predetermined character string in association with information indicating any of the above-described plurality of processes that can be executed by the control unit 10. For example, the HDD 17 stores text data indicating the character string “copy” in association with information indicating the copy process. The HDD 17 stores text data indicating the character string "scan" in association with information indicating the scan process. The HDD 17 stores text data indicating the character string "print" in association with information indicating the print process.

ＨＤＤ１７は、ユーザーに対して確認を行なうための確認用音声を示す確認用音声データを記憶している。ここでは、ＨＤＤ１７は、「を行ないますか」という確認用音声を示す確認用音声データを記憶しているものとする。 The HDD 17 stores confirmation voice data indicating a confirmation voice for confirming to the user. Here, it is assumed that the HDD 17 stores the confirmation voice data indicating the confirmation voice "Do you want to do it?".

ＨＤＤ１７はまた、ユーザーに対して確認を行なうための確認用文字列を示す確認用テキストデータを記憶している。ここでは、ＨＤＤ１７は、「を行ないますか」という確認用文字列を示す確認用テキストデータを記憶しているものとする。 The HDD 17 also stores confirmation text data indicating a confirmation character string for confirming to the user. Here, it is assumed that the HDD 17 stores confirmation text data indicating a confirmation character string "Do you want to perform?".

搬送機構１８は、搬送ローラー対１８Ａ及び排出ローラー対１８Ｂ等によって構成されている。搬送機構１８は、排出先として設定されている排出トレイ８に向けて、記録紙Ｐを搬送路Ｔに沿って搬送する。 The transport mechanism 18 is composed of a transport roller pair 18A, a discharge roller pair 18B, and the like. The transport mechanism 18 transports the recording paper P along the transport path T toward the discharge tray 8 set as the discharge destination.

画像処理部１９は、画像読取部１１によって生成された画像データに対して、必要に応じて画像処理を実行する。画像メモリー２０は、画像読取部１１によって生成された出力対象の画像データを一時的に記憶する領域を含む。ファクシミリ通信部２１は、公衆回線への接続を行ない、公衆回線を介して画像データの送受信を行なう。 The image processing unit 19 executes image processing on the image data generated by the image reading unit 11 as necessary. The image memory 20 includes an area for temporarily storing the image data to be output generated by the image reading unit 11. The facsimile communication unit 21 connects to a public line and transmits / receives image data via the public line.

音声入力部２２Ａは、画像形成装置１の正面側であって、操作部１６の近傍に設けられている。音声入力部２２Ｂは、画像形成装置１の一方の側面側に設けられている。音声入力部２２Ｃは、画像形成装置１の他方の側面側に設けられている。音声入力部２２Ａ乃至２２Ｃはそれぞれ、音声が入力されるマイクロフォンと、マイクロフォンに入力された音声に基づくアナログ信号をデジタル信号に変換するためのＡ／Ｄ変換回路と、を含んでいる。 The voice input unit 22A is provided on the front side of the image forming apparatus 1 and in the vicinity of the operation unit 16. The voice input unit 22B is provided on one side surface side of the image forming apparatus 1. The voice input unit 22C is provided on the other side surface side of the image forming apparatus 1. The voice input units 22A to 22C each include a microphone into which voice is input and an A / D conversion circuit for converting an analog signal based on the voice input to the microphone into a digital signal.

音声出力部２３は、音声データが示すデジタル信号をアナログ信号に変換するためのＤ／Ａ変換回路と、変換されたアナログ信号に基づいて音声を出力するスピーカー、及びアンプ等を含んでいる。 The audio output unit 23 includes a D / A conversion circuit for converting a digital signal indicated by audio data into an analog signal, a speaker that outputs audio based on the converted analog signal, an amplifier, and the like.

通信部２４は、ＬＡＮ(Local Area Network)ボード等の通信モジュールを含んでいる。画像形成装置１は、通信部２４を介して、ネットワークを介して接続されているＰＣ(Personal Computer)２５等とデータ通信を行なう。 The communication unit 24 includes a communication module such as a LAN (Local Area Network) board. The image forming apparatus 1 performs data communication with a PC (Personal Computer) 25 or the like connected via a network via a communication unit 24.

画像形成装置１の各部には電源が接続されており、この電源から電力が供給されることによって、画像形成装置１の各部が動作する。 A power supply is connected to each part of the image forming apparatus 1, and each part of the image forming apparatus 1 operates by supplying electric power from this power source.

［動作］
図３は、音声指示認識処理を示すフローチャートである。以下、図３等を参照して、画像形成装置１の動作について説明する。 [motion]
FIG. 3 is a flowchart showing the voice instruction recognition process. Hereinafter, the operation of the image forming apparatus 1 will be described with reference to FIG. 3 and the like.

画像形成装置１の管理者が、操作部１６を介して、画像形成装置１の操作モードを通常モード及び音声モードのうちのいずれかに設定するための設定画面を表示するための表示指示を入力したものとする。ここで、画像形成装置１の操作モードとして通常モードが設定されている場合には、制御部１０は、ユーザーの指示として、操作部１６を介して入力された指示を受付ける。音声モードが設定されている場合には、制御部１０は、ユーザーの指示として、音声入力部２２を介して音声によって入力された指示を受付ける。 The administrator of the image forming apparatus 1 inputs a display instruction for displaying a setting screen for setting the operation mode of the image forming apparatus 1 to either a normal mode or a voice mode via the operation unit 16. It is assumed that Here, when the normal mode is set as the operation mode of the image forming apparatus 1, the control unit 10 receives the instruction input via the operation unit 16 as the user's instruction. When the voice mode is set, the control unit 10 receives an instruction input by voice via the voice input unit 22 as a user's instruction.

図４は、設定画面の一例を示す図である。図４を参照して、制御部１０は、操作部１６を介して上記した表示指示を受付けると、表示部１５に、設定画面４０を表示させる。このとき、制御部１０は、選択画面４０に、通常モードを選択するためのラジオボタン４１と、音声モードを選択するためのラジオボタン４２と、を表示させている。管理者は、選択画面４０を確認して、ラジオボタン４２をタッチしたものとする。制御部１０は、タッチパネル１６Ｂを介してラジオボタン４２に対するタッチ操作を検知すると、ラジオボタン４２にチェックを表示させる。 FIG. 4 is a diagram showing an example of a setting screen. With reference to FIG. 4, when the control unit 10 receives the above-mentioned display instruction via the operation unit 16, the display unit 15 causes the display unit 15 to display the setting screen 40. At this time, the control unit 10 displays the radio button 41 for selecting the normal mode and the radio button 42 for selecting the voice mode on the selection screen 40. It is assumed that the administrator confirms the selection screen 40 and touches the radio button 42. When the control unit 10 detects a touch operation on the radio button 42 via the touch panel 16B, the control unit 10 causes the radio button 42 to display a check.

制御部１０はまた、選択画面４０に、音声のみによって操作案内を行なうための設定を選択するためのラジオボタン４３と、音声及び表示によって操作案内を行なうための設定を選択するためのラジオボタン４４と、を表示させている。管理者は、選択画面４０を確認して、ラジオボタン４３をタッチしたものとする。制御部１０は、タッチパネル１６Ｂを介してラジオボタン４３に対するタッチ操作を検知すると、ラジオボタン４３にチェックを表示させる。 The control unit 10 also has a radio button 43 for selecting a setting for performing operation guidance only by voice and a radio button 44 for selecting a setting for performing operation guidance by voice and display on the selection screen 40. And are displayed. It is assumed that the administrator confirms the selection screen 40 and touches the radio button 43. When the control unit 10 detects a touch operation on the radio button 43 via the touch panel 16B, the control unit 10 displays a check on the radio button 43.

ラジオボタン４３のタッチ後、管理者は、選択内容を確定するためのソフトキー４５をタッチしたものとする。制御部１０は、タッチパネル１６Ｂを介してソフトキー４５に対するタッチ操作を検知すると、設定画面４０に反映されている選択内容を、操作モードについての設定内容として、ＨＤＤ１７に記憶させる。ソフトキー４５のタッチ後、管理者は、画像形成装置１の電源を切断する。 After touching the radio button 43, it is assumed that the administrator touches the soft key 45 for confirming the selected content. When the control unit 10 detects a touch operation on the soft key 45 via the touch panel 16B, the control unit 10 stores the selection content reflected on the setting screen 40 in the HDD 17 as the setting content for the operation mode. After touching the soft key 45, the administrator turns off the power of the image forming apparatus 1.

管理者による電源切断後、画像形成装置１のユーザーが、画像形成装置１の電源を投入したものとする。図３を参照して、画像形成装置１の電源が投入されると、制御部１０は、音声指示認識処理を開始して、ＨＤＤ１７に記憶されている設定内容に基づいて、音声モードが設定されているか否かを判定する（ステップＳ１０）。 It is assumed that the user of the image forming apparatus 1 turns on the power of the image forming apparatus 1 after the power is turned off by the administrator. With reference to FIG. 3, when the power of the image forming apparatus 1 is turned on, the control unit 10 starts the voice instruction recognition process, and the voice mode is set based on the setting contents stored in the HDD 17. It is determined whether or not it is (step S10).

この場合、管理者によってラジオボタン４２がタッチされ、音声モードが設定されているので、制御部１０は、音声モードが設定されていると判定し（ステップＳ１０にてＹＥＳ）、音声入力部２２Ａ乃至２２Ｃを起動させる（ステップＳ１１）。 In this case, since the radio button 42 is touched by the administrator and the voice mode is set, the control unit 10 determines that the voice mode is set (YES in step S10), and the voice input units 22A to 22C is activated (step S11).

音声入力部２２Ａ乃至２２Ｃが起動されると、音声入力部２２ＡのＡ／Ｄ変換回路は、マイクロフォンに入力された音声に基づくアナログ信号をデジタル信号に変換することで第１音声データを生成し、生成された第１音声データを制御ユニット１００に入力する。同様にして、音声入力部２２Ｂは、音声入力部２２Ｂのマイクロフォンに入力された音声に基づいて第２音声データを生成し、生成された第２音声データを制御ユニット１００に入力する。音声入力部２２Ｃは、音声入力部２２Ｃのマイクロフォンに入力された音声に基づいて第３音声データを生成し、生成された第３音声データを制御ユニット１００に入力する。 When the voice input units 22A to 22C are activated, the A / D conversion circuit of the voice input unit 22A generates the first voice data by converting the analog signal based on the voice input to the microphone into a digital signal. The generated first audio data is input to the control unit 100. Similarly, the voice input unit 22B generates the second voice data based on the voice input to the microphone of the voice input unit 22B, and inputs the generated second voice data to the control unit 100. The voice input unit 22C generates a third voice data based on the voice input to the microphone of the voice input unit 22C, and inputs the generated third voice data to the control unit 100.

ステップＳ１１の処理後、制御部１０は、制御ユニット１００に入力されてくる第１乃至第３音声データが示す音声に、ユーザー音声が含まれているか否かを判定する（ステップＳ１２）。具体的には、ステップＳ１２において、制御部１０はまず、第１乃至第３音声データのそれぞれが示す音声の波形に対してフーリエ変換を行なうことで、第１乃至第３音声データのそれぞれに対応する第１乃至第３周波数スペクトルを取得する。第１乃至第３周波数スペクトルは、特許請求の範囲における周波数特性を示している。 After the processing of step S11, the control unit 10 determines whether or not the voice indicated by the first to third voice data input to the control unit 100 includes the user voice (step S12). Specifically, in step S12, the control unit 10 first performs Fourier transform on the voice waveforms indicated by the first to third voice data to correspond to the first to third voice data. The first to third frequency spectra to be used are acquired. The first to third frequency spectra show frequency characteristics within the claims.

制御部１０は、第１乃至第３周波数スペクトルに共通して、周期性を示している部分が存在している場合には、第１乃至第３音声データが示す音声に、ユーザー音声が含まれていると判定する（ステップＳ１２にてＹＥＳ）。一方、制御部１０は、第１乃至第３周波数スペクトルが不規則であって周期性を示しておらず、第１乃至第３周波数スペクトルに共通して周期性を示している部分が存在していない場合には、第１乃至第３音声データが示す音声に、ユーザー音声が含まれていないと判定する（ステップＳ１２にてＮＯ）。 When the control unit 10 has a portion showing periodicity in common to the first to third frequency spectra, the voice indicated by the first to third voice data includes the user voice. (YES in step S12). On the other hand, in the control unit 10, the first to third frequency spectra are irregular and do not show periodicity, and there is a portion that shows periodicity in common with the first to third frequency spectra. If not, it is determined that the voice indicated by the first to third voice data does not include the user voice (NO in step S12).

このとき、音声入力部２２Ａ乃至２２Ｃのマイクロフォンのそれぞれには、画像形成装置１の周囲に発生している、ドアの開閉音及びキーボードの打鍵音等の非周期性雑音が入力されているものとする。この場合、第１乃至第３周波数スペクトルが不規則であって周期性を示していないので、制御部１０は、第１乃至第３音声データが示す音声に、ユーザー音声が含まれていないと判定して（ステップＳ１２にてＮＯ）、上記したステップＳ１２の処理を繰返す。 At this time, it is assumed that aperiodic noises such as door opening / closing sounds and keyboard keystroke sounds generated around the image forming apparatus 1 are input to each of the microphones of the voice input units 22A to 22C. To do. In this case, since the first to third frequency spectra are irregular and do not show periodicity, the control unit 10 determines that the voice indicated by the first to third voice data does not include the user voice. Then (NO in step S12), the process of step S12 described above is repeated.

このような状況で、ユーザーは、画像形成装置１の正面に立って、「コピー。」という音声を発声したものとする。このとき、第１乃至第３周波数スペクトルに共通して、特定の周波数帯のピークが周期的に出現するので、制御部１０は、第１乃至第３音声データが示す音声に、ユーザー音声が含まれていると判定し（ステップＳ１２にてＹＥＳ）、第１音声データが示す音声から、ユーザー音声を抽出する（ステップＳ１３）。 In such a situation, it is assumed that the user stands in front of the image forming apparatus 1 and utters the voice "copy." At this time, since peaks in a specific frequency band appear periodically in the first to third frequency spectra, the control unit 10 includes the user voice in the voice indicated by the first to third voice data. It is determined that this is the case (YES in step S12), and the user voice is extracted from the voice indicated by the first voice data (step S13).

具体的には、ステップＳ１３において、制御部１０は、第１乃至第３周波数スペクトルに共通して周期的なピークが出現している周波数帯のスペクトルを特定して抽出し、抽出されたスペクトルに対して逆フーリエ変換を行なうことで、ユーザー音声を示す音声データ（以下、「抽出データ」と記す。）を生成する。これによって、制御部１０は、第１音声データが示す音声から、周波数特性が不規則でない部分の音声、すなわち周波数特性が周期性を示している部分の音声を抽出する。ここで、ステップＳ１３において、第１周波数スペクトルのみからユーザー音声を抽出するのは、画像形成装置１の正面側に設けられている音声入力部２２Ａに、ユーザー音声が最も明瞭に入力されていると想定されるためである。 Specifically, in step S13, the control unit 10 identifies and extracts the spectrum of the frequency band in which the periodic peak appears in common to the first to third frequency spectra, and extracts the extracted spectrum. On the other hand, by performing the inverse Fourier transform, voice data indicating the user voice (hereinafter, referred to as “extracted data”) is generated. As a result, the control unit 10 extracts the voice of the portion where the frequency characteristic is not irregular, that is, the voice of the portion where the frequency characteristic shows periodicity from the voice indicated by the first voice data. Here, in step S13, the user voice is extracted only from the first frequency spectrum because the user voice is most clearly input to the voice input unit 22A provided on the front side of the image forming apparatus 1. This is because it is assumed.

ステップＳ１３の処理後、制御部１０は、ＨＤＤ１７に記憶されている設定内容に基づいて、操作案内を音声のみで行なうか否かを判定する（ステップＳ１４）。この場合、管理者によってラジオボタン４３がタッチされ、音声のみによって操作案内を行なうための設定が選択されているので、制御部１０は、操作案内を音声のみで行なうと判定し（ステップＳ１４にてＹＥＳ）、音声出力部２３に、抽出されたユーザー音声を出力させる（ステップＳ１５）。 After the process of step S13, the control unit 10 determines whether or not to perform the operation guidance only by voice based on the setting contents stored in the HDD 17 (step S14). In this case, since the radio button 43 is touched by the administrator and the setting for performing the operation guidance only by voice is selected, the control unit 10 determines that the operation guidance is performed only by voice (in step S14). YES), the voice output unit 23 is made to output the extracted user voice (step S15).

具体的には、ステップＳ１５において、制御部１０は、抽出データと、ＨＤＤ１７に記憶されている確認用音声データとをこの順に音声出力部２３に入力する。これによって、音声出力部２３は、ユーザー音声と確認用音声とを、この順に出力する。この場合、音声出力部２３は、「コピー」というユーザー音声と、「を行ないますか。」という確認用音声とをこの順に出力する。 Specifically, in step S15, the control unit 10 inputs the extracted data and the confirmation voice data stored in the HDD 17 to the voice output unit 23 in this order. As a result, the voice output unit 23 outputs the user voice and the confirmation voice in this order. In this case, the voice output unit 23 outputs the user voice "copy" and the confirmation voice "do you want to do it?" In this order.

一方、管理者によってラジオボタン４４がタッチされ、音声及び表示によって操作案内を行なうための設定が選択されている場合、制御部１０は、操作案内を音声のみで行なわないと判定し（ステップＳ１４にてＮＯ）、音声出力部２３に、抽出されたユーザー音声を出力させるとともに、表示部１５に、抽出されたユーザー音声が示す内容を表示させる（ステップＳ１６）。 On the other hand, when the radio button 44 is touched by the administrator and the setting for performing the operation guidance by voice and display is selected, the control unit 10 determines that the operation guidance is not performed only by voice (in step S14). NO), the voice output unit 23 is made to output the extracted user voice, and the display unit 15 is made to display the content indicated by the extracted user voice (step S16).

具体的には、ステップＳ１６において、制御部１０は、上記したステップＳ１５の処理と同様にして、ユーザー音声と確認用音声とをこの順に音声出力部２３に出力させる。制御部１０はまた、一般的な音声認識技術を用いて、抽出データが示すユーザー音声の波形から音声の内容を認識し、認識された音声の内容をテキスト化して出力用テキストデータを生成する。制御部１０は、出力用テキストデータが示す文字列と、ＨＤＤ１７に記憶されている確認用テキストデータが示す確認用文字列とを、この順に並べて、ユーザー音声が示す内容として表示部１５に表示させる。この場合、制御部１０は、「コピー」という文字列と、「を行ないますか。」という確認用文字列とを、この順に並べて表示部１５に表示させる。 Specifically, in step S16, the control unit 10 causes the voice output unit 23 to output the user voice and the confirmation voice in this order in the same manner as in the process of step S15 described above. The control unit 10 also uses a general voice recognition technique to recognize the content of the voice from the waveform of the user voice indicated by the extracted data, convert the recognized voice content into text, and generate text data for output. The control unit 10 arranges the character string indicated by the output text data and the confirmation character string indicated by the confirmation text data stored in the HDD 17 in this order, and causes the display unit 15 to display the content indicated by the user voice. .. In this case, the control unit 10 displays the character string "copy" and the confirmation character string "do you want to do it?" On the display unit 15 in this order.

ステップＳ１５又はステップＳ１６の処理後、制御部１０は、ユーザー音声を取消すための取消指示を受付けているか否かを判定する（ステップＳ１７）。制御部１０は、制御ユニット１００に入力されてくる第１乃至第３音声データが示す音声に新たなユーザー音声が含まれている場合、又は、取消キー１６Ａが押下された場合には、取消指示を受付けていると判定する（ステップＳ１７にてＹＥＳ）。一方、制御部１０は、上記した第１乃至第３音声データが示す音声に新たなユーザー音声が含まれておらず、かつ、取消キー１６Ａが押下されていない場合には、取消指示を受付けていないと判定する（ステップＳ１７にてＮＯ）。 After the processing of step S15 or step S16, the control unit 10 determines whether or not the cancellation instruction for canceling the user voice is accepted (step S17). The control unit 10 gives a cancel instruction when a new user voice is included in the voice indicated by the first to third voice data input to the control unit 100, or when the cancel key 16A is pressed. Is accepted (YES in step S17). On the other hand, the control unit 10 receives a cancellation instruction when the voice indicated by the first to third voice data described above does not include a new user voice and the cancel key 16A is not pressed. It is determined that there is no such (NO in step S17).

この場合、「コピー」というユーザー音声が画像形成装置１によって正確に認識されているので、ユーザーは、新たな発声を行なわず、かつ取消キー１６Ａの押下も行なわない。したがって、制御部１０は、取消指示を受付けていないと判定し（ステップＳ１７にてＮＯ）、ステップＳ１５又はステップＳ１６の処理が実行されてから予め定められた待機時間が経過しているか否かを判定する（ステップＳ１８）。 In this case, since the user voice of "copy" is accurately recognized by the image forming apparatus 1, the user does not make a new utterance and does not press the cancel key 16A. Therefore, the control unit 10 determines that the cancellation instruction has not been received (NO in step S17), and determines whether or not a predetermined waiting time has elapsed since the process of step S15 or step S16 was executed. Judgment (step S18).

待機時間としては特に限定されないが、ここでは、予め定められた待機時間は「１０秒」であるものとする。管理者は、操作部１６を介して待機時間を予め入力しており、制御部１０は、入力された待機時間を設定内容としてＨＤＤ１７に予め記憶させている。制御部１０は、ステップＳ１５又はステップＳ１６の処理が実行されてから１０秒以内である場合、予め定められた待機時間が経過していないと判定し（ステップＳ１８にてＮＯ）、ステップＳ１７の処理に戻る。 The waiting time is not particularly limited, but here, the predetermined waiting time is assumed to be "10 seconds". The administrator inputs the standby time in advance via the operation unit 16, and the control unit 10 stores the input standby time in advance in the HDD 17 as a setting content. If it is within 10 seconds after the process of step S15 or step S16 is executed, the control unit 10 determines that the predetermined waiting time has not elapsed (NO in step S18), and the process of step S17. Return to.

ユーザーによって新たな発声及び取消キー１６Ａの押下が行なわれることなく１０秒が経過すると、制御部１０は、予め定められた待機時間が経過していると判定し（ステップＳ１８にてＹＥＳ）、ユーザー音声に対応付けてＨＤＤ１７に記憶されている情報を読出し、読出された情報が示す処理を実行する（ステップＳ１９）。 When 10 seconds have passed without the user performing a new utterance or pressing the cancel key 16A, the control unit 10 determines that a predetermined waiting time has elapsed (YES in step S18), and the user The information stored in the HDD 17 is read in association with the voice, and the process indicated by the read information is executed (step S19).

具体的には、ステップＳ１９において、制御部１０は、一般的な音声認識技術を用いて、抽出データが示すユーザー音声の波形から音声を認識し、認識された音声をテキスト化して認識テキストデータを生成する。制御部１０は、ＨＤＤ１７から、認識テキストデータが示す文字列と同じ文字列を示すテキストデータに対応付けて記憶されている情報を読出す。この場合、制御部１０は、ＨＤＤ１７から、認識テキストデータが示す「コピー」という文字列と同じ文字列を示すテキストデータに対応付けて記憶されている、コピー処理を示す情報を読出す。 Specifically, in step S19, the control unit 10 recognizes a voice from the waveform of the user voice indicated by the extracted data by using a general voice recognition technique, converts the recognized voice into text, and converts the recognized voice data into text. Generate. The control unit 10 reads from the HDD 17 the information stored in association with the text data indicating the same character string as the character string indicated by the recognition text data. In this case, the control unit 10 reads from the HDD 17 the information indicating the copy process, which is stored in association with the text data indicating the same character string as the character string "copy" indicated by the recognition text data.

制御部１０は、読出された情報が示す処理を実行するために必要となる処理を実行する。この場合、制御部１０は、音声出力部２３に、コピー処理についての各種設定を行なうための操作案内を示す音声を出力させたり、表示部１５に、コピー処理についての各種設定を行なうためのコピー画面を表示させたりする。ステップＳ１９の処理後、制御部１０は、音声指示認識処理を終了する。 The control unit 10 executes the process required to execute the process indicated by the read information. In this case, the control unit 10 causes the voice output unit 23 to output a voice indicating an operation guide for making various settings for the copy process, and the display unit 15 makes a copy for making various settings for the copy process. Display the screen. After the process of step S19, the control unit 10 ends the voice instruction recognition process.

なお、予め定められた待機時間が経過する前に、ユーザーによって新たな発声又は取消キー１６Ａの押下が行なわれた場合、制御部１０は、取消指示を受付けていると判定し（ステップＳ１７にてＹＥＳ）、ステップＳ１２の処理に戻る。 If the user makes a new utterance or presses the cancel key 16A before the predetermined waiting time elapses, the control unit 10 determines that the cancel instruction has been accepted (in step S17). YES), the process returns to step S12.

また、管理者によってラジオボタン４１がタッチされ、通常モードが設定されている場合には、制御部１０は、音声モードが設定されていないと判定し（ステップＳ１０にてＮＯ）、音声指示認識処理を終了する。この場合、制御部１０は、表示部１５に、画像形成装置１によって実行可能な複数の処理のうちのいずれかを選択するためのホーム画面を表示させる。 Further, when the radio button 41 is touched by the administrator and the normal mode is set, the control unit 10 determines that the voice mode is not set (NO in step S10), and the voice instruction recognition process. To finish. In this case, the control unit 10 causes the display unit 15 to display a home screen for selecting one of a plurality of processes that can be executed by the image forming apparatus 1.

上記実施形態によれば、制御部１０は、音声入力部２２Ａ乃至２２Ｃに入力された音声から、周波数特性が不規則でない部分の音声をユーザー音声として抽出し、抽出されたユーザー音声を音声出力部２３に出力させ、音声出力部２３がユーザー音声を出力してから、取消指示を受付けることなく予め定められた待機時間が経過した場合に、ユーザー音声に対応付けてＨＤＤ１７に記憶されている情報が示す処理を実行し、上記した待機時間が経過する前に取消指示を受付けた場合には、上記した処理を実行しない。 According to the above embodiment, the control unit 10 extracts the voice of the portion whose frequency characteristics are not irregular from the voice input to the voice input units 22A to 22C as the user voice, and the extracted user voice is the voice output unit. When a predetermined waiting time elapses without accepting the cancellation instruction after the voice output unit 23 outputs the user voice to the 23, the information stored in the HDD 17 in association with the user voice is stored in the HDD 17. If the above-mentioned processing is executed and the cancellation instruction is received before the above-mentioned waiting time elapses, the above-mentioned processing is not executed.

これによって、音声入力部２２Ａに入力された音声から、周波数特性が不規則でない部分の音声をユーザー音声として抽出するので、認識された音声に非周期性雑音が含まれることによって、入力された音声指示の内容が誤って認識されることを防止できる。また、ユーザーは、音声出力部３４によって出力された音声を確認することによって、ユーザー音声が正確に認識されているか否かを確認することができるので、視覚に障害のあるユーザーの利便性が向上する。さらに、ユーザー音声が出力されてから予め定められた待機時間が経過する前に取消指示を受付けた場合には、ユーザー音声に対応する処理が実行されないので、ユーザーが意図していない処理が実行されることを確実に防ぐことができる。 As a result, the voice of the portion whose frequency characteristics are not irregular is extracted as the user voice from the voice input to the voice input unit 22A. Therefore, the recognized voice includes aperiodic noise, so that the input voice is obtained. It is possible to prevent the content of the instruction from being erroneously recognized. Further, since the user can confirm whether or not the user's voice is accurately recognized by checking the voice output by the voice output unit 34, the convenience of the visually impaired user is improved. To do. Further, if the cancellation instruction is received before the predetermined waiting time elapses after the user voice is output, the process corresponding to the user voice is not executed, so that the process not intended by the user is executed. Can be reliably prevented.

また上記実施形態によれば、制御部１０は、音声入力部２２Ａ乃至２２Ｃにそれぞれ入力された音声の周波数特性において、共通して周期性を示している部分の音声をユーザー音声として特定して抽出する。これによって、音声入力部２２Ａに入力された音声からユーザー音声を正確に抽出できるので、認識された音声に非周期性雑音が含まれることによって、入力された音声指示の内容が誤って認識されることをより一層確実に防止できる。 Further, according to the above embodiment, the control unit 10 identifies and extracts the voice of the portion that commonly shows periodicity in the frequency characteristics of the voice input to the voice input units 22A to 22C as the user voice. To do. As a result, the user voice can be accurately extracted from the voice input to the voice input unit 22A, and the content of the input voice instruction is erroneously recognized due to the aperiodic noise included in the recognized voice. This can be prevented even more reliably.

また上記実施形態によれば、制御部１０は、ユーザー音声を音声出力部３４に出力させるとともに、ユーザー音声が示す内容を表示部１５に表示させる。これによって、ユーザーは、音声出力部３４によって出力された音声だけでなく、表示部１５に表示された内容を確認することによって、ユーザー音声が正確に認識されているか否かを確認することができる。したがって、視覚に障害のあるユーザーだけでなく、聴覚に障害のあるユーザーの利便性が向上する。 Further, according to the above embodiment, the control unit 10 causes the voice output unit 34 to output the user voice, and causes the display unit 15 to display the content indicated by the user voice. As a result, the user can confirm whether or not the user voice is accurately recognized by checking not only the voice output by the voice output unit 34 but also the content displayed on the display unit 15. .. Therefore, the convenience of not only visually impaired users but also hearing impaired users is improved.

また上記実施形態によれば、制御部１０は、音声入力部２２Ａ乃至２２Ｃ又は操作部１６を介して、取消指示を受付ける。したがって、ユーザーが視覚又は聴覚に障害のあるユーザーであっても、容易に取消指示を入力できるので、ユーザーの利便性がより一層向上する。 Further, according to the above embodiment, the control unit 10 receives a cancellation instruction via the voice input units 22A to 22C or the operation unit 16. Therefore, even if the user is a visually or hearing impaired user, the cancellation instruction can be easily input, and the convenience of the user is further improved.

（その他の変形例）
上記実施形態では、制御部１０は、待機時間を１つのみ設定したが、本発明はそのような実施形態に限定されない。例えば、制御部１０は、操作部１６を介して入力される管理者等の指示にしたがって、ユーザー音声に対応する処理ごとに異なる待機時間を設定してもよい。これによって、ユーザー音声に対応する処理ごとにユーザーの意向に沿った適切な待機時間を設定できるので、ユーザーの利便性がより一層向上する。 (Other variants)
In the above embodiment, the control unit 10 sets only one standby time, but the present invention is not limited to such an embodiment. For example, the control unit 10 may set a different standby time for each process corresponding to the user voice according to an instruction from the administrator or the like input via the operation unit 16. As a result, it is possible to set an appropriate waiting time according to the user's intention for each process corresponding to the user voice, so that the convenience of the user is further improved.

制御部１０はまた、コピー処理、プリント処理、及びスキャン処理のうちの少なくともいずれか１つの処理については、待機時間を設定しないように構成されていてもよい。この場合、制御部１０は、ユーザー音声に対応する処理が上記した少なくともいずれか１つの処理である場合には、待機時間の経過を待つことなく、上記した少なくともいずれか１つの処理を実行するための処理を実行する。これによって、例えば、コピー処理、プリント処理、及びスキャン処理等の時間を要する処理については待機時間の経過を待つことがなくなるので、ユーザー音声による指示が入力されてからユーザー音声に対応する処理が完了するまでのユーザーの待ち時間が長くなり過ぎることを防止できる。 The control unit 10 may also be configured not to set a waiting time for at least one of the copy process, the print process, and the scan process. In this case, when the process corresponding to the user voice is at least one of the above-mentioned processes, the control unit 10 executes at least one of the above-mentioned processes without waiting for the elapse of the waiting time. Executes the processing of. As a result, for example, for time-consuming processes such as copy processing, printing processing, and scanning processing, it is not necessary to wait for the elapse of the waiting time, so that the processing corresponding to the user voice is completed after the instruction by the user voice is input. It is possible to prevent the user from waiting too long before doing so.

また上記実施形態では、制御部１０は、第１周波数スペクトルから直接ユーザー音声を抽出したが、本発明はそのような実施形態に限定されない。例えば、制御部１０は、画像読取部１１、画像形成部１２、又は給紙部１４等に備えられているモーターから発生するモーター音等の周期性雑音の周波数スペクトルをＨＤＤ１７に予め記憶させておき、ユーザー音声を抽出する際に、まず、第１周波数スペクトルから上記周期性雑音の周波数スペクトルを除去し、当該除去後の第１周波数スペクトルからユーザー音声を抽出してもよい。これによって、認識された音声に周期性雑音が含まれている場合であっても、入力された音声指示の内容が誤って認識されることを防止できる。 Further, in the above embodiment, the control unit 10 directly extracts the user voice from the first frequency spectrum, but the present invention is not limited to such an embodiment. For example, the control unit 10 stores in advance the frequency spectrum of periodic noise such as motor sound generated from a motor provided in the image reading unit 11, the image forming unit 12, the feeding unit 14, or the like in the HDD 17. When extracting the user sound, the frequency spectrum of the periodic noise may be first removed from the first frequency spectrum, and the user sound may be extracted from the first frequency spectrum after the removal. As a result, even when the recognized voice contains periodic noise, it is possible to prevent the content of the input voice instruction from being erroneously recognized.

また、この場合、制御部１０は、周期性雑音の周波数スペクトルと、第１周波数スペクトルとの類似性の度合いを示す値が予め定められた値以上である場合に、例えば、「音声指示をもう一度行なって下さい。」等のメッセージを音声出力部２３に出力させたり、表示部１６に表示させたりすることによって、ユーザー音声の再入力を促すように構成されていてもよい。これによって、入力された音声指示の内容が誤って認識されることをより一層確実に防止できる。 Further, in this case, when the value indicating the degree of similarity between the frequency spectrum of the periodic noise and the first frequency spectrum is equal to or more than a predetermined value, the control unit 10 may, for example, "give the voice instruction again. It may be configured to prompt the re-input of the user voice by outputting a message such as "Please go" to the voice output unit 23 or displaying it on the display unit 16. As a result, it is possible to more reliably prevent the content of the input voice instruction from being erroneously recognized.

また上記実施形態では、制御部１０は、第１周波数スペクトルのみからユーザー音声を抽出したが、本発明はそのような実施形態に限定されない。例えば、制御部１０は、第１乃至第３周波数スペクトルのそれぞれからユーザー音声を抽出してもよい。この場合、制御部１０は、一般的な音声合成技術によって、抽出された３つのユーザー音声を合成し、合成されたユーザー音声を音声出力部２３に出力させるように構成されていてもよい。 Further, in the above embodiment, the control unit 10 extracts the user voice only from the first frequency spectrum, but the present invention is not limited to such an embodiment. For example, the control unit 10 may extract the user voice from each of the first to third frequency spectra. In this case, the control unit 10 may be configured to synthesize the three extracted user voices by a general voice synthesis technique and output the synthesized user voices to the voice output unit 23.

また上記実施形態では、制御部１０は、画像形成装置１の操作モードを通常モード及び音声モードのうちのいずれかに設定するための設定画面として、設定画面４０を表示したが、本発明はそのような実施形態に限定されない。例えば、制御部１０は、色覚に障害のないユーザーが認識でき、色覚に障害のあるユーザーが認識できない画像を含んでいる設定画面を表示させてもよい。この場合、制御部１０は、ユーザーによって、操作部１６を介して、当該画像を認識できることを示す情報が入力された場合に通常モードを設定し、当該画像を認識できないことを示す情報が入力された場合に音声モードを設定する。 Further, in the above embodiment, the control unit 10 displays the setting screen 40 as a setting screen for setting the operation mode of the image forming apparatus 1 to either the normal mode or the voice mode. It is not limited to such an embodiment. For example, the control unit 10 may display a setting screen including an image that can be recognized by a user without color vision deficiency and cannot be recognized by a user with color vision deficiency. In this case, the control unit 10 sets the normal mode when the user inputs information indicating that the image can be recognized via the operation unit 16, and the information indicating that the image cannot be recognized is input. If so, set the voice mode.

また上記実施形態では、制御部１０は、ラジオボタン４３及びラジオボタン４４の表示に代えて、音声出力部２３からモスキート音を出力させてもよい。この場合、制御部１０は、ユーザーによって、操作部１６を介して、モスキート音を認識できることを示す情報が入力された場合には、音声のみによって操作案内を行なうための設定を行ない、モスキート音を認識できないことを示す情報が入力された場合には、音声及び表示によって操作案内を行なうための設定を行なう。 Further, in the above embodiment, the control unit 10 may output a mosquito sound from the voice output unit 23 instead of displaying the radio button 43 and the radio button 44. In this case, when the user inputs information indicating that the mosquito sound can be recognized via the operation unit 16, the control unit 10 makes a setting for performing the operation guidance only by voice, and makes the mosquito sound. When information indicating that it cannot be recognized is input, settings are made to provide operation guidance by voice and display.

また上記実施形態では、音声入力部は３つ設けられたが、本発明はそのような実施形態に限定されず、例えば、音声入力部は４つ設けられてもよいし、５つ設けられてもよい。 Further, in the above embodiment, three voice input units are provided, but the present invention is not limited to such an embodiment. For example, four voice input units may be provided or five may be provided. May be good.

なお、本発明は上記実施形態の構成に限られず種々の変形が可能である。例えば、上記実施形態では、電子機器としてカラー複合機を用いているが、これは一例に過ぎず、モノクロ複合機、コピー機、又はファクシミリ装置等の他の画像形成装置が電子機器として用いられてもよいし、又は、ＰＣが電子機器として用いられてもよい。 The present invention is not limited to the configuration of the above embodiment, and various modifications can be made. For example, in the above embodiment, a color multifunction device is used as an electronic device, but this is only an example, and another image forming device such as a monochrome multifunction device, a copier, or a facsimile machine is used as the electronic device. Alternatively, a PC may be used as an electronic device.

図１乃至図４を用いて示した上記実施形態の構成及び処理は、本発明の一実施形態に過ぎず、本発明を当該構成及び処理に限定する趣旨ではない。 The configuration and processing of the above-described embodiment shown with reference to FIGS. 1 to 4 are merely one embodiment of the present invention, and the present invention is not intended to be limited to the configuration and processing.

１画像形成装置
１０制御部
１２画像形成部
１５表示部
１６操作部
１７ＨＤＤ
２２Ａ，２２Ｂ，２２Ｃ音声入力部
２３音声出力部 1 Image forming device 10 Control unit 12 Image forming unit 15 Display unit 16 Operation unit 17 HDD
22A, 22B, 22C Audio input unit 23 Audio output unit

Claims

A control unit that can execute multiple processes and
The voice input section where voice is input and
An audio output unit that outputs audio and
A storage unit that stores a predetermined voice in association with information indicating any of the plurality of processes is provided.
The control unit
From the voice input to the voice input unit, the voice of the part where the frequency characteristics are not irregular is extracted as the user voice, and the voice is extracted.
The voice output unit is made to output the extracted user voice.
When a predetermined time elapses without receiving a cancellation instruction for canceling the user voice after the voice output unit outputs the user voice, the user voice is supported among the plurality of processes. The process indicated by the information stored in the storage unit is executed.
An electronic device that does not execute the process when the cancellation instruction is received before the predetermined time elapses after the voice output unit outputs the user voice.

A plurality of the voice input units are provided, and the voice input unit is provided.
According to claim 1, the control unit identifies and extracts the voice of a portion that commonly shows periodicity in the frequency characteristics of the voice input to each of the plurality of voice input units as the user voice. The electronic device described.

With an additional display
The electronic device according to claim 1 or 2, wherein the control unit causes the voice output unit to output the user voice and causes the display unit to display the content indicated by the user voice.

Input section where user's instructions are input and
An image forming unit for executing an image forming process for forming an image on a recording medium is further provided.
When the process is the image forming process, the control unit executes a process for causing the image forming unit to perform the image forming process without waiting for the elapse of the predetermined time. The electronic device according to any one of claims 1 to 3.

It also has an instruction input section where user instructions are input.
The electronic device according to any one of claims 1 to 4, wherein the control unit receives the cancellation instruction via the voice input unit or the input unit.