JP2017514225A

JP2017514225A - Smart optical input / output (I / O) extension for context-sensitive workflows

Info

Publication number: JP2017514225A
Application number: JP2016562561A
Authority: JP
Inventors: マッチオーラ，アンソニー; アントラップ，ジャン・ダブリュ
Original assignee: コファックス，インコーポレイテッド
Priority date: 2014-04-15
Filing date: 2015-04-15
Publication date: 2017-06-01
Also published as: EP3132381A1; EP3132381A4; WO2015160988A1; CN106170798A

Abstract

モバイルデバイスの光センサを用いてテキスト情報をスマートかつ自動的にキャプチャするためのシステム、方法およびコンピュータプログラムプロダクトが開示される。テキスト情報は、コピー／ペースト動作などのユーザ介入を必要とすることなく、ユーザが手動でデータを入力するかまたは転送する必要なしに、モバイルアプリケーションまたはワークフローに提供される。キャプチャおよび提供は、コンテキスト認識型であり、ワークフローまたはモバイルアプリケーションにおけるエントリよりも前に、キャプチャされたテキスト情報を正規化するかまたは確認することができる。ワークフローによって必要であり、モバイルデバイス光センサにとって利用可能である他の情報も、単独の自動プロセスにおいてキャプチャおよび提供され得る。結果として、モバイルデバイスを用いて光入力から情報をキャプチャする全体的なプロセスが、データ転送／入力の精度、ワークフローの速度および効率ならびにユーザ経験の点から、著しく単純化されて、向上させられる。Disclosed are systems, methods and computer program products for smart and automatic capture of text information using an optical sensor of a mobile device. The text information is provided to the mobile application or workflow without the need for user intervention such as a copy / paste operation and without the user having to manually enter or transfer data. Capture and delivery is context-aware and can normalize or confirm captured text information prior to entry in a workflow or mobile application. Other information required by the workflow and available to the mobile device light sensor can also be captured and provided in a single automated process. As a result, the overall process of capturing information from optical input using a mobile device is significantly simplified and improved in terms of data transfer / input accuracy, workflow speed and efficiency, and user experience.

Description

関連出願
本願は、２０１４年４月１５日に提出された米国仮特許出願第６１／９７９，９４９号の利点を主張する、２０１５年４月１４日に提出された米国出願第１４／６８６，６４４号の利点を主張するものであって、その全体が引用によりここに援用されている。 RELATED APPLICATION This application claims US application Ser. No. 14 / 686,644, filed Apr. 14, 2015, which claims the benefit of US Provisional Patent Application No. 61 / 979,949, filed Apr. 15, 2014. Claims the advantages of the issue, which is incorporated herein by reference in its entirety.

発明の分野
本発明の開示は、モバイルデバイスの光学部品を用いた入出力（input/output：Ｉ／Ｏ）に関する。より具体的には、本概念は、モバイルデバイスの光入力機能をモバイルデバイスの出力機能に統合することに関し、さらにより具体的には、モバイルデバイスのカメラからの光入力の、モバイルのワークフローまたはアプリケーション用のテキスト出力へのコンテキスト依存型統合を実行することに関する。 The present disclosure relates to input / output (I / O) using optical components of mobile devices. More specifically, the concept relates to integrating the light input function of a mobile device with the output function of the mobile device, and even more specifically, the mobile workflow or application of light input from the mobile device camera. Performing context-sensitive integration into text output for

背景
モバイルデバイスは、発展し続ける市場においてますます顕著になる隙間市場を占めており、表面的には無数のアクティビティを実施するさまざまな段階におけるアクセスポイントとして機能している。このトレンドが継続するにつれて、モバイルデバイスと、これらによって提供されるモバイルネットワーク能力とは、ますます多くの広範なシナリオにおいて活用されている。近年の例として、小切手預託、為替手形支払い、アカウント管理などの多数の金融サービスを提供するためのモバイル技術の拡張が含まれる。加えて、モバイルデバイスによって集められた所在地データは、たとえば、対象とされた広告宣伝、状況認識などを提供するためにますます多くのアプリケーションにおいて利用されている。 Background Mobile devices occupy an increasingly prominent gap market in an ever-growing market, and function as access points at various stages of performing countless activities on the surface. As this trend continues, mobile devices and the mobile network capabilities provided by them are being utilized in an increasing number of broader scenarios. Recent examples include the expansion of mobile technology to provide numerous financial services such as check deposit, bill bill payment, account management and the like. In addition, location data collected by mobile devices is being used in an increasing number of applications, for example, to provide targeted advertising, situational awareness, and the like.

モバイルの開発コミュニティがデバイスのための新しい有用性を発見すると、モバイルデバイスが実行のために利用されている基礎的プロセスによって必要とされるか、または当該基礎的プロセスにとって有利になる入力を与えるために、より多くの複雑で特定的な機会がユーザに与えられる。加えて、ユーザがプロセスと対話するかまたはプロセスに入力を与える可能性のある状況のコンテキストは多様化し続けている。 When the mobile development community discovers a new utility for a device, to provide input that is required by or advantageous to the underlying process that the mobile device is being utilized for execution In addition, the user is given more complex and specific opportunities. In addition, the context of situations in which a user may interact with or provide input to the process continues to diversify.

この多様化は、本来、実現された技術が、ユーザの観点からは必ずしも最適ではあり得ないかまたは許容可能なアプローチであり得ない隙間市場への拡大を含んでいる。所定の難問に対する解決策が許容可能であるかまたは許容可能でないかの違いを瞬時のうちに判断する文化においては、開発者が、優れた技術を達成するためにあらゆる実現可能な性能利点を追求している。 This diversification inherently includes an expansion into the gap market where the implemented technology may not always be optimal or acceptable from the user's point of view. In a culture that instantly determines whether a solution to a given challenge is acceptable or unacceptable, developers seek all possible performance benefits to achieve superior technology doing.

たとえば、モバイルデバイスを介して受信されるユーザ入力について、いくつかの非能率性が周知である。第１の非能率性として、モバイルデバイス、特に携帯電話、に特有の小型のスクリーンサイズが挙げられる。従来の「スマートフォン」が物理的なキーボードおよびポインタデバイスを排除して、代わりに、タッチスクリーン技術に依拠しているので、モバイルデバイスのスクリーン上に表示された仮想の「キーボード」上の所与のキーに割り当てられた物理的空間の量は、人の指で正確かつ確実に起動させることができる範囲よりもはるかに少ない。結果として、モバイルデバイスを介して受信されたテキスト形式のユーザ入力を鑑みると、誤字が頻繁に発生している。 For example, some inefficiencies are known for user input received via a mobile device. The first inefficiency is the small screen size typical of mobile devices, especially mobile phones. A traditional “smartphone” eliminates the physical keyboard and pointer device and instead relies on touch screen technology, so given a virtual “keyboard” displayed on the mobile device screen The amount of physical space allocated to the keys is much less than the range that can be activated accurately and reliably with a human finger. As a result, typographical errors frequently occur in light of textual user input received via the mobile device.

このような制限をなくすために、典型的なモバイルデバイスは、所与のユーザの入力動作を「学習する」ために有力な予測的解析および辞書を採用している。ユーザの実際の入力が規定されたノルマ、パターンなどに適合していないテキストに対応している場合、モバイルデバイスは、開発された予測モデルに基づいて、ユーザの意図した入力テキストを予測することができる。このような予測的分析および辞書を利用する最も顕著な例が、最も典型的なモバイルデバイスで利用可能な従来の「オートコレクト」機能で具体化される。 To eliminate such limitations, typical mobile devices employ powerful predictive analysis and dictionaries to “learn” the input behavior of a given user. If the user's actual input corresponds to text that does not conform to the specified quota, pattern, etc., the mobile device may predict the user's intended input text based on the developed prediction model. it can. The most prominent example of utilizing such predictive analysis and dictionaries is embodied in the traditional “autocorrect” feature available on most typical mobile devices.

しかしながら、これらの「オートコレクト」アプローチは、不正確な（または不適切な）予測を生成するので、モバイルコミュニティにおいては評判が悪い。コンテキストによっては、これらの不正確性はユーモラスになるが、誤った予測が広まると、結果として、伝達不良およびエラーを発生させてしまい、基礎をなすプロセスやユーザにフラストレーションを起こし、最終的には、大きな恩恵を得るためにモバイルデバイスが活用され得る多種多様なコンテキストにおいてモバイルデバイスが採用できなくなり、その有用性が無効になってしまう。 However, these “autocorrect” approaches generate inaccurate (or inadequate) predictions and are therefore not well received in the mobile community. In some contexts, these inaccuracies can be humorous, but widespread mispredictions can result in miscommunication and errors, frustrating the underlying processes and users, and ultimately Makes it impossible to employ mobile devices in a wide variety of contexts where they can be leveraged to gain significant benefits, making their usefulness invalid.

結果として、開発者の中には、代替の入力源や、モバイルデバイスを介して入力を集約するための技術に頼るものもでてきた。たとえば、大抵の解決策は、テキスト入力（すなわち、モバイルデバイスのディスプレイに示されたバーチャルキーボードによって受信される触覚入力）の代替案または補足案として音声入力を活用することに焦点を合わせてきた。実際には、この技術は、従来から、モバイルデバイスの音声認識機能を統合したものとして（たとえば、ＡＰＰＬＥモバイルデバイス（ｉＯＳ５．０以上）に搭載された「Ｓｉｒｉ」などの「バーチャル・アシスタント」によって付与されているように）具体化されてきた。 As a result, some developers have relied on alternative input sources and techniques for aggregating input via mobile devices. For example, most solutions have focused on utilizing speech input as an alternative or supplement to text input (ie, haptic input received by a virtual keyboard shown on a mobile device display). In practice, this technology has traditionally been integrated with the voice recognition capabilities of mobile devices (eg, by “virtual assistants” such as “Siri” installed in APPLE mobile devices (iOS 5.0 and above)). Have been materialized).

モバイルキーボードに付加されているこの音声入力拡張部の具体的な実施形態を添付の図において示している。この図は、ＡＰＰＬＥのｉＯＳモバイルのオペレーティングシステムを用いて生成されるインターフェイスを表示しているが、同様の機能は、ＡＮＤＲＯＩＤ（登録商標）、ＭＩＣＲＯＳＯＦＴＳＵＲＦＡＣＥＲＴなどの他のプラットフォーム上にも同様に見出される可能性がある。 A specific embodiment of this voice input extension added to the mobile keyboard is shown in the accompanying drawings. Although this figure shows the interface generated using APPLE's iOS Mobile operating system, similar functionality is found on other platforms such as ANDROID®, MICROSOFT SURFACE RT as well. There is a possibility.

音声入力は、モバイルのバーチャルキーボードに拡張部を統合することによって受信され得るものであって、モバイルデバイスのディスプレイを介して受信される典型的な触覚入力以外の入力をユーザが容易に行なえるようにする。１つのアプローチにおいては、音声拡張部は、（左側にある）スペースバーのすぐ隣りにあるマイクロホンのアイコンまたは記号を示すボタンとして表示されている。ユーザは、テキスト入力（たとえばオンライン形式でのフィールド、ＰＤＦなど）を受付けるように構成されたフィールドと対話してもよい。モバイルデバイスは、フィールドとのユーザの対話を検出したことに応じて、モバイルのバーチャルキーボードのユーザインターフェイスを呼出すためにオペレーティングシステムを活用している。さらに、ユーザは、任意には、所望のテキストを入力するために触覚入力を行なうか、または、音声入力インターフェイスを呼出すために音声拡張部と対話する。当該技術分野においては、この技術は、音声入力を受付けて、受信された音声入力をテキスト情報に変換する、「スピーチ・トゥ・テキスト（speech-to-text）」機能として一般に公知である。 Voice input can be received by integrating an extension into the mobile virtual keyboard so that the user can easily make input other than typical haptic input received via the display of the mobile device. To. In one approach, the audio extension is displayed as a button indicating a microphone icon or symbol immediately adjacent to the space bar (on the left). A user may interact with a field configured to accept text input (eg, fields in online form, PDF, etc.). The mobile device utilizes an operating system to invoke the user interface of the mobile virtual keyboard in response to detecting user interaction with the field. In addition, the user optionally makes haptic input to enter the desired text or interacts with the voice extension to invoke the voice input interface. In the art, this technique is generally known as a “speech-to-text” function that accepts speech input and converts the received speech input into text information.

音声入力インターフェイスが呼出されると、任意には、モバイルデバイスのディスプレイを介してユーザからの追加入力が受信される（たとえば、音声入力の開始を示すために音声拡張部を２回タップする）ことに応じて、ユーザは音声入力を行ない、これが、モバイルデバイス音声認識コンポーネントによって分析され、テキストに変換されて、フィールドに入力されて、ユーザがこれと対話することによって、モバイルのバーチャルキーボードを呼出す。 When the voice input interface is invoked, optionally additional input from the user is received via the display of the mobile device (eg, tapping the voice extension twice to indicate the start of voice input). In response, the user makes speech input, which is analyzed by the mobile device speech recognition component, converted to text, entered into the field, and the user interacts with it to invoke the mobile virtual keyboard.

モバイルデバイスのテキスト入出力能力に音声入力を統合することによって、ユーザは、ハンズフリーのアプローチでテキスト情報を入力することが可能となり、これにより、デバイスの利用可能な有用性を、他の場合には実現不可能なコンテキストのホスト全体にまで拡大する。たとえば、ユーザは、これらのアプローチに従って、音声入力だけを用いてテキストメッセージを生成してもよい。しかしながら、これらのアプローチでも、同様にフラストレーションを引起こして性能を低下させる不正確性および不整合性によって悩まされているが、これらのことは既存の音声認識技術では周知である。結果として、テキスト入力を補足するかまたはテキスト入力に置き換えるという現在の音声認識のアプローチでは不十分である。 By integrating voice input into the text input / output capabilities of mobile devices, users can enter text information in a hands-free approach, which makes the device's available usefulness in other cases. Extends to the entire host of unrealizable contexts. For example, a user may generate a text message using only voice input according to these approaches. However, these approaches are also plagued by inaccuracies and inconsistencies that similarly cause frustration and degrade performance, which are well known in existing speech recognition technology. As a result, current speech recognition approaches that supplement or replace text input are insufficient.

現在利用可能な音声認識では、障害が生じることは公知であり、しばしば、音声認識ソフトウェアは、特定の個々人によって発揮される固有の発声を単に認識することができない場合がある。同様に、音声認識では、「オーディオグラフィカル（audiographical）」なエラー（すなわち、発声された単語を誤って「認識する」などの、音声入力についての「誤記」エラーと類似したエラー）が生じやすくなる。 It is well known that currently available speech recognition is disturbed, and often speech recognition software may simply not be able to recognize the unique utterances played by a particular individual. Similarly, speech recognition is prone to “audiographical” errors (ie, errors similar to “wrong” errors in speech input such as “recognizing” a spoken word incorrectly). .

さらに、音声認識は、本質的に、予め定められた一連のルール（たとえば、話されている言語に基づいて規定され得る一組の仮定または条件）が存在することで制限されてしまう。さらに、同じ言語であっても使用法が口語形式と書面の文語形式とで著しく異なっていることが多いので、テキスト入力の補足案または代替案として音声入力を利用することも不可能であるかもしれない。たとえば、音声入力は、（音声認識が依拠している「ルール」をしばしば規定している）予想される表現および／または使用のフォームが言語の書き言葉の形式に対応している環境においては、触覚入力の実施不可能な代替案であることが多い。 Furthermore, speech recognition is inherently limited by the existence of a predetermined set of rules (eg, a set of assumptions or conditions that can be defined based on the language being spoken). In addition, even in the same language, usage is often significantly different between spoken and written sentences, so it may not be possible to use speech input as a supplement or alternative to text input. unknown. For example, speech input is haptic in environments where the expected representation and / or form of use (which often prescribes the “rules” on which speech recognition relies) corresponds to the written form of the language. Often this is an infeasible alternative to input.

音声認識はまた、言葉で表現することが一般的ではないかまたは言葉で表現することができない情報に対応しているユーザ入力を取得するかまたは確認するために利用するべき下位のツールである。これらの制限事項の基本的な例は、単位をラベル表示するのにしばしば利用されるような記号を含むユーザ入力の観点から実証可能である。これらの単位が一般に容認されている発音である（たとえば、「米ドル」として知られている通過単位が記号「＄」に対応している）場合であっても、これらの発音は必ずしも対応する用語の固有の用法であるとは限らない（たとえば、「ポンド」は、文脈に応じて、重量測定単位、すなわち「ｌｂｓ．」または通貨単位、たとえば「£」のどちらかに対応し得る）。 Speech recognition is also a subordinate tool that should be utilized to obtain or confirm user input corresponding to information that is not common or cannot be verbalized. A basic example of these restrictions can be demonstrated in terms of user input including symbols that are often used to label units. Even if these units are generally accepted pronunciations (eg, a passing unit known as “US dollar” corresponds to the symbol “$”), these pronunciations are not necessarily the corresponding terms. (For example, “pound” may correspond to either a unit of weight measurement, ie “lbs.” Or a currency unit, eg “£”, depending on the context).

音声認識はまた、文法上の記号（たとえば、カンマ「，」セミコロン「；」、ピリオド「．」などの）文法情報を伝えるために用いられる１つ以上の「記号」）を含むテキスト入力；または、対応する物理的表現を表現された言語で必ずしも有する必要のない（たとえば、キャリッジリターン、タブ、スペース、特定の文字列合せなどの）記号を含むフォーマット入力；を受付けて処理することには適していない。 Speech recognition also includes text input that includes grammatical symbols (eg, one or more “symbols” used to convey grammatical information (eg, comma “,” semicolon “;”, period “.”, Etc.)); or Suitable for accepting and processing format inputs that contain symbols (eg, carriage returns, tabs, spaces, specific string alignments, etc.) that do not necessarily have a corresponding physical representation in the expressed language Not.

他の既存のアプローチは、テキスト入力の補足として光入力の使用を含むが、これらの技術は、単に、画像またはビデオクリップとテキスト入力を組合わせる能力を提供するだけに過ぎず、ユーザの好ましい通信形式（すなわち、ＳＭＳテキストメッセージ、電子メール、ビデオチャットなど）によってこの組合わされた入力を分散させてしまう。これらの従来のアプローチは、典型的には、組合わされた入力インターフェイスを含んでおり、これは、モバイルデバイスのバーチャルキーボードを介する触覚入力と、入力インターフェイス上に配置された別個のボタンを介する光入力との受信を容易にする（が、上述の音声入力機能の場合と同様に必ずしもバーチャルキーボードに含まれている必要はない）。 Other existing approaches include the use of light input as a supplement to text input, but these techniques merely provide the ability to combine text input with images or video clips, and favor user communication. The format (ie, SMS text message, email, video chat, etc.) distributes this combined input. These conventional approaches typically include a combined input interface, which includes tactile input via the mobile device's virtual keyboard and light input via a separate button located on the input interface. (But not necessarily included in the virtual keyboard as in the case of the voice input function described above).

ユーザがこの別個のボタンと対話すると、デバイスは、予めキャプチャされた光入力を含めることが容易になるか、または、代替的には、新しい光入力をキャプチャするためにキャプチャインターフェイスを呼出すことが容易になり、ユーザがモバイルのバーチャルキーボードに触覚入力を与えることによってテキスト情報入力に加えて、予めキャプチャされたかまたは新しくキャプチャされた光入力を含むことが容易になる。 When the user interacts with this separate button, the device can easily include a pre-captured light input, or alternatively can easily call the capture interface to capture a new light input It becomes easier for the user to include pre-captured or newly captured light input in addition to text information input by providing tactile input to the mobile virtual keyboard.

上述の結果、モバイルデバイスを介して既存の光入力と音声入力とを統合することは、モバイルデバイスを介してユーザ入力を受信して処理するというに補足的アプローチまたは代替的アプローチとして厳密に制限されてしまう。既存の戦略では、音声認識のための音声の入力が煩雑になるか、または、テキスト入力を補うための画像の入力が煩雑になる可能性がある。しかしながら、これらの技術では、モバイルデバイスによるテキスト入力に対してインテリジェントな代替および／または補足をもたらすコンテキスト依存の態様では、これらのさまざまな入力能力を統合することができない。 As a result of the above, the integration of existing optical and voice inputs via mobile devices is strictly limited as a supplemental or alternative approach to receiving and processing user inputs via mobile devices. End up. With the existing strategy, there is a possibility that the input of speech for speech recognition becomes complicated or the input of an image for supplementing the text input becomes complicated. However, these technologies cannot integrate these various input capabilities in a context sensitive manner that provides an intelligent alternative and / or supplement to text input by mobile devices.

追加入力能力の確保は、デバイスの性能を低下させるのではなく支援する生産的な態様で実施され、かつ、デバイスとのユーザの対話は複雑なタスクとなる。このため、光入力が有用となるであろうさまざまなコンテキストを注意深く検討することが必要になり、さらに、光入力をキャプチャおよび／または分析するための適切な条件が必要となり、これにより、ユーザからテキスト情報を受信するための入力のソースとしてモバイルデバイスのカメラをインテリジェントに統合することによって提供されるコンテキスト依存による利点が実現され得る。 Ensuring additional input capabilities is implemented in a productive manner that supports rather than degrades device performance, and user interaction with the device is a complex task. This necessitates careful consideration of the various contexts in which light input may be useful, as well as appropriate conditions for capturing and / or analyzing the light input, which can The context-dependent benefits provided by intelligently integrating the mobile device's camera as a source of input for receiving text information may be realized.

したがって、ユーザ入力を受信して、出力（特に、受信された入力と、入力が受信された状況または入力が提供される目的についてのコンテキストとに全体的または部分的に基づいて決定される出力）を生成するために、メカニズムとして触覚入力および聴覚入力を補足および／または置換するように構成された新しい方法、システムおよび／またはコンピュータプログラムプロダクトの技術を提供することは非常に有用となるだろう。 Thus, user input is received and output (especially output determined based in whole or in part on the received input and the context for which the input was received or the purpose for which the input was provided). It would be very useful to provide new methods, systems and / or computer program product techniques configured to supplement and / or replace haptic and auditory inputs as mechanisms.

一実施形態に従った、ユーザ入力を受信するように構成されたモバイルデバイス・ユーザインターフェイスを示す図である。FIG. 3 illustrates a mobile device user interface configured to receive user input according to one embodiment. 一実施形態に従った、ユーザ入力を受信するように構成されたモバイルデバイス・ユーザインターフェイスを示す図である。FIG. 3 illustrates a mobile device user interface configured to receive user input according to one embodiment. 一実施形態に従った方法のフローチャートである。2 is a flowchart of a method according to one embodiment. 一実施形態に従った方法のフローチャートである。2 is a flowchart of a method according to one embodiment.

発明の概要
一実施形態においては、方法は、モバイルデバイス上にユーザ入力インターフェイスを呼出すステップと、ユーザ入力インターフェイスの光入力拡張部を呼出すステップと、モバイルデバイスの１つ以上の光センサを介して光入力をキャプチャするステップと、キャプチャされた光入力からテキスト情報を決定するステップと、決定されたテキスト情報をユーザ入力インターフェイスに提供するステップとを含む。 SUMMARY OF THE INVENTION In one embodiment, a method calls a user input interface on a mobile device; calls a light input extension of the user input interface; and transmits light via one or more light sensors of the mobile device. Capturing input, determining text information from the captured light input, and providing the determined text information to a user input interface.

別の実施形態においては、方法は、モバイルデバイスの１つ以上の光センサを介して光入力を受信するステップと、光入力のコンテキストを決定するためにモバイルデバイスのプロセッサを用いて光入力を分析するステップと、光入力のコンテキストに基づいてコンテキスト的に適切なワークフローを自動的に呼出すステップとを含む。 In another embodiment, a method receives light input via one or more light sensors of a mobile device and analyzes the light input using a processor of the mobile device to determine the context of the light input. And automatically calling a contextally appropriate workflow based on the context of the light input.

さらに別の実施形態においては、コンピュータプログラムプロダクトは、プログラムコードが組込まれたコンピュータ読取可能記憶媒体を含む。プログラムコードは、プロセッサが、モバイルデバイス上にユーザ入力インターフェイスを呼出し、ユーザ入力インターフェイスの光入力拡張部を呼出し、モバイルデバイスの１つ以上の光センサを介して光入力をキャプチャし、キャプチャされた光入力からテキスト情報を決定し、決定されたテキスト情報をユーザ入力インターフェイスに提供することによって、読取り可能／実行可能である。 In yet another embodiment, a computer program product includes a computer readable storage medium having program code embedded therein. The program code causes the processor to call a user input interface on the mobile device, call a light input extension of the user input interface, capture the light input via one or more light sensors of the mobile device, and capture the captured light. It is readable / executable by determining text information from the input and providing the determined text information to a user input interface.

本発明の他の局面および実施形態は、添付の図面に関連付けて読まれると、本発明の原理を例として説明している以下の詳細な説明から明らかになるだろう。 Other aspects and embodiments of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.

詳細な説明
以下の記載は、本発明の一般原則を例示する目的でなされるものであり、この明細書中において主張される発明の概念を限定するよう意図されたものではない。さらに、この明細書中に記載された特定の特徴は、さまざまな実現可能な組合せおよび代替例の各々において記載された他の特徴と組合わせて用いることができる。 DETAILED DESCRIPTION The following description is made for the purpose of illustrating the general principles of the invention and is not intended to limit the inventive concepts claimed herein. Furthermore, the particular features described in this specification can be used in combination with other possible features described in each of the various possible combinations and alternatives.

この明細書中に特に規定されない限り、すべての用語は、明細書から暗示される意味、ならびに当業者によって理解される、および／または辞書、全書等に定義される意味を含む、可能な限り広範な解釈を与えられるものとする。 Unless otherwise defined in this specification, all terms are as broad as possible, including those implied by the specification and meanings understood by those of ordinary skill in the art and / or defined in dictionaries, complete text, etc. Interpretation is given.

また、明細書および添付の特許請求の範囲において使用されるように、単数形の「ａ」、「ａｎ」および「ｔｈｅ」は特に定めのない限り複数の指示対象を含むことに留意しなければならない。 Also, as used in the specification and the appended claims, the singular forms “a”, “an” and “the” include plural referents unless the context clearly dictates otherwise. Don't be.

本願は、カメラ、特にモバイル機器のカメラによってキャプチャされた画像（たとえば写真、図面、グラフ図、映画、ビデオ、フィルム、クリップの単一フレーム等）の画像処理について述べる。ここに理解されるように、モバイルデバイスとは、物理的接続（たとえばワイヤ、コード、ケーブル等）を介して電力を供給されずにデータを受信可能であり、かつ物理的なデータ接続（たとえばワイヤ、コード、ケーブル等）なしでデータを受信可能な任意のデバイスである。本開示の範囲内のモバイル機器は、携帯電話、スマートフォン、タブレット、携帯情報端末、ｉＰｏｄ（登録商標）、ｉＰａｄ（登録商標）、ＢＬＡＣＫＢＥＲＲＹ（登録商標）デバイス等の例示的な機器を含む。 This application describes image processing of images (eg, photographs, drawings, graphs, movies, videos, films, single frames of clips, etc.) captured by cameras, particularly mobile device cameras. As understood herein, a mobile device is capable of receiving data without being supplied with power through a physical connection (eg, wire, cord, cable, etc.) and a physical data connection (eg, wire Any device that can receive data without a cord, cable, etc.). Mobile devices within the scope of this disclosure include exemplary devices such as mobile phones, smartphones, tablets, personal digital assistants, iPod®, iPad®, BLACKBERRY® devices, and the like.

当然、ここに記述されるさまざまな実施形態は、ハードウェア、ソフトウェア、またはそれらの任意の所望の組合せを利用して実現され得る。そのことについて、ここに記述されるさまざまな機能を実現可能な任意の種類のロジックが利用され得る。 Of course, the various embodiments described herein may be implemented utilizing hardware, software, or any desired combination thereof. In that regard, any type of logic capable of implementing the various functions described herein may be utilized.

モバイル機器を用いる１つの利点は、データプランを用いて、キャプチャ画像に基づく画像処理および情報処理が、スキャナの存在に依拠する以前の方法よりもはるかに便利で能率化および統合された方法で行うことができることである。しかし、文書キャプチャおよび／または処理装置としてのモバイル機器の使用は、さまざまな理由によりこれまで不可能であると考えられてきた。 One advantage of using mobile devices is that using data plans, image processing and information processing based on captured images is done in a much more convenient, streamlined and integrated way than previous methods that rely on the presence of a scanner. Be able to. However, the use of mobile devices as document capture and / or processing devices has been previously considered impossible for a variety of reasons.

１つのアプローチでは、画像はモバイル機器のカメラによってキャプチャされ得る。「カメラ」という語は、紙などの機器外部の物理的対象の画像をキャプチャ可能な任意の種類の機器を含むと広範に解釈すべきである。「カメラ」という語は、周辺スキャナまたは多機能装置を含まない。任意の種類のカメラが使用され得る。好ましい実施形態では、たとえば８ＭＰ以上、理想的には１２ＭＰ以上のより高い解像度を有するカメラが使用され得る。画像はカラーで、グレースケールで、モノクロで、または任意の他の公知の光学効果を有してキャプチャされ得る。ここで言及される「画像」という語は、生データ、処理データ等を含む、カメラの出力に対応する任意の種類のデータを含むことが意図される。 In one approach, the image may be captured by a mobile device camera. The term “camera” should be broadly interpreted to include any type of device that can capture images of physical objects external to the device, such as paper. The term “camera” does not include a peripheral scanner or multifunction device. Any type of camera can be used. In a preferred embodiment, a camera with a higher resolution may be used, for example 8MP or higher, ideally 12MP or higher. The image can be captured in color, grayscale, monochrome, or with any other known optical effect. The term “image” referred to herein is intended to include any type of data corresponding to the output of the camera, including raw data, processed data, and the like.

この明細書中において記載される場合、「音声認識」という語は、いくつかのモバイルデバイスに設けられたいわゆる「スピーチ・トゥ・テキスト」機能（たとえば、「Ｓｉｒｉ」）と同等であるかまたはこれを包含するものと考えられるべきであり、この機能により、音声入力をテキスト出力に変換することが可能となる。対照的に、この明細書中に記載される発明の技術は、「イメージ・トゥ・テキスト（image-to-text）」機能または「ビデオ・トゥ・テキスト（video-to-text）」機能と称されてもよい。 As described herein, the term “voice recognition” is equivalent to or equivalent to the so-called “speech to text” function (eg, “Siri”) provided on some mobile devices. This feature allows the voice input to be converted to text output. In contrast, the inventive techniques described in this specification are referred to as “image-to-text” or “video-to-text” functions. May be.

当業者によって認識されるように、本発明の局面は、システム、方法またはコンピュータプログラムプロダクトとして具体化されてもよい。したがって、本発明の局面は、全体的にハードウェアの実施形態の形を取るか、（ファームウェア、常駐ソフトウェア、マイクロコードなどを含む）全体的にソフトウェアの実施形態の形を取るか、または、ソフトウェア局面とハードウェア局面とを組合わせた実施形態の形を取り得る。これらはすべて、この明細書中においては概して、「論理」、「回路」、「モジュール」または「システム」と称され得る。さらに、本発明の局面は、コンピュータ読取り可能なプログラムコードが組込まれた１つ以上のコンピュータ読取可能媒体において具体化されたコンピュータプログラムプロダクトの形態を取り得る。 As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, microcode, etc.), or software It can take the form of an embodiment combining aspects and hardware aspects. All of these may be generally referred to herein as “logic”, “circuit”, “module”, or “system”. Further, aspects of the invention may take the form of a computer program product embodied in one or more computer readable media incorporating computer readable program code.

１つ以上のコンピュータ読取可能な媒体を組み合わせたものを使用してもよい。コンピュータ読取可能な媒体は、コンピュータ読取可能な信号媒体であってもコンピュータ読取可能な記憶媒体であってもよい。コンピュータ読取可能な記憶媒体は、たとえば、電子、磁気、光、電磁、赤外線、または半導体システム、装置、またはデバイスであってもよく、これらを適切に組み合わせたものであってもよいが、これらに限定される訳ではない。コンピュータ読取可能な記憶媒体のより具体的な例（非網羅的なリスト）は以下のものを含むであろう。ポータブルコンピュータディスケット、ハードディスク、ランダムアクセスメモリ（random access memory：ＲＡＭ）、読取専用メモリ（read-only memory：ＲＯＭ）、消去可能プログラマブル読取専用メモリ（erasable programmable read-only memory：ＥＰＲＯＭまたはフラッシュメモリ）、ポータブルコンパクトディスク読取専用メモリ（compact disc read-only memory：ＣＤ−ＲＯＭ）、光記憶装置、磁気記憶装置、またはこれらを適切に組み合わせたものである。この文書の文脈において、コンピュータ読取可能な記憶媒体は、命令実行システム、装置、プロセッサ、またはデバイスによって、またはこれとの関連で使用するためのプログラムを含むかまたは格納することが可能な有形媒体であってもよい。 A combination of one or more computer readable media may be used. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, and any suitable combination thereof, It is not limited. A more specific example (non-exhaustive list) of computer readable storage media would include: Portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable A compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or an appropriate combination thereof. In the context of this document, a computer-readable storage medium is a tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, processor, or device. There may be.

コンピュータ読取可能な信号媒体は、たとえばベースバンドにおける、搬送波の一部として、コンピュータ読取可能なプログラムコードが組込まれた伝搬データ信号を含み得るものであり、一本以上のワイヤを有する電気的接続、光ファイバ等である。このような伝搬信号は、電磁、光、またはこれを適切に組み合わせたものを含むがこれらに限定されないさまざまな形態のうちのいずれかを取り得る。コンピュータ読取可能な信号媒体は、コンピュータ読取可能な記憶媒体ではなくかつ命令実行システム、装置、またはデバイスによってまたはこれとの関連で使用するためのプログラムを伝達、伝搬、または搬送することが可能なコンピュータ読取可能な媒体であればよい。 A computer readable signal medium may include a propagated data signal incorporating computer readable program code as part of a carrier wave, for example in baseband, and an electrical connection having one or more wires, An optical fiber or the like. Such a propagated signal may take any of a variety of forms including, but not limited to, electromagnetic, light, or a suitable combination thereof. A computer-readable signal medium is not a computer-readable storage medium and is a computer capable of transmitting, propagating, or carrying a program for use by or in connection with an instruction execution system, apparatus, or device Any medium that can be read may be used.

コンピュータ読取可能な媒体上で実現されるプログラムコードは、無線、ワイヤライン、光ファイバケーブル、ＲＦ等またはこれらを適切に組み合わせたものを含むがこれらに限定されない適切な媒体を用いて送信し得る。 Program code implemented on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, fiber optic cable, RF, etc., or any suitable combination thereof.

本発明の局面の動作を実行するためのコンピュータプログラムコードは、Ｊａｖａ（登録商標）、Ｓｍａｌｌｔａｌｋ（登録商標）、Ｃ＋＋等のオブジェクト指向プログラミング言語、および、「Ｃ」プログラミング言語等の従来の手続き型プログラミング言語、または同様のプログラミング言語を含む、１つ以上のプログラミング言語を組み合わせたもので記述されてもよい。プログラムコードは、全体がユーザのコンピュータ上で実行されてもよく、一部がユーザのコンピュータ上で実行されてもよく、スタンドアロンのソフトウェアパッケージとして、一部がユーザのコンピュータ上で一部が遠隔コンピュータ上で実行されてもよく、または全体が遠隔コンピュータもしくはサーバ上で実行されてもよい。後者のシナリオの場合、遠隔コンピュータは、ユーザのコンピュータに、ローカルエリアネットワーク（local area network：ＬＡＮ）もしくはワイドエリアネットワーク（wide area network：ＷＡＮ）を含む何らかの種類のネットワークを通して接続されてもよく、または、この接続が外部コンピュータに対して（たとえばインターネットサーバプロバイダを用いてインターネットを通して）なされてもよい。 Computer program code for performing operations of aspects of the present invention includes object-oriented programming languages such as Java (registered trademark), Smalltalk (registered trademark), C ++, and conventional procedural programming such as "C" programming language. It may be written in a language or a combination of one or more programming languages, including similar programming languages. The program code may be executed entirely on the user's computer, partly on the user's computer, or as a stand-alone software package, partly on the user's computer and partly a remote computer It may be executed on the whole, or it may be executed entirely on a remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through some type of network, including a local area network (LAN) or a wide area network (WAN), or This connection may be made to an external computer (eg, through the Internet using an Internet server provider).

以下、本発明の局面を、本発明の実施形態に従う方法、装置（システム）およびコンピュータプログラムプロダクトのフローチャートの図および／またはブロック図を参照しながら説明する。フローチャートの図および／またはブロック図の各ブロック、および、フローチャートの図および／またはブロック図のブロックを組み合わせたものは、コンピュータプログラム命令によって実現可能であることが理解されるであろう。これらコンピュータプログラム命令は、汎用コンピュータ、専用コンピュータ、または、他のプログラム可能なデータ処理装置のプロセッサに与えられてマシンを構成し、コンピュータまたは他のプログラム可能なデータ処理装置のプロセッサを介して実行される命令が、フローチャートおよび／またはブロック図の１つまたは複数のブロックにおいて指定された機能／動作を実現するための手段を作成してもよい。 Aspects of the present invention are described below with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions are provided to a general purpose computer, special purpose computer, or processor of another programmable data processing device to form a machine and are executed via the processor of the computer or other programmable data processing device. Instructions may create a means for implementing the specified function / operation in one or more blocks of the flowcharts and / or block diagrams.

これらコンピュータプログラム命令はまた、コンピュータ、他のプログラム可能なデータ処理装置、または他の装置に特定のやり方で機能するよう指示することが可能な、コンピュータ読取可能な媒体に格納されたものであってもよく、それにより、コンピュータ読取可能な媒体に格納された命令が、フローチャートおよび／またはブロック図の１つまたは複数のブロックにおいて指定された機能／動作を実現する命令を含む製品を構成するようにしてもよい。 These computer program instructions may also be stored on a computer readable medium capable of directing a computer, other programmable data processing device, or other device to function in a particular manner. The instructions stored on the computer-readable medium may thereby constitute a product that includes instructions that implement the functions / operations specified in one or more blocks of the flowcharts and / or block diagrams. May be.

コンピュータプログラム命令はまた、コンピュータ、他のプログラム可能なデータ処理装置、または他の装置にロードされて、一連の動作ステップをこのコンピュータ、他のプログラム可能な装置、または他の装置上で実行させることにより、コンピュータによって実現されるプロセスを生成してもよく、それにより、コンピュータまたは他のプログラム可能な装置上で実行される命令が、フローチャートおよび／またはブロック図の１つまたは複数のブロックにおいて指定された機能／動作を実現するためのプロセスを提供するようにしてもよい。 Computer program instructions are also loaded into a computer, other programmable data processing device, or other device to cause a series of operational steps to be performed on the computer, other programmable device, or other device. May generate a computer-implemented process whereby instructions to be executed on the computer or other programmable device are specified in one or more blocks of the flowcharts and / or block diagrams. Processes for realizing the functions / operations may be provided.

図面におけるフローチャートおよびブロック図は、本発明のさまざまな実施形態に従うシステム、方法、およびコンピュータプログラムプロダクトの可能な実装例のアーキテクチャ、機能、および動作を示す。この点に関し、フローチャートまたはブロック図の各ブロックは、指定された論理機能を実現するための１つ以上の実行可能命令を含むモジュール、セグメント、またはコードの一部を表わし得る。なお、いくつかの代替実装例においては、ブロックに示される機能が、図面に示される順序と異なる順序で発生してもよい。たとえば、関係する機能に応じて、連続して示されている２つのブロックが実際はほぼ同時に実行されてもよく、または、これらのブロックが逆の順序で実行されることがあってもよい。また、ブロック図および／またはフローチャートの図の各ブロック、および、ブロック図および／またはフローチャートの図におけるブロックを組み合わせたものは、指定された機能もしくは動作、または、専用ハードウェアおよびコンピュータ命令を組み合わせたものを実行する専用ハードウェアに基づいたシステムによって実現することができる。 The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagram may represent a module, segment, or portion of code that includes one or more executable instructions for implementing a specified logical function. Note that in some alternative implementations, the functions shown in the blocks may occur in an order different from the order shown in the drawings. For example, depending on the function involved, two blocks shown in succession may actually be executed substantially simultaneously, or they may be executed in reverse order. Also, each block in the block diagram and / or flowchart diagram, and combinations of blocks in the block diagram and / or flowchart diagram, combine specified functions or operations, or dedicated hardware and computer instructions It can be realized by a system based on dedicated hardware that executes things.

アプリケーションがモバイル装置にインストールされてもよく、たとえば、装置の不揮発性メモリに格納されてもよい。あるアプローチにおいて、アプリケーションは、モバイル装置上で画像処理を実行させる命令を含む。別のアプローチにおいて、アプリケーションは、ネットワークサーバ等の遠隔サーバへ画像を送信させる命令を含む。さらに他のアプローチにおいて、アプリケーションは、モバイル装置上で処理のうちの一部またはすべてを実行するのか、および／または画像を遠隔サイトに送信するのかを決定させる命令を含み得る。 The application may be installed on the mobile device, for example, stored in a non-volatile memory of the device. In one approach, the application includes instructions that cause image processing to be performed on the mobile device. In another approach, the application includes instructions that cause the image to be transmitted to a remote server, such as a network server. In yet another approach, the application may include instructions that determine whether to perform some or all of the processing on the mobile device and / or to send the image to a remote site.

概略的な一実施形態においては、方法は、モバイルデバイス上にユーザ入力インターフェイスを呼出すステップと、ユーザ入力インターフェイスの光入力拡張部を呼出すステップと、モバイルデバイスの１つ以上の光センサを介して光入力をキャプチャするステップと、キャプチャされた光入力からテキスト情報を決定するステップと、決定されたテキスト情報をユーザ入力インターフェイスに提供するステップとを含む。 In one general embodiment, a method calls a user input interface on a mobile device, calls a light input extension of the user input interface, and light via one or more light sensors of the mobile device. Capturing input, determining text information from the captured light input, and providing the determined text information to a user input interface.

別の概略的な実施形態においては、方法は、モバイルデバイスの１つ以上の光センサを介して光入力を受信するステップと、光入力のコンテキストを決定するためにモバイルデバイスのプロセッサを用いて光入力を分析するステップと、光入力のコンテキストに基づいてコンテキスト的に適切なワークフローを自動的に呼出すステップとを含む。 In another schematic embodiment, a method receives light input via one or more light sensors of a mobile device and uses a mobile device processor to determine a light input context. Analyzing the input and automatically invoking a contextually appropriate workflow based on the context of the light input.

さらに別の概略的な実施形態においては、コンピュータプログラムプロダクトは、プログラムコードが組込まれたコンピュータ読取可能記憶媒体を含む。プログラムコードは、プロセッサが、モバイルデバイス上にユーザ入力インターフェイスを呼出し、ユーザ入力インターフェイスの光入力拡張部を呼出し、モバイルデバイスの１つ以上の光センサを介して光入力をキャプチャし、キャプチャされた光入力からテキスト情報を決定し、決定されたテキスト情報をユーザ入力インターフェイスに提供することによって、読取り可能／実行可能である。 In yet another schematic embodiment, a computer program product includes a computer readable storage medium having program code embedded therein. The program code causes the processor to call a user input interface on the mobile device, call a light input extension of the user input interface, capture the light input via one or more light sensors of the mobile device, and capture the captured light. It is readable / executable by determining text information from the input and providing the determined text information to a user input interface.

さまざまな実施形態においては、ここに開示されている方法、システムおよび／またはコンピュータプログラムプロダクトは、任意には、２０１３年１月１１日に提出された関連する米国特許第８，８５５，３７５号；２０１３年７月２２日に提出された米国特許第１３／９４８，０４６号；２０１３年３月１３日に提出された米国特許公報第２０１４／０２７０３４９号；２０１４年３月１３日に提出された米国特許公報第２０１４／０２７０５３６号；２０１４年５月２日に提出された米国特許第８，８８５，２２９号；および／または、２０１４年３月１９日に提出された米国特許出願第１４／２２０，０２９号に開示された機能のうちいずれかを利用し得るおよび／または含み得る。上述の特許出願の各々は、引用によってこの明細書中に援用されている。たとえば、いくつかの具体的な実施形態においては、テキスト情報を取得すべき文書を分類し、文書に対してデータ抽出を実行し、文書または文書から抽出された情報を確認し、キャプチャ動作前、動作中または動作後に（たとえば、画質を向上させるために）、画像データに付加的処理を施すことが有利になり得る。このことは、当業者がこの記載を読むことによって理解され得るであろう。 In various embodiments, the methods, systems and / or computer program products disclosed herein are optionally related US Pat. No. 8,855,375 filed Jan. 11, 2013; US Patent No. 13 / 948,046 filed on July 22, 2013; US Patent Publication No. 2014/0270349 filed on March 13, 2013; US filed on March 13, 2014 Patent Publication No. 2014/0270536; U.S. Patent No. 8,885,229 filed May 2, 2014; and / or U.S. Patent Application No. 14/220, filed Mar. 19, 2014. Any of the functions disclosed in 029 may be utilized and / or included. Each of the aforementioned patent applications is incorporated herein by reference. For example, in some specific embodiments, the document from which text information is to be obtained is classified, data extraction is performed on the document, the document or information extracted from the document is verified, and before the capture operation, It may be advantageous to perform additional processing on the image data during or after operation (eg, to improve image quality). This can be understood by one of ordinary skill in the art upon reading this description.

ここに開示されているアルゴリズムに従った処理に適したデジタル画像には、ページ検出、矩形化、不均一な照明の検出、照明の正規化、解像度の推定、不鮮明検出、分類、データ抽出、文書確認等の上述の特許出願において開示された如何なる画像処理動作が施されてもよい。 Digital images suitable for processing according to the algorithms disclosed herein include page detection, rectangularization, non-uniform illumination detection, illumination normalization, resolution estimation, blur detection, classification, data extraction, document Any image processing operation disclosed in the above-mentioned patent application such as confirmation may be performed.

さらなるアプローチにおいては、ここに開示されている方法、システムおよび／またはコンピュータプログラムプロダクトは、複数の実施形態において、画像処理モバイルアプリケーション、ケースマネジメントアプリケーション、分類アプリケーションおよび／または、データ抽出アプリケーションなどの、この明細書中に、および／またはこの上述の関連特許出願に開示されている如何なる機能をも実行することを容易にするように構成された１つ以上のユーザインターフェイスとともに利用されてもよく、当該１つ以上のユーザインターフェイスにおいて実現されてもよく、および／または、当該１つ以上のユーザインターフェイスを含んでもよい。 In a further approach, the methods, systems, and / or computer program products disclosed herein may include, in embodiments, an image processing mobile application, a case management application, a classification application, and / or a data extraction application. It may be utilized with one or more user interfaces configured to facilitate performing any function disclosed in the specification and / or in this above-mentioned related patent application. It may be implemented in one or more user interfaces and / or may include one or more user interfaces.

さらに他のアプローチにおいては、ここに開示されているシステム、方法および／またはコンピュータプログラムプロダクトは、上述の関連特許出願において開示されており、特にこれらの記載を読むことによって当業者によって認識され得るであろう方法論および／またはシナリオのうち１つ以上に有利に適用されてもよい。 In yet other approaches, the systems, methods and / or computer program products disclosed herein are disclosed in the above-mentioned related patent applications and may be recognized by those skilled in the art, especially by reading these descriptions. It may be advantageously applied to one or more of the possible methodologies and / or scenarios.

この明細書中に提示されている実施形態が、オンデマンドでサービスを提供するために、顧客のために展開されるサービスの形態で提供され得ることがさらに認識されるであろう。 It will be further appreciated that the embodiments presented in this specification can be provided in the form of services that are deployed for customers to provide services on demand.

ここに開示されている発明の概念は、テキスト情報の正確かつ容易な入力を容易にするインテリジェントな態様で光入力をモバイルデバイスのＩ／Ｏ能力に統合することに関する。これらの概念が最も適用可能となる例示的なシナリオは、この明細書を読むことによって当業者によって理解され得るように、文書、フォーム、ウェブページなどにテキスト情報を入力することを含む。有利には、ここに開示されている技術は、音声入力を利用することによる固有の不利点（たとえば、精度の低い音声認識）または、バーチャルモバイルキーボードを介する触覚入力（たとえば、小型「キー」サイズによる不正確な入力、予測的な辞書または「オートコレクト」機能を用いた不適当な「補正」など）を被ることなく、テキスト情報の入力を実現する。 The inventive concepts disclosed herein relate to integrating optical input into the I / O capabilities of mobile devices in an intelligent manner that facilitates accurate and easy input of text information. Exemplary scenarios where these concepts are most applicable include entering text information into documents, forms, web pages, etc., as can be understood by those skilled in the art by reading this specification. Advantageously, the techniques disclosed herein may have inherent disadvantages of utilizing speech input (eg, less accurate speech recognition) or tactile input (eg, a small “key” size via a virtual mobile keyboard). Input of text information without incurring inaccuracies of input, inadequate “correction” using a predictive dictionary or “autocorrect” function, etc.).

特に、この技術はユーザに優れた性能および利便性を提供する。優れた性能は、モバイルデバイスを介してテキスト入力を提供する際の（特に、光入力が複数のコンテキストまたはフィールドで使用されるのに適した情報を示している場合）精度の向上および入力時間の短縮などの特徴を含む。部分的には、性能の利点は、この明細書中に開示されている発明のアプローチが、ユーザからの触覚フィードバックに依拠することなく、光入力からのテキスト情報をキャプチャし、分析し、かつ提供するように構成されていることによるものである。結果として、これらの技術には、上述のとおり小型化したバーチャルキーボードを利用する入力インターフェイスに共通する不利点がない。 In particular, this technology provides users with superior performance and convenience. The superior performance is improved accuracy and input time when providing text input via a mobile device (especially when the light input shows suitable information for use in multiple contexts or fields). Includes features such as shortening. In part, the performance advantage is that the inventive approach disclosed herein captures, analyzes, and provides text information from light input without relying on tactile feedback from the user. It is because it is configured to do. As a result, these techniques do not have the disadvantages common to input interfaces that utilize a miniaturized virtual keyboard as described above.

同時に、この技術は、テキスト入力と組合わせて使用されるべき光入力の既存の統合よりも優れた性能を提供する。たとえば、テキスト入力および光入力の両方を含むメッセージの構成および発信を必要とする上述の従来のシナリオを参照すると、この技術は、有利には、モバイルデバイスの光入力能力をテキスト形式のＩ／Ｏと統合するものであるので、ユーザがテキスト情報を伝えるために触覚入力を行なう必要がなくなる。 At the same time, this technology offers better performance than the existing integration of light input to be used in combination with text input. For example, referring to the conventional scenario described above that requires the construction and origination of a message that includes both text input and optical input, this technique advantageously increases the optical input capability of a mobile device to textual I / O. Therefore, it is not necessary for the user to perform tactile input to convey text information.

さらに、光入力は、コンテキスト依存の態様で、キャプチャされ、分析されて、テキスト情報に変換され得る。光入力についてのコンテキスト依存型の呼出し、キャプチャおよび分析を以下においてさらに詳細に説明する。 Furthermore, the light input can be captured, analyzed and converted to text information in a context-dependent manner. Context-sensitive invocation, capture and analysis for optical input is described in further detail below.

モバイルバーチャルキーボードユーザインターフェイス（User Interface：ＵＩ）のための光入力拡張部
ここに開示されている光入力機能は、当該機能が含まれている特定のモバイルオペレーティングシステムに従って光入力および触覚入力をキャプチャするために、固有のツール、手順、コール、コンポーネント、ライブラリなどを活用することによって提供される。この態様では、本技術は、典型的には触覚入力または音声入力を介してテキスト情報をキャプチャすることに限定された、コンテキストへの光入力のシームレスな統合を表わしている。 Optical Input Extension for Mobile Virtual Keyboard User Interface (UI) The optical input function disclosed herein captures optical input and haptic input according to the specific mobile operating system in which the function is included. In order to be provided by leveraging unique tools, procedures, calls, components, libraries and so on. In this aspect, the technology represents a seamless integration of light input into the context, typically limited to capturing text information via haptic input or voice input.

このシームレスな統合は、モバイルオペレーティングシステムに固有の既存の光入力および触覚入力のキャプチャ能力に勝る利点を与える。なぜなら、これらの既存の能力は、テキスト情報をキャプチャして提供する目的で、触覚入力の代替または補足として光入力を使用することを意図していないからである。 This seamless integration offers advantages over the existing optical and haptic input capture capabilities inherent in mobile operating systems. This is because these existing capabilities are not intended to use light input as an alternative or supplement to tactile input for the purpose of capturing and providing text information.

最も顕著には、たとえ、従来のモバイルオペレーティングシステムが独立した光入力キャプチャ能力および触覚入力キャプチャ能力を提供する可能性があったとしても、モバイルデバイスを介してテキスト情報を受信、決定および／または利用するための補足的技術および／または代替的技術として光入力を統合するための現在公知の技術は存在しない。 Most notably, text information is received, determined and / or utilized via a mobile device, even though traditional mobile operating systems may provide independent optical and haptic input capture capabilities There are currently no known techniques for integrating optical inputs as supplemental and / or alternative techniques for doing so.

まれではあるが、いくつかのモバイルオペレーティングシステムは、さらに、（たとえば、光学文字認識（optical character recognition：ＯＣＲ）によって、または、当業者によって認識され得るような他の同様の機能によって）キャプチャされた画像データを分析し、そこに示されるテキスト情報を識別し、位置特定し、および／または翻訳する能力を提供してもよい。しかしながら、これらのまれな実施形態は、光入力をキャプチャすることによってテキスト情報の入力を有効に達成するために、ユーザが光入力のキャプチャと分析との複合能力を活用することを可能にする、固有のＯＳ能力の如何なる統合をも提供しない。 Although rare, some mobile operating systems have also been captured (eg, by optical character recognition (OCR) or other similar functionality as may be recognized by one skilled in the art). The ability to analyze image data and identify, locate, and / or translate text information presented therein may be provided. However, these rare embodiments allow the user to take advantage of the combined ability of capturing and analyzing light input to effectively achieve input of text information by capturing light input. Does not provide any integration of inherent OS capabilities.

たとえば、現在公知の技術では、所望のテキスト情報を含むか、または、所望のテキスト情報を決定するかまたは取得するために利用され得る他の情報を含む識別子を示す光入力をキャプチャすることによって直接、ユーザが、テキスト情報をたとえばフォームのフィールドに入力することを可能にすることができない。「他の」情報は、当業者が本明細書を読むことによって、所望のテキスト情報を取得するかまたは決定するのに好適であるかまたは有用であるものとして、理解され得るであろう如何なるタイプの情報をも含み得る。 For example, currently known techniques directly capture optical input indicating an identifier that contains desired text information or that includes other information that can be utilized to determine or obtain the desired text information. , It cannot allow the user to enter text information into, for example, a field of the form. “Other” information is any type that can be understood by those skilled in the art as being suitable or useful for obtaining or determining the desired text information by reading this specification. Information may also be included.

概して、この光入力拡張部およびコンテキスト依存型呼出しアプリケーションのコンテキストにおける抽出に適した識別子は、保険金請求または申請などのビジネスワークフロー；インボイス作成送付などの支払勘定プロセス；ナビゲーションプロセス、通信プロセス、トラッキングプロセス、納税申告または計算書調査などの金融取引またはワークフロー；ブラウジングプロセス；入場または顧客搭乗プロセス、などを実行するプロセスにおいて有用であり得る如何なるタイプの識別情報（好ましくはテキスト情報）をも含み得る。このことは、当業者がこの記載を読むことによって理解され得るであろう。好適な識別子が上述の例示的な実施形態などのプロセスに適した如何なるタイプの識別情報をも含み得る一方で、いくつかのタイプの情報が選択アプリケーション（たとえば、特定のリソースにアクセスするかまたは特定のワークフローを完了するのに必要となり得る一意識別子）に特に有用であることが理解されるはずである。 In general, identifiers suitable for extraction in the context of this optical input extension and contextual invocation application are business workflows such as insurance claims or applications; payment account processes such as invoice creation and delivery; navigation processes, communication processes, tracking It may include any type of identifying information (preferably textual information) that may be useful in a process, a financial transaction or workflow such as a tax return or statement survey; a browsing process; an entry or customer boarding process, etc. This can be understood by one of ordinary skill in the art upon reading this description. While a suitable identifier may include any type of identification information suitable for a process such as the exemplary embodiment described above, some type of information may be selected application (eg, accessing or identifying a particular resource). It should be understood that it is particularly useful for unique identifiers that may be required to complete the current workflow.

このように、さまざまな実施形態においては、抽出された識別子は、好ましくは、電話番号、完全なアドレスもしくは部分的なアドレス、ユニバーサル・リソース・ロケータ（universal resource locator：ＵＲＬ）、問合せ番号；車両識別番号（vehicle identification number：ＶＩＮ）、車両型式／モデルおよび／もしくは年、社会保障番号（social security number：ＳＳＮ）、（統一商品コード（universal product code：ＵＰＣ）もしくは在庫商品識別番号（stock keeping unit：ＳＫＵ）などの）製品名もしくはコードまたは、典型的にはインボイス上に記載された他の同様のテキスト情報；保険グループ番号、および／または、ポリシー番号、保険プロバイダ名、個人名、日付（たとえば、生年月日または満期日）、（好ましくは、手書きの）署名、などのうちいずれか１つ以上を含む。このことは、当業者がこの記載を読むことによって理解され得るであろう。 Thus, in various embodiments, the extracted identifier is preferably a telephone number, a complete or partial address, a universal resource locator (URL), a query number; a vehicle identification Vehicle identification number (VIN), vehicle model / model and / or year, social security number (SSN), (universal product code (UPC)) or stock keeping unit (stock keeping unit: Product name or code) (such as SKU) or other similar textual information typically found on the invoice; insurance group number and / or policy number, insurance provider name, personal name, date (eg , Date of birth or maturity), (preferably handwritten) signature, etc. Including one or more. This can be understood by one of ordinary skill in the art upon reading this description.

同様に、「他の情報」は、照合動作、逆の照合、認証などの公知の技術を含む如何なる好適な技術を用いても取得または決定され得る。このことは、当業者が本明細書を読むことによって理解され得るであろう。 Similarly, “other information” may be obtained or determined using any suitable technique, including known techniques such as verification operations, reverse verification, authentication, and the like. This can be understood by one of ordinary skill in the art upon reading this specification.

むしろ、現在利用可能な技術を用いてこの結果を達成するために、ユーザは、（たとえば、現在の技術を用いることを必要とする１２ステップの従来の手順に関して以下に述べられるように）固有のＯＳの別個の機能を手動で呼出すことによって一連の別個のステップを実行する必要があるだろう。 Rather, in order to achieve this result using currently available technology, the user is unique (eg, as described below with respect to the 12-step conventional procedure that requires using the current technology). It would be necessary to perform a series of separate steps by manually calling separate functions of the OS.

「拡張部」によって、この開示は、モバイルデバイスの他の点では既存である特徴に含まれている機能を参照する。さらに、触覚入力と共にまたは触覚入力の代わりに音声入力を受信する例示的なシナリオを参照すると、上述の図に示されるマイクロホン「ボタン」は、モバイルのバーチャルキーボードユーザインターフェイスの音声拡張部と見なされてもよい。対照的に、ユーザによる独立した呼出しを必要とするスタンドアロンのアプリケーション、機能または特徴（たとえば、モバイルオペレーティングシステムを備えた標準的なユーザインターフェイスのうちの１つと対話することなくアプリケーション、機能または特徴を呼出す）は、既存の能力の「拡張部」と見なされるべきではない。 By “extension”, this disclosure refers to functionality included in features that are otherwise existing in the mobile device. Further, referring to an exemplary scenario for receiving voice input with or instead of tactile input, the microphone “button” shown in the above figure is considered a voice extension of a mobile virtual keyboard user interface. Also good. In contrast, a stand-alone application, function or feature that requires an independent call by the user (eg, calling an application, function or feature without interacting with one of the standard user interfaces with a mobile operating system) ) Should not be considered an “extension” of an existing capability.

好ましい実施形態においては、光入力拡張部は、光入力をキャプチャする途中に（たとえば、ウェブページ、アプリケーション、フォーム、フィールドなどの）ユーザインターフェイスを介して提示される複数のフィールド全体にわたってユーザがシームレスにナビゲートすることを容易にするように構成される。いくつかのアプローチにおいては、この機能は、光入力キャプチャインターフェイスに含まれる「次の（next）」または「終了（finished）」ボタン、ジェスチャ、記号、オプション、などとして具体化されてもよい。 In a preferred embodiment, the light input extension is seamless to the user across multiple fields presented via the user interface (eg, web page, application, form, field, etc.) in the middle of capturing light input. Configured to facilitate navigating. In some approaches, this functionality may be embodied as a “next” or “finished” button, gesture, symbol, option, etc. included in the light input capture interface.

実際には、１つ例示的なシナリオに従うと、ユーザは、フォーム、ウェブページなどの複数のさまざまなフィールドに入力されるように意図されたテキスト情報に対応するデータをキャプチャすることを所望する場合もある。たとえば、データエントリフィールドと対話するためにクリックしたり、タップしたり、ホバリングしたり、選択したり、タブ付けしたりすることによって示されるように、ユーザの「焦点」が（ユーザインターフェイス上に存在する複数のこのようなデータエントリフィールドのうち第１のデータエントリフィールドであり得る）データエントリフィールド上にあることが検出されると、光入力拡張部を含む固有のユーザ入力／バーチャルキーボードインターフェイスが呼出される。 In fact, according to one exemplary scenario, if the user wishes to capture data corresponding to text information intended to be entered into multiple different fields such as forms, web pages, etc. There is also. For example, the user's “focus” is present on the user interface as indicated by clicking, tapping, hovering, selecting, and tabging to interact with the data entry field. A unique user input / virtual keyboard interface including an optical input extension is invoked when it is detected on the data entry field (which may be the first data entry field of a plurality of such data entry fields). Is done.

ユーザは、たとえば、バーチャルキーボード上に表示された「カメラ」ボタンをタップすることによって、第１のデータエントリフィールドと対話して、光入力拡張部を呼出してもよい。光入力拡張部を呼出したことに応じて、ユーザには、キャプチャされている光入力の「プレビュー」を含むキャプチャインターフェイスが提示され得る（たとえば、実質的にはカメラまたは他の光入力デバイス上に「ビューファインダ」を表示する）。好ましくは、光入力拡張部の「プレビュー」およびキャプチャ能力は、ユーザが対話しているデータエントリフィールドが表示されているブラウザ、アプリケーションなどからモバイルデバイスの焦点を切換えることなく利用されてもよい。 The user may interact with the first data entry field to invoke the light input extension, for example, by tapping a “camera” button displayed on the virtual keyboard. In response to invoking the light input extension, the user may be presented with a capture interface that includes a “preview” of the light input being captured (eg, substantially on a camera or other light input device). "Viewfinder" is displayed). Preferably, the “preview” and capture capabilities of the light input extension may be utilized without switching the focus of the mobile device from a browser, application, etc. that displays the data entry field with which the user is interacting.

言い換えれば、この明細書中に記載されるバーチャルキーボードインターフェイスの光入力拡張部は、好ましくは機能をシームレスに統合したものであって、これにより、ユーザが、データエントリフィールドの位置を特定し、光入力拡張部を呼出し、光入力拡張部を介して光入力をキャプチャし、キャプチャされた光入力から決定されたテキスト情報をデータエントリフィールドにポピュレートすることが可能となる。好ましくは、上述のプロセス全体は「シームレス」である。というもの、ユーザは、たとえば、モバイルデバイスのマルチタスク能力があることにより、または、モバイルデバイスなどの上で実行可能な独立したアプリケーション間でデータを「コピー・アンド・ペースト」するように構成されたクリップボードを用いるので、利用する必要のないすべての構成機能を終了させ得るからである。このことは、当業者がこの記載を読むことによって理解され得るであろう。 In other words, the optical input extension of the virtual keyboard interface described in this specification is preferably a seamless integration of functions so that the user can locate the data entry field and It is possible to call the input extension, capture the optical input via the optical input extension, and populate the data entry field with text information determined from the captured optical input. Preferably, the entire process described above is “seamless”. That is, the user is configured to "copy and paste" data between independent applications that can run on, for example, the mobile device's multitasking capabilities or on a mobile device This is because since the clipboard is used, all configuration functions that do not need to be used can be terminated. This can be understood by one of ordinary skill in the art upon reading this description.

ユーザが対話しているブラウザページ、アプリケーションなどか複数のデータエントリフィールドを含んでいるシナリオにおいては、ユーザは、好ましくは、光入力拡張部によって提供される付加的な機能を利用して、複数のデータエントリフィールド間を移動してもよい。この態様では、ユーザは、光入力拡張部を選択的に利用して、提示されるデータフィールド総数の所望のサブセットのための光入力をキャプチャすることによって、テキスト情報を入力してもよい。同様に、ユーザは光入力拡張部を利用して、複数のデータエントリフィールドのうちいくつかに、順次、テキスト情報を入力してもよい。 In a scenario where the user is interacting with a browser page, application, etc. or includes multiple data entry fields, the user preferably takes advantage of additional functionality provided by the light input extension to You may move between data entry fields. In this aspect, the user may enter text information by selectively utilizing the light input extension to capture light input for a desired subset of the total number of data fields presented. Similarly, the user may sequentially enter text information in some of the plurality of data entry fields using the optical input extension.

好ましくは、複数のデータエントリフィールド間におけるユーザの移動は、光入力インターフェイスが構成されているボタンまたはジェスチャによって達成される。例示的な実施形態は、たとえば、「次の（next）」および／もしくは「前の（previous）」ボタンを採用してもよく、または、１つ以上のスワイプまたはマルチタッチ・ジェスチャを翻訳して複数のデータエントリフィールド間を移動するように構成されてもよい。さらにより好ましくは、光入力インターフェイスはまた、ユーザが光入力キャプチャプロセスを終了させ得るかまたはその完了を示し得るための機能を含む。たとえば、いくつかの実施形態においては、光入力インターフェイスは「最後（last）」ボタン、「終了（finished）」または「遂行（done）」ボタンなどを含んでもよく、これにより、ユーザが、光入力キャプチャプロセスを終了させることを可能にし、好ましくは、ブラウザページ、アプリケーションインタフェースなどとの対話を再開することを可能にし得る。 Preferably, the movement of the user between the data entry fields is achieved by a button or gesture in which the light input interface is configured. Exemplary embodiments may employ, for example, “next” and / or “previous” buttons, or translate one or more swipes or multi-touch gestures. It may be configured to move between multiple data entry fields. Even more preferably, the optical input interface also includes functionality for allowing a user to terminate or indicate completion of the optical input capture process. For example, in some embodiments, the optical input interface may include a “last” button, a “finished” or “done” button, etc., that allows the user to input an optical input. It may be possible to end the capture process and preferably resume interaction with the browser page, application interface, etc.

したがって、少なくともいくつかの実施形態においては、現在開示されている発明の概念の重要な機能は、光入力能力がモバイルオペレーティングシステムを備えた既存のインターフェイスに直接一体化されている点である。光入力能力は、具体的には、そのバーチャルキーボードユーザインターフェイスの拡張部としてモバイルオペレーティングシステムを備えた固有のバーチャルキーボードユーザインターフェイスに統合される。 Thus, in at least some embodiments, an important feature of the presently disclosed inventive concept is that the optical input capability is directly integrated into an existing interface with a mobile operating system. The light input capability is specifically integrated into a unique virtual keyboard user interface with a mobile operating system as an extension of the virtual keyboard user interface.

したがって、この技術は、別個の（すなわち、統合されていない）モバイルデバイスのカメラおよびバーチャルキーボードユーザインターフェイスのコンポーネントによって担持されるような既存の能力とともに非効率的に「スティッチ（stitch）」することを必要とする可能性のあるアプローチとは区別されるはずである。たとえば、全体的に別個のインターフェイス、機能、アプリケーションなどを介して受信されるような触覚入力と光入力との組合せを単に活用する技術では、入力の容易さおよび正確性が促進されずに、むしろ複雑になってしまう。 Thus, this technology inefficiently “stitches” with existing capabilities as carried by separate (ie, non-integrated) mobile device camera and virtual keyboard user interface components. It should be distinguished from the approach that may be required. For example, a technology that simply utilizes a combination of haptic input and light input, such as received via a totally separate interface, function, application, etc., does not promote ease and accuracy of input, rather It becomes complicated.

たとえば、テキスト情報の存在を決定（し、任意には、示されたテキストを決定および／または出力する）ために光入力をキャプチャして分析するように構成されたスタンドアロンのアプリケーションまたは機能は、コンテキスト依存の態様では光入力のこのようなキャプチャおよび／または分析を実行することができない。たとえば、スタンドアロンのアプリケーション、機能、特徴などは、たとえば当該スタンドアロンのアプリケーション、機能、特徴などが第１段階として設定されるように構成されていないウェブサイト上に表示された特定のフィールドまたはフォームのコンテキストにおいて所望のテキスト情報をもたらすようには構成されていない。 For example, a stand-alone application or function configured to capture and analyze light input to determine the presence of text information (and optionally determine and / or output the indicated text) Such capture and / or analysis of the light input cannot be performed in a dependent manner. For example, a stand-alone application, function, feature, etc. is the context of a particular field or form displayed on a website that is not configured to be set as the first stage, eg, the stand-alone application, feature, feature, etc. Is not configured to provide the desired text information.

結果として、ユーザは、いくつかの独立した処理を呼出して、各々が必要とする処理の実施にフルコースで関与しなければならなくなるだろう。たとえば、ここで開示されている光入力と触覚入力との統合がなされていなければ、従来技術を用いて同様のプロセスを試みるユーザは、ユーザによって個々にインストールされ、構成され、呼出され、操作された複数の独立した処理を用いて実行される別個の動作を含む、過度に煩雑で著しく劣ったプロセスに関与しなければならなくなるだろう。 As a result, the user will have to call several independent processes and be involved in the full course of performing the processes that each needs. For example, if the light input and haptic input disclosed herein are not integrated, users attempting a similar process using the prior art can be individually installed, configured, invoked, and manipulated by the user. Would have to be involved in processes that are overly cumbersome and significantly inferior, including separate operations performed using multiple independent processes.

例示的な従来のプロセスは、実質的に以下のとおり上述の手順を参照している：
（１）モバイルのウェブブラウザアプリケーション（たとえば、ｉＯＳのためのサファリ）を呼出す；
（２）モバイルデバイスウェブブラウザを用いて、テキスト情報を必要とするウェブページにナビゲートする；
（３）モバイルブラウザアプリケーションを閉じるかまたは一時停止させる；
（４）別個の光入力機能（たとえば、「カメラ」アプリケーション）を呼出す；
（５）別個の光入力アプリケーションによって所望のテキスト情報を含む光入力をキャプチャする；
（６）光入力アプリケーションを閉じるかまたは一時停止させる；
（７）別個の光学分析アプリケーション（たとえば、ＯＣＲアプリケーション）を呼出す；
（８）キャプチャされた光入力を分析して、そこに示されたテキスト情報を、別個の光学分析アプリケーションを用いて決定する；
（７）決定されたテキスト情報の中から所望のテキスト情報の位置を特定する；
（８）決定されたテキスト情報の中から所望のテキスト情報を選択する（かまたは、同様に、不所望なテキスト情報を選択解除するか、削除するかもしくは廃棄する）；
（９）所望のテキスト情報を（たとえば、モバイルデバイスの「クリップボード」に、または同様に、単に所望のテキスト情報を記憶するユーザを介して）コピーする；
（１０）光学分析アプリケーションを閉じるかまたは一時停止させる；
（１１）閉じられた／一時停止されたウェブブラウザアプリケーションを呼出すかまたは再開させる（ウェブブラウザが一時停止されたのではなく閉じられた場合、上述のステップ（２）のようにウェブページへのナビゲートも繰返されなければならない）；ならびに、
（１２）所望のテキスト情報を上述のステップ（２）からウェブページの適切なフィールドにペーストする（かまたは、再生する）。 An exemplary conventional process refers to the procedure described above substantially as follows:
(1) Call a mobile web browser application (eg, Safari for iOS);
(2) Use a mobile device web browser to navigate to a web page that requires text information;
(3) close or pause the mobile browser application;
(4) Invoking a separate light input function (eg, a “camera” application);
(5) capture light input containing the desired text information by a separate light input application;
(6) Close or pause the light input application;
(7) Invoking a separate optical analysis application (eg, OCR application);
(8) Analyzing the captured light input and determining the text information shown there using a separate optical analysis application;
(7) specifying the position of desired text information from the determined text information;
(8) Select desired text information from the determined text information (or similarly deselect, delete or discard undesired text information);
(9) Copy the desired text information (eg, to the “clipboard” of the mobile device, or similarly simply through a user storing the desired text information);
(10) close or pause the optical analysis application;
(11) Call or resume a closed / paused web browser application (if the web browser is closed rather than paused, navigate to the web page as in step (2) above) The gate must also be repeated); and
(12) Paste (or replay) the desired text information from step (2) above into the appropriate field of the web page.

たとえば、特定のモバイルデバイスが必要なマルチタスク能力をサポートしない場合、または、所望の結果を達成するのに必要な独立したアプリケーション同士を有効に「切換える」ためのシステムリソースが不十分である場合、複数の独立したプロセスの使用を含む上述のシナリオは、実現可能ではないかもしれない。 For example, if a particular mobile device does not support the required multitasking capabilities, or if there are insufficient system resources to effectively “switch” between independent applications needed to achieve the desired result, The above scenario involving the use of multiple independent processes may not be feasible.

比較のために、バーチャルキーボードユーザインターフェイスに対する光入力拡張部によって、統合された光入力および触覚入力の機能を利用している例示的なプロセスは、図２の方法２００における一実施形態に従って例示されるように、（システムリソースの消費に関して、さらには、ユーザの利便性および時間の観点から）実質的により効率的になるだろう。 For comparison, an exemplary process utilizing the integrated optical and haptic input functionality by the optical input extension to the virtual keyboard user interface is illustrated according to one embodiment in the method 200 of FIG. As such, it will be substantially more efficient (in terms of system resource consumption, as well as in terms of user convenience and time).

方法２００は、図１Ａおよび図１Ｂに示されたものを含む如何なる好適な環境においても、さらには、本記載を読む当業者によって認識され得る他の如何なる好適な環境においても、実行され得る。 The method 200 may be performed in any suitable environment, including those shown in FIGS. 1A and 1B, as well as any other suitable environment that will be recognized by those skilled in the art reading this description.

動作２０２において、ユーザ入力ユーザインターフェイス（ＵＩ）がモバイルデバイス上に呼出される。 In operation 202, a user input user interface (UI) is invoked on the mobile device.

動作２０４において、ユーザ入力ＵＩの光入力拡張部が呼出される。
動作２０６において、光入力がモバイルデバイスの１つ以上の光センサによってキャプチャされる。 In operation 204, the optical input extension of the user input UI is invoked.
In operation 206, light input is captured by one or more light sensors of the mobile device.

動作２０８において、キャプチャされた光入力からテキスト情報が決定される。
動作２１０において、決定されたテキスト情報がユーザ入力ＵＩに提供される。 In act 208, text information is determined from the captured light input.
In operation 210, the determined text information is provided to the user input UI.

方法２００は、この明細書中に開示されたいずれか１つ以上の付加的または代替的な特徴をも含み得る。さまざまなアプローチにおいては、方法２００は、付加的および／または代替的には、選択的な識別、正規化、確認、および光入力からユーザ入力ＵＩへのテキスト情報の提供などの機能を含み得る。 The method 200 may also include any one or more additional or alternative features disclosed herein. In various approaches, the method 200 may additionally and / or alternatively include features such as selective identification, normalization, confirmation, and providing text information from the light input to the user input UI.

ユーザ入力インターフェイスは、好ましくは、テキスト情報を受信するように構成されたユーザインターフェイス要素とのユーザの対話が検出されたことに応じて、呼出される。このようなアプローチでは、当該方法は、有利には、テキスト情報を決定するために光入力を分析するステップを含み得る。したがって、分析するステップは、光学文字認識（ＯＣＲ）を実行するステップ；ＯＣＲに基づいて、決定されたテキスト情報の中から所望のテキスト情報を識別するステップ；および、所望のテキスト情報をユーザ入力インターフェイスに選択的に提供するステップのうち１つ以上のステップを含み得る。 The user input interface is preferably invoked in response to detecting a user interaction with a user interface element configured to receive text information. In such an approach, the method may advantageously include analyzing the light input to determine text information. Accordingly, the analyzing step comprises performing optical character recognition (OCR); identifying desired text information from the determined text information based on the OCR; and the desired text information in the user input interface One or more of the steps of selectively providing to.

好ましくは、所望のテキスト情報は複数の識別子を含み、各々の識別子は、テキスト情報を受信するように構成された複数のユーザインターフェイス要素のうちの１つに対応する。いくつかの実施形態においては、識別子のうちのいくつかまたはすべては、ユーザインターフェイス要素のうちの１つによって必要とされるテキスト情報を含む。このため、どの識別子がこのような所要のテキスト情報を含むかを判断し、好ましくは適切なフォーマットで適切なユーザインターフェイス要素に各々の対応する識別子を選択的に提供することが有利である。 Preferably, the desired text information includes a plurality of identifiers, each identifier corresponding to one of a plurality of user interface elements configured to receive the text information. In some embodiments, some or all of the identifiers include text information required by one of the user interface elements. Thus, it is advantageous to determine which identifiers contain such required text information and selectively provide each corresponding identifier to the appropriate user interface element, preferably in an appropriate format.

ユーザ経験全体にとって、自動的にＯＣＲエラーを補正するための機能を提供すること、たとえば、キャプチャされたコンテンツの正確な再生を確実にし、ワークフローによって予想される態様での情報の適切なフォーマット化を確実にすることは、非常に有益である。したがって、いくつかのアプローチにおいては、当該方法が、所望のテキスト情報の予想されるフォーマットと、所望のテキスト情報についての予想される値の範囲とのうち１つ以上に一致させるために、識別子のうち少なくとも１つを確認するステップおよび正規化するステップのうち１つ以上を含むことは有利である。 Provide functionality for automatically correcting OCR errors for the entire user experience, for example ensuring correct playback of captured content and proper formatting of information in the manner expected by the workflow It is very beneficial to make sure. Thus, in some approaches, the method can be used to match the identifier format to match one or more of the expected format of the desired text information and the expected range of values for the desired text information. It is advantageous to include one or more of identifying and normalizing at least one of them.

さまざまなアプローチにおいては、確認するステップは、識別子のうち少なくとも１つに適用可能なビジネスルールおよび補足的文書からの参照コンテンツのうち１つ以上を決定するステップを含み得る。この決定するステップは、好ましくは、識別子に対応する要素に基づいており、確認するステップは、参照コンテンツおよびビジネスルールのうち１つ以上に基づいている。同様に、正規化するステップは、補足的文書、ビジネスルールおよび／またはユーザによって呼出される要素からのフォーマット化を決定するステップを含み得る。 In various approaches, the step of confirming may include determining one or more of business rules applicable to at least one of the identifiers and reference content from the supplemental document. This determining step is preferably based on an element corresponding to the identifier, and the confirming step is based on one or more of the reference content and business rules. Similarly, normalizing may include determining formatting from supplemental documents, business rules and / or elements invoked by the user.

さらなる実施形態においては、当該方法はまた、所望のテキスト情報の予想されるフォーマットと、所望のテキスト情報についての値の予想される範囲とのいずれかまたは両方に一致させるために、所望のテキスト情報を確認（すなわち、コンテンツおよび／またはフォーマットの精度をたとえば参照コンテンツと照らし合わせてチェック）し、正規化する（すなわち、予想されるフォーマットまたは他のビジネスルールなどと合致させるようにフォーマットまたは表示を変更する）ステップのうち１つ以上を含んでもよい。これにより、ワークフローによって予想される態様で、キャプチャされたコンテンツの正確な再生と情報の適切なフォーマット化とを確実に行えるようにするために、ＯＣＲエラーを補正することが容易になる。いくつかの実施形態においては、確認するステップおよび正規化するステップは、ビジネスルールおよび補足的文書からの参照コンテンツのうち１つ以上に基づいている。このため、当該方法はまた、ユーザが対話した要素に基づいて、補足的文書およびビジネスルールのうち１つ以上を決定するステップを含んでもよい。 In a further embodiment, the method also includes the desired text information to match either or both the expected format of the desired text information and the expected range of values for the desired text information. (I.e. check content and / or format accuracy against reference content, for example) and normalize (i.e. change format or display to match expected format or other business rules, etc.) May include one or more of the steps. This facilitates correcting OCR errors to ensure that the captured content can be accurately played and the information is properly formatted in a manner expected by the workflow. In some embodiments, the verifying and normalizing steps are based on one or more of business rules and reference content from supplemental documents. Thus, the method may also include determining one or more of supplemental documents and business rules based on the elements with which the user interacted.

いくつかのアプローチにおいては、呼出されたユーザ入力インターフェイスが提示されるのと同時に、光入力拡張部が提示される。好ましくは、ユーザ入力インターフェイスは、モバイルデバイス上に表示されたバーチャルキーボードを含み、これは、バーチャルキーボード上に表示されたカメラボタンを含む。 In some approaches, the light input extension is presented at the same time as the called user input interface is presented. Preferably, the user input interface includes a virtual keyboard displayed on the mobile device, which includes a camera button displayed on the virtual keyboard.

当該方法は、付加的および／または代替的には、光入力拡張部の呼出しが検出されるのに応じて、自動的に光入力キャプチャインターフェイスを呼出すステップを含む。 The method additionally and / or alternatively includes automatically invoking the optical input capture interface in response to detecting an optical input extension invocation.

さまざまな実施形態においては、当該方法は、付加的および／または代替的には、光入力をキャプチャする前に光入力を予め分析するステップを含み得る。予め分析するステップは、光入力において示されたオブジェクトを検出するステップ；光入力において示されたオブジェクトの１つ以上の特徴を決定するステップ；および、決定された特徴に少なくとも部分的に基づいて１つ以上の分析パラメータを決定するステップ；などの動作を含む。１つ以上の分析パラメータは、好ましくは、ＯＣＲパラメータを含む。 In various embodiments, the method may additionally and / or alternatively include pre-analyzing the light input before capturing the light input. The pre-analyzing step includes detecting an object indicated at the light input; determining one or more characteristics of the object indicated at the light input; and 1 based at least in part on the determined characteristics. Determining one or more analysis parameters; and the like. The one or more analysis parameters preferably include OCR parameters.

上述の例がモバイルブラウザと対話するユーザに関連付けて説明されているが、当業者であれば、原則として、ここに開示された発明の概念が、さまざまな実施形態において、モバイルブラウザ、モバイルデバイスオペレーティングシステム機能、サードパーティアプリケーション、固有のＯＳアプリケーションなどを介して提示されようとも、如何なるデータエントリフィールドとの如何なるユーザ対話にも適用可能であることを認識するだろう。 Although the above examples have been described in the context of a user interacting with a mobile browser, those skilled in the art will in principle understand that the inventive concepts disclosed herein can be used in various embodiments in mobile browsers, mobile device operating systems. It will be appreciated that it can be applied to any user interaction with any data entry field, whether presented via system functions, third party applications, native OS applications, etc.

上述の例示的なシナリオに従って実証されるように、ここに開示された技術は、優れた結果を達成するのに必要な個々の動作の数を少なくとも２分の１だけ減らすことができる。光入力のコンテキスト依存型の呼出し、キャプチャおよび分析に関して以下に説明される付加的な利点を考慮すると、当業者であれば、一方でこの開示において記載される１つ以上の構成動作を達成し得るが、他方で優れた能力および性能特徴を備えた一体化された手順にこれらの機能を統合することによって達成される性能利点を担持することが全く不可能である従来の技術を利用する場合とは対照的に、この明細書中に記載される発明の技術によって与えられる重要な利益を認識するだろう。 As demonstrated according to the exemplary scenario described above, the techniques disclosed herein can reduce the number of individual operations required to achieve good results by at least a factor of two. In view of the additional benefits described below with respect to contextual invocation, capture and analysis of optical input, one of ordinary skill in the art can, on the other hand, achieve one or more configuration operations described in this disclosure. On the other hand, using conventional techniques that are totally impossible to carry the performance benefits achieved by integrating these functions into an integrated procedure with superior capabilities and performance characteristics In contrast, one will recognize the significant benefits afforded by the inventive techniques described herein.

光入力のコンテキスト依存型の呼出し、キャプチャおよび分析
好ましいアプローチにおいては、ここに開示された発明の光入力技術は、光学的情報もしくはテキスト情報、データ入力動作、データが入力されるべきフォーム、フィールドなどに関するコンテキスト情報を活用してもよい。このことは、当業者がこの記載を読むことによって理解され得るであろう。 Context-sensitive invocation, capture and analysis of light input In a preferred approach, the light input technology of the invention disclosed herein includes optical or text information, data entry operations, forms into which data is to be entered, fields, etc. Context information may be used. This can be understood by one of ordinary skill in the art upon reading this description.

テキスト入力とは逆に光入力を用いて、或るタイプのテキスト情報をキャプチャすることが特に有利であるかもしれない。入力データの好ましいフォームとして光入力をコンテキスト的に有利に選択する場合、従来の触覚入力または音声入力が問題となるような場合には必ず、光入力を優先的にキャプチャするステップを含み得る。 It may be particularly advantageous to capture certain types of text information using light input as opposed to text input. If the light input is advantageously selected contextually as the preferred form of input data, it may include preferentially capturing the light input whenever conventional tactile or audio input is a problem.

たとえば、テキスト情報は、確立されている（たとえば、典型的には予測的な辞書によって利用されるような）如何なる規約またはルールセットに従っていない場合、触覚入力または音声入力（たとえば、モバイルデバイスのバーチャルキーボードインターフェイス上で「タイプする」、テキスト情報を読上げるかまたは列挙することなど）によってテキスト情報を入力しようと試みると、エラーが起こりやすくなる。予測的な辞書または音声の認識機能は、不適当な規約またはルールのうち１つ以上を実施しようとするユーザによって提供される入力を不適切に「訂正する」かまたは翻訳してしまう可能性がある。 For example, if the text information does not follow any established convention or rule set (eg, as typically used by a predictive dictionary), tactile input or voice input (eg, a virtual keyboard on a mobile device) Attempting to enter text information by “typing” on the interface, reading or enumerating text information, etc. is error prone. A predictive dictionary or speech recognition function may improperly "correct" or translate input provided by a user who is trying to implement one or more of the inappropriate rules or rules. is there.

さらなる実施形態においては、大量のテキスト情報および／またはテキスト情報の複雑な集合が必要な場合には、光入力が好ましいかもしれない。たとえば、それらのモバイルデバイスを介する作業に関与しているユーザが、さまざまなタイプのテキスト情報を要求するいくつかのフィールドを有するフォームを完成させることを所望し、テキスト情報のうちのいくらかまたはすべてが１つ以上の文書上に示されている場合、所望のテキスト情報の各々を手動で入力することをユーザに要求するのではなく、そのテキスト情報を示す文書の画像を含む光入力をキャプチャすることによってテキスト情報を決定するかまたは取得することが有利であり得る。 In further embodiments, light input may be preferred when large amounts of text information and / or complex collections of text information are required. For example, a user involved in working through their mobile device wants to complete a form with several fields that require different types of text information, and some or all of the text information Rather than requiring the user to manually enter each of the desired text information, if shown on one or more documents, capture optical input that includes an image of the document showing that text information It may be advantageous to determine or obtain text information by means of

同様に、光入力をそのコンテキストに従って分析することが有利であるかもしれない。たとえば、１つのアプローチにおいては、ユーザは、テキスト情報のソースとして文書を利用して、光入力を介して提供されるようにしてもよい。文書は、如何なるフォームを取っていてもよく、予め定められたクラスの文書（たとえば、クレジットカード、信用報告書、運転免許証、財務諸表、納税申告用紙など）に属する文書を示す固有の特徴を呈していてもよい。このことは、当業者がこの記載を読むことによって理解され得るであろう。さらに、これらの固有の特徴に全体的または部分的に起因して、文書のそのクラスについての理想的な分析結果をもたらすことが分かっている予め定められた分析パラメータ、設定、技術、仮定などを用いて、予め定められたクラスに属する文書を示す光入力を分析することが有利であるかもしれない。 Similarly, it may be advantageous to analyze the light input according to its context. For example, in one approach, a user may be provided via optical input, utilizing a document as a source of text information. Documents may take any form and have unique characteristics that indicate documents belonging to a predetermined class of documents (eg credit cards, credit reports, driver's licenses, financial statements, tax forms, etc.) It may be presented. This can be understood by one of ordinary skill in the art upon reading this description. In addition, pre-determined analysis parameters, settings, techniques, assumptions, etc. that have been found to result in ideal analysis results for that class of documents, in whole or in part, due to these unique features It may be advantageous to use to analyze the light input indicating documents belonging to a predetermined class.

たとえば、特定のカラープロファイルを有するテキスト情報、特にこのようなカラープロファイルが非標準的な（たとえば、白黒ではない）カラープロファイルである場合には特定のカラープロファイルを有する背景、を示す文書について、特に好ましい結果をもたらすことが分かっている予め定められた設定を用いて、光入力を分析することが有利であるかもしれない。 For example, for documents that show text information with a specific color profile, especially a background that has a specific color profile if such color profile is a non-standard (eg non-black and white) color profile. It may be advantageous to analyze the light input using a predetermined setting that has been found to yield favorable results.

同様に、文書クラスがテキスト情報についての既知の寸法、既知の配向、既知のレイアウトまたは編成などによって特徴付けられている場合、そのレイアウト、編成、配向などについての優れた分析結果を生成するように構成された分析パラメータ、設定などを利用することが有利であり得る。 Similarly, if a document class is characterized by known dimensions, known orientation, known layout or organization, etc. for text information, it will produce excellent analysis results for its layout, organization, orientation, etc. It may be advantageous to utilize configured analysis parameters, settings, etc.

加えて、特定のフォントまたはスタイルで表わされたテキスト情報を分析するように構成された固有の分析パラメータを利用することは、たとえば、実現可能な各々の文字についての平均的な文字幅、高さ、予想される大きさなどのそのフォントの公知の特徴を利用すると、有利になるかもしれない。これは、当業者がこの開示を読むことによって理解され得るであろう。 In addition, utilizing unique analysis parameters configured to analyze text information represented in a particular font or style can be achieved by, for example, average character width, high for each feasible character. Now, it may be advantageous to use known features of the font, such as the expected size. This can be understood by one of ordinary skill in the art upon reading this disclosure.

さまざまな実施形態においては、採用されている予め定められた分析パラメータ、設定、技術などは、好ましくは、１つ以上のＯＣＲパラメータ、設定、技術などを含む。 In various embodiments, the predetermined analysis parameters, settings, techniques, etc. employed preferably include one or more OCR parameters, settings, techniques, etc.

したがって、いくつかのシナリオにおいては、光入力をキャプチャする前にモバイルデバイス光センサに提示された光入力を予め分析するように構成された機能を含むことがさらに好ましいかもしれない。たとえば、好ましい実施形態においては、キャプチャインターフェイスが呼出されると、（キャプチャインターフェイスの呼出しを命じるユーザ入力に応じて、自動的であろうとなかろうと）モバイルデバイスは、光入力が識別可能なオブジェクトを含むかどうか、さらに理想的には、このような検出された如何なるオブジェクトのアイデンティティまたは分類をも含むかどうかを含めて、但しこれには限定されずに、光入力の特徴を決定してもよい。 Thus, in some scenarios, it may be further preferred to include a function configured to pre-analyze the light input presented to the mobile device light sensor before capturing the light input. For example, in a preferred embodiment, when a capture interface is invoked, the mobile device includes an object whose light input is identifiable (whether automatic or not in response to user input commanding the capture interface). Whether, or more ideally, the characteristics of the light input may be determined, including but not limited to, including any such detected object identity or classification.

さらなるアプローチにおいては、光入力は、ユーザ入力インターフェイス（たとえば、さまざまな実施形態においては、バーチャルキーボードおよび／またはその光入力拡張部）を呼出すために、ユーザが対話するウェブページ、アプリケーション、フォーム、フィールドなどから決定されるかまたはこれらに基づいて決定されるコンテキスト情報に基づいて分析されてもよい。たとえば、当業者によって認識され得るように、既存の技術では、たとえば、制限された入力インターフェイス（たとえば、生年月日または社会保障番号を入力するための、数字で構成されるインターフェイス；「名前」などを入力するための、アルファベット文字で構成されるインターフェイス）を選択的に呼出すことによって、ユーザがユーザインターフェイスに与え得る入力がユーザインターフェイスによって制限されてしまう。 In a further approach, the light input is a web page, application, form, field with which the user interacts to invoke a user input interface (eg, in various embodiments, a virtual keyboard and / or its light input extension). For example, or may be analyzed based on context information determined based on these. For example, as can be appreciated by those skilled in the art, existing technology, for example, has a limited input interface (eg, a numeric interface for entering a date of birth or social security number; a “name”, etc. The user interface restricts the input that the user can give to the user interface.

同様に、ここに記載される光入力拡張部は、拡張部を用いてキャプチャされた光入力を分析するために用いられる分析パラメータに影響を及ぼすか、当該分析パラメータを決定するか、または、当該分析パラメータを制限する可能性がある。分析が光学文字認識を含んでいる例示的なシナリオにおいては、たとえば、数字だけを受付けるフィールドのために用いられる分析パラメータは、数字に限定されたＯＣＲアルファベット、または逆に、アルファベット文字だけを受付けるフィールドのために文字に限定されたＯＣＲアルファベットを含み得る。好ましいアプローチにおいては、光入力拡張部は、所与のデータエントリフィールドについての許容可能な入力のタイプ、フォーマットなどに基づいて、自動的かつトランスペアレントに、分析パラメータを規定してもよく、規定するステップは、ユーザがデータエントリフィールドと対話している特定フィールドのための許容可能なタイプの入力を識別する命令を受信したことに応じて、直接実行されてもよい。たとえば、ユーザが、電話番号を入力として予想する記入可能なデータエントリフィールドと対話するシナリオの場合を例示する。従来の例に従うと、このデータエントリフィールドと対話するユーザには数字０〜９で構成されるキーボードが提示される一方で、ここに開示された発明の概念に従うと、この明細書中に記載されるのと同じデータエントリフィールドと対話して光入力拡張部を利用するユーザは、数字０〜９に限定されておりＯＣＲアルファベットを含む分析パラメータを採用してもよい。 Similarly, the light input extension described herein affects, determines, or determines the analysis parameters used to analyze the light input captured using the extension, or May limit analysis parameters. In an exemplary scenario where the analysis includes optical character recognition, for example, the analysis parameters used for a field that accepts only numbers are OCR alphabets limited to numbers, or conversely, fields that accept only alphabet characters. May contain OCR alphabets limited to letters. In a preferred approach, the optical input extension may define analysis parameters automatically and transparently based on acceptable input types, formats, etc. for a given data entry field. May be executed directly in response to the user receiving an instruction identifying an acceptable type of input for a particular field interacting with the data entry field. For example, illustrate the scenario of a user interacting with a fillable data entry field that expects a phone number as input. According to a conventional example, a user interacting with this data entry field is presented with a keyboard consisting of the numbers 0-9, while according to the inventive concept disclosed herein, it is described in this specification. Users who interact with the same data entry field and use the optical input extension are limited to the numbers 0-9 and may employ analysis parameters including the OCR alphabet.

上述の実施形態に従った例示的なシナリオにおいては、ユーザは、モバイルデバイスを用いて、ウェブページ、フォーム、モバイルアプリケーションなどに移動し得る。ユーザが対話し得る１つ以上の記入可能なフィールドは、ウェブページ上、ウェブブラウザのナビゲーションバー上、または、ユーザが対話中でありテキスト情報を好適な入力として受付ける媒体の他の如何なる要素上にも提示され得る。この対話が検出されたことに応じて、および／または、ユーザからの入力に応じて、モバイルデバイスは、たとえば、典型的には従来のモバイルデバイスに設けられた固有のＯＳ機能に含まれているような、「カメラ」アプリケーションを実質的に表わす光キャプチャインターフェイスを呼出してもよい。 In an exemplary scenario according to the above-described embodiments, a user may navigate to a web page, form, mobile application, etc. using a mobile device. The one or more fillable fields with which the user can interact are on the web page, on the navigation bar of the web browser, or on any other element of the medium that the user is interacting with and accepts text information as suitable input. Can also be presented. In response to detecting this interaction and / or in response to input from the user, the mobile device is typically included, for example, in a unique OS function provided in a conventional mobile device. An optical capture interface that substantially represents a “camera” application may be invoked.

光キャプチャインターフェイスが呼出されると、モバイルデバイスのディスプレイは、好ましくはリアルタイムで、またはほぼリアルタイムで、モバイルデバイス光センサの視界を示す「ビューファインダ」を表示する。モバイルデバイスは、ユーザ入力に応じて、または（好ましくは）ユーザにトランスペアレントな自動的な態様で、モバイルデバイス光センサによって受信される光入力（たとえば、ビューファインダディスプレイを生成するために利用される光入力）を利用して、上述のとおり事前の分析を実行してもよい。 When the light capture interface is invoked, the display of the mobile device displays a “view finder” that shows the field of view of the mobile device light sensor, preferably in real time or near real time. The mobile device is responsive to user input or (preferably) light input received by the mobile device light sensor in an automatic manner that is transparent to the user (eg, light utilized to generate a viewfinder display). Input) may be used to perform a prior analysis as described above.

特に好ましいアプローチにおいては、事前の分析は、光センサの視界（たとえば、バウンディングボックス）の一部に示された如何なるテキスト情報をも識別するステップと、識別された如何なるテキスト情報のプレビューをも表示するステップとを含み得る。さらにより好ましくは、識別されたテキストは、ユーザ入力インターフェイスおよび／またはその光入力拡張部を呼出すために、ユーザが対話したデータエントリフィールドに表示されてもよい。 In a particularly preferred approach, the prior analysis displays a step of identifying any text information shown in a portion of the optical sensor's field of view (eg, bounding box) and a preview of any text information identified. Steps. Even more preferably, the identified text may be displayed in a data entry field with which the user has interacted to invoke the user input interface and / or its optical input extension.

さらなるアプローチにおいては、ここに開示された方法、システム、および／またはコンピュータプログラムプロダクトは、ユーザ入力の受信と対応する出力の生成とを容易にするように構成された１つ以上のユーザインターフェイス（ＵＩ）とともに利用されてもよく、当該１つ以上のユーザインターフェイス（ＵＩ）において実現されてもよく、および／または、当該１つ以上のユーザインターフェイス（ＵＩ）を含んでもよい。ユーザ入力ＵＩは、モバイルデバイス・オペレーティングシステムとともに含まれる標準ＵＩの形式であってもよく、たとえば、標準的なＳＭＳメッセージ通信機能およびアプリケーション、ブラウザアプリケーションなどとともに用いられるようなキーボードインターフェイス；標準的な電話機能およびアプリケーションとともに用いられるようなテンキーインターフェイス；または、ユーザ入力、特にテキスト情報を含むかまたはテキスト情報に対応する入力を受信するように構成された他の標準的なオペレーティングシステムＵＩ（すなわち、スクリーンのさまざまな位置上でのタップ、またはテキスト情報に変換されるであろうまたはスピーチ、を含むユーザ入力）を含み得る。 In a further approach, the methods, systems, and / or computer program products disclosed herein may include one or more user interfaces (UIs) configured to facilitate receiving user input and generating corresponding output. ), May be implemented in the one or more user interfaces (UI), and / or may include the one or more user interfaces (UI). The user input UI may be in the form of a standard UI included with a mobile device operating system, for example, a keyboard interface such as that used with standard SMS messaging functions and applications, browser applications, etc .; A numeric keypad interface as used with functions and applications; or other standard operating system UIs configured to receive user input, particularly input that includes or corresponds to text information (ie, a screen User input including taps on various locations, or speech that would be converted to textual information).

たとえば、図１Ａに示されるように、ユーザ入力ＵＩ１００は、ナビゲーションＵＩ１１０、フォームまたはページ１２０、およびキーボードＵＩ１３０を含む。各々のＵＩ１１０、ＵＩ１２０、ＵＩ１３０は、モバイルデバイスオペレーティングシステムとともに含まれるモバイルデバイスオペレーティングシステム、標準的なブラウザもしくはモバイルアプリケーションによって提供される標準的なＵＩであり得るか、または、別々にインストールされたスタンドアロンのアプリケーションによって提供され得る。スタンドアロンのアプリケーションの実施形態は、シームレスなワークフローおよびユーザ経験においてコンテキスト依存型の機能およびキャプチャ／抽出機能を効率的に統合する能力があるために、好ましい。 For example, as shown in FIG. 1A, the user input UI 100 includes a navigation UI 110, a form or page 120, and a keyboard UI 130. Each UI 110, UI 120, UI 130 may be a standard UI provided by a mobile device operating system, standard browser or mobile application included with the mobile device operating system, or a stand-alone installed separately May be provided by the application. Stand-alone application embodiments are preferred because of their ability to efficiently integrate contextual and capture / extraction functions in a seamless workflow and user experience.

図１Ａを続けて参照すると、ワークフローがユーザ入力ＵＩ１００によって促進されている本願の文脈においては、ナビゲーションＵＩ１１０は、ワークフローなどのさまざまな段階を通じてナビゲートを支援するためのモバイルブラウザのアドレスバー、前送りボタンおよび／または後送りボタン（図示せず）などのナビゲーションコンポーネント１１２を含む。このことは、当業者がこの記載を読むことによって理解され得るであろう。 With continued reference to FIG. 1A, in the context of the present application where the workflow is facilitated by a user input UI 100, the navigation UI 110 includes a mobile browser address bar, forward to assist navigation through various stages such as the workflow. It includes a navigation component 112 such as a button and / or a backward button (not shown). This can be understood by one of ordinary skill in the art upon reading this description.

ワークフローのフォーム／ページ１２０は、好ましくは、ワークフローのキャプチャおよび抽出動作から出力される（任意には、この明細書中に記載されるように正規化および／または確認される）複数の識別子を入力として受信するように構成された複数のフィールド１２２〜１２８を含む。図１Ａに示されるように、フィールドは、都市フィールド１２２、郵便番号フィールド１２４、電話番号フィールド１２６、および州フィールド１２８を含む。当然、追加のフィールドがフォーム／ページ１２０に含まれてもよく、ユーザは、当業者がこの記載を読むことによって認識され得るであろう好適な任意の技術を用いて、そのさまざまなフィールドを選択的に表示するためにフォーム／ページ１２０内を移動してもよい。 The workflow form / page 120 preferably inputs multiple identifiers (optionally normalized and / or verified as described herein) output from the workflow capture and extraction operations. Includes a plurality of fields 122-128 configured to receive as. As shown in FIG. 1A, the fields include a city field 122, a postal code field 124, a telephone number field 126, and a state field 128. Of course, additional fields may be included in the form / page 120 and the user can select the various fields using any suitable technique that would be recognized by one of ordinary skill in the art upon reading this description. May be moved within the form / page 120 for visual display.

さらに、各々のフィールドは、そこへの入力として受信されるテキスト情報についての予想されるフォーマットおよび／または値もしくは値の範囲に関連付けられてもよい。たとえば、都市フィールド１２２には、大文字で始まり、その後に複数の小文字が続き、任意には、１つ以上のスペースまたはハイフン文字を含むが、数字および他の特殊文字を除く一連のアルファベット文字が予想され得る。逆に、郵便番号フィールド１２４には、数字および任意のハイフンまたはスペースを含む一連の５つの数字または１０個の文字が予想され得る。郵便番号フィールド１２４にはさらに、「＃＃＃＃＃−＃＃＃＃」などの特定のフォーマットに従った１０の文字列が予想され得る。同様に、電話番号フィールド１２６には、７つの数字、ならびに任意には１つ以上のスペース、括弧、ピリオド、カンマおよび／またはハイフンが予想され得る。電話番号フィールド１２６にはまた、米国における「（ＸＸＸ）＃＃＃−＃＃＃＃」などのいくつかの標準的な電話番号フォーマットのうちの１つに対応するマスクに従って、または、デバイスが用いられている地域に応じて他の対応する公知の規約に従って、そこに入力されるテキスト情報が予想され得る。州フィールド１２８には、大文字の２つの文字列が予想され得る。当然、他のフィールドは、同様に、そこに入力として受信されるように意図された情報に関連付けられた公知の規約、基準などに従って予想されるフォーマットおよび／または値もしくは値の範囲に関連付けられてもよい。 Further, each field may be associated with an expected format and / or value or range of values for text information received as input thereto. For example, the city field 122 begins with an uppercase letter followed by multiple lowercase letters, and optionally includes one or more spaces or hyphen characters, but expects a series of alphabetic characters excluding numbers and other special characters. Can be done. Conversely, the zip code field 124 can be expected to be a series of 5 numbers or 10 characters including numbers and any hyphens or spaces. In the zip code field 124, ten character strings according to a specific format such as “####-####” can be further expected. Similarly, the phone number field 126 can be expected to have seven numbers, and optionally one or more spaces, parentheses, periods, commas and / or hyphens. The phone number field 126 may also be used by a device according to a mask corresponding to one of several standard phone number formats such as “(XXX) ##-####” in the United States. Depending on the region being served, the text information entered therein can be expected according to other corresponding known conventions. In the state field 128, two uppercase strings can be expected. Of course, other fields are similarly associated with the expected format and / or value or range of values according to known conventions, criteria, etc. associated with the information intended to be received as input therein. Also good.

ユーザは、いずれかの好適な手段を用いて、たとえば、フィールドに対応するモバイルデバイスのディスプレイの領域上でタップすることによって、フィールド１２２〜１２８のうちの１つと対話してもよく、これに応じて、キーボードインターフェイス１３０が呼出されてもよい。代替的には、たとえば州フィールド１２８などのドロップダウンメニューフィールドの場合には、フィールドがユーザ定義のテキスト情報を受付けない場合、キーボードインターフェイスが呼出されないかもしれない。ユーザ定義のテキスト情報を受付けるフィールドの場合、フィールドとのユーザの対話がカーソル１２１の存在によって示されてもよい。特定のフィールドとのユーザの対話により、この明細書中にさらに詳細に記載されるように、ワークフローのコンテキスト依存型コンポーネント（たとえば、特定のビジネスルールを適用し、確認を実行し、文書分類を行うように構成されたコンポーネント）が呼出されるかまたはスケジューリングされる可能性もある。 The user may interact with one of the fields 122-128 using any suitable means, for example by tapping on the area of the mobile device display corresponding to the field, and accordingly Then, the keyboard interface 130 may be called. Alternatively, in the case of a drop-down menu field such as state field 128, the keyboard interface may not be invoked if the field does not accept user-defined text information. For fields that accept user-defined text information, user interaction with the field may be indicated by the presence of the cursor 121. User interaction with specific fields, as described in more detail in this document, applies contextual components of the workflow (eg, applies specific business rules, performs verification, and performs document classification) A component configured in such a manner may be called or scheduled.

キーボードインターフェイス１３０は、ユーザが対話していたフィールドのコンテキスト（たとえば、フィールドに入力されるテキスト情報についての予想される値または値の範囲）に基づいて、（たとえば、図１Ａに示されるように、都市フィールド１２２とのユーザの対話に応答する）アルファベット文字セット、または（たとえば、図１Ｂに示されるように、郵便番号フィールド１２４とのユーザの対話に応答する）数／記号の文字セットを選択的に含んでもよい。好ましくは、キーボードインターフェイス１３０は、ユーザがフィールドにテキスト情報を「タイプする」のを容易にするように構成された複数のキー１３２や、さらには、モバイルデバイスのマイクロホンおよび／またはカメラなどのモバイルデバイスのＩ／Ｏコンポーネントを用いて、１つ以上の動作を実行するように構成された機能ボタン１３４を含む。 The keyboard interface 130 may be based on the context of the field in which the user was interacting (eg, an expected value or range of values for text information entered into the field) (eg, as shown in FIG. 1A, Select alphabetic character set (responsive to user interaction with city field 122) or number / symbol character set (eg, responding to user interaction with postal code field 124, as shown in FIG. 1B) May be included. Preferably, the keyboard interface 130 includes a plurality of keys 132 configured to facilitate the user “typing” text information into the field, as well as a mobile device such as a microphone and / or camera of the mobile device. The function buttons 134 are configured to perform one or more operations using the I / O components of the.

キーボードインターフェイス１３０が呼出されると、図１Ａに表わされるように、キーボードインターフェイス１３０の機能ボタン１３４（たとえば、図１Ｂに示されるように、音声キャプチャまたはスピーチ・トゥ・テキスト機能に一般に関連付けられているボタン）は、モバイルのアプリケーションまたはワークフローの光入力拡張部を呼出すために、ユーザと対話させてもよい。実際には、光入力拡張部は、以下にさらに詳細に記載されるように、キャプチャインターフェイスを呼出し、（任意には、確認、分類などを含む）キャプチャおよび抽出の動作を開始する。 When the keyboard interface 130 is invoked, as shown in FIG. 1A, a function button 134 of the keyboard interface 130 (eg, generally associated with a voice capture or speech to text function, as shown in FIG. 1B). Button) may interact with the user to invoke the light input extension of the mobile application or workflow. In practice, the optical input extension calls the capture interface and initiates capture and extraction operations (optionally including confirmation, classification, etc.) as described in more detail below.

付加的および／または代替的には、光入力拡張部は、たとえば、図１Ｂにおいて概略的に示されるように、フォーム／ページ１２０内において別個のボタン１３６として、キーボードインターフェイス１３０とは別個に表示されてもよい。 Additionally and / or alternatively, the light input extension is displayed separately from the keyboard interface 130 as a separate button 136 within the form / page 120, for example, as schematically illustrated in FIG. 1B. May be.

１つのアプローチにおいては、文書の画像は、モバイルデバイスによってキャプチャまたは受信されてもよく、光学文字認識（ＯＣＲ）などの画像処理動作は画像上で実行されてもよい。さらなるアプローチにおいては、ユーザは、文書上でモバイルデバイスを移動させることで、キャプチャ動作を別個に呼出す必要なしに、供給映像から直接、ＯＣＲを介して識別子が抽出される。ＯＣＲ結果に全体的または部分的に基づいて、識別子、好ましくは一意識別子が画像から抽出され得る。 In one approach, an image of a document may be captured or received by a mobile device, and image processing operations such as optical character recognition (OCR) may be performed on the image. In a further approach, the user can move the mobile device over the document so that the identifier is extracted directly from the feed video via OCR without having to call the capture operation separately. Based in whole or in part on the OCR results, an identifier, preferably a unique identifier, can be extracted from the image.

正規化、確認
抽出された識別子は、参照コンテンツと比較されてもよく、または、１つ以上のビジネスルールを考慮して分析されてもよい。参照コンテンツおよび／またはビジネスルールは、好ましくは、効率的な比較および／または分析を容易にするためにモバイルデバイス上に局所的に記憶され、如何なる好適なフォームで提供されてもよい。 The normalized, confirmed and extracted identifier may be compared to the reference content or analyzed in view of one or more business rules. Reference content and / or business rules are preferably stored locally on the mobile device to facilitate efficient comparison and / or analysis and may be provided in any suitable form.

無数のアプローチにおいては、参照コンテンツは、識別子が抽出されるべき文書に対して、補足的文書のフォームを取り入れてもよい。補足的文書は、抽出された識別子の単純な比較が実行され得る場合に照合されるテキスト情報の文書、ファイルまたは他の如何なる好適なソースをも含み得る。たとえば、好ましいアプローチにおいては、モバイルアプリケーションは、１つ以上の補足的文書が記憶されたデータを含み、各々の補足的文書は、少なくとも１つの識別子に対応しているか、または、モバイルアプリケーションの１つ以上のワークフローにおいて利用されるタイプの識別子に対応している。 In a myriad of approaches, the reference content may incorporate a supplemental document form for the document from which the identifier is to be extracted. Supplementary documents may include textual information documents, files, or any other suitable source that are matched when a simple comparison of extracted identifiers can be performed. For example, in a preferred approach, the mobile application includes data in which one or more supplementary documents are stored, each supplementary document corresponding to at least one identifier or one of the mobile applications. It corresponds to the identifier of the type used in the above workflow.

補足的文書は、たとえば、モバイルアプリケーションを用いて、事前のキャプチャおよび抽出動作に基づいて取得されてデータストアに記憶され得るような識別子を含んでもよい。有利には、補足的文書は、識別子を示す文書についての処理済み画像を含んでもよく、上記処理は、（たとえば、カラープロファイル、射影効果の補正、配向補正などに基づいて、特化された二値化によって）データ抽出の目的で画像の質を向上させるように構成されていてもよい。文書画像は、モバイルアプリケーションまたはその特定のワークフローのその後の呼出しの際に画像化された文書から抽出される識別子の精度を確実にするために確認ツールとしての役割を果たしてもよい。当然、補足的文書が、確認された識別子（たとえば、正確であることが分かっている識別子についての一連の文字、記号）を単に含んでいる場合に、同様の機能が達成されてもよい。 The supplemental document may include an identifier that may be obtained and stored in a data store based on prior capture and extraction operations, for example, using a mobile application. Advantageously, the supplemental document may include a processed image for the document indicating the identifier, and the above processing may be carried out according to a specialized second (for example, based on color profile, correction of projection effects, orientation correction, etc. It may be configured to improve the quality of the image for data extraction purposes (by valuation). The document image may serve as a verification tool to ensure the accuracy of identifiers extracted from the imaged document during subsequent calls of the mobile application or its particular workflow. Of course, a similar function may be achieved if the supplemental document simply includes a confirmed identifier (eg, a series of characters, symbols for an identifier that is known to be accurate).

付加的および／または代替的な実施形態においては、ビジネスルールは、抽出された識別子の予想されるフォーマットを示してもよく、さらに（たとえば、文書の特定のカラープロファイルに基づいたＯＣＲパラメータを用いて；識別子が探索されている文書内の位置を制限するＯＣＲパラメータを用いて）識別子を如何に選択的に抽出するかについてのルールを含んでもよく、および／または、たとえば、マスクや正規表現を用いて、予想されるフォーマットに適合するように、抽出された識別子を変更して、いくつかの記号または文字セットを除外するためにＯＣＲアルファベットを変更することなどによってＯＣＲパラメータを変更してもよい。このことは、当業者が本明細書を読むことによって理解され得るであろう。 In additional and / or alternative embodiments, the business rule may indicate the expected format of the extracted identifier, and further (eg, using OCR parameters based on a specific color profile of the document) May include rules on how to selectively extract identifiers (using OCR parameters that limit the position in the document where the identifier is being searched) and / or using, for example, a mask or regular expression Thus, the OCR parameters may be changed, such as by changing the extracted identifier and changing the OCR alphabet to exclude some symbols or character sets to match the expected format. This can be understood by one of ordinary skill in the art upon reading this specification.

１つのアプローチにおいては、ビジネスルールは、情報のうち、適切にこの開示の範囲内の識別子であるとみなされる部分だけが特定のワークフローのコンテキストにおいて必要であるかまたは所望されていることを示す可能性もある。たとえば、ワークフローは、アドレスの郵便番号だけ、社会保障番号またはクレジットカード番号の最後の４つの数字だけ、日付の月および年だけ、インボイス上の勘定科目の部分だけ、たとえば価格または製品コードのどちらか一方だけ、などを必要とする可能性がある。このことは、当業者が本明細書を読むことによって理解され得るであろう。 In one approach, business rules may indicate that only the portion of the information that is considered appropriately an identifier within the scope of this disclosure is needed or desired in the context of a particular workflow. There is also sex. For example, the workflow can be just the postal code of the address, only the last four digits of the social security number or credit card number, only the month and year of the date, only the portion of the account on the invoice, eg price or product code Only one or the other may be required. This can be understood by one of ordinary skill in the art upon reading this specification.

ここに開示された発明の概念を含むビジネスルールを利用することの特定的な利点は、特定の抽出動作に適用される特定のビジネスルールが、機密のコンテキストであり、このため、抽出の試みに適用されるべき適切なビジネスルールを自動的に決定し得る点である。 A particular advantage of utilizing business rules that include the inventive concepts disclosed herein is that the specific business rules that apply to a particular extraction operation are in a confidential context, and therefore are subject to extraction attempts. It is possible to automatically determine the appropriate business rules to be applied.

ＯＣＲエラーの原因を明らかにして自動的に補正するために、いくつかのアプローチにおいては、抽出された識別子が補正されてもよい。たとえば、好ましくは、抽出された識別子は、補足的文書および／または定義済みのビジネスルールからテキスト情報を用いて補正される。 In some approaches, the extracted identifier may be corrected to account for and automatically correct the cause of the OCR error. For example, preferably the extracted identifier is corrected using textual information from supplemental documents and / or defined business rules.

定義済みのビジネスルールは、このコンテキストにおいては、好ましくは、補正が適用され得る許容可能なだけの不整合についてのしきい値を設定する等の、データを処理するためのビジネス指向の基準／条件を含んでもよい。（たとえば、補正は、文字の最大しきい値数、文字の最大百分率などよりも小さい不整合に適用されてもよく、補正は、定義済みの１セットの「許容可能な」エラーの範囲内に適合する不整合に適用されるだけであってもよく、たとえば、文字「ｌ」の代わりに数字「１」であったり、この逆の場合であったり、さらに、ハイフン「−」の代わりにダッシュ「―」であってもよく、他の同様のビジネス指向の基準／条件に適用されるだけであってもよい。このことは、当業者がこの記載を読むことによって理解され得るであろう。 Predefined business rules preferably in this context are business-oriented criteria / conditions for processing data, such as setting thresholds for acceptable inconsistencies to which corrections can be applied. May be included. (For example, the correction may be applied to inconsistencies that are smaller than the maximum threshold number of characters, the maximum percentage of characters, etc., and the correction is within a defined set of “acceptable” errors. It may only apply to matching inconsistencies, for example the number “1” instead of the letter “l”, or vice versa, and the dash instead of the hyphen “-”. It may be “-” and may only apply to other similar business-oriented criteria / conditions, which can be understood by one of ordinary skill in the art upon reading this description.

付加的および／または代替的には、抽出された識別子が変更されてもよい。たとえば、ＯＣＲエラーから生じる不一致はこの技術を用いて自動的に処理されてもよい。一実施形態においては、ビジネスルールに従うと、識別子は、予め定められたフォーマットであると予想される。たとえば、クレジットカードなどの扱いに注意を要する文書のコンテキストにおいては、識別子は、典型的には従来のクレジットカード／デビットカード上に見られるような「＃＃＃＃−＃＃＃＃−＃＃＃＃−＃＃＃＃」に実質的に適合する１６桁の数値フォーマットで予想されるアカウント番号であってもよく、または、「ＭＭ／ＹＹ」フォーマットの満了日付などであってもよい。このことは、当業者がこの記載を読むことによって理解され得るであろう。 Additionally and / or alternatively, the extracted identifier may be changed. For example, inconsistencies resulting from OCR errors may be handled automatically using this technique. In one embodiment, according to business rules, the identifier is expected to be in a predetermined format. For example, in the context of a document that requires attention to handling, such as a credit card, the identifier is typically "####-####-##" as found on conventional credit / debit cards. It may be an account number expected in a 16-digit numeric format that substantially conforms to “##-####”, or an expiration date in “MM / YY” format, or the like. This can be understood by one of ordinary skill in the art upon reading this description.

さらなる実施形態においては、抽出された識別子は。正確に抽出されるかもしれないが、それでも、期待されたのとは異なるフォーマットで提示される可能性もある（たとえば、識別子は、スペース、ダッシュまたは許容不可能な文字などの予測される記号またはフォーマット化を含み得るかまたは除外する可能性もある）（たとえば、予想されるフォーマットが厳密に「０１」などの数である場合にはアルファベット文字を含む「Ｊａｎ」および「Ｊａｎｕａｒｙ」などのように年月日に関して月の指定がなされる）。 In a further embodiment, the extracted identifier is. May be extracted accurately, but may still be presented in a different format than expected (e.g. identifiers are expected symbols such as spaces, dashes or unacceptable characters or May include or exclude formatting) (e.g. "Jan" and "January" including alphabetic characters if the expected format is strictly a number such as "01") The month is specified for the date).

この性質の不一致は、データ正規化機能を活用することによって自動的に解決されるかもしれない。たとえば、抽出された識別子が日付を含んでいるいくつかのアプローチにおいては、２００１年１月１日（01 January, 2001; January 01, 2001）、０１／０１／０１、０１年１月１日（Jan. 1, 01,）などの日付データが表わされ得る好適なフォーマットの有限集合がある。このことは、当業者がこの記載を読むことによって理解され得るであろう。他のタイプの識別子データは、アカウント番号（たとえば、フォーマット＃＃＃＃−＃＃＃＃−＃＃＃＃−＃＃＃＃、＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃、＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃などの従来の１６桁のアカウント番号）、カード所有者名前、（たとえば、ラストネーム、ファーストネーム；ラストネーム、ファーストネーム、ミドルネームのイニシャル（Middle Initial：ＭＩ）；ファーストネーム、ラストネーム；ファーストネーム、ＭＩ、ラストネーム；など）、暗証番号（たとえば、３桁または４桁の数、文字および数の両方を含む英数字の列など）を含む有限数のフォーマットで同様に表現されてもよい。このことは、当業者がこの記載を読むことによって理解され得るであろう。 This property mismatch may be resolved automatically by leveraging the data normalization function. For example, in some approaches where the extracted identifier includes a date, January 1, 2001 (01 January, 2001; January 01, 2001), 01/01/01, January 1, 2001 ( There is a finite set of preferred formats in which date data can be represented, such as Jan. 1, 01,). This can be understood by one of ordinary skill in the art upon reading this description. Other types of identifier data may be account numbers (eg, format ####-####-####-####, #### #### ######## , ######## ######## etc., conventional 16-digit account number), cardholder name, (for example, last name, first name; last name, first name, middle name) Middle Initial (MI); first name, last name; first name, MI, last name; etc.), PIN (for example, a 3 or 4 digit number, an alphanumeric string containing both letters and numbers) Etc.) may be expressed in a similar manner. This can be understood by one of ordinary skill in the art upon reading this description.

識別子データについての予想されるフォーマットまたは実現可能なフォーマットの有限集合を規定するビジネスルールに基づいて、ここに開示された技術は、財務書類から得られたデータが、たとえば、補足的文書のテキスト情報に含まれる／示される対応するデータの予想されるフォーマットと合致するような態様で、画像化された財務書類から（たとえば、抽出によって）得られたデータを自動的に正規化するように構成されてもよい。たとえば、日付などの抽出されたデータが、予想されるフォーマット（たとえば、ＭＭ／ＹＹ）以外の特定のフォーマット（たとえば、Jan. 01, 2001）であると判断されると、抽出されたデータを特定のフォーマットから予想されるフォーマットに変換して、画像から導き出された識別子データと補足的文書からの対応するテキスト情報とを容易かつ正確に合致させることを可能にすることが有利である。 Based on business rules that stipulate a finite set of expected or feasible formats for identifier data, the techniques disclosed herein enable data obtained from financial documents, for example, textual information in supplemental documents. Configured to automatically normalize data obtained from an imaged financial document (eg, by extraction) in a manner consistent with the expected format of the corresponding data contained / presented in May be. For example, if the extracted data such as date is determined to be in a specific format (eg Jan. 01, 2001) other than the expected format (eg MM / YY), the extracted data is identified It is advantageous to convert from one format to the expected format, making it possible to easily and accurately match the identifier data derived from the image with the corresponding text information from the supplementary document.

他の場合には、データ正規化を達成するために反復的な繰返しのアプローチを利用することが有利であるかもしれない。たとえば、一実施形態においては、最初の繰返しは実質的に上述のとおりに行われる。すなわち、文書の画像から識別子を抽出し、抽出された識別子を１つ以上のデータソースからの対応するデータ（たとえば、補足的文書、データベース記録、定義済みのビジネスルールなどからのテキスト情報）と比較する。しかしながら、最初の繰返しにおける比較は、抽出された識別子とデータソースからの対応するデータとの間に如何なる整合性ももたらさない。いくつかのアプローチにおいては、この不整合は、画像化された文書上の識別子と１つ以上のデータソースからの対応するデータとの間の真の不整合ではなくＯＣＲエラーの結果、もたらされるものである可能性がある。 In other cases, it may be advantageous to utilize an iterative iterative approach to achieve data normalization. For example, in one embodiment, the first iteration is performed substantially as described above. That is, extract an identifier from a document image and compare the extracted identifier with corresponding data from one or more data sources (eg, supplementary documents, database records, text information from predefined business rules, etc.) To do. However, the comparison in the first iteration does not provide any consistency between the extracted identifier and the corresponding data from the data source. In some approaches, this inconsistency is the result of an OCR error rather than a true inconsistency between the identifier on the imaged document and the corresponding data from one or more data sources. There is a possibility.

この性質のＯＣＲエラーは、いくつかのアプローチにおいては、識別子に対応するデータの１つ以上の特徴を決定することによって補正され得る。一実施形態においては、最初のＯＣＲの繰返しは、許容不可能なフォーマット（たとえば、データが適切に正規化されない）で識別子を抽出してもよく、および／または、抽出された識別子が１つ以上のＯＣＲエラーを含むような態様でＯＣＲを実行してもよい。結果として、抽出された識別子は、文書上に示されているような「真の」識別子が対応するデータのうちの少なくともいくらかと実際に合致しているという事実にもかかわらず、１つ以上のデータソースにおけるいずれの対応するデータとも合致しない。この多様性についての誤った否定的な結果は、識別子特徴に基づいて、ＯＣＲ動作の基礎となるパラメータ、ルールおよび／または仮定を変更することによって軽減または回避され得る。 This nature of OCR error may be corrected in some approaches by determining one or more characteristics of the data corresponding to the identifier. In one embodiment, the first OCR iteration may extract identifiers in an unacceptable format (eg, the data is not properly normalized) and / or one or more extracted identifiers. OCR may be performed in such a manner as to include OCR errors. As a result, the extracted identifier is one or more, despite the fact that a “true” identifier as shown on the document actually matches at least some of the corresponding data. Does not match any corresponding data in the data source. This false negative result for diversity can be mitigated or avoided by changing the parameters, rules and / or assumptions underlying OCR operation based on the identifier feature.

たとえば、一実施形態においては、識別子が抽出され、１つ以上のデータソースからの対応するデータと比較される。抽出された識別子を含む文字列は、対応するデータにおける如何なるアカウント番号とも合致しない。データソースにおける対応するいずれのデータも識別されなかったことに応じて、抽出された識別子がさらに分析されてその特徴が決定される。 For example, in one embodiment, identifiers are extracted and compared with corresponding data from one or more data sources. The character string including the extracted identifier does not match any account number in the corresponding data. In response to not identifying any corresponding data in the data source, the extracted identifier is further analyzed to determine its characteristics.

１つのアプローチにおいては、抽出された識別子は、複数の定義済みの識別子タイプ（たとえば、「ファーストネーム」、「ラストネーム」、「アカウント番号」、「満了日付」、「ＰＩＮ」など）と比較されて、抽出された識別子が定義済みの識別子タイプのうちの１つに対応するいずれかの特徴を示すかどうかが判断されてもよい。たとえば、抽出された識別子と定義済みの識別子タイプとを比較して、データフォーマット化および／またはデータ値に関して類似度が存在するかどうかが判断されてもよい。 In one approach, the extracted identifier is compared to multiple predefined identifier types (eg, “first name”, “last name”, “account number”, “expiration date”, “PIN”, etc.). Thus, it may be determined whether the extracted identifier exhibits any feature corresponding to one of the defined identifier types. For example, the extracted identifier and a defined identifier type may be compared to determine if there is a similarity regarding data formatting and / or data values.

このような比較にとって好適な例示的な識別子特徴は、いくつかのアプローチにおいては、文字列長、アルファベット列（すなわち、「アルファベット」、「数字」、「英数字」などの識別子が形成され得る文字セット）、特定のタイプの識別子に共通する１つ以上の認識可能なパターンの存在、または、これらの説明を読んだ当業者によって認識され得るような他の特徴をも含む。好ましいアプローチにおいては、識別子特徴は、公知のパターン合致ツール（たとえば、正規表現）を用いて認識可能な如何なるパターンをも含み得る。 Exemplary identifier features suitable for such comparisons include, in some approaches, character lengths that can be formed with identifiers such as string length, alphabetic string (ie, “alphabet”, “numeric”, “alphanumeric”) Set), the presence of one or more recognizable patterns common to a particular type of identifier, or other features as may be recognized by one of ordinary skill in the art having read these descriptions. In a preferred approach, the identifier feature can include any pattern that can be recognized using known pattern matching tools (eg, regular expressions).

付加的および／または代替的には、識別子タイプは、１つ以上の文書特徴など、たとえば、識別子が抽出される文書における位置；（２０１４年９月１８日に米国特許公報２０１４／０２７０３４９として公開され、引用によりこの明細書中に援用されている、２０１３年３月１３日に提出された関連する米国特許出願第１３／８０２，２２６号に開示されているような）識別子が抽出される文書の分類、および／または、文書上の識別子の隣りに、上方に、下方に、または、空間的に近接して位置するデータの特徴など、に全体的または部分的に基づいて判断されてもよい。これは、当該記載を読んだ当業者によって理解され得るとおりである。たとえば、好ましい実施形態においては、識別子特徴は、関連情報を示すデータの下方に位置する識別子が抽出される位置、たとえば、典型的には、特に郵便宛先を記載している文書における都市、州および／または郵便番号に対応する所在地住所のラインの下に位置する識別子などが抽出される位置、に基づいて決定されてもよい。別の好ましい実施形態においては、識別子特徴は、たとえば、満了日付またはアカウント番号がそれぞれ、典型的なクレジットカードおよびデビットカードの文書上に記載されているのと同様に、関連するデータに水平に隣接する位置から抽出された識別子に基づいて決定されてもよい。 Additionally and / or alternatively, the identifier type may be one or more document features, such as a location in the document from which the identifier is extracted; (published on September 18, 2014 as US Patent Publication 2014/0270349) Of the document from which identifiers are extracted (as disclosed in the related US patent application Ser. No. 13 / 802,226 filed Mar. 13, 2013, incorporated herein by reference). The determination may be based in whole or in part on classification and / or characteristics of data located next to, above, below, or in spatial proximity to an identifier on a document. This can be understood by those skilled in the art who have read the description. For example, in a preferred embodiment, the identifier feature is the location from which the identifier located below the data indicating the relevant information is extracted, eg, typically the city, state and It may be determined based on the location where an identifier or the like located below the line of the street address corresponding to the postal code is extracted. In another preferred embodiment, the identifier feature is horizontally adjacent to the associated data, for example, as the expiration date or account number, respectively, is described on typical credit card and debit card documents. It may be determined based on an identifier extracted from the position to be.

識別子特徴が決定されると、抽出された識別子は、識別子特徴を記述するいずれかの規約またはルールが違反されているかどうか（これは、さまざまなアプローチにおいて、ＯＣＲエラー、不適当なデータ正規化またはこれらをともに含む、抽出された識別子を示す可能性がある）を判断するために分析されてもよい。一例においては、抽出された識別子は、最初の比較に基づいて１つ以上のデータソースにおける対応するデータのうちのいずれとも合致しない。合致しなかったことに応じて、抽出された識別子が分析され、そして、抽出されたストリングの長さが１６文字であることに少なくとも部分的に基づいて、識別子タイプ「アカウント番号」であると判断される。抽出された識別子はさらに分析され、「アカウント番号」特徴に違反していると判断される。この分析により明らかになるように、アカウント番号ストリングは数値文字から構成されており、抽出された識別子は非数値文字を含んでいる。なぜなら、たとえば、抽出された識別子ストリングにおける１文字は、数字「８」でななく文字「Ｂ」であり、数字「１」ではなく文字「ｌ」であり、数字「０」ではなく文字「Ｏ」である等と、不適切に判断されたからである。このことは、当業者がこの記載を読むことによって理解され得るであろう。 Once the identifier feature is determined, the extracted identifier is used to determine whether any convention or rule describing the identifier feature has been violated (in various approaches, such as OCR errors, inappropriate data normalization or May be analyzed) to determine (which may indicate an extracted identifier, including both). In one example, the extracted identifier does not match any of the corresponding data in one or more data sources based on the initial comparison. In response to not matching, the extracted identifier is analyzed and determined to be of the identifier type “account number” based at least in part on the extracted string length being 16 characters. Is done. The extracted identifier is further analyzed and determined to violate the “account number” feature. As will be apparent from this analysis, the account number string is composed of numeric characters, and the extracted identifier contains non-numeric characters. For example, one character in the extracted identifier string is not the number “8” but the character “B”, not the number “1” but the character “l”, and not the number “0” but the character “O”. This is because it was determined inappropriately. This can be understood by one of ordinary skill in the art upon reading this description.

ＯＣＲエラーは、識別子特徴の確立に少なくとも部分的に基づいて第２のＯＣＲの繰返しを用いて補正されてもよい。数値ではなくアルファベット文字を誤って含んでいるアカウント番号の上述の例においては、ＯＣＲエンジンは、番号数字全体から構成されるアルファベットの候補文字に制限される可能性がある。ＯＣＲアルファベットを制限するという決定は、アカウント番号のフォーマットに適用される定義済みのビジネスルール（すなわち、アカウント番号が番号数字から構成されている）に基づいている。したがって、第２の繰返しは、第１の繰返しから誤って決定された「Ｂ」という文字ではなく、識別子における「８」という数字を適切に認識する。好ましくは、識別子は、上述のような少なくとも１つのビジネスルールに準拠している。より好ましくは、ビジネスルールは、少なくとも１つの論理式（たとえば、ルール、公式、パターン、規約、構造、組織など、または任意の数もしくはそれらの組合せ）として表現されてもよい。 The OCR error may be corrected using a second OCR iteration based at least in part on the establishment of the identifier feature. In the above example of account numbers that incorrectly include alphabetic characters rather than numbers, the OCR engine may be limited to alphabetic candidate characters consisting of the entire number digits. The decision to restrict the OCR alphabet is based on predefined business rules that apply to the account number format (ie, the account number consists of number digits). Thus, the second iteration properly recognizes the number “8” in the identifier rather than the letter “B” that was erroneously determined from the first iteration. Preferably, the identifier conforms to at least one business rule as described above. More preferably, the business rules may be expressed as at least one logical expression (eg, rules, formulas, patterns, conventions, structures, organizations, etc., or any number or combination thereof).

当業者であれば、同様のビジネスルールが、上に例示された数字／文字の区別とは異なるさまざまな状況において抽出された識別子ストリングを如何に規定するかに関して、ＯＣＲプロセスに通知し得ることを認識するだろう。 Those skilled in the art will be aware that similar business rules can inform the OCR process regarding how to define an extracted identifier string in a variety of situations different from the number / character distinction illustrated above. You will recognize.

たとえば、一実施形態においては、ビジネスルールは、たとえば、より包括的またはさまざまなアルファベット記号とは対照的に、特定のアルファベット記号が用いられるべきであることを示してもよい。ビジネスルールは、アカウント番号がハイフン記号文字（すなわち「−」）を含むがダッシュ記号文字（すなわち「―」）、下線記号文字（すなわち「＿」）および間隔文字（すなわち「」）を除外している規約に従うことを示している。したがって、第１の繰返しが、たとえば補足的文書中の対応するデータと合致する識別子をうまく抽出できない場合、第２の繰返しは、ビジネスルールに反映されていた予測に従って抽出結果を正規化するように、より制限されたアルファベットを用いて実行される可能性がある。 For example, in one embodiment, a business rule may indicate that a particular alphabet symbol should be used, for example, as opposed to more comprehensive or various alphabet symbols. The business rule is that the account number contains a hyphen character (ie “-”) but excludes a dash character (ie “—”), an underscore character (ie “_”) and an interval character (ie “”). Indicates that you are following a convention. Thus, if the first iteration cannot successfully extract an identifier that matches the corresponding data, for example in the supplemental document, the second iteration will normalize the extraction results according to the predictions reflected in the business rules. May be performed using a more restricted alphabet.

例示的なコンテキスト依存型ワークフロー使用事例
たとえば、モバイルのアプリケーションまたはワークフロー内で作業するユーザは、アプリケーション、ウェブページなどのフィールドと対話してもよく、さらに、特定のフィールドに基づいて、固有のビジネスルールが後続のキャプチャおよび抽出タスクに適用されてもよい。たとえば、郵便番号（たとえば、図１Ａのフィールド１２４）を要求するフィールドは、抽出された識別子が５（または９）桁のフォーマットを有するべきというビジネスルールを示すかまたは呼出してもよく、すべての文字は数値である（またはハイフンを含む）べきであり、５（または９）の数ストリングに隣接するアルファベット文字は抽出された識別子に含まれているべきではない。こうして、特定のフィールドとのユーザの対話は、その後、文書から識別子をキャプチャおよび抽出する際に適用される適切なビジネスルールのコンテキスト依存型の決定を行うことができる。 Example context-sensitive workflow use cases For example, a mobile application or a user working within a workflow may interact with fields such as applications, web pages, etc. May be applied to subsequent capture and extraction tasks. For example, a field requesting a zip code (eg, field 124 in FIG. 1A) may indicate or call a business rule that the extracted identifier should have a 5 (or 9) digit format, and all characters Should be a number (or include a hyphen) and alphabetic characters adjacent to a number string of 5 (or 9) should not be included in the extracted identifier. Thus, user interaction with a particular field can then make a context-sensitive determination of the appropriate business rules to be applied in capturing and extracting identifiers from the document.

この態様では、ユーザは、街路住所すべてを示す文書から郵便番号だけを選択的にキャプチャし、対応するモバイルのアプリケーションまたはワークフローの郵便番号フィールドをポピュレートし得るが、モバイルのアプリケーションまたはワークフローに如何なる命令も与えることはなく、如何なるテキスト情報もフィールドに入力する必要がない。 In this aspect, the user can selectively capture only the zip code from a document showing all street addresses and populate the zip code field of the corresponding mobile application or workflow, but any instructions in the mobile application or workflow There is no need to provide any text information in the field.

同様に、ビジネスルールは、モバイルのアプリケーションまたはワークフローを考慮して文書のコンテキストに部分的にまたは全体的に基づいていてもよい。たとえば、上述と同様の状況においては、ユーザは、郵便番号を予想するフォームまたはウェブページのフィールドと対話してもよい。しかしながら、フォームまたはページはまた、電話番号、住所の都市および州、名前、社会保障番号、満了日付、クレジットカード番号などのさまざまな情報を要求する他のフィールドを含む。ユーザが対話したフィールドが単一の文書（たとえば、運転免許証、公共料金請求書、クレジットカードなど）上に記載されている可能性の高い他の情報を要求するフォーム／ページの一部であるため、ビジネスルールが呼出されてもよく、これにより、後続のキャプチャおよび抽出の動作は、たとえ、ユーザが他のフィールドと対話していなかった可能性があったとしても、複数の識別子を抽出して、フォーム中の複数のフィールドを単一のプロセスでポピュレートしようと試みる。より明確には、これは、文書のコンテキストではなく上述の例におけるようなワークフローのコンテキストを構成している。 Similarly, business rules may be based in part or in whole on the context of a document in view of a mobile application or workflow. For example, in a situation similar to that described above, the user may interact with a form or web page field that expects a zip code. However, the form or page also includes other fields that request various information such as telephone number, city and state of the address, name, social security number, expiration date, credit card number and the like. The field that the user interacted with is part of a form / page that requests other information that is likely to be listed on a single document (eg, driver's license, utility bill, credit card, etc.) Because of this, business rules may be invoked so that subsequent capture and extraction operations extract multiple identifiers even though the user may not have interacted with other fields. Trying to populate multiple fields in a form in a single process. More specifically, this constitutes the context of the workflow as in the above example, not the context of the document.

文書コンテキストを決定するために、キャプチャインターフェイスが呼出されると、ビューファインダ内の文書が分析されて、１回のアプローチで文書のタイプが決定され得る。この決定に基づいて、複数の識別子抽出およびフィールドポピュレーションのプロセスが、（たとえば、文書タイプが、複数のフィールドに対応する複数の識別子を含む可能性の高いタイプの文書である場合には）実行されてもよく、または、（たとえば、文書タイプが典型的にはフォーム／ページ上の他のフィールドに対応する情報を示していないので、文書が複数回の抽出を試みるのに適切なタイプの文書でない場合には）回避されてもよい。 When the capture interface is invoked to determine the document context, the document in the viewfinder can be analyzed to determine the document type in a single approach. Based on this determination, multiple identifier extraction and field population processes are performed (for example, if the document type is a type of document that is likely to include multiple identifiers corresponding to multiple fields). Or (for example, because the document type typically does not indicate information corresponding to other fields on the form / page, the document is of the appropriate type to attempt multiple extractions. If not) may be avoided.

この態様では、たとえば、識別子が抽出されている文書のコンテキストと同様にフィールドとのユーザの対話によっても示されるように、モバイルのアプリケーション／ワークフローのコンテキストを活用することができる。有利には、このデュアルコンテキスト型のアプローチにより、事前の如何なるデータ入力にも依存することなく、光入力ベースのオートフィル機能が可能になる。オートフィルは、ほぼリアルタイムで第１回目のキャプチャ時に実行することができる。 In this aspect, the mobile application / workflow context can be exploited, for example, as indicated by user interaction with the field as well as the context of the document from which the identifier is being extracted. Advantageously, this dual context approach allows an optical input based autofill function without depending on any prior data input. Autofill can be performed during the first capture in near real time.

好ましいアプローチにおいては、ユーザは、１つ以上の文書の画像をキャプチャしてもよい。画像は、好ましくは、光Ｉ／Ｏ拡張部（たとえば、図１Ａおよび図１Ｂのそれぞれの拡張部１３４または１３６）によってキャプチャインターフェイスを呼出すことによって、モバイルデバイスのキャプチャコンポーネント（たとえば、上述されるような「カメラ」）を用いてキャプチャされる。キャプチャされた画像は、任意には、この明細書中に記載されるように、将来の使用および／または再使用を可能にするために、メモリ、たとえばモバイルデバイスのメモリに記憶されてもよい。特に、この開示の他の実施形態はまた、文書画像がキャプチャされていないシナリオを包含しているが、他の場合には、文書上に示されているかまたは文書に関連付けられている情報（たとえば、さまざまな文書上に示されている対応する識別子）を抽出および／または確認する際に、後に使用するために、デバイス（好ましくはモバイルデバイスなどのプロセッサを有するデバイス）において受信される。 In a preferred approach, the user may capture images of one or more documents. The image is preferably captured by the capture component of the mobile device (eg, as described above) by invoking the capture interface by an optical I / O extension (eg, the respective extension 134 or 136 of FIGS. 1A and 1B, respectively). "Camera"). The captured image may optionally be stored in a memory, eg, a mobile device memory, to allow for future use and / or reuse, as described herein. In particular, other embodiments of this disclosure also encompass scenarios where the document image is not captured, but in other cases the information shown on or associated with the document (eg, , Corresponding identifiers shown on various documents) are received at a device (preferably a device having a processor such as a mobile device) for later use in extracting and / or verifying.

文書の画像は、それに対してＯＣＲを実行することによって分析される。ＯＣＲは、画像から文字、特にテキスト文字を識別および／または抽出するために、実質的に上述したように利用されてもよい。さらにより好ましくは、抽出された文字は、文書を固有に識別する識別子を含む。識別子は、当該技術において公知の如何なる好適なフォームを採用してもよく、いくつかのアプローチにおいては、文字の英数字ストリング（たとえば扱いに注意を要する文書のアカウント番号）；（典型的にはクレジットカード／デビットカードのアカウントに関連付けられる１６桁のアカウント番号など）、暗証番号（デビットカード／クレジットカード上のＣＣＶコード、スクラッチカード確認コード、個人識別番号（personal identification number：ＰＩＮ）など）、満了日付（たとえば、フォーマット「ＭＭ／ＹＹ」）、などとして具体化されてもよい。このことは、当業者がこの記載を読むことによって理解され得るであろう。 The document image is analyzed by performing OCR on it. OCR may be utilized substantially as described above to identify and / or extract characters, particularly text characters, from images. Even more preferably, the extracted characters include an identifier that uniquely identifies the document. The identifier may take any suitable form known in the art, and in some approaches an alphanumeric string of characters (e.g., account number of a document that requires careful handling); (typically credits) 16-digit account number associated with card / debit card account), PIN (CCV code on debit card / credit card, scratch card verification code, personal identification number (PIN), etc.), expiration date (Eg, format “MM / YY”), etc. This can be understood by one of ordinary skill in the art upon reading this description.

ここに開示された技術は、いくつかの有利な特徴を利用することによって、文書所有者にそれらの文書に関する有益な情報および／またはサービスを与え得る。たとえば、任意には、モバイルデバイス上で実行されるモバイルアプリケーションなどのコンテキスト情報を考慮すると、データは、如何なるテキスト情報もユーザが入力する必要なしに、モバイルアプリケーションに自動的に提供され得るので、時間のかかるプロセス、ユーザエラー、予測による辞書の先入観、およびモバイルデバイスに関する従来のユーザベースのテキスト入力に共通する他の問題が回避され得る。 The technology disclosed herein may provide document owners with useful information and / or services regarding their documents by utilizing several advantageous features. For example, optionally considering the context information, such as a mobile application running on a mobile device, the data can be automatically provided to the mobile application without the need for any text information to be entered by the user. Such problems, user errors, predictive dictionary preconceptions, and other problems common to traditional user-based text entry for mobile devices can be avoided.

たとえば、一実施形態においては、特定のウェブページ、スタンドアロンのアプリケーションなどを表示する標準的なブラウザであり得るモバイルアプリケーションは、ユーザが自動車保険を申請し易くなるように構成されたワークフローを含む。ワークフローは、申請者の名前、運転免許番号、車両型式、モデル、および／または年、居住州などの情報を要求するフィールドを含み得る。 For example, in one embodiment, a mobile application, which may be a standard browser that displays a particular web page, stand-alone application, etc., includes a workflow configured to facilitate a user to apply for car insurance. The workflow may include fields requesting information such as the applicant's name, driver's license number, vehicle type, model and / or year, state of residence, etc.

ユーザがモバイルアプリケーションのフィールドのうちの１つを呼出したことに基づいて、および／または、モバイルデバイスを介して表示されたキーボードまたは他のユーザ入力インターフェイス（たとえば、図１Ａに示されるようなＵＩ１１０、ＵＩ１２０、ＵＩ１３０）の光入力拡張部（たとえば、図１Ａに示されるような拡張部１３４または図１Ｂに示されるような拡張部１３６）をユーザが呼出したことに基づいて、ビューファインダなどのキャプチャインターフェイスがモバイルデバイス上に呼出される。 A keyboard or other user input interface (eg, UI 110, as shown in FIG. 1A) displayed based on the user calling one of the fields of the mobile application and / or via the mobile device. A capture interface such as a viewfinder based on the user calling an optical input extension of UI 120, UI 130) (eg, extension 134 as shown in FIG. 1A or extension 136 as shown in FIG. 1B). Is called on the mobile device.

キャプチャインターフェイスは、ワークフローのフィールドについて要求される情報のうちいくつかまたはすべてを示す１つ以上の文書のキャプチャ画像、たとえば運転免許証および車両登録、をキャプチャするようにユーザを直ちに誘導し得る。好ましくは、キャプチャインターフェイスは、最適なキャプチャ条件（たとえば、照明、視野およびズーム／解像度）が達成された場合に、ビューファインダに示された文書を自動的に検出してその画像をキャプチャするように構成される。ビューファインダは、文書など上に示されたテキスト情報のライン、フィールドなどをキャプチャし易くするために、文書全体、すなわち長方形の箱型、の画像をキャプチャし易くするように長方形に配置された４つの角部などのレチクルを含み得る。このことは、当業者がこの記載を読むことによって理解され得るであろう。レチクルは、好ましくは、ユーザが、最適なキャプチャ条件を達成するためにデバイスおよび／または文書を方向付けるのを支援するように構成される。 The capture interface may immediately guide the user to capture captured images of one or more documents that show some or all of the required information about the workflow field, such as a driver's license and vehicle registration. Preferably, the capture interface automatically detects the document shown in the viewfinder and captures the image when optimal capture conditions (eg, illumination, field of view and zoom / resolution) are achieved. Composed. The viewfinder is arranged in a rectangle so that it is easy to capture the image of the entire document, that is, a rectangular box shape, in order to easily capture the text information lines, fields, etc. shown on the document. A reticle such as one corner may be included. This can be understood by one of ordinary skill in the art upon reading this description. The reticle is preferably configured to assist the user in directing the device and / or document to achieve optimal capture conditions.

より好ましくは、キャプチャ動作は、識別子を文書から正確かつ綿密に抽出することを容易にし、かつ、ワークフローのフィールドにおける対応するテキスト情報を正確かつ綿密に出力することを容易にするためにコンテキストにより認識されている、対応するテキスト情報は、さまざまなアプローチにおいては、抽出された識別子と同じであってもよく、または、予想されるフォーマットおよび／もしくは正確なＯＣＲエラーに従って正規化されてもよい。さらなるアプローチにおいては、識別子は、綿密かつ正確な抽出および出力を容易にするために、この明細書中により詳細に記載される参照コンテンツまたはビジネスルールと照らし合わせて確認されてもよい。 More preferably, the capture operation is recognized by the context to facilitate the accurate and thorough extraction of the identifier from the document and to facilitate the accurate and thorough output of the corresponding text information in the workflow field. The corresponding text information that is being performed may be the same as the extracted identifier in various approaches, or may be normalized according to the expected format and / or exact OCR error. In a further approach, the identifier may be verified against reference content or business rules described in more detail herein to facilitate in-depth and accurate extraction and output.

いくつかのアプローチにおいては、文書は、この明細書中にさらに記載されるように、文書のコンテキストを決定するために、および／または、マルチフィールド抽出動作を試みるべきかどうかを決定するために、分析および分類されてもよい。 In some approaches, the document, as further described in this specification, to determine the context of the document and / or to determine whether a multi-field extraction operation should be attempted, It may be analyzed and classified.

コンテキスト依存型のプロセス呼出し
さらなる実施形態においては、モバイルデバイス光センサを介して受信された光入力に基づいて１つ以上のコンテキスト的に適切なプロセスを自動的に呼出すことが有利となり得る。 Context-Dependent Process Invocation In further embodiments, it may be advantageous to automatically invoke one or more contextually appropriate processes based on light input received via a mobile device light sensor.

概して、このようなプロセスは、複数の実施形態に従って図３の方法３００に示されるように、グラフで表わされてもよい。方法３００は、図１Ａおよび図１Ｂにおいて示されるものを含む如何なる好適な環境においても、かつ、この記載を読んだ当業者によって認識され得る他の如何なる好適な環境においても、実行され得る。 In general, such a process may be represented graphically as shown in the method 300 of FIG. 3 in accordance with embodiments. Method 300 may be performed in any suitable environment, including those shown in FIGS. 1A and 1B, and in any other suitable environment that can be recognized by one of ordinary skill in the art upon reading this description.

図３に示されるように、方法３００は動作３０２〜３０６を含む。動作３０２において、ビューファインダインターフェイスが呼出され、モバイルデバイス光センサの視界を示すビデオフィードが表示されたときに起こるように、モバイルデバイスの１つ以上の光センサを介して光入力が受信される。 As shown in FIG. 3, the method 300 includes acts 302-306. In operation 302, light input is received via one or more light sensors of the mobile device, as occurs when the viewfinder interface is invoked and a video feed showing the field of view of the mobile device light sensor is displayed.

動作３０４において、光入力が、光入力のコンテキストを決定するためにモバイルデバイスのプロセッサを用いて分析される。 In operation 304, the light input is analyzed using the processor of the mobile device to determine the context of the light input.

動作３０６において、コンテキスト的に適切なワークフローが光入力のコンテキストに基づいて呼出される。 In operation 306, a context-appropriate workflow is invoked based on the context of the light input.

コンテキストは、対応するワークフロー内の動作を実行することに関連する如何なる好適な情報をも含み得るものであって、好ましくは、光入力において表わされるタイプの文書；および、光入力において表わされる文書のコンテンツのうち１つ以上を含む。 The context may contain any suitable information related to performing the actions in the corresponding workflow, preferably the type of document represented at the light input; and the document represented at the light input Contains one or more of the content.

コンテキストが文書タイプを含んでいる場合、好ましくは、文書のタイプは、契約書、扱いに注意を要する文書、身元確認書類、保険文書、権利証、見積もり、および車両登録からなる群から選択される。コンテキストが文書コンテンツを含む場合、好ましくは、コンテンツは、電話番号、社会保障番号、署名、インボイスの勘定科目、部分的または完全な住所、ユニバーサル・リソース・ロケータ、保険グループ番号、クレジットカード番号、問合せ番号、写真、および文書上に示されたフィールドの分布から選択される。 If the context includes a document type, preferably the document type is selected from the group consisting of contracts, sensitive documents, identification documents, insurance documents, certificates, quotations, and vehicle registrations. . If the context includes document content, preferably the content is a phone number, social security number, signature, invoice account, partial or complete address, universal resource locator, insurance group number, credit card number, Selected from query number, photo, and distribution of fields shown on the document.

１つのアプローチにおいては、ユーザは、モバイルデバイスの光センサの範囲内において、運転免許証、個人用小切手、ビジネス用小切手、契約書などの、署名を示す文書の位置を特定し得る。モバイルデバイスは、好ましくは文書の１つ以上の他の識別特徴（たとえば、運転免許証上の写真、小切手上の磁気インキ文字認識フォントなどの特定のフォント、フォーム上のフィールドの分布など）と組合わされた署名の存在を検出してもよく、さらに、モバイルデバイス上の適切なモバイルアプリケーションを自動的または半自動的に呼出してもよい。付加的および／または代替的には、特定のモバイルアプリケーション内において、コンテキスト依存型のビジネスプロセスまたはワークフローが同様に呼出されてもよい。 In one approach, the user may locate a document that indicates the signature, such as a driver's license, personal check, business check, contract, etc., within the optical sensor of the mobile device. The mobile device is preferably paired with one or more other identifying features of the document (eg, a photo on a driver's license, a specific font such as a magnetic ink character recognition font on a check, a distribution of fields on a form, etc.). The presence of the combined signature may be detected and the appropriate mobile application on the mobile device may be called automatically or semi-automatically. Additionally and / or alternatively, context-sensitive business processes or workflows may be invoked as well within a particular mobile application.

呼出すべき適切なワークフローを示し得るさまざまな情報は、保険額の見積り、健康管理許可プロセス、署名の儀式、預託、またはそれらのいずれかの組合せである。運転免許番号および車両識別番号は、自動車保険見積もりの妥当性を示す場合もある。健康保険プロバイダ名、保険契約者（患者の名前）および／またはグループ番号は、代替的には、健康管理許可ワークフローまたは健康保険見積もりワークフローの妥当性を示す場合もある。署名または署名欄と共に抵当証書または融資申請書などの融資契約書に共通するテキスト情報を含む文書は、署名儀式ワークフローの妥当性を示し得る。署名およびアカウント番号または預託額を含む文書は、預託ワークフローの妥当性を示し得る。当然、ここに開示された発明の概念は、当該記載の範囲から逸脱することなく、本開示を読んだ当業者によって理解され得るように、他のワークフローにも適用され得る。 Various pieces of information that may indicate the appropriate workflow to call are insurance estimates, health care authorization processes, signature rituals, deposits, or any combination thereof. The driver's license number and vehicle identification number may indicate the validity of the car insurance quote. The health insurance provider name, policyholder (patient name) and / or group number may alternatively indicate the adequacy of the health care authorization workflow or health insurance quote workflow. A document that includes text information common to a loan agreement such as a mortgage or loan application along with a signature or signature line may indicate the validity of the signature ritual workflow. A document that includes a signature and an account number or deposit amount may indicate the validity of the deposit workflow. Of course, the inventive concepts disclosed herein may be applied to other workflows as will be understood by those of ordinary skill in the art having read this disclosure without departing from the scope of the description.

たとえば、署名および写真が検出されたことに応じて、モバイルアプリケーションは、ユーザが自動車保険を取得し易くするために保険額見積りワークフローを呼出してもよい。署名および特定のフォントが検出されたことに応じて、モバイルの小切手預託ワークフローが呼出されてもよい。署名およびフィールドの分布が検出されたことに応じて、抵当申請プロセスまたは文書署名儀式プロセスが呼出されてもよい。同様に、モバイルアプリケーション内から動作がまだ行われていない場合、モバイルデバイスは、さまざまな実施形態において、上述のようなコンテキスト的に適切な動作を容易にするように構成されたアプリケーションを呼出してもよい。 For example, in response to detection of a signature and a photo, the mobile application may call an insurance quote workflow to help the user obtain car insurance. In response to detection of the signature and a particular font, a mobile check deposit workflow may be invoked. A mortgage application process or document signature ritual process may be invoked in response to detection of signature and field distributions. Similarly, if the operation has not yet taken place from within the mobile application, the mobile device may call an application configured to facilitate contextually appropriate operation as described above in various embodiments. Good.

コンテキスト依存型のプロセス呼出しの他の例は、以下のうちいずれか１つ以上を含み得る。モバイルデバイスの光センサを考慮して示される文書がインボイスであることが検出されたことに応じて、（たとえば、用語「インボイス」、インボイス番号、公知のサービスプロバイダエンティティ名、住所などの存在を検出することによって）システム、アプリケーション、プロダクト（systems, applications, products：ＳＡＰ）または他の同様のエンタープライズアプリケーションを呼出して、インボイスの状態を自動的に表示する。 Other examples of context sensitive process invocations may include any one or more of the following. In response to detecting that the document shown taking into account the optical sensor of the mobile device is an invoice (for example, the term “invoice”, invoice number, known service provider entity name, address, etc. Call system, applications, products (SAP) or other similar enterprise applications (by detecting presence) to automatically display invoice status.

モバイルデバイスの光センサを考慮して示されるテキスト情報が電話番号であることを検出したことに応じて、モバイルデバイスオペレーティングシステムの電話アプリケーションが呼出されてもよく、番号が自動的に入力および／またはダイアルされてもよい。 In response to detecting that the text information shown in view of the light sensor of the mobile device is a phone number, the phone application of the mobile device operating system may be called and the number is automatically entered and / or It may be dialed.

モバイルデバイスの光センサを考慮して示されるテキスト情報がユニバーサル・リソース・ロケータであることを検出したことに応じて、モバイルデバイスのウェブブラウザアプリケーションが呼出されてもよく、ＵＲＬがナビゲーションもしくはアドレスバーに入力されてもよく、および／またはブラウザが、ＵＲＬによって示されるリソースに自動的に向けられてもよい。 In response to detecting that the text information shown in view of the mobile device's light sensor is a universal resource locator, the mobile device's web browser application may be invoked and the URL is displayed in the navigation or address bar. It may be entered and / or the browser may be automatically directed to the resource indicated by the URL.

モバイルデバイスの光センサを考慮して示されるテキスト情報がクレジットカード番号であることを検出したことに応じて、金融サービスアプリケーションまたはクレジットカード会社ウェブサイトが（ウェブサイトが呼出される場合にはブラウザを介して）呼出されてもよく、売掛勘定明細表、収支、満期日などがユーザに表示されてもよい。 In response to detecting that the text information shown taking into account the optical sensor of the mobile device is a credit card number, the financial service application or credit card company website (if the website is called up the browser And a bill of accounts, balance, maturity date, etc. may be displayed to the user.

モバイルデバイスの光センサを考慮して示されるテキスト情報が社会保障番号であることが検出されたことに応じて、税金の確定申告書類作成アプリケーションまたはウェブサイトが呼出されてもよい。 In response to detecting that the text information shown in view of the optical sensor of the mobile device is a social security number, a tax finalization application or website may be invoked.

当然、当業者が本明細書を読むことによって理解され得るように、この明細書中に開示された発明の概念は、テキスト情報のソースとして光入力の使用を必要とする如何なる好適なシナリオ、実現例、アプリケーションなどに用いられてもよい。特に好ましいアプローチにおいては、ワークフローのユーザ入力ＵＩは、モバイルデバイスの視界における光入力に基づいて、コンテキストによって呼出されてもよく、モバイルデバイス視界内の適切な情報がいずれも自動的にキャプチャされ、呼出されたＵＩの適切なフィールドに出力されると、適切にフォーマット化されて、如何なるＯＣＲエラーも既に修正された状態となる。 Of course, as those skilled in the art will appreciate upon reading this specification, the inventive concepts disclosed herein provide any suitable scenario, implementation that requires the use of optical input as a source of text information. For example, it may be used for an application. In a particularly preferred approach, the workflow user input UI may be invoked by the context based on the light input in the mobile device view, and any appropriate information in the mobile device view is automatically captured and invoked. When output to the appropriate field of the configured UI, it is properly formatted and any OCR error is already corrected.

この明細書中に開示された発明の主題の概念および特徴を例示するために、いくつかの例示的なシナリオを上述してきたが、当業者であれば、これらの概念が同様のいくつものシナリオ、実現例、実用化などに均しく適用可能であることを認識するだろう。たとえば、この明細書中に記載されるいくつかの例は、ウェブページと対話してそのウェブページの記入可能なフィールドに文書上に示されたテキスト情報を入力することを要望するユーザの観点から提示されてきたが、上述の発明の主題は、これらの開示を読む当業者によって認識され得る如何なる同様または同等のシナリオにも均しく適用可能である。たとえば、この主題は、バーチャルキーボードユーザインターフェイスを介してユーザがテキスト情報を入力すること、たとえば、ユーザが電子メールを作成すること、ユーザがアプリケーションと対話することなど、を必要とする如何なる状況にも同様に適用され得る。 Although several exemplary scenarios have been described above to illustrate the concepts and features of the inventive subject matter disclosed in this specification, those skilled in the art will recognize a number of scenarios where these concepts are similar, You will recognize that it is equally applicable to realizations and practical applications. For example, some examples described in this specification are from the perspective of a user who wants to interact with a web page and enter the text information shown on the document into a fillable field on that web page. Although presented, the subject matter of the invention described above is equally applicable to any similar or equivalent scenario that can be recognized by one of ordinary skill in the art reading these disclosures. For example, the subject matter can be used in any situation where a user enters text information via a virtual keyboard user interface, eg, the user creates an email, the user interacts with an application, etc. It can be applied as well.

この記載は主に方法に関連付けてなされてきたが、当業者であれば、この明細書中に記載される発明の概念がシステムおよび／もしくはコンピュータプログラムプロダクトにおいて、またはシステムおよび／またはコンピュータプログラムプロダクトとして、等しく実現され得ることを認識するだろう。 Although this description has been made primarily in connection with methods, those skilled in the art will recognize that the inventive concepts described herein are in a system and / or computer program product or as a system and / or computer program product. Will recognize that they can be equally realized.

たとえば、この記載の範囲内にあるシステムは、プロセッサを含み得るものであって、この明細書中に記載される方法のステップをプロセッサに実行させることによって、当該プロセッサにおいてロジックインされてもよく、および／または当該プロセッサによって実行されてもよい。 For example, a system within the scope of this description may include a processor and may be logic-in in the processor by causing the processor to perform the method steps described herein, And / or may be executed by the processor.

同様に、この記載の範囲内のコンピュータプログラムプロダクトは、プログラムコードが組込まれたコンピュータ読取可能記憶媒体であってもよく、プログラムコードは、この明細書中に記載される方法のステップをプロセッサに実行させるように、プロセッサによって読取り可能／実行可能である。 Similarly, a computer program product within the scope of this description may be a computer-readable storage medium having program code embedded therein, which program code performs the steps of the methods described herein on a processor. As such, it is readable / executable by the processor.

さまざまな実施形態を上述してきたが、これらが単に例示のためにのみ提示されたものであって限定を意図したものではないことが理解されるはずである。たとえば、本開示を読んだ当業者によって認識され得るこの明細書中に開示された実施形態の如何なる同等例も、この明細書中に記載される発明の概念の範囲内に含まれるものとして理解されるはずである。同様に、これらの発明の開示は好適な如何なる態様で組合わせられてもよく、その置換例、合成例、変更例が、これらの記載を読んだ当業者によって認識され得るだろう。 While various embodiments have been described above, it should be understood that these have been presented for purposes of illustration only and are not intended to be limiting. For example, any equivalents of the embodiments disclosed in this specification that may be recognized by one of ordinary skill in the art upon reading this disclosure are understood to be included within the scope of the inventive concepts described in this specification. Should be. Similarly, the disclosures of these inventions may be combined in any suitable manner, and their substitutions, synthesis, and modifications will be recognized by those skilled in the art after reading these descriptions.

このように、本発明の実施形態の広さおよび範囲は、上述の具体的な実施形態のいずれによっても限定されるべきではなく、添付の特許請求の範囲およびそれらの均等物に従ってのみ規定されるべきである。 Thus, the breadth and scope of embodiments of the present invention should not be limited by any of the above-described specific embodiments, but is defined only in accordance with the appended claims and their equivalents. Should.

Claims

A method,
Invoking a user input interface on the mobile device;
Calling an optical input extension of the user input interface;
Capturing light input via one or more light sensors of the mobile device;
Determining text information from the captured light input;
Providing the determined text information to the user input interface.

The method of claim 1, wherein the user input interface is invoked in response to detecting user interaction with a user interface element configured to receive text information.

Analyzing the light input to determine the text information, the analyzing step comprising:
Performing optical character recognition (OCR);
Identifying desired text information among the determined text information based on the OCR;
And selectively providing the desired text information to the user input interface.

The desired text information includes a plurality of identifiers, each identifier corresponding to one of a plurality of user interface elements configured to receive the text information;
The method of claim 3, wherein the providing step further comprises selectively providing each of the plurality of identifiers to the corresponding user interface element.

Checking and normalizing at least one of the identifiers to match one or more of an expected format of the desired text information and an expected range of values for the desired text information The method of claim 4, further comprising one or more of the steps of:

Determining one or more of business rules applicable to at least one of the identifiers and reference content from supplemental documents;
The step of determining is based on the element corresponding to the identifier;
6. The method of claim 5, wherein the step of verifying and the step of normalizing are based on one or more of the reference content and the business rules.

Confirming and normalizing the desired text information to match one or more of an expected format of the desired text information and an expected range of values for the desired text information; The method of claim 3, further comprising one or more of the steps.

8. The method of claim 7, wherein the step of verifying and the step of normalizing are based on one or more of business rules and reference content from supplemental documents.

The method of claim 8, further comprising determining one or more of the supplemental document and the business rules based on the elements with which the user interacted.

The method of claim 1, wherein the optical input extension is provided simultaneously with providing the called user input interface.

The user input interface includes a virtual keyboard displayed on the mobile device;
The method of claim 1, wherein the light input extension includes a camera button displayed on the virtual keyboard.

The method of claim 1, further comprising automatically calling an optical input capture interface in response to detecting a call to the optical input extension.

Further comprising pre-analyzing the light input prior to capturing the light input, wherein the pre-analyzing comprises:
Detecting an object indicated in the light input;
Determining one or more characteristics of the object indicated in the light input;
And determining one or more analysis parameters based at least in part on the determined characteristics.

The method of claim 13, wherein the one or more analysis parameters include OCR parameters.

A method,
Receiving light input via one or more light sensors of the mobile device;
Analyzing the light input using a processor of the mobile device to determine a context of the light input;
Automatically calling a contextually appropriate workflow based on the context of the light input.

The context is
The type of document represented in the optical input;
The method of claim 15, comprising one or more of the content of the document represented at the optical input.

The method of claim 16, wherein the document type is selected from the group consisting of a contract, a sensitive document, an identification document, an insurance document, a title, a quote, and a vehicle registration.

The content is shown on the phone number, social security number, signature, invoice account, partial or complete address, universal resource locator, insurance group number, credit card number, inquiry number, photo, and document The method of claim 16, wherein the method is selected from a distribution of measured fields.

The method of claim 15, wherein the workflow includes one or more of insurance amount estimation, health care authorization process, signature ritual, and deposit.

A computer program product comprising a computer readable storage medium having program code embedded therein, wherein the program code is
Call the user input interface on the mobile device,
Calling the optical input extension of the user input interface;
Capturing light input via one or more light sensors of the mobile device;
Determining text information from the captured light input;
A computer program product that is readable / executable by providing the determined text information to the user input interface.