EP1221160A2 - Informations-wiedergewinnungs-system - Google Patents

Informations-wiedergewinnungs-system

Info

Publication number: EP1221160A2
Authority: EP; European Patent Office
Prior art keywords: user; information; speech; users; application software
Prior art date: 1999-04-29
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.): Withdrawn

Application number

EP00921017A

Other languages

English (en)

French (fr)

Inventor

Benjamin Te-Eni

Gil Israeli

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

Spintronics Ltd

Original Assignee

Spintronics Ltd

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

1999-04-29

Filing date

2000-04-30

Publication date

2002-07-10

2000-04-30 Application filed by Spintronics Ltd filed Critical Spintronics Ltd

2002-07-10 Publication of EP1221160A2 publication Critical patent/EP1221160A2/de

Status Withdrawn legal-status Critical Current

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2452—Query translation
- G06F16/24522—Translation of natural language queries to structured queries
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
- G06F16/635—Filtering based on additional data, e.g. user or group profiles
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/64—Browsing; Visualisation therefor
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Definitions

the present invention relates generally to computer based information and commerce systems and wireless communications systems and more particularly, to a platform for organizing accessing and interacting with information according to users' individual interests over the cellular network.
the present invention thus transcends the boundaries of radio, Internet, and more traditional information formats.
the present invention provides a platform that enables access to information, advertising, and special features and services via the cellular network in an audio format.
the platform is represented by the concept of a Mobile Agent (a virtual window through which users may access all available information and services) and of a Virtual Companion (a representation of the system's audio interface by a user-selected persona).
the Companion whose voice and personality are selected by the user, serves as a guide to the many services and information retrieval options available to the user via the Mobile Agent.
a virtual private hard disk which is the user's personal information storage space and database, via a cellular switching center;
the invention makes use of the cellular phone as a remote input/output (I/O) device, and employs a speech-user-interface enabling speech activation and data retrieval in a convenient audio format; and
the system enables the information dissemination center to add hyperlinks relevant to radio and TN broadcasts; the hyperlinks are available via the cellular phone, without modifying the actual radio or TV broadcast.
users of the system can obtain information regarding media broadcasts via the cellular phone.
the Mobile Agent functions as a value-added service (VAS) to cellular telephone systems.
Value-added services help effect a virtual transmutation of the cellular phone, extending the range of functions from telephony to voice-mail and faxing capabilities, and ultimately far beyond, as the cellular phone morphs into a mobile personal computer (or personal digital assistant).
the present invention advances this transmutation by means of a unique speech user-interface (SUI) utilizing speech recognition technologies and innovative audio services, thereby allowing the cellular phone a range of functions restricted heretofore to personal computers.
SUI unique speech user-interface
Wireless communications technologies together with the "Information Superhighway" and World Wide Web, are revolutionizing the ways in which people access personal data, news, and information.
the tools now available to consumers are already rapidly changing the ways in which news and information are disseminated, and are altering people's personal habits of data storage and retrieval.
Even as people are becoming increasingly mobile the importance of maintaining a viable link with personal data, together with access to multiple forms of news and information, is becoming increasingly critical.
a hybridization of Internet technologies and wireless network capabilities is required
Traditional news media such as print media, radio, and television have specific disadvantages that can be best mitigated by complementary use of newer technologies.
Traditional news media are neither interactive nor personalized to accommodate user preferences, and so lend themselves to inefficiency; time, money, and resources are wasted.
each of these traditional media is restrictive to users, subjecting users to inconvenient physical and temporal restrictions, where all users receive the same broadcast, at the same time. This requires the media consumer's presence either at a particular time or at a specified location - or both (e.g., near the television at 8 o'clock), and enables the consumer an experience limited to the boundaries of the media broadcast planned for all the audience, with virtually no possibility of personalization or interaction with the broadcast.
traditional media update information slowly and infrequently.
the present invention addresses these problems, rendering the ironies of information retrieval in cyberspace obsolete by allowing users to free themselves from their PC's and workstations without compromising their unfettered access to news and information in cyberspace and the advantages in the availability of existing mass media.
An innovative speech user interface (SUI) is most congenial to a fast-paced, mobile lifestyle. Users need not be present at any particular location or at any particular time in order to access news and information items.
a goal of this invention is to integrate multiple forms of access to electronic data, information, and advertising into a single, easily negotiable format - a Mobile Agent. Users are provided with a wide range of services and information retrieval formats via the cellular network using an audio interface.
Another goal of the present invention is to make all processes and services simple and easy to use.
the speech-user-interface is coupled with audio-formatted data files, and then endowed with specific user-defined characteristics. Having selected a specific persona for the interface, the user interacts with the resultant Companion, which serves as a guide to all services and forms of information available via the Mobile Agent.
Companion adapts to each user's habits and preferences, and recognizes each user's voice and speech patterns. Users select a persona for their Companion, set preferences for all system functions and services, and create a self-profile that helps determine which advertising items are played to the user and in what manner and conditions. The system adapts to patterns of typical use.
Another goal of this invention is to provide users with customization options such that the user may determine which categories of news and information items the user would like to receive and not receive.
users are able to set and change preferences either over the wireless network using a cellular handset, or alternatively, over the World Wide Web using a PC.
Another goal of this invention is to provide information access in an audio format.
the user is not required to refer to a visual display or GUI in order to access the desired information during the information retrieval process.
Another goal of the present invention is to provide access to news and information such that access to information is completely “hands-free.”
the user is not required to "key-in" input commands, touch or manipulate an input device in order to retrieve information.
the user interacts with information such as broadcast radio and TV by selecting from a list of links respective to a specific media program.
the user I able to receive addition information regarding such programs or items on the program and purchase the information or commercial items related to the program. All using voice or WAP menu commands.
Another goal of the present invention is to assure each subscriber timely information upon request without delays for downloading files. Upon hearing a song the subscriber likes, he can immediately download the song to his mobile phone using speech or WAP commands to facilitate the download.
Another goal of the present invention is to present information and news items to users in formats of varying lengths so that information is disseminated at a rate that suits each user's preferences.
Another goal of the present invention is to provide quick, specific answers to straightforward questions, (e.g. " what is the song played on Capital Radio right now”? and : where can I buy the product being advertised right now on the radio?)
Another goal of the present invention is to provide road directions to mobile phone users, with integrated up-to-the-minute data such as traffic and road- condition information with road directions.
Another goal of the present invention is to augment the utility of information services by combining personal assistant and organizer functions with news and information dissemination.
the user is able to keep track of media bookmarks.
the system stores a time stamp upon receiving the users 'bookmark' command.
the bookmark then enables the user to return to that time frame on a specific radio or TV station, thereby relating to a playlist of news items, music items, or advertisements broadcasted at that time on the selected station.
Another goal of the present invention is to actively facilitate commerce and financial transactions via the cellular network.
Another goal of the present invention is to provide remote access to applications, files, and data, together with extensive storage space, via the Virtual PC.
System based application settings, files, and data are synchronized with information on the user's PC such that copies of user files may be obtained via the cellular system in audio format or via the user's PC.
Another goal of this invention is to conserve time, money, and the resources involved in communications and data access and retrieval, while providing news, information, and advertisements in a dynamic, efficient, entertaining format.
the present invention describes an information system for organizing accessing and interacting with information and media according to users' individual interests over the cellular network comprising: a speech recognition engine for converting speech received from the subscribers cellular telephone handset to to a plurality of commands for operations to be performed according to the speech from the subscribers cellular telephone handset; a Natural Language Engine for compiling the required speech interface and sending it to the recognition engine; a session management system for directing commands from recognition engine to the appropriate application and database typically via an application server, enabling entering and retrieving data from the respective application database; a Profile database for storing, updating, and retrieving personal data regarding preferences of the respective subscriber regading content, and speech interface, and a Content database for storing data as required by the subscriber enabling such data to be retrieved as desired by the subscriber.
both the session management and the application server may be implemented on the same server computer and commercial application server software platform such as Oracle Application Server, or BE A Weblogic
the application server integrates:
the system functions as an "agent” empowered to buy products according to the user's commands. These operations may be executed according to a profile of pre-defined user-preferences or system options, and in online, real-time transactions from radio stations/TV, etc. Via the cellular network users may access travel agent services, order a pizza using only a cellular phone, buy a product that is being advertised on the radio or TV, or buy in-depth an Interview that was broadcasted on the radio or TV.
the information made available to the users is retrieved from the radio TV and news papers and It is aggregated and tagged with properties and relevant links by a team of writers, editors, and media experts. Where required the information or links are then recorded in an audio format by voice and sound specialists. Human- voice audio clips are then stored on the server's database.
Subscriber profile data stored in the database This profile data represents user preferences, which are indications of categories of information items to which the user would like access.
the preference controls function as a filter for excluding those items inconsistent with the user profile data from the information items played over the cellular network.
Profile data is used to select advertisements appropriate to each user according to the chosen items. Users may set or change these settings using the cellular handset or alternatively, over the Internet using a PC.
the system also provides a system for answering user-initiated queries via the cellular network. To activate these general information services, users may ask questions on a wide range of topics using speech commands. Using advanced query applications, human speech is converted to a query with standardized formulation by a natural language server. Once converted, queries are processed and answered in a convenient audio format. The user can request that a copy of the response will be sent to an email address of fax number.
the Virtual PC consists of the user's private virtual hard disk (data storage space designated to each user,) together with access to audio enabled applications such as word processing. Users may create directories and subdirectories for storing, organizing, and in which to receive information items. Information (data, files, and application settings) on the Virtual PC is synchronized with information on the user's own PC, such that subscribers are granted remote access to the information stored on their PC together with access audio-enabled files via the cellular network.
a road directions transportation application Based on natural language server capabilities, the system uses segmented audio clips of road directions stored at a local database. In response to users' requests, individual clips of audio directions are linked together and played for the user via the cellular network. Road directions are customized to suit each user's mode of transportation, listening preferences, chosen route, and personal profile.
a General Information database containing reference information, tools for calculation, games, and other general information, to be supplied to subscribers according to the respective command.
the invention also provides an integrated audio interface called Virtual Companion, a conceptual representation of the user-interface in the form of a persona with specific attributes and characteristics.
the interface is based on speech commands as the primary mode of data input and audio files as the primary mode of data output. Communication with the user is made possible as the system learns user typical responses and requests.
the Virtual Companion Using both text-to-speech and speech-to-text engines, together with advanced voice-recognition technologies, the Virtual Companion enables users to make requests using natural speech commands, and allows users to hear information in a convenient, easy-to- use format.
An audio Mobile Agent system stores and updates a database of information items.
the information items are classified according to a precise categorization system whereby each item is tagged according to multiple properties such as source and time of broadcast. This tagging thus allows the system to match articles and advertisements to users, to cater to user preferences most effectively to effect a real-time retrieval of recent articles, advertisements, music items etc.
Tagging information items may also include links to related information items or commercial offerings enabling users to access additional information relevant to specific news items or broadcasts, and conduct pertinent commerce transactions.
system further comprises interfaces for external devices, including an audio player, an audio disk storage, a fax server, and an e-mail client, all controlled by the session management system.
external devices including an audio player, an audio disk storage, a fax server, and an e-mail client, all controlled by the session management system.
the natural language engine may be used independently to enhance and cache an existing speech interface used for any application.
the Natural Language Engine pre - loads the requested voice interface data or document from a document site or database, performs a series of rule-based scripts and processing functions on the document, and then provides the subscriber with the speech interface to the site requested enhanced by processing functions, all controlled by the session management system. 4. BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 is a block diagram illustrating a preferred system architecture in accordance with the present invention
Fig. 2 is a block diagram illustrating the system structure
Fig. 3 is a diagram illustrating the processes that generate Cellcast summary reports
Fig. 4 is a block diagram illustrating the operations of the Virtual PC
Fig. 5 is a flowchart illustrating the operations of the Adaptive Interface
FIG. 6 is an expanded illustration of FIG. 5, illustrating an example of system adaptive processes
Fig. 7 is a tree-diagram illustrating an example of user-created directories
Figs. 8a and 8b are flowcharts illustrating the operations performed in the course of a sample phone call
Fig. 9 is a block diagram illustrating one form of Natural Language Engine that may be provided to improve speed recognition and understanding performance.
Fig. 10 is a diagram illustrating how the Natural Language Engine interacts with the input source types.
the representative embodiment illustrated in Fig. 1 is a system in which speech recognition engine is comprised of two tiers of computers.
the first tier includes computers with telephony cards including hardware and software for interfacing with a telephony network (e.g. Dialogic DTI 300SC-E1 telephony card).
the second tier of computers includes speech recognition servers (102), running the recognition software itself.
the computers in this first tier interfacing directly with the telephone network on one end, and employing the speech recognition servers on the other end are referred to hereinafter as "Client Computers" (101).
this architecture is not mandatory for the implementation of the application and the speech recognition server may also reside on a DSP located on the telephony card itself or on the same computer running the telephony interface (e.g. Aculab Prosody cards).
the Client Computers are connected via communication lines to a telephony switching center.
the connection to the cellular operator may be via a dedicated connection such as an El trunk that carries multiple circuit calls. It is also possible to use a packet switching network for connection to the switching center in order to facilitate access to packet switching enabled cellular phones.
Mobile subscribers dial a dedicated access number (e.g., *99) and their call is routed via said trunk to one of the client computers.
Each client computer utilizes a trunk termination card that provides the interface between the lines and the client computer's resources.
the client computer may also incorporate an echo cancellation card to improve speech quality, enable better recognition rates and cut speech segments in accordance with speaker pauses.
All client computers connect to a plurality of servers via a Local Area Network (103) by utilizing open and proprietary protocols such as TCP/IP and Microsoft networking. Digitized speech segments are transferred to a speech recognition server (102), which applies various well-known algorithms for conversion into textual information.
the Natural Language Engine (104) retrieves from the database data to be accessed by users using a speech interface, implements several processing modules in order to construct the dialog file, pre-compiles the Recognition Server 102 Grammar and sends the files to the Recognition Server 102. as more particularly described below with respect to Figs. 9 and 10.
the session management system (105) directs commands from recognition engine to the appropriate application and database typically via an application server .
the session management system (105) tracks all ongoing sessions and provides the context for each user service request.
the session management system (105) also interfaces with a mechanism for billing each cellular subscriber for its access to system resources.
the session management system has access to all applications and resources available on the LAN and provides means for linking all system functions and resources so as to supply a seamless flow of information to and from the user while minimizing unnecessary delays.
an application server (106), which runs dedicated software modules, each designed to handle a subset of the embodiments in accordance with the invention.
Further resources include interfaces, generally designated (107), to external devices or computer systems, including a messaging gateway (fax, e- mail, SMS, WAP etc.), a web gateway to facilitate Internet surfing capabilities, and various other value-added gateways.
a plurality of database structures (108) is maintained in order to provide access to subscriber profiles and content records.
the client computer 101 Subscriber dials the access number (e.g. *99) and is routed via one of the available trunks to a client computer 101 ;
the client computer software identifies an incoming call and answers it;
the session management system 105 handles the call from now on by loading the user's profile from the profile database 108 according to its caller ID data;
the natuaral languge engine 104 compiles the dialog interface and send it to the recognition server 102.
the corresponding prompt is played by the audio player 109;
the user says "get me a Pizza”;
said speech segment is sent to the speech recognition server 102, which in turn responds with set of system commands, such as PURCHASE ITEM, where ITEM-PIZZA according to the grammar definitions received from the natural language engine 104;
the session manager 105 transfers the command to the application server 106, while appending user identity code;
the application server searches the user profile database 108 for a definition of the purchase items, such as a Pizza, and in case no such item exists, a response is played back to the user which can select to connect to a pizza shop via the internet through the XML web agent 107.
a valid purchase item is located, further environment variables are looked into. These may include the user's current home address (or current location if pizza is to be sent to his current location which may be provided by the mobile switching center or by the user in speech form). Other implied information elements such as preferred pizza shop, preferred type of pizza etc. may be completed through the user's profile database of through alternative environment variables.
NLE Natural Language Engine
the Natural Language Engine interacts with an Automatic Speech Recognition (ASR) engine of choice, a Voice Extensible Mark Up Language (VXML) Engine, or any other Voice Markup Language (VML) recognition engine of choice such as the Motorola VoXML engine (all engines also referred to hereinafter as "Recognition Engine”).
ASR Automatic Speech Recognition
VXML Voice Extensible Mark Up Language
VML Voice Markup Language
the Natural Language Engine retrieves the data or document that is to be accessed with a speech interface, i.e. any Voice Markup Language (VML), XML, or other data residing in a database, then implements several processing modules, pre-compiles the required Recognition Engine Grammar and saves it in a grammar cache storage unit.
the NLE recognition engine When the data or document is accessed by a user via the recognition engine the NLE recognition engine is fed with the preprocessed speech recognition data and grammar files and with the recorded audio prompt files, which make up the system audible response in the Dialogue ("prompts").
the grammar and prompt files are preprocessed in accordance with predetermined rules and artificial intelligence algorithms.
the Natural Language Engine as illustrated in figure 9, consists of several modules:
the Document Processing Engine (902) is responsible for receiving the voice Dialogue data from the application server or external XML agents, enhancing the Dialogue characteristics and transforming the data to a Dialogue data format understandable by the speech recognition server such as the recognition engine Application Programming Interface (API), VXML or other Dialogue definition language.
API Application Programming Interface
the Document Processing Engine includes several sub-modules:
the Data Retrieval Unit (905) Receives data requests from the NLE Manager and executes them, i.e. load the required document or information element from the database or from external agents such as the XML agent.
the Data Retrieval Unit also receives data requests from subsequent layers of the Document Processing Engine (i.e. the Parsing Unit and the Dialogue Enhancement Unit) to fetch external documents and executes them as well.
the Parsing Unit receives new documents from the Data Retrieval Unit and converts each document to its own respective internal representation. It consists of several plugable document parsers; each parser supports a specific input format (for example VXML, or JSGF), and converts it to an internal XML tree structure representing components of the source document.
the parser separates complex document structures to its basic components, to be later processed (and cached) by different modules. Such structure separation occurs, for example, in large VXML files, where an inline grammar is transformed to an external JSGF grammar file.
the Dialogue Enhancement Unit processes the XML components supplied by the Parsing Unit. It consists of several enhancing modules, each adapted to enhance a specific format or data source, such as grammar, Dialogue, prompt, etc.
the XML components are processed by separate enhancing modules according to their type. Each enhancing module runs a different set of algorithms or scripts designed specifically to its known component type. The enhancing module functions are detailed later in the functional description section.
An output format type specifier is attached to each processed XML component by its respective enhancing module, to indicate the possible output formats that can be used by the Format Wrapper. Note that some XML components might be created, modified, and destroyed in the enhancing process.
the Format Wrapper After each component has passed through the enhancement process, it is passed to the Format Wrapper (908), which transforms it to an output format compatible with the speech recognition server technology.
the Format Wrapper consists of several Formatting Modules, each supporting a specific type of output format. Some of the key formatting modules are: a Grammar Compiler, which compiles grammars to the proprietary ASR API / format in order to speed up recognition time; a VXML compiler, which generates standard VXML documents for use of a VXML engine; compilers for other proprietary voice dialogue standards, such as the proprietary Motorola VoXML format; a Prompt Compiler, which prepares voice prompts using a Text-To-Speech Engine or from a recorded source etc. Note that this process might include several stages.
a grammar file could be converted from JSGF format to a proprietary grammar format which is compatible with the grammar compiler used by the ASR, and then converted from this proprietary format to a compiled form.
the formatting process might invoke external components (for example an external grammar compiler for the ASR platform).
NLE Caching Server Each compiled component which is ready to be sent to the Speech Recognition Server is stored in the NLE Caching Server (903), with its respective creation timestamp and expiry timestamp.
the NLE Caching Server acts in a similar way to any caching/proxy server, serving cached documents to its client and managing document invalidation/refresh.
the overall NLE response time and performance is improved by the storage of precompiled grammar files.
the NLE Cache receives the compiled Dialogue files from the Dialogue file compiler and stores the complex grammars and prompts thereby relieving a VXML engine from slow and lengthy grammar compilation and speeding up recognition performance in the implementation platform.
the Caching Server acts as the one and only interface to the Speech Recognition Server. For each document requested from the ASR platform, the Caching Server checks if a valid copy of the document exists in the document cache database. If so - the document is fetched and transferred to the ASR platform (via TCP/IP or any other API required by the ASR). If no valid document is found (i.e. "cache miss") - the Caching Server triggers the NLE Manager, which in turn loads the required data from its respective source, runs its through the Document Processing Engine, and stores a valid copy of the document in the Caching Server. The Caching Server then immediately fetches this document to the requesting party.
the ⁇ L ⁇ manager (904) invokes the enhancement process or any part thereof periodically or subsequent a content update in a database or a change in one of the VXML or other documents the system is required to enhance access to.
Each access to the ⁇ LE Caching Server is also registered in the ⁇ LE Manager, which uses this information to predict document requests by the ASR, and prefetch these documents before they're actually requested in order to speed up document fetching time.
the system can receive a variety of input source types and process them in a variety of forms, as depicted in figure 10.
the output format could be VXML, VoXML, or any other proprietary format required by the Speech Recognition platform.
VXML source as follows:
the NLE receives the actual Option list available to the user at any given stage of the Dialogue, adds possible grammar options so a variety of voice commands and syntaxes can be recognized with improved performance.
grammar enhancements include:
the NLE may add suitable or generic prefixes such as: “Hmm", “I want!, “I will go for the", “I'll take the", “I would like the!” etc.
NLE can add several descriptions that would be acceptable as a reference to an item on the option list. For example in a "Get me” interactive media application which enables the user to get shows or songs played on the radio when the prompt reads out the recent tracks song list from which the user is expected to select a song that was played on the radio, where the basic grammar would be the name of the song and the performer (e.g. "Rain by Madonna"), the NLE adds grammar that enables the user to say "the song played by Madonna", "the first song on the list”.
the NLE can add grammar making it possible for the user to answer several questions in the Dialogue at once. For example instead of requiring the user to say name of the song first (e.g. "Rain by Madonna”), then the media (“mp file” or “CD”) and then the form of delivery (“email” or “FedEx”) the user can say: "MP file, CD, email” all at once.
NLE can add grammar to enable complete sentence syntax understanding, where in the GETME a song example the user can say: "I would like to buy Madonna's song as an MP file, please send it to my email” instead of answering any directed questions or using specific grammar and syntax.
the NLE detects ambiguous items in option lists, system-level commands and document-level commands (for example: when the user says 'help' he could select the command "help” or the song item "help”).
the system applies state specific rules preferring state specific grammar and adds a clarification sub-form in the current document for handling ambiguous command, so that the user is asked to clarify her selection (system: "did you mean the command 'help' or the song 'Help by The Beatles'").
the NLE detects the Dialogue structure and adds helpful navigation commands, especially (but not limited to) the "back" command. Additional navigation commands include next, skip, and generic 'goto' commands.
the NLE retrieves data in certain language and processes the speech commands to phonetic representation in the speech commands original language, transforms the phonetic representation to foreign language accent, and rebuilds the dictionary and grammar of the speech commands.
the NLE converts TTS (text to speech) prompts to natural (recorded) prompts on the fly, without requiring the VML content editor to change all its VML documents.
the system keeps a local database of matching recorded prompts, which replace the TTS prompts are inserted as part of the TTS prompts.
the NLE can precompile TTS prompts and store them in a local database as recorded (TTS) prompts.
TTS recorded
the system automatically adds pauses and intonation tags (as defined in Java Speech Markup Language, for example) according to predefined rules; thereby making the system prompts more fluent and natural sounding.
the system can automatically convert between different types of audio formats, grammar formats and other external file formats according to the file types required by the target platform.
the Natural Language Engine can be used in order to enable improved speech access to any content site, and can be installed as part of the user site, VML Engine, and as an independent server the service provider utilizes in the same manner an Internet Service Provider uses a proxy server for internet access.
the user typically connects to the Mobile Agent system by dialing a specified toll-free phone number.
the user's identity is established. Certain resources can be accessed only after user identity is established using voice user name, voice recognition technologies and a secret password.
the audio transmission commences. In the preferred embodiment, the user is guided through the personal audio transmission with a series of voice prompts; or the user may initiate communications using spoken commands. In alternate embodiments, the user communicates using the keypad of the cellular handset.
the speech-based user interface is coordinated to the specific functions of the user-handset, such that information may be displayed on the handset's screen and such that users may key-in commands when appropriate, or scroll through available options. These options are used in conjunction with spoken commands., using WAP (Wireless Application Protocol).
users select a persona from a list of predefined settings for the speech-based user interface.
the selection of the persona determines the following: (1) voice; (2) tone; (3) content and style of the system voice-prompts.
a wise old man, a sexy woman, or a cowboy may be selected as a persona, each with appropriate greetings, accents, and comments. Personas may also be constructed using clips of recordings of celebrity voices or famous characters. Pre-recorded audio-clips corresponding to the user-selected persona setting constitute the platform's primary mode of data output; this Virtual Companion serves as a guide to all system functions and services. Persona settings may include motivational messages, sayings, jokes, or slogans as part of the persona format. Other persona options include daily Bible readings or readings from other religious texts, or other messages which change daily. These messages are interspersed throughout the audio transmission at regular intervals, or during idle time.
Persona settings may be configured to active mode.
active mode the system will ring the user to give messages, or will prompt the user after a pre-defined period of inactivity (e.g., if the subscriber has not accessed the system for three days, he receives a call from his "Companion” asking how he is, what is new, etc.)
the Companion may have mood changes from day to day, or may have "needs" which the subscriber is asked to provide for. For example, the sexy woman Companion may "need" to be “complimented” as a condition for providing help to the user. In this way, the user forms a virtual relationship with the Companion, which serves to bond the user to the system, as the system serves social functions above and beyond the information management functions and services that the system performs.
the Companion reacts to changes in user habits from day to day.
the system is sensitive to such changes by virtue of the adaptive interface.
the Companion asks about recognized changes in user habits and responds to them. For example, the Companion may note that the user has not called Mary in one week, whereas he had previously placed daily calls to Mary. The system may inquire, "What is going on with Mary? would you like to call her?"
users configure persona settings using the cellular handset of via the Internet using a PC or workstation.
the persona settings are user-defined. Users may determine which words are uttered for any given command. For example, users may choose a special greeting for themselves. Messages may also be included as part of a user-defined persona. Corporations may use this feature to audio transmission corporate messages to its users. 5.2.4 Adaptive Interface
the system adapts to user habits by analyzing statistics based on users' listening habits and habits of use. As shown in Fig. 5, the system identifies patterns of regular use according to predefined formulae, and measures the occurrence of identified patterns; if the rate of recurrence of the pattern is above a predefined level, the system "adapts" by setting the identified pattern as the default mode of operations.
users evaluate the relevance of an information item, the duration of the item, and are able to evaluate the relative level of depth or understanding which the user brings to an item. For example, a user interested in law requests stories related to family law.
the evaluative features of the Mobile Agent enable the user to determine at which level of complexity he would like to receive stories, (beginner, experienced user, advanced, professional) the initial level definition results in a set of default interest rating user properties. Users also provide feedback to specific stories. The feedback is registered in the user's database and the matching software adds properties to the initial level definition provided by the user, making the appropriate changes.
An embodiment of the adaptive processes of this invention includes first, recording the subscriber usage data in terms of information item requests from a subscriber. The process compiles the usage data to give a complete usage picture for a given subscriber during a given period of time. Finally, the process compiles the usage data, compares the result with the subscriber's original profile and then adjusts the subscriber profile accordingly. This process assumes that records are tracked by day and by category structure; information item retrievals are tracked by subscriber and by time period; and profile category structure priorities are tracked for each subscriber.
the retrieval system of this invention also includes a process for adjusting subscriber profiles through the introduction of peripheral category structures into their profile from time to time. Subscribers initially create their own profiles by selecting their relevant areas of interest. As time passes, they refine their profiles directly through relevance feedback, and usage feedback by ordering full-text records from delivered briefs. From each method the subscriber indicates what they like or dislike of what they have received. However, no such feedback is available about records subscribers did not receive.
the automatic retrieval system of this invention provides a process for occasionally introducing, at defined times or randomly, peripheral category structures into a subscriber's profile to determine if the subscriber's interests are expanding into these peripheral areas. In this way, subscribers get to sample, on a limited basis, emerging fields and have their profiles adjusted automatically.
User profile adjustment includes ranking a subscriber's category structures in order of the number of information items retrieved to determine a usage rank.
the usage rank is compared with the original rank of the category structure.
a new profile rank is determined for each of N category structures by assigning various rates to the different category structures.
the new ranking for each category structure is determined by summing different ranks for that category structure to determine its new priority value. Rules can be applied to avoid wild swings in profile contents by for example, preventing a category structure from moving more than one place in priority for a given usage.
the system is capable of "learning" preferred orders of operation as well as specific data items. That is, the system learns to perform operations in the order the user prefers, and also learns to interpret the user's use of specific terms. For example, if a user repeatedly requests a summary report of CNN upon accessing the Mobile Agent, the system will suggest to the user that a summary report of CNN be set as the default mode of operations. In this way, the user will hear the report without having to request it specifically. An example of this operation is illustrated in Fig. 6. For another example, if a user repeatedly requests "Ben and Jerry's" upon asking the system to "get me” ice cream, the system will learn to procure Ben and Jerry's without being asked to specifically.
the system prompts the user to register his preference with the system. For instance, if the user repeatedly requests Ben and Jerry's, but also requests other brands of ice cream from time to time, the system will ask the user whether or not Ben and Jerry's should be set as the default definition for "ice cream.”
the system also adapts to user speech patterns. If the user tends to use commands other than the system-defined commands to perform certain functions, the system will learn the commands. Advanced forms of speech-recognition technologies may be used to accommodate colloquial language, irregular speech patterns, and background noise.
system adaptations are also enabled by means of analysis of subscribers' evaluations of articles, advertising, and services. Users are periodically asked to evaluate information items, advertising, or services, either in the form of a questionnaire or series of questionnaires, or in the form of a short prompt after a given segment of play. For example, after hearing a Premium report from Cellcast, a user is asked to provide rank the material as "very relevant, somewhat relevant, or not at all relevant.”, and to rank the form of the material as "too long", "good”, or “too short”. Answers to such system-initiated queries are stored together with user profile information, and are used to enable the system to adapt to user preferences.
Embodiments of the retrieval system of this invention can also include enhanced customization and duplicate elimination based upon information item properties.
a subscriber can define certain properties that the subscriber always wants to see, or always wants to discard. Through attribute selections, different subscribers can receive different treatment of the same news event. In other cases, a subscriber may want to see all treatments of a particular event or related to a particular party from all sources (e.g., where a public relations department may want to track all treatments of a particular client by the press).
Additional system adaptive capabilities include but are not limited to the following: users may set preferences for the style of interface by selecting a Virtual Companion; users determine the content of all forms of Cellcast reports, including categories of information items to be received or not received; and advertising information is selected based on a user profile.
users complete a profile report upon registration to the Mobile Agent.
Data is stored in the system database, (8a, Fig. 1) relating to users' personal details, such as name, contact information, age, gender, place of residence, occupation, income, and areas of interest. Users are also requested to complete a questionnaire, which relates to fields of user interest and affiliation. This information is used to select appropriate advertising information, and supports adaptive and customization options.
the step of supplying profile information includes providing the system with a user selection of category records (e.g., selecting Cellcast channels, per Fig. 3)
the selected category records are weighted to indicate not only priority among categories, but also degrees of preference.
the user may select default weights. If default weights are selected, the system assigns successive decreasing values for the weights based on his or her preferences. Alternatively, the user may enter weights for the various categories. The final weight determination is then made, performed using a pre-defined formula.
the user profile includes information relating to all system functions and settings.
User profile information includes but is not limited to user preferences relating to Mobile Agent functions and services, Cellcast channels and preferences, settings for the Virtual Companion, and user favorites.
User profile data also includes statistics related to subscribers' listening habits, such as the average duration of calls and the operations performed during each call. Records of all user activities are included, as well.
Users may change their profile information using a cellular phone or via the Internet using a PC.
advertising information is disseminated to users via the Mobile Agent. Advertising items may be played one item at a time or with several separate segments linked together, played sequentially.
Audio advertising information is ranked for relevancy for best advertising segmentation and performance, according to several parameters:
Topic i.e., relevancy to user interest topics
Target user profile information i.e., topic
Duration i.e., time needed to play each item.
Mobile Agent functions to which items are best suited. When to play which advertising items is determined using a calculus of the above factors.
users have a choice of service plans.
the different plans are classified as Basic, Premium, and Super-premium.
Super-premium services contain no advertising.
Premium users may make use of an advertising filter in order to request that certain topics of advertising be included or not included in the Mobile Agent services.
Basic service users receive advertising at a higher ratio of time of play to advertising time than Premium service users.
audio transmissions play advertising segments whose total length of time of play is determined according to a pre-defined ratio of advertising to airtime. For example, advertising may comprise 20%, 10%, or 0% of total airtime, according to the subscriber's service plan.
advertising segments are played in between tasks, so that the audio transmission is as smooth as possible.
Lengthy tasks may be interrupted by advertising. A task, clip, or procedure of no more than 2 minutes is immediately followed by advertising items with a total paying-time not to exceed the predefined ratio of play-to-advertising, (e.g., 2 minutes of audio transmission is followed by 12 seconds of advertising) Longer tasks are interrupted at the most convenient possible interval. All audio transmissions in basic service plan begin with a five-second advertising clip.
the ratio of broadcast time to advertising time may be configured on a sliding scale such that the more active time a subscriber logs with the system, the fewer advertising items she hears. That is, the advertising algorithm is such that as the airtime increases, the ration of advertising to airtime decreases. In this way users are "rewarded" for increased airtime.
Incentives may be offered to users in order to persuade users to listen to advertisements without interruption, or to persuade users to participate in surveys. Incentives include:
Frequent-flier miles The system monitors subscribers' advertising listening habits, such that the appropriate rewards may be given, and the appropriate incentives offered to each user. Advertising revenues may be used to fund or subsidize program costs.
each segment of the audio transmission is coded to support an advertising item of a specified length.
Articles are also coded for content, such that a sports article supports advertising items related only to topics of sports, health, and fitness.
Interactive advertising services are provided.
Interactive advertising items may include surveys or providing coupons to users, and enable purchase during the advertisement.
Advertisements are segmented and woven into audio transmissions.
CELLCAST a Information editor is used to select stories and information items, and to edit and format these items such that they are represented in a form suitable for dissemination to users using the present invention.
the selected and edited stories are then voice-recorded and stored in an information database on the Mobile Agent system.
the information editor categorizes each information item according to a predetermined set of criteria.
the information editor maintains a list of currently defined categories and sub-categories.
the personnel operating the Mobile Agent system may add and delete categories and sub-categories so as to accommodate major media events or special features.
the list of category definitions is thus relatively constant but subject to change.
the system editing crew As new information items are received they are each assigned a weight relevance value of 1-100 against each category structure by the system editing crew. As the information items are accumulated, they are ranked based on the assigned relevance values. A cutoff threshold determined for each category structure is then applied to information items with respect to each category structure. If the relevance value for an information item exceeds the cutoff threshold for a given category structure (e.g., 60 points), a pointer identifying the information item is included in the category structure. The cutoff threshold is different for different categories and is generally empirically determined. As a result of the above operations, the system maintains a ranked list of information items received for each category structure.
the cutoff threshold is different for different categories and is generally empirically determined.
a full sequence of assembling operations may be successively repeated (e.g., daily for a daily Cellcast report).
Fig. 3, briefly referred to earlier, illustrates and example of the process for generating Cellcast summary reports.
the individual assembling operations are performed for each profile. Since each profile may include a different scheme of prioritization, the relevance values are separately tailored for each profile. These adjusted values are used to re-rank the information items for each category structure in the profile. The information items are then selected based on a priority scheme to create the final set.
the information database stores three recorded versions of each story. These are:
Headline A derivative or nominal version of approximately 3-15 words, used primarily for organizing purposes;
the information database includes statistics relating to the user's listening habits, storing information pertaining to the number of times any given item has been played, and the duration of each audio transmission.
the database also includes advertising play statistics indicating how many times each advertisement has been played to each user.
the information items selected for Cellcast play consist of news and feature articles, entertainment features, music, and audio books.
Cellcast provides a wide range of musical selections to users, which are played on demand. Users may program personalized "radio shows” by choosing a play program for music selections. Cellcast also provides preset play programs. Music is organized according to channels by genre: jazz, rock, classical, country, local or regional music, and so forth. Some items are available at a paid premium only.
Cellcast also provides audio books for users to listen to. Users may listen to audio books in segments of a predetermined length, or may stop and start play as desired. Some items are available at a paid premium only.
Personalization of services is via the Internet, or via cellular handset, depending on each user's preference.
the user defines the profile of audio content that may be of interest to the user (e.g., specific stock quotes, or scores of specific sport teams.) This information is stored as part of the user-profile information database.
users select categories and sub-categories of information items. Each category may be set according to user preferences. Users determine:
This process begins with each subscriber creating a profile by choosing relevant category structures (their "primary category structures") for their own interests. Each subscriber determines the maximum number of information items and the maximum number of articles they want to receive each day. Next the process continues with the system determining "secondary category structures" and "neighboring category structures" for use by the system on days when record volume is low (e.g., a slow news day) as received from information providers. Secondary category structures are user defined lower priority categories for a user's profile, and neighboring categories are system-defined categories of related subject matter. Both contain records that, while not of primary interest to the subscriber, are still relevant to the subscriber. Finally, the process distributes information items according to the limits set by the subscriber and the availability of information items in each of the primary, secondary, and neighboring category structures.
Each of the category managers includes a profiler procedure for defining the subscriber's interest in receiving news items within each information category. For example, an "all" command can be used to select all sub-categories, and a “none" command can be used to indicate that the subscriber does not want to receive any news items for any given category.
the category manager profile procedure generates a category profile data structure that represents the subcategories of interest to the subscriber as well as any associated filters that have been defined.
Configuration for audio transmissions may occur at any time, including during the course of any given audio transmission. Any preferences that have been set automatically or that function as the mode of operations by default may be overridden with speech commands.
Users may pause the audio transmission at any point, and continue play when they so designate. Similarly, if the user hangs up in the middle of an audio transmission, she may return to the same point in the audio transmission when she calls back.
the Personal Summary report represents the set of data and information items to be received by the user upon activation of Report mode, either by spoken command, or set as default.
the personal summary report is read as a continuous audio transmission but is in fact comprised of a series short audio files of articles played in sequence. The stories are punctuated by a short pause after each story. Commands uttered during these pauses are to apply to the previous story.
the individual user determines the content, length, and style of the report.
each of the user's pre-selected channels is represented by an average of three stories, subject to user preference. Users may opt to exclude any selected channel from the standard summary reports, or to reduce or increase the number of stories to be included in any given summary report.
each short story corresponds to a "full story,” a longer and more detailed version of the short story, as well as a shorter "title” version, which is used mainly for sorting, browsing, and organizational purposes.
users may choose to include either of these versions in the personal summary report.
the user may access these corresponding versions of stories by uttering the appropriate commands even when the summary report is being played. For example, during the summary report a short version of the audio report regarding a basketball match is transmitted to the user. During the transmission the user utters the command "full Story”. The system immediately starts transmitting the "Full Story" version of that same short story.
the matching software is responsible for matching the items according to relevancy and other record and user properties.
the user profile and preferences are dynamically updated by the system. Tuning and redefining subscriber profiles is based on the subscriber's usage feedback, which is developed by tracking the data requests issued by the subscriber, and usage statistics. In this manner, the usage feedback acts as an implicit, non-intrusive way for subscribers to let the system know which types of records they consider the most relevant.
a subscriber is implicitly stating the relevance of that record to his or her interests. When several records of the same type (i.e. from the same category) are ordered, the statement of that category 's relevance to the subscriber becomes that more powerful. If the particular category in question has originally been placed by the subscriber low in the profile priority, the automatic profile tuning and redefinition process of this invention raises the category structure in priority to give it more prominence in the records or briefs delivered each day.
Reports may also include search results or specially requested programs, features, or information items.
Article Browser
the article browser is a program for listening to news items that the user specifically wants to hear.
the browser can be launched at the user's explicit command. It can also be launched from the personal summary report with the appropriate user commands such as, "full story,” or "more details,” indicating that the user wants to hear the full version of story.
the user may hear headlines - the shortest recorded versions of each story — and may use appropriate commands to hear fuller versions, including both the executive version used in the summary report, or the comprehensive version.
Users may either use voice commands to navigate between categories, sub- categories, and individual information items, or may direct the program to function in continuous mode within a given category or sub-category.
users may listen to any category or subcategory of data or information items using the continuous mode. This is an option to hear news stories or data read in sequence, one item after the other, without requiring a new voice commands before each story is read.
the continuous mode functions only within each given story category or subcategory and not between categories.
a Premium information retrieval service is available for business users and professionals.
the Cellcast media team locates information on specialized topics of interest, customized to suit each user's field or specialty. Users may request news, information, press releases, research, reports, and reviews on a vast array of topics. For example, a user may request information about mergers and acquisitions in the telecommunications industry in Europe, data relating to recent fluctuations in crude oil prices in the Middle East, or reviews of recently released books on Internet-related law.
Premium service uses the broadest possible range of Internet-based sources in order to obtain the most pertinent information, including news feeds from a number of information transmission services, hundreds of information databases, and full access to the World Wide Web.
a computerized tagging, mapping, and filtering system combines forces with a human editing team to determine which sources are the most accurate and relevant to the users' chosen topic(s). Articles are prioritized and categorized according to relevance and user preference.
Premium service users may also track a specified set of named entities, such as companies.
Embodiments of the retrieval system of the invention can include a process whereby the subscriber selects a collection of records containing relevant information about any of a specified set of named entities from a larger set of records whose content may be either relevant or non-relevant to the set.
the relevant information can include the full set of information items relevant to the companies or named entities in the set, or a subset of those records determined by additional subject matter criteria.
the tracking process includes a multi-stage, rule-based system that attaches one or more tags to an information item corresponding to each company or named entity specified as a member of the set.
the information items are collected and rule- base tags are attached to them corresponding to each company or named entity that is part of the specified set.
Information items are then sorted and evaluated accordingly.
the resultant set of information items may be played as part of the subscriber's summary report, or stored on the subscriber's personal hard disk for future reference.
users may access a Personal Research Assistant function via the application server 6.
This function allows the user to obtain answers to specific queries in addition to a compilation of articles on relevant topics.
the Personal Research function combines the specialization and broad-based information access of the Premium service with full human capabilities.
Personal Research Assistant functions are tailored to high-level, complex queries.
An example of such a query might be, "How does the real estate market in California change directly after earthquakes," or "How has the quality of women's lives in Afghanistan changed over the last ten years?”
Answers to questions can be represented by special reports, recorded specifically in response to the user's question; a series of articles edited and filtered by a human research assistant; or a combination thereof.
the Personal Research Assistant will locate the requested information when data is otherwise unavailable.
user-initiated queries are answered via the cellular network.
Users access a selection of short-answer tools through the Mobile Agent, which are together classified as General Information Services of the application server 6. These services include,, but are not limited to, reference information, tools for calculation, and short entertainment features.
the General Information Services are designed to obtain information, and to perform specific functions, as well.
Natural speech queries are converted to the required data forms using advanced query applications and a natural language engine. Based on the data form content, the Mobile Agent system database in searched for the required action, process, or information item. The user's request is duly processed, and the results of the query are relayed to the user in an audio format.
Requesting a specific tool serves as a cue to the system as to which operations are to be executed using the user-input data. For example, if a user wants to convert five dollars to yen, the user may ask for the exchange rate using natural language, or alternatively, the user may access the currency exchange tool and then directly convert the sum.
the General Information Services include but are not limited to the following services:
a dictionary a thesaurus, translation tools and grammar tools; • A calculator, calendar information, clock information, weather information, currency exchange, and conversion tables;
a medical information guide
Classified advertisements A portion of the General Information service functions are location sensitive, providing information and services particular to the user's location. Yellow and white page directories, TV and movie listings, and classified advertisements are among the information categories that are organized according to the user's location.
the system enables phone users to interact with traditionally two-dimensional and one-way media sources, such as radio, television and print media.
traditionally two-dimensional and one-way media sources such as radio, television and print media.
cellular media hyperlinks allow phone users to hyperlink to radio and TV broadcasts and billboard and magazine adverts.
radio listeners will be able to buy any song that was recently played, or find out more information about any product, which they have just heard advertised, and interactive radio commercials are made possible.
the system includes hyperlinks relevant to each of the broadcast tracks.
the listener Upon hearing a song or advert which s/he liked, the listener will access a CellCAST's WAP or Web site, and select recent item type the user is interested in Music, Commercials , or News. the user is then presented with the recent music tracks or commercials played on the radio, and obtain additional information on the item or buy a related product (e.g. music CD).
a related product e.g. music CD
Cellcast will be able to connect the listener directly with the advertiser.
users may make purchases and conduct financial transactions via the "Get Me” services of the application server 6.
the Companion functions as an agent empowered to buy and sell products according to the user's commands.
"Get me” services are usually but not necessarily commercially oriented; some "get me” functions do not involve financial transactions.
a subscriber may request, "Get me a copy of tonight's news audio transmission.”
the system might be able to provide a video-file of the news free of charge.
a subscriber may request a copy of today's Howard Stern radio show, or a copy of the song played on the show, or one of the items mentioned in the show or the advertisement played on the show.
a natural speech engine in the appropriate speech recognition server 3 processes users' requests. Human voice commands are converted to data forms recognized by the system, and the users' command is then extracted and processed accordingly. The task is executed after user confirmation without further human intervention.
"Get me” services may connect users to sales representatives instead making purchases. Users may choose to grant agency to the system, or may choose to be conduct transactions personally after having been connected to the appropriate number.
users are billed for purchases and transactions via the cellular operator.
users are billed via the Credit Company of the user's choice.
Get me operations may be executed according to a profile of pre-defined user- preferences or system options. For example, if a user request, "get me a pizza,” the system may "know” to order from a specific establishment according to the user's preset preferences. Alternatively, the system may "remember” that the user always orders pizza from a specific establishment, and will proceed with the operation accordingly.
Advertising items linked to this function may offer services that put the means of carrying out the transaction at the user's immediate disposal. For example, if the user requests, "get me a pizza," the user hears an advertisement suggesting that the order be from a particular establishment. If the appropriate voice commands are used, the operation will be executed at once.
Possible transactions include, but are not limited to the following:
Files are synchronized with those on the user's own personal computer 412, and server-side application settings may be synchronized with those on the user's PC 413, as well.
(2) synchronization occurs according to a pre-defined schedule (e.g., every two hours) using dial up internet connection, leased line, frame relay, or any other form of internet connectivity;
the Virtual PC allows mobile users to access files and run applications on the server-side using files from the user's personal computer, once synchronization has occurred. Access to the synchronized file servers is made via the cellular phone and the mobile Agent system of via the web directly to the Mobile Agent file servers.
the application server platform enables the addition of third-party applications, so that they may be run or accessed using the cellular handset via the Mobile Agent.
Users may access any information item by category, specific properties, name or location, and by topic. Users create their own directories and sub-directories for storing and organizing information (see FIG 7). Users can receive specifically requested articles or other information items directly into specified folders. Information is stored in the user database as pointers to stories on the in the relevant database. Information includes personal data (contacts, appointments) saved pointers to Cellcast articles, saved pointers to Road directions, saved answers to user-initiated queries, and the results of user-initiated searches. Each separate information item is tagged, coded, sorted, and stored on the system database.
Sites using VXML protocol can be played via the seamlessly operating natural language engine 4. Access to other sites is enabled using a text to speech engine of speech recognition server 4 and converting the text to synthesized digital audio signals, which are sent to a mobile station via the cellular network.
Web interface includes the following options: 1. "Browse" the Mobile Agent indices of web sites by topic heading.
Indices of available topics are read as lists upon user request. Users may browse categories and subcategories of web sites rather than conduct a search.
a search as a default mode of operations.
a user seeking stories, news, or information items about the Washington Redskins may conduct a search any time she is on the line; or, she may set a search as a default mode such that search results are a standard part of her personal audio transmissions.
users may access a search engine using voice commands.
the search interface comprises of the following options: (1) search the system database; (2) Search the World Wide Web; (3)
Searches may be conducted by keyword or by topic. If by keyword, files are searched for all occurrences of the specified word. If by topic, then files are searched according to tags. Human editors assign the tags as part of the categorization and filtration process; all articles are tagged according to topic. Each article generally has a number of tags assigned to it.
the Mobile Agent features book-marking capacities, tracks changes users make, records history, and saves user favorites.
the same services are available using packet switching such as in GPRS systems whereby the user downloads data packets that are assembled at the user-side, thus comprising full data files.
packet switching such as in GPRS systems whereby the user downloads data packets that are assembled at the user-side, thus comprising full data files.
saving information both on the user handset and at the switch would be possible, and more efficient downloading of certain information would be possible sending information packets to the mobile handset while the user is in network cells with a cell capacity load not exceeding a predetermined threshold, thus not creating a burden on frequency reuse.
Users may obtain location-sensitive road directions to and from any point (within designated areas) by connecting to the road direction application of the application server 6 and the relevant database.
Directions are comprised of a series of incremental pre-recorded audio segments corresponding to segments of the selected route, linked together sequentially.
Road directions are based on the user's present location and the user's desired destination. Using this data, a search is automatically conducted for directions.
users input their present location using simple speech commands, following voice-prompts.
the user's location information is located on the server's map and cross-referenced with a database on the Mobile Agent system. If any information is incomplete or inconsistent, the user will be prompted to provide additional details relating to the present location.
the user's location is determined using a combination of voice-input by the user and network information. For example, the user may indicate that he is presently located on Main Street, and the network information services will indicate in which city this particular Main Street is located.
the cellular network may determine the user's present location using GPS reports or any other tracking systems or techniques such as triangulation.
the user enters the desired destination information, still using voice prompts.
Users enter specific destination information, such as a specific street address, or general destination information, such as "the nearest hospital.” Users may enter multiple destinations, and may specify an order in which to reach those destinations.
Directions are suitable for pedestrians as well as drivers. Information may be customized to suit drivers of vehicles with special requirements, such as trucks, vehicles carrying hazardous materials, or lightweight vehicles such as bicycles and mopeds. Directions for using available public transportation systems are available, as well, including relevant timetables. Road directions may accommodate combinations of various forms of transportation, such as walking, biking, and subway riding, for example.
the road direction function is linked to a constantly updating traffic information database. Users' requests for directions are accompanied by relevant information relating to traffic congestion, road construction, and driving conditions.
directions Once directions are found and compiled, users may listen to directions all at one time or in segments en route to the destination. The user may save road directions on his personal hard disk for future reference.
advertisements targeted to the user and relevant to the user's location are played.
the advertisements are targeted to users based on their individual profiles, and are also targeted to users based on their location, destination, route, and mode of transportation. For example, a driver may hear an advertisement referring them to a car wash on his route, while a pedestrian may hear an advertisement for a restaurant on his way.
news and information items requested by the user are also played.
the user may also listen to road directions in conjunction with personal audio transmissions or while retrieving any other information.
the road direction function works in conjunction with a location information function.
the location information feature allows users to obtain specific information related to either their present location or intended destination.
Location information is linked to local directory assistance functions, as well. Location information includes but is not limited to the locations of: hospitals, pharmacies, restaurants, movie theatres, airports, train and bus stations, museums, and shopping facilities.
Road directions may be used to plan trips, to find alternate routes to clogged or congested ones, and in emergencies.
FIG. 8a, 8b illustrate an example of an overall operation involving accessing the system to obtain both personal cellular assistant (PCA) services and road direction services.
PCA personal cellular assistant
Help Functions are available from any of the functions and services accessed via the Mobile Agent. Upon request for help, users are told which commands are relevant or appropriate at present; users may request help for a specific topic; or users may receive explanations of specific features and functions. The help received depends on both the specific commands given by the user and the point in the audio transmission from which help is entered.
Human help is also available. Users may request the system help desk in order to access human help.
Reports In the preferred embodiment, users may obtain reports of all activity conducted via the Mobile Agent. Reports contain all relevant information pertaining to Cellcast reports, general inquiries, road direction inquiries, Personal Cellular Assistant functions, and "get me" functions. Reports provide an accurate record of activities including but not limited to calls, transactions, and information sent and received via the Mobile Agent.
Reports may be delivered as text or as audio files.
a user might request a textual report of all messages sent via the Mobile Agent over a period of 14 days, to be sent as an e-mail to the user.

Landscapes

Engineering & Computer Science (AREA)
Theoretical Computer Science (AREA)
General Engineering & Computer Science (AREA)
Data Mining & Analysis (AREA)
Databases & Information Systems (AREA)
Physics & Mathematics (AREA)
General Physics & Mathematics (AREA)
Multimedia (AREA)
Computational Linguistics (AREA)
Artificial Intelligence (AREA)
Library & Information Science (AREA)
Information Transfer Between Computers (AREA)
Telephonic Communication Services (AREA)

EP00921017A 1999-04-29 2000-04-30 Informations-wiedergewinnungs-system Withdrawn EP1221160A2 (de)

Applications Claiming Priority (3)

Application Number	Priority Date	Filing Date	Title
US13149199P	1999-04-29	1999-04-29
US131491P		1999-04-29
PCT/IL2000/000246 WO2000067091A2 (en)	1999-04-29	2000-04-30	Speech recognition interface with natural language engine for audio information retrieval over cellular network

Publications (1)

Publication Number	Publication Date
EP1221160A2 true EP1221160A2 (de)	2002-07-10

Family

ID=22449700

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
EP00921017A Withdrawn EP1221160A2 (de)	1999-04-29	2000-04-30	Informations-wiedergewinnungs-system

Country Status (3)

Country	Link
EP (1)	EP1221160A2 (de)
AU (1)	AU4141400A (de)
WO (1)	WO2000067091A2 (de)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
FI20001918A (fi)	2000-08-30	2002-03-01	Nokia Corp	Monimodaalinen sisältö ja automaattinen puheen tunnistus langattomassa tietoliikennejärjestelmässä
EP1217542A1 (de) *	2000-12-21	2002-06-26	Motorola, Inc.	Kommunikationssystem, Kommunikationseinheit und Verfahren zum Personalisieren von Kommunikationsdiensten
SE0101937L (sv) *	2001-06-01	2003-01-30	Newmad Technologies Ab	Ett system och en metod för hantering av data mellan en serverinfrastruktur och mobila användarklienter
JP4711099B2 (ja) *	2001-06-26	2011-06-29	ソニー株式会社	送信装置および送信方法、送受信装置および送受信方法、並びにプログラムおよび記録媒体
EP1271345A1 (de) *	2001-06-27	2003-01-02	econe AG	Telekommunikationssystem zum Bereitstellen vertonter Artikel in einem Datennetz
US7613636B2 (en) *	2003-03-03	2009-11-03	Ipdev Co.	Rapid entry system for the placement of orders via the Internet
DE10344347A1 (de) *	2003-09-24	2005-05-12	Siemens Ag	Verfahren und eine Vorrichtung zum Aktualisieren von Reise- und/oder Positionsdaten eines mobilen Endgeräts auf einem Reise-Agenten-Server
US10635723B2 (en)	2004-02-15	2020-04-28	Google Llc	Search engines and systems with handheld document data capture devices
US7990556B2 (en)	2004-12-03	2011-08-02	Google Inc.	Association of a portable scanner with input/output and storage devices
US9116890B2 (en)	2004-04-01	2015-08-25	Google Inc.	Triggering actions in response to optically or acoustically capturing keywords from a rendered document
US9143638B2 (en)	2004-04-01	2015-09-22	Google Inc.	Data capture from rendered documents using handheld device
US9008447B2 (en)	2004-04-01	2015-04-14	Google Inc.	Method and system for character recognition
US8874504B2 (en)	2004-12-03	2014-10-28	Google Inc.	Processing techniques for visual capture data from a rendered document
US8620083B2 (en)	2004-12-03	2013-12-31	Google Inc.	Method and system for character recognition
EP1751916A1 (de)	2004-05-21	2007-02-14	Cablesedge Software Inc.	Fernzugriffssystem und verfahren und intelligenter agent dafür
US8346620B2 (en)	2004-07-19	2013-01-01	Google Inc.	Automatic modification of web pages
US20060067497A1 (en) *	2004-09-27	2006-03-30	Avaya Technology Corp	Dialog-based content delivery
WO2010105244A2 (en)	2009-03-12	2010-09-16	Exbiblio B.V.	Performing actions based on capturing information from rendered documents, such as documents under copyright
US8447066B2 (en)	2009-03-12	2013-05-21	Google Inc.	Performing actions based on capturing information from rendered documents, such as documents under copyright
US9081799B2 (en)	2009-12-04	2015-07-14	Google Inc.	Using gestalt information to identify locations in printed information
US9323784B2 (en)	2009-12-09	2016-04-26	Google Inc.	Image search using text-based elements within the contents of images
US10255914B2 (en)	2012-03-30	2019-04-09	Michael Boukadakis	Digital concierge and method
WO2013187610A1 (en) *	2012-06-15	2013-12-19	Samsung Electronics Co., Ltd.	Terminal apparatus and control method thereof
KR102304052B1 (ko)	2014-09-05	2021-09-23	엘지전자 주식회사	디스플레이 장치 및 그의 동작 방법

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US5283833A (en) *	1991-09-19	1994-02-01	At&T Bell Laboratories	Method and apparatus for speech processing using morphology and rhyming
DE69232407T2 (de) *	1991-11-18	2002-09-12	Kabushiki Kaisha Toshiba, Kawasaki	Sprach-Dialog-System zur Erleichterung von Rechner-Mensch-Wechselwirkung
US5386494A (en) *	1991-12-06	1995-01-31	Apple Computer, Inc.	Method and apparatus for controlling a speech recognition function using a cursor control device
US5511213A (en) *	1992-05-08	1996-04-23	Correa; Nelson	Associative memory processor architecture for the efficient execution of parsing algorithms for natural language processing and pattern recognition
US5465401A (en) *	1992-12-15	1995-11-07	Texas Instruments Incorporated	Communication system and methods for enhanced information transfer
US5335276A (en) *	1992-12-16	1994-08-02	Texas Instruments Incorporated	Communication system and methods for enhanced information transfer
US5594779A (en) *	1995-01-12	1997-01-14	Bell Atlantic	Mobile audio program selection system using public switched telephone network
US5874954A (en) *	1996-04-23	1999-02-23	Roku Technologies, L.L.C.	Centricity-based interface and method
GB2320595B (en) *	1996-12-21	2001-02-21	Int Computers Ltd	Network access control
US5897616A (en) *	1997-06-11	1999-04-27	International Business Machines Corporation	Apparatus and methods for speaker verification/identification/classification employing non-acoustic and/or acoustic models and databases

2000
- 2000-04-30 AU AU41414/00A patent/AU4141400A/en not_active Abandoned
- 2000-04-30 EP EP00921017A patent/EP1221160A2/de not_active Withdrawn
- 2000-04-30 WO PCT/IL2000/000246 patent/WO2000067091A2/en not_active Application Discontinuation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0067091A2 *

Also Published As

Publication number	Publication date
AU4141400A (en)	2000-11-17
WO2000067091A3 (en)	2002-05-02
WO2000067091A2 (en)	2000-11-09

Publication	Publication Date	Title
WO2000067091A2 (en)	2000-11-09	Speech recognition interface with natural language engine for audio information retrieval over cellular network
US7844215B2 (en)	2010-11-30	Mobile audio content delivery system
KR100798574B1 (ko)	2008-01-28	위치 기반 서비스 시스템을 위한 광고 캠페인 및 비즈니스 목록
CA2400073C (en)	2008-09-16	System and method for voice access to internet-based information
US8868589B2 (en)	2014-10-21	System and method for the transformation and canonicalization of semantically structured data
EP1269732B1 (de)	2009-05-06	Interaktion mit einem datennetzwerk mittels eines telephonapparates
US7103563B1 (en)	2006-09-05	System and method for advertising with an internet voice portal
KR100585347B1 (ko)	2006-06-01	위치 기반의 서비스 제공 방법 및 위치 기반의 서비스 시스템
US7028252B1 (en)	2006-04-11	System and method for construction, storage, and transport of presentation-independent multi-media content
US8271331B2 (en)	2012-09-18	Integrated, interactive telephone and computer network communications system
US20080154870A1 (en)	2008-06-26	Collection and use of side information in voice-mediated mobile search
US20110166860A1 (en)	2011-07-07	Spoken mobile engine
US20100174544A1 (en)	2010-07-08	System, method and end-user device for vocal delivery of textual data
US20130142321A1 (en)	2013-06-06	Enhanced Directory Assistance Services in a Telecommunications Network
US20020080927A1 (en)	2002-06-27	System and method for providing and using universally accessible voice and speech data files
US20120166202A1 (en)	2012-06-28	System and method for funneling user responses in an internet voice portal system to determine a desired item or servicebackground of the invention
EP2165437A2 (de)	2010-03-24	Darstellung von inhalten einer mobilen kommunikationseinrichtung auf basis von kontext- und verhaltensdaten mit bezug auf einen teil eines mobilen inhalts
WO2008083173A2 (en)	2008-07-10	Local storage and use of search results for voice-enabled mobile communications devices
US20070208564A1 (en)	2007-09-06	Telephone based search system
WO2008083175A2 (en)	2008-07-10	On a mobile device tracking use of search results delivered to the mobile device
CA2596456C (en)	2015-11-24	Mobile audio content delivery system
EP2130359A2 (de)	2009-12-09	Integrierte sprachsuchkommandos für mobile kommunikationsvorrichtungen
JP2004504654A (ja)	2004-02-12	ウェブに基づく情報の変換に使用されるルールのプログラミングによらない開発システムおよび方法

Legal Events

Date	Code	Title	Description
2002-05-24	PUAI	Public reference made under article 153(3) epc to a published international application that has entered the european phase	Free format text: ORIGINAL CODE: 0009012
2002-07-10	17P	Request for examination filed	Effective date: 20010129
2002-07-10	AK	Designated contracting states	Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE
2002-07-10	AX	Request for extension of the european patent	Free format text: AL;LT;LV;MK;RO;SI
2003-03-28	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN
2003-05-14	18D	Application deemed to be withdrawn	Effective date: 20030402