CN115150501A

CN115150501A - Voice interaction method and electronic equipment

Info

Publication number: CN115150501A
Application number: CN202110343786.7A
Authority: CN
Inventors: 黄益贵; 乔登龙
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2022-10-04
Also published as: WO2022206704A1

Abstract

The application provides a voice interaction method and electronic equipment, relates to the technical field of terminals, and can continue to realize a voice connection function after a voice application jumps to other applications, so that the use experience of a user during voice interaction is improved. The method comprises the following steps: the electronic device displays a conversation interface for the voice application, the conversation interface is used for displaying conversation content between a user and the voice application; subsequently, when the electronic equipment detects the first voice input by the user, the first voice can be converted into first dialogue content in a text form; furthermore, the electronic device can acquire a corresponding first link according to the first conversation content; skipping from the conversation interface of the voice application to the interface of the first application according to the first link; when the mobile phone quits displaying the interface of the first application, the electronic equipment can jump back to the conversation interface according to the conversation identification corresponding to the first conversation content.

Description

Voice interaction method and electronic equipment

Technical Field

The present application relates to the field of terminal technologies, and in particular, to a voice interaction method and an electronic device.

Background

With the development of voice recognition technology, voice assistant APPs (e.g., siri, lovely classmates, little E, etc.) are added to many electronic devices to help users complete the human-computer interaction process with the electronic devices. Generally, after the user wakes up the voice assistant APP in the electronic device, the voice assistant APP can answer or execute each voice instruction issued by the user.

In some scenarios, the voice assistant APP has a function of voice continuation when performing voice interaction with the user, so that multiple rounds of conversations between the user and the voice assistant APP are realized. For example, after a user wakes up a voice assistant APP in a mobile phone, the voice assistant APP can display the queried first information in a display interface of the mobile phone in the form of a card and the like after receiving a first voice instruction of "how to look like the weather today". Subsequently, if the voice assistant APP receives a second voice instruction of the open air, the voice assistant APP can continuously inquire weather information (namely, second information) of the open air, and further continuously display the inquired second information in a display interface of the mobile phone, so that a voice continuing function is realized.

In other scenarios, the voice assistant APP may provide the corresponding service to the user by jumping to an application interface of another application when answering the voice instruction of the user. For example, if the voice assistant APP detects a third voice instruction of "my want to take away", the voice assistant APP may jump to a search interface of the take away APP so that the user may complete an operation related to "my want to take away" in the take away APP. Generally, after the voice assistant APP jumps to other applications, the voice assistant APP ends the session with the user. At the moment, the voice assistant APP can be closed or switched to the mobile phone background to operate, so that the subsequent voice assistant APP cannot continue to interact with the user, the voice connection function cannot be realized, and the use experience of the user during voice interaction is reduced.

Disclosure of Invention

The application provides a voice interaction method and electronic equipment, which can continuously realize a voice continuing function after a voice assistant APP skips to other applications, and improve the use experience of a user during voice interaction.

In order to achieve the purpose, the technical scheme is as follows:

in a first aspect, the present application provides a voice interaction method, including: the electronic equipment displays a conversation interface of the voice application, wherein the conversation interface is used for displaying conversation content between a user and the voice application; subsequently, when the electronic equipment detects the first voice input by the user, the first voice can be converted into first dialogue content in a text form; further, it is possible to prevent the occurrence of, the electronic equipment can acquire a corresponding first link according to the first conversation content; skipping from the conversation interface of the voice application to the interface of the first application according to the first link; when the handset exits displaying the interface of the first application, the electronic device can jump back to the conversation interface based on the conversation identification corresponding to the first conversation content. Therefore, the user can continue to have a conversation with the voice application in the previous conversation interface, and the continuing function of the conversation content during voice interaction is realized, so that the use experience of the user is improved.

In one possible implementation manner, after the electronic device converts the first voice into the first dialog content, the method further includes: the electronic equipment sends a first request message to the first server, wherein the first request message can comprise first conversation content, so that the first server determines a conversation identifier of the first conversation content and the first link in response to the first request message; at this time, the electronic device acquires the first link according to the first dialogue content, including: the electronic equipment receives a first response message sent by the first server, wherein the first response message comprises the first link and the session identifier. For example, the first server may send the link of the first link (i.e., a certain interface in the first application) and the session identifier of the first session content as a first response message to the electronic device after splicing. In this way, the voice application in the electronic device may obtain the session identifier and the first link of the first session content.

In one possible implementation manner, the electronic device jumps back to the session interface according to the session identifier corresponding to the first session content, including: a first application running in the electronic device may first pull up a voice application; furthermore, after the voice application is pulled up, the voice application in the electronic device can display the conversation content corresponding to the conversation identifier according to the conversation identifier, so that the conversation interface displayed by the voice application before jumping to the first application is recovered.

Illustratively, a first application in the electronic device pulls up a voice application, comprising: when the electronic equipment quits displaying the interface of the first application, the first application can splice the link of the voice application and the session identification into a second link, then the first application can pull up the voice application according to the second link, and meanwhile, the session identification is transmitted to the voice application through the second link, so that the voice application can recover and jump to the session interface displayed by the voice application before the first application according to the session identification.

Illustratively, a voice application of the electronic device displays dialog content corresponding to the session identification, including: after the voice application acquires the session identifier through the second link, whether the electronic equipment stores the session content corresponding to the session identifier or not can be inquired; if the electronic equipment stores the conversation content corresponding to the conversation identifier, the voice application can display the conversation content corresponding to the conversation identifier in a conversation interface; if the electronic device does not store the conversation content corresponding to the conversation identifier, the voice application can acquire the conversation content corresponding to the conversation identifier from the first server and display the conversation content corresponding to the conversation identifier in the conversation interface.

In a possible implementation manner, the first request message may further include a device identifier (e.g., UUID), where the device identifier is used to confirm whether the electronic device logs in the first application.

For example, if the electronic device is not logged in the first application, the first link may include a link to a login page in the first application. At this time, the login page of the first application is service content (or service resource) corresponding to the first dialog content. For another example, if the electronic device does not log in the first application, the first response message returned by the first server may include, in addition to the first link (i.e., the link of the login page), a search result based on the first dialog content when the electronic device does not log in the first application. For another example, if the electronic device logs in the first application, the first server may query a search result corresponding to the first session content based on the logged-in account information, and then return the search result to the electronic device by carrying the search result in the first response message.

In one possible implementation manner, if the electronic device does not log in the first application, the electronic device jumps from the session interface of the voice application to the interface of the first application according to the first link, including: and the voice application of the electronic equipment pulls up the first application according to the first link and displays a login page of the first application. At this time, the electronic device jumps from the conversation interface of the voice application to the interface of the first application.

In one possible implementation manner, after the electronic device displays the login page of the first application, the method further includes: the method comprises the steps that the electronic equipment receives login operation input by a user in a login page, wherein the login operation is used for authorizing the electronic equipment to login a user account of a first application; at this time, the electronic device jumps back to the session interface according to the session identifier corresponding to the first session content, including: if a login success message corresponding to the login operation is received, the electronic equipment can be triggered to quit displaying the interface of the first application, at this time, the electronic device may jump back to the session interface from the interface of the first application according to the session identifier.

In a possible implementation manner, after the electronic device jumps back to the session interface according to the session identifier, the method further includes: the electronic equipment can request the first server to obtain the first service content corresponding to the first conversation content, at the moment, because the user logs in the first application, the first service content which can be obtained by the first server is related to the account information of the user after the user logs in the first application, and then the electronic equipment can display the first service content in a conversation interface, so that more targeted and more accurate service resources are recommended for the user, and the use experience of the user during voice interaction is improved.

In a possible implementation manner, if the electronic device does not log in the first application, the method further includes: the electronic equipment acquires second service content corresponding to the first conversation content, wherein the second service content is service content corresponding to the first conversation content when the first application is not logged in; at this time, before the electronic device jumps from the conversation interface of the voice application to the interface of the first application according to the first link, the method further includes: the electronic device displays the second service content and a link to a landing page in the first application in the session interface. In this way, the user can acquire the corresponding second service content in the session interface even in a state where the user is not logged in.

In this scenario, the electronic device jumps from the conversation interface of the voice application to the interface of the first application according to the first link, including: if the link of the login page selected by the user in the session interface is detected, the electronic equipment can jump to the login page of the first application from the session interface according to the link of the login page; or, if the second voice input by the user is used for indicating to log in the first application, the electronic device may jump from the session interface to the login page of the first application according to the link of the login page.

In one possible implementation manner, after the electronic device jumps from the conversation interface of the voice application to the interface of the first application according to the first link, the method further includes: the electronic equipment can start a preset timer; when the timer times out, the electronic device jumps back from the interface of the first application to the conversation interface. That is, after the electronic device jumps from the voice application to the first application, the electronic device may stay in the first application for a preset time, and when the timer times out, the electronic device may automatically jump back to the session interface of the voice application, and the user may continue voice interaction in the session interface.

In one possible implementation manner, after the electronic device jumps from the conversation interface of the voice application to the interface displaying the first application according to the first link, the method further includes: the electronic device may switch the voice application to background operation.

In a second aspect, the present application provides a voice interaction method, including: the first server can receive a first request message sent by the electronic equipment, wherein the first request message comprises first dialogue content; in response to the first request message, the first server may obtain a session identification and a first link of the first session content; furthermore, the first server sends a first response message to the electronic device, wherein the first response message comprises a session identifier and a first link, so that the electronic device can jump back to an interface of a voice application displaying the first conversation content from an interface of the first application corresponding to the first link according to the session identifier, and a voice connection function is realized.

In a possible implementation manner, the obtaining, by the first server, the session identifier and the first link of the first session content includes: the first server distributes session identification for the first session content; the first server obtains a first link corresponding to the first dialog content from a second server of the first application.

In one possible implementation manner, the allocating, by the first server, a session identifier for the first session content includes: the first server identifies semantics of the first dialogue content, for example, slot position information and intention of the first dialogue content are extracted; further, the first server may assign a session identification to the first conversational content according to semantics of the first conversational content. In some embodiments, the electronic device may also assign a session identifier to the first session content, and in this case, the electronic device does not need to obtain the session identifier of the current session content from the first server.

In a possible implementation manner, the acquiring, by a first server, a first link corresponding to a first service content from a second server of a first application includes: the first server sends a first message to the second server, wherein the first message can comprise the session identifier and the semantics of the first session content, so that the second server determines a first link according to the semantics of the first session content, and a corresponding relation between the first link and the session identifier is established; furthermore, the first server may receive a second message sent by the second server, where the second message includes the session identifier and the first link, that is, the first server obtains the first link through interaction with the second server (the server of the first application).

In a possible implementation manner, the first request message may include a device identifier of the electronic device; at this time, a first message from the first server to the second server may also carry the device identifier, so that the second server determines whether the electronic device logs in the first application according to the device identifier; if the electronic device does not log in the first application, the second server can determine the link of the log-in page in the first application as the first link.

In a possible implementation manner, after the first server sends the first response message to the electronic device, the method further includes: the method comprises the steps that a first server receives a conversation recovery message sent by electronic equipment, wherein the conversation recovery message comprises a conversation identifier and an equipment identifier of first conversation content; in response to the session recovery message, the first server may query, in the second server, whether the electronic device logs in the first application according to the device identifier; if the electronic equipment logs in the first application, the first server can acquire first service content corresponding to the first conversation content from the second server, and the first service content is associated with account information of a user after logging in the first application; the first server sends the first service content to the electronic equipment to be displayed.

In a possible implementation manner, if the electronic device does not log in the first application, the first response message may further include second service content, where the second service content is service content corresponding to the first session content when the electronic device does not log in the first application. In this way, the user can acquire the corresponding second service content in the session interface even in a state where the user is not logged in.

In a possible implementation manner, the first response message may further include a timeout time of the first link. Therefore, when the electronic equipment displays the interface of the first application, the corresponding timer can be set according to the overtime time, and after the timer is overtime, the electronic equipment is triggered to automatically jump back to the conversation interface of the voice application.

In a third aspect, the present application provides a voice interaction method, including: the electronic equipment can display a conversation interface of the voice application, wherein the conversation interface is used for displaying conversation content between a user and the voice application; after detecting a first voice input by a user, the electronic equipment can convert the first voice into first conversation content; furthermore, the electronic equipment sends a first request message to the first server, wherein the first request message comprises the first conversation content; in response to the first request message, the first server may obtain a session identification and a first link of the first session content; the session identifier and the first link are carried in a first response message and returned to the electronic equipment; furthermore, the electronic device can jump from the session interface to the interface of the first application according to the first link; after displaying the interface of the first application, the electronic device may jump back to the session interface according to the session identifier.

That is to say, when the user performs voice interaction with the voice application, and the voice application, the first application, and each server perform interaction, the session identifier of the current session content can be transmitted as a carried parameter in each interaction process. Therefore, when the server provides the service resource corresponding to the conversation content for the voice application, the corresponding relation between the service resource and the corresponding conversation identifier can be established, so that when the voice application jumps to a third party application providing the service resource, the third party application can also obtain the service resource and the corresponding conversation identifier, the voice application can be pulled up again according to the conversation identifier and a corresponding conversation interface can be recovered when the third party application exits, the voice application can still jump back to the conversation interface after jumping to the third party application to realize a voice continuing function, and the use experience of a user is improved.

In one possible implementation, the obtaining, by the first server, the first link includes: the first server acquires the first link from the second server according to the first conversation content, and the second server is a server corresponding to the first application.

In a possible implementation manner, the first request message further includes a device identifier of the electronic device; at this time, the first server may also send the device identifier to the second server; the second server can determine whether the electronic equipment logs in the first application or not according to the equipment identification; if the electronic equipment does not log in the first application, the first link comprises a link of a log-in page in the first application.

In one possible implementation manner, after the electronic device jumps from the session interface to the interface of the first application according to the first link, the method further includes: responding to a login operation input by a user on an interface of a first application, and sending a login request to a first server by the electronic equipment, wherein the login request comprises an equipment identifier; responding to the login request, the first server further requests the second server to mark the equipment identification as a login state; further, the first server sends a login success message to the electronic equipment; at this time, the electronic device jumps back to the session interface according to the session identifier, including: and responding to the login success message, and the electronic equipment jumps back to the session interface from the interface of the first application according to the session identification.

In one possible implementation manner, after the electronic device jumps back to the session interface from the interface of the first application according to the session identifier, the method further includes: the electronic equipment sends a session recovery message to the first server, wherein the session recovery message comprises a session identifier and an equipment identifier; if the electronic equipment logs in the first application, the first server can respond to the session recovery message, obtain first service content corresponding to the first session content from the second server, and send the first service content to the electronic equipment, at this time, the first service content is associated with account information of the user after logging in the first application; subsequently, the electronic device can display the first service content in the session interface.

In a possible implementation manner, if the electronic device does not log in the first application, the method further includes: the first server acquires second service content from the second server, wherein the second service content is service content corresponding to the first conversation content when the first application is not logged in; before the electronic device jumps from the session interface to the interface of the first application according to the first link, the method further includes: the electronic device can display the second service content in the session interface.

In one possible implementation manner, the electronic device jumps back to the session interface according to the session identifier, including: the method comprises the steps that a first application in the electronic equipment pulls up a voice application according to a second link, wherein the second link comprises a link of the voice application and a session identifier; after the voice application is pulled up, the voice application in the electronic device may display the conversation content corresponding to the conversation identification.

In a fourth aspect, the present application provides an electronic device comprising: a memory, a display screen, and one or more processors; the memory, the display screen and the processor are coupled. Wherein the memory is to store computer program code, the computer program code comprising computer instructions; the processor is configured to execute the one or more computer instructions stored by the memory when the electronic device is operating to cause the electronic device to perform the voice interaction method as described in any one of the above.

In a fifth aspect, the present application provides a server, comprising: a processor, memory, a communication module, and one or more computer programs; wherein the processor is coupled to both the communication module and the memory, the one or more computer programs being stored in the memory, and when the server is running, the processor executes the one or more computer programs stored in the memory to cause the server to perform any of the voice interaction methods described above.

In a sixth aspect, the present application provides a voice interaction system, which includes the above electronic device and a server, and when the electronic device and the server interact with each other, the voice interaction method according to any one of the above third aspects may be executed.

In a seventh aspect, the present application provides a computer storage medium, which includes computer instructions, when the computer instructions are executed on an electronic device (or a server), the method for voice interaction is executed on the electronic device (or the server).

In an eighth aspect, the present application provides a computer program product, which when run on an electronic device (or server), causes any one of the above-mentioned voice interaction methods to be performed on the electronic device (or server).

It can be understood that the electronic device, the server, the voice interaction system, the computer storage medium and the computer program product provided above are all configured to execute the corresponding methods provided above, and therefore, the beneficial effects achieved by the electronic device, the server, the voice interaction system, the computer storage medium and the computer program product may refer to the beneficial effects in the corresponding methods provided above, and are not described herein again.

Drawings

Fig. 1 is a schematic architecture diagram of a voice interaction system according to an embodiment of the present application;

FIG. 2 is a schematic flow chart illustrating a voice interaction between an electronic device and a server according to the prior art;

fig. 3 is a schematic flowchart illustrating a voice interaction between an electronic device and a server according to an embodiment of the present disclosure;

fig. 4 is a first schematic structural diagram of an electronic device according to an embodiment of the present disclosure;

fig. 5 is an interaction flow diagram of a voice interaction method according to an embodiment of the present application;

fig. 6 is a first schematic view of an application scenario of a voice interaction method according to an embodiment of the present application;

fig. 7 is a schematic view of an application scenario of a voice interaction method according to an embodiment of the present application;

fig. 8 is a schematic view of an application scenario of a voice interaction method according to an embodiment of the present application;

fig. 9 is a schematic view of an application scenario of a voice interaction method according to an embodiment of the present application;

fig. 10 is a schematic view of an application scenario of a voice interaction method according to an embodiment of the present application;

fig. 11 is a schematic view sixth of an application scenario of a voice interaction method according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

To facilitate a clear understanding of the following embodiments, a brief description of relevant terms in speech recognition technology is first given:

intent (intent): each speech input by the user generally corresponds to an intention of the user. The intent is that a set of one or more expressions, such as "i want to watch a movie" and "i want to watch an action movie taken in Liudebua 2001" may all belong to the same video playback intent.

Skill (kill): one skill can cover one or more intents. For example, a developer of take-away APP may create skill 1 named "take-over-point" in a voice platform (or called a voice open platform) provided by an electronic device vendor. Skill 1 may cover multiple intents of search, payment, and navigation. Subsequently, after the electronic device receives the voice input 1, if the voice input 1 is recognized to contain a keyword related to "take away", it can be determined that the voice input 1 is associated with skill 1, which can also be referred to as that the voice input 1 hits skill 1.

Slot (slot) information: the slot position information is key information used for expressing the intention in the voice input of the user, and directly determines whether the electronic equipment (or the server) can be matched with the correct intention. One slot corresponds to a keyword of one type of attribute, and information in the slot (i.e., slot information) can be filled with the same type of keyword. For example, the query pattern corresponding to the intent of the song to play may be "song" of { singer } ". Where { singer } is the slot for the singer and { song } is the slot for the song. Then, if a voice input of "i want to listen to the red beans of royal phenanthrene" is received from the user, the electronic device (or the server) may extract slot information in the slot of { singer } from the voice input as: the slot information in this slot of royal, song is: red beans. Thus, the electronic device (or the server) can recognize that the user intention of the voice input is as follows according to the two slot position information: the song is played.

And session identification: a session process may include one or more rounds of dialog between a user and an electronic device. Each session may be identified by a corresponding session identification. For example, the session identification may be sessionId or dialogId, etc. In general, a session process may correspond to one skill. For example, after the user inputs the skill of hit point takeout by voice, the electronic device may assign sessionId 1 to the session process, and subsequently, the conversations between the user and the electronic device in the skill of hit point takeout all belong to session 1, and the session identifiers are sessionId 1.

When the user performs voice interaction with the electronic equipment, after the electronic equipment receives the voice input of the user every time, the specific skills related to the voice input can be determined through interaction with the server. Further, based on the determined specific skill, the server may extract the intention and the slot position information corresponding to the current voice input, and instruct the electronic device to execute an operation instruction corresponding to the intention and the slot position information. For example, the operation instruction may be an instruction to display a card, play a voice, jump to another application, or control the smart home device, so as to complete a response to the voice input of the user.

Embodiments of the present embodiment will be described in detail below with reference to the accompanying drawings.

Illustratively, a voice interaction method provided by the embodiment of the present application may be applied to the voice interaction system 100 shown in fig. 1. The voice interaction system 100 may include an electronic device 101 and at least one server 102.

A voice assistant APP (also referred to as a voice APP, a voice assistant, or a smart voice, etc.) may be installed in the electronic device 101 for voice interaction with a user. For example, the user may wake up the voice assistant APP in the electronic device 101 by inputting a preset wake-up word (e.g., "hello small E", "artwork", "hi Siri", etc.). After the voice assistant APP is awakened, conversation can be conducted with the user. During a session, the user may have one or more rounds of dialog with the voice assistant APP. In each round of conversation, the user can input corresponding voice input (also called voice command, voice input, etc.) to the voice assistant APP, trigger the voice assistant APP to recognize the voice input, and provide corresponding service resources, thereby completing a round of conversation.

For example, as shown in fig. 2, in step S201, the voice assistant APP may receive a voice input of a user. In step S202, the voice assistant APP may convert the voice input into corresponding text (i.e. dialog content) through an ASR (Automatic Speech Recognition) technology. Further, in step S203, the voice assistant APP may send the conversation content to the server 102. Alternatively, the electronic device 101 may directly transmit the received voice input to the server 102, and the server 102 may convert the voice input into the corresponding dialog content.

As shown in fig. 2, after the server 102 receives the session content sent by the electronic device 101, in step S204, the server 102 may extract the intention and slot information in the current session content by using a preset Natural Language Understanding (NLU) algorithm. Further, in step S205, the server 102 may send a service resource (may also be referred to as service content) or a link of the service resource corresponding to the currently extracted intention and slot information to the voice assistant APP. For example, when the intention is to inquire about weather, the server 102 may send corresponding weather information as a service resource to the voice assistant APP of the electronic device 101. For another example, when the slot information includes singer a and song B with the intention of playing a song, the server 102 may transmit URLs corresponding to the singer a and song B to the electronic device 101. For another example, when the intent is to take a spot, the server 102 may send a deeplink of a search page in the take away APP to the electronic device 101.

It should be noted that there may be more than one server 102 in the voice interactive system 100. For example, the first server may be used to extract intent and slot information in the conversation content. The first server can obtain the corresponding service resource from the second server according to the extracted intention and the slot position information.

In some scenarios, after the server 102 sends the deeplink of the other application (e.g., takeaway APP) to the voice assistant APP of the electronic device 101, as described in step S206, the voice assistant APP may jump to the takeaway APP according to the deeplink. At this time, the electronic device 101 turns off or switches the voice assistant APP to the background operation, which results in ending the voice interaction process (i.e. step S207). Subsequently, after the user finishes operating in the takeaway APP, the electronic device 101 cannot jump back to the voice assistant APP to continue the previous session, thereby reducing the continuity of multiple sessions during voice interaction.

In the embodiment of the present application, when a user has a conversation with the electronic device 101, as shown in fig. 3, in step S301, the voice assistant APP of the electronic device 101 may receive a voice input of the user. Further, in step S302, the voice assistant APP may convert the received voice input into corresponding dialog content. In step S303, the voice assistant APP may send each received voice input (or dialog content in text form) and a session identifier of the current session to the server 102. Subsequently, in step S304, the server 102 may extract corresponding intention and/or slot information from the dialog content, so as to query a service resource corresponding to the current voice input. If the server 102 inquires that the service resource corresponding to the current voice input is a deeplink of another application (for example, a takeaway APP), in step S305, the server 102 may send the session identifier and the deeplink (i.e., the service resource) to the voice assistant APP in the electronic device 101. Furthermore, in step S306, the voice assistant APP may jump to an interface of the takeaway APP according to the received deeplink, and send the corresponding session identifier to the takeaway APP. Thus, after the user finishes the interface operation of the takeout APP on the electronic device 101, as described in step S307, the electronic device 101 may jump back to the voice assistant APP from the displayed interface of the takeout APP, and transmit the corresponding session identifier to the voice assistant APP, so that the voice assistant APP may display a corresponding session interface according to the session identifier. Therefore, the user can continue to have a conversation with the voice assistant APP in the previous conversation interface, and the continuing function of conversation content during voice interaction is realized, so that the use experience of the user is improved.

The electronic device 101 may be specifically a mobile phone, a sound box, a car-mounted device (also referred to as a car machine), a tablet computer, a notebook computer, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a handheld computer, a netbook, a Personal Digital Assistant (PDA), a wearable electronic device, a virtual reality device, and other electronic devices having a voice interaction function, which are not limited in this embodiment of the present application.

For example, fig. 4 shows a schematic structural diagram of the electronic device 101.

The electronic device 101 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a microphone 170B, a sensor module 180, and the like.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processor (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), among others. The different processing units may be separate devices or may be integrated into one or more processors.

A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

In embodiments of the invention, the processor 110 may process one or more of the following operations: converting speech to text, recognizing speech input by a user, responding to the user's speech, sending content retrieved from a server to a display screen for display, jumping from one application to another according to a link, and the like.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

The mobile communication module 150 may provide a solution including wireless communication of 2G/3G/4G/5G, etc. applied on the electronic device 101. The mobile communication module 150 may include one or more filters, switches, power amplifiers, low Noise Amplifiers (LNAs), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110.

The wireless communication module 160 may provide a solution for wireless communication applied to the electronic device 101, including Wireless Local Area Networks (WLANs) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global Navigation Satellite System (GNSS), frequency Modulation (FM), near Field Communication (NFC), infrared (IR), and the like. The wireless communication module 160 may be one or more devices that integrate one or more communication processing modules. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the storage capability of the electronic device 101. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.

Internal memory 121 may be used to store one or more computer programs, which include instructions. The processor 110 may execute the above-mentioned instructions stored in the internal memory 121, so as to enable the electronic device 101 to execute the method for intelligently recommending contacts, as well as various functional applications and data processing, etc. provided in some embodiments of the present application. The internal memory 121 may include a program storage area and a data storage area. Wherein, the storage program area can store an operating system; the storage area may also store one or more application programs (e.g., gallery, contacts, etc.), etc. The storage data area may store data (such as photos, contacts, etc.) created during use of the electronic device 101, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, universal Flash Storage (UFS), and the like. In other embodiments, the processor 110 causes the electronic device 101 to execute the voice interaction method provided in the embodiments of the present application, and various functional applications and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The electronic device 101 may implement audio functions through the audio module 170, the speaker 170A, the microphone 170B, and the application processor, among others. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also called a "horn", is used to convert the audio electrical signal into a sound signal. The electronic apparatus 101 can listen to music through the speaker 170A or listen to a handsfree call.

The microphone 170B, also referred to as a "microphone", is used to convert a sound signal into an electrical signal. When making a call or transmitting voice information, the user can input a voice signal to the microphone 170B by speaking near the microphone 170B through the mouth. The electronic device 101 may be provided with one or more microphones 170B. In other embodiments, the electronic device 101 may be provided with two microphones 170B to achieve a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 101 may further include three, four, or more microphones 170B to collect sound signals, reduce noise, identify sound sources, and perform directional recording.

In the practice of the invention, the speaker 170A, the microphone 170B and the audio module 170 may be used to enable voice interaction with a user, such as receiving the user's voice, or responding to the user's actions through voice, etc.

The sensor 180 may include a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like, which is not limited in this embodiment.

It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the electronic device 101. In other embodiments of the present application, the electronic device 101 may include more or fewer components than illustrated, or combine certain components, or split certain components, or arrange different components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

For example, when the electronic device 101 is a sound box, the electronic device 101 may further include one or more devices such as a GPU, a display screen, a camera, and a key, which is not limited in this embodiment of the present application.

For another example, when the electronic device 101 is a mobile phone, the electronic device 101 may further include one or more devices such as a GPU, a display screen, a camera, an earphone interface, a key, a battery, a motor, an indicator, and a SIM card interface, which is not limited in this embodiment.

Hereinafter, a voice interaction method provided by the embodiments of the present application will be described in detail with reference to the accompanying drawings. In the following embodiments, a mobile phone is exemplified as the electronic device 101.

Fig. 5 is a schematic flowchart of a voice interaction method according to an embodiment of the present application. As shown in fig. 5, the voice interaction method may include:

s501, a voice assistant APP of the mobile phone receives a first voice input by a user.

For example, when a user needs to use a voice assistant APP in a mobile phone, the user may wake up the voice assistant APP by inputting a wake-up word or the like. For example, when the handset detects that the user inputs the correct wake word, it may start running the voice assistant APP. Also, as shown in fig. 6, the handset may display a session interface 601 of the voice assistant APP with the user. Alternatively, when the mobile phone detects a preset operation (for example, long-pressing a power key, waking up with a voice wake-up word) for waking up the voice assistant APP by the user, the voice assistant APP may be run, and the session interface 601 shown in fig. 6 is displayed.

After the voice assistant APP in the mobile phone is awakened, the voice assistant APP may invoke a microphone (or a microphone array) of the mobile phone to detect a voice input (also referred to as a voice, a voice command, or a corpus, etc.) of the user, so as to perform one or more rounds of conversations with the user. Illustratively, the first speech input entered by the user may be a "my point take away" speech.

S502, a voice assistant APP of the mobile phone converts the first voice input into first conversation content.

For example, the ASR service may be provided in a handset. After the voice assistant APP detects a first voice input entered by the user, the ASR service may be invoked to convert the first voice input into corresponding text content (i.e., first dialog content). For example, when the first voice input is a voice of "my point take-away", the voice assistant APP may obtain a corresponding first dialog content "my point take-away" in a text form through an ASR service in the mobile phone. At this point, as also shown in fig. 6, the voice assistant APP may display the first conversation content 602 in the form of a conversation in the conversation interface 601. It is understood that the ASR service may also be provided on other devices, such as a server or other terminal devices, and the handset may obtain the content after the speech conversion text, such as the first dialogue content, by interacting with the device having the ASR service. The embodiment of the present invention is not particularly limited to this.

In some embodiments, after the voice assistant APP converts the first voice input into the first dialog content in the text form, the current position information, camera data or health data of the mobile phone may also be acquired. For example, if the first dialog content hits a skill such as navigation or point take-away, the handset may also obtain current location information. For another example, if the first dialog content hits in a skill of taking a video or the like, the handset may also acquire current camera data.

S503, the mobile phone sends a first request message to the first server, where the first request message includes the first session content.

For example, after the voice assistant APP of the mobile phone converts the first voice input into the first session content in text form, the session identifier of the first session content, for example, the sessionId of the first session content, may be obtained by interacting with a Dialog Manager (DM) to determine a session to which the first session content belongs. The session identifier may be used for jumping back to the voice assistant APP and recovering the first session content after the subsequent cell phone jumps from the voice assistant APP to another application.

For example, the dialog manager described above may be provided in the first server. The first server may be one or more servers providing network services to the voice assistant APP. After obtaining the first session content, the voice assistant APP of the mobile phone may send a first request message to the first server, where the first request message includes the first session content. Furthermore, the first server can identify the semantics of the first dialogue content by using a preset NLU algorithm, and then allocates a corresponding dialogue identifier for the first dialogue content according to the semantics of the first dialogue content.

For example, the DM in the first server may determine a skill (kill) of the first dialog content hit according to the semantics of the first dialog content, and then assign a session identifier according to the hit skill. For example, the DM may determine the specific skills hit by each dialog content by extracting keywords in the dialog content. If the conversation content 1 and the conversation content 2 hit the same skill, the conversation content 1 and the conversation content 2 belong to the same conversation, and at this time, the conversation identifications of the conversation content 1 and the conversation content 2 are the same.

Alternatively, the dialog manager may be provided in the handset. At this time, after the voice assistant APP of the mobile phone acquires the first session content, the preset NLU algorithm may be used to identify the semantics of the first session content, and then the DM in the mobile phone is requested to allocate a corresponding session identifier to the first session content according to the semantics of the first session content. Furthermore, the mobile phone may send the first session content and the session identifier of the first session content to the server, for example, the mobile phone carries both the first session content and the session identifier of the first session content in the first request message and sends the first request message to the first server.

In some embodiments, the voice assistant APP of the mobile phone may also carry the device identifier of the mobile phone in the first request message and send the device identifier to the first server. For example, the Device Identifier may be UUID (universal Unique Identifier), UDID (universal Device Identifier), IMEI (International Mobile Equipment Identifier), SN (Serial Number, product Serial Number), UID (User Identification ), openID, or the like. The device identifier can be used for judging whether the user logs in an account of the related application when a service resource (also called service content) corresponding to the first conversation content is acquired subsequently.

In some embodiments, the voice assistant APP of the mobile phone may also carry data such as current location information, camera data, or health data of the mobile phone in the first request message and send the first request message to the first server.

S504, in response to the first request message, the first server determines a first service resource corresponding to the first session content.

For example, after the first server acquires the first dialog content in the first request message, the intent and/or slot position information in the first dialog content may be extracted through a preset NLU algorithm, that is, the semantics of the first dialog content are identified. For example, the first dialog content is taken as "i'm key take-away", the first server may extract that the first dialog content is intended to be taken-away (order-takeout) through a preset NLU algorithm, and at this time, there is no explicit slot information in the first dialog content. For another example, when the first dialog content is "i want to listen to a song of zhou jeren", the first server may extract, through a preset NLU algorithm, that the intention of the first dialog content is: playing the song, wherein the slot position information corresponding to the singer is as follows: zhou Ji Lun. Subsequently, the first server may query a first service resource (i.e., the first service content) corresponding to the first dialogue content according to the extracted intent and/or slot information.

For example, the first service resource (or the first service content) may be a specific video file or audio file, or may also be a function in a search result, a page (e.g., a login page) or an application for a certain content, and the like, which is not limited in this embodiment of the present application.

In some embodiments, some device vendors may set their own service open platform for accessing services provided by developers of third-party applications to various devices of the device vendors themselves for use. For example, hua is provided with an HAG (Huawei accessibility library, hua is an open platform for intelligent services), and the access process to hua is equipment through the service provided by the developer of third-party application is uniformly managed by the HAG. In an optional manner, the device manufacturer may also notify the third party application developer of the link of the voice assistant APP in the device of the device manufacturer through its own service development platform, or other setting or convention. In this way, the third party application developer may preset the link in the third party application. After the device manufacturer's own device, such as a mobile phone, jumps from the voice assistant APP to the interface of the third party application, the third party application may cause the device to pull up the voice assistant APP via the link.

Still taking the first conversation content as an example of 'my key point takeaway', after the first server extracts the intention and/or slot position information of the first conversation content, the first server can send a first message to a service open platform of the mobile phone to trigger the service open platform to determine a third party application associated with the first conversation content. For example, the first server may send a first message to the HAG, the first message including the intent extracted from the first message: order-takeout. In this way, the HAG may determine, according to the intention carried in the first message, that the third-party application associated with the current first session content is a takeaway APP (e.g., a midget APP). That is, the first service resource corresponding to the first dialog content is provided by a server (e.g., a second server) that takes out the APP. Further, the HAG may notify the first server to obtain the first service resource corresponding to the first dialog content from the second server of the takeaway APP. Subsequently, the first server may obtain the corresponding first service resource from the second server by performing the following step S505 according to the intention and/or slot information of the first session content and the session identifier of the first session content.

Or, still taking the first conversation content as an example of "i key take away", after the first server sends the first message to the HAG, the HAG may determine, according to an intention carried in the first message, that the third-party application associated with the first conversation content is a take-away APP, and then the HAG may directly interact with a second server of the take-away APP, and obtain, from the second server, the first service resource corresponding to the first conversation content.

It should be noted that, in the foregoing embodiment, the first server determines, through interaction with the HAG, the third-party application associated with the current first session content by way of example. It can be understood that the first server may also determine, in other ways, a third-party application associated with the first session content of this time, which is not limited in this embodiment of the application.

S505, when the first service resource is provided by a server (i.e. a second server) of the first application, the first server sends a second request message to the second server, where the second request message includes a session identifier to which the first session content belongs.

For example, when the first server acquires, from the HAG, that the third-party application associated with the first session content is a takeout APP, the first server may send a second request message to a second server of the takeout APP to request to acquire the first service resource (or the link of the first service resource) corresponding to the first session content. The second request message may include the intention and/or slot information extracted from the first session content, and a session identifier of the first session content.

Or after determining that the third-party application associated with the first session content is a takeaway APP, the HAG may directly send the second request message to a second server of the takeaway APP to request to acquire the first service resource corresponding to the first session content. Likewise, the intent and/or slot information extracted from the first session content, as well as the session identification of the first session content, may be included in the second request message.

No matter which way is adopted to send the second request message, the second server may obtain the session identifier of the current session content (i.e., the first session content) through the second request message, so that the second server of the first application may subsequently establish a correspondence between the session identifier of the first session content and the service resource of the first session content.

S506, responding to the second request message, the second server sends the first service resource (or the link of the first service resource) to the voice assistant APP of the mobile phone through the first server.

For example, after receiving the second request message, the second server may determine the corresponding service resource, that is, the first service resource, according to the intention and/or the slot information carried in the second request message. For example, when the intention is to listen to a song and the slot information is nunchakus, the second server may determine that the corresponding service resource is an audio resource of the song nunchakus. For another example, when the intent is point takeaway, the second server may determine that the corresponding service resource is a real-time takeaway search result in a takeaway APP. Furthermore, the first service resource can be directly sent to the voice assistant APP of the mobile phone through the first server, and the link of the first service resource can also be sent to the voice assistant APP of the mobile phone through the first server, so that the voice assistant APP obtains the corresponding first service resource according to the connection.

In some embodiments, the second request message received by the second server may further include an equipment identifier of the handset, for example, a UUID. At this time, after receiving the second request message, the second server may first determine whether the user has logged in the takeout APP according to the UUID carried in the second request message.

Generally, after a user logs in a takeout APP by using an account, a password and the like, the takeout APP may send a corresponding UUID to a second server, and the second server may mark the UUID as a login state. For example, the second server may assign a token corresponding to the UUID, such that the UUID is marked as logged in. Then, after receiving the second request message, the second server may query whether a token corresponding to the UUID carried in the second request message is stored. If the corresponding token is stored, the user is indicated to log in the takeout APP; if the corresponding token is not stored, it indicates that the user has not logged in to the takeaway APP. The embodiment of the invention does not specifically limit how the second server judges whether the user logs in the takeaway APP according to the device identifier.

If the user logs in the takeout APP, the second server can obtain account information such as coupons, historical orders and taste preferences in the user account. At this time, the second server may determine, based on the information, a first service resource corresponding to the intention to take out of the spot in the first dialog content. For example, if the user has a Kendex coupon, kendex search results may be preferentially included in the first service resource. For another example, if the user possesses the kentucky coupon, the second server may also carry information such as the price after the discount in the first service resource. Therefore, the second server can determine more targeted and accurate service resources for the user by combining the account information after the user logs in, and the use experience of the user during voice interaction is improved.

If the user does not log in the takeaway APP, the second server may use a login page in the takeaway APP as the first service resource corresponding to the first session content. At this time, the second server may send the link (i.e., the link of the first service resource) of the login page in the takeout APP to the voice assistant APP of the mobile phone, so that the user can obtain more accurate and richer service resources through the voice assistant APP after subsequently logging in the takeout APP.

For example, the second server may first send the first service resource or the link of the first service resource to the first server, and then the first server sends the first service resource or the link of the first service resource to the voice assistant APP of the mobile phone. The link of the first service resource may be a link of an H5 (Hyper Text Markup Language, HTML 5) page, a link of a related page in a fast application, or a link of a related page in a third-party application, and these links may be collectively referred to as deeplinks. And the link of the first service resource may further include a session identifier of the first session content, so as to establish a correspondence between the first service resource of the first session content and the session identifier of the first session content.

For example, the deeplink of the first service resource determined by the second server is: https:// www.huawei.com/hag/accountLogin; the session identification of the first session content is: sessionId = xxxyyy; then, the second server may splice the session identifier of the first session content to the deeplink of the first service resource, and finally obtain the link1 of the first service resource as: https:// www.huawei.com/hag/accountLogin & sessionId = xxxyyy, wherein link1 includes both a decaplink of the first service resource and a session identifier of the first session content. Further, the second server may send link1 to the handset via the first server, and the handset may send link1 to the voice assistant APP.

In other embodiments, the link1 of the first service resource may further include a timeout time of the link1, for example, 2 s. The timeout time can be used for triggering the mobile phone to jump back to the voice assistant APP from the interface of the application corresponding to the link1 after the timer reaches or exceeds the timeout time.

And S507, skipping to the first application by the voice assistant APP of the mobile phone according to the link.

Taking the first service resource as the login page example in the takeout APP, in step S507, the link acquired by the voice assistant APP of the mobile phone may include the deeplink of the login page, and then the voice assistant APP may pull up the takeout APP according to the deeplink of the login page, and operate the takeout APP on the foreground of the mobile phone.

At this time, the mobile phone can switch the voice assistant APP to the mobile phone background for running. For example, when the voice assistant APP switches to the background, the conversation content between the current user and the voice assistant APP can be stored in a preset database, the loss of the current dialog content after the speech assistant APP is switched to the background or killed (kill) is avoided.

For example, as shown in table 1, the preset database may store the correspondence between the conversation content, the conversation time, and the conversation identifier between the user and the voice assistant APP in the last period of time (e.g., 10 minutes). Of course, the table 1 may further include context information such as a device identifier, a skill ID corresponding to the session content, and the like, which is not limited in this embodiment. Subsequently, the voice assistant APP can recover the conversation content between the user and the voice assistant APP according to the specific conversation identifier through table 1. Certainly, the DM of the first server may also maintain context information between different users and their voice assistants APP, where the context information may include information such as session content and session identifier of each round of session, and the voice assistants APP may also recover the session content between the user and the voice assistants APP according to the session identifier and the context information recorded in the DM.

TABLE 1

And S508, displaying an interface of the first application corresponding to the link by the mobile phone.

Still taking the first service resource as the login page in the takeout APP, in step S508, after the voice assistant APP jumps to the takeout APP according to the deeplink in the link, as shown in fig. 7, the takeout APP may display the login page 701 in the takeout APP. At this time, the user may input an authorization operation of authorizing the voice assistant APP or authorizing account information of other logged-in APPs to log in the takeout APP in the login page 701. For example, the authorization operation may be an operation of clicking an authorization button 702 in the login page 701. Or, the login page 701 of the takeout APP may also prompt the user to log in the takeout APP by inputting an account, a password, and the like, which is not limited in this embodiment of the present application.

After detecting that the user inputs an authorization operation, the takeaway APP may send an authorization message to a second server of the takeaway APP, where the authorization message may include an equipment identifier of the mobile phone, such as a UUID. Further, the second server may mark the UUID as logged in response to the authorization message. For example, the second server may assign a token corresponding to the UUID such that the UUID is marked as logged in. Further, the second server may send a response message of successful login to the takeaway APP of the handset. At this point, as shown in fig. 8, the takeaway APP may display a login success message 703 in the login page 701 in response to the response message. Of course, the takeaway APP may not display the login success message 703.

Of course, if the link of the first service resource includes a deeplink of another application or another page, the mobile phone may also trigger the voice assistant to pull up and display a display interface of another application according to the above method, which is not limited in this embodiment of the present application.

S509, after displaying the interface of the first application, the mobile phone may further jump back to the session interface of the voice assistant APP from the interface where the first application is displayed.

After the interface of the first application is displayed, the mobile phone can quit displaying the interface of the first application. For example, when the user switches the takeaway APP (i.e., the first application) to run in the background, the cell phone may exit the interface displaying the first application. For another example, if the user is detected to click the return button while the mobile phone is displaying the interface of the take-away APP, the mobile phone may exit displaying the interface of the first application. For another example, when the takeaway APP receives a message that the user login is successful, the interface displaying the first application may also be automatically quitted.

Still alternatively, a timeout may be set in the link1 to the first service resource acquired by the takeaway APP. For example, the takeaway APP may start a timer of duration the timeout described above. At this moment, after the mobile phone skips from the voice assistant APP to the takeaway APP, if any operation input by the user on the takeaway APP is not received within the timeout period, the interface of the takeaway APP can automatically quit displaying. Or, if a timeout time is set in the link1 of the first service resource, the takeout APP can be automatically exited after the timeout time is run, regardless of whether an operation input by a user on the takeout APP is received. The embodiment of the application does not limit the condition that the mobile phone quits from displaying the interface of the first application.

When the mobile phone quits the interface displaying the takeout APP, the link1 of the first service resource acquired by the takeout APP comprises the session identifier of the first session content, so that the takeout APP can trigger the voice assistant APP to display the corresponding session interface according to the session identifier.

For example, deeplink1 of the session interface of the voice assistant APP is: hivoice:// com.huawei.vassistant/diaglog. The first application, for example, the takeaway APP, may be preset with the above deeplink1 of the voice assistant APP to implement interface jump. For example, the takeaway APP may identify the session of the first session content: sessionId = xxxyyy is added to the above deeplink1, and a deeplink 2 of the voice assistant APP carrying the session identifier is generated after concatenation: hicose:// com.huawei.vasistant/diaglog & sessionId = xxxyy. Further, the takeaway APP may pull up the voice assistant APP by calling the above deeplink 2, and pass the session identification of the first session content to the voice assistant APP through deeplink 2. For another example, when the takeaway APP displays a message that the login is successful, or when the interface displayed by the takeaway APP times out, the mobile phone may be triggered to quit displaying the interface of the takeaway APP. At this time, the takeaway APP can pull up the voice assistant APP through the deeplink 2, so that the session identifier carried in the deeplink 2 is transferred to the voice assistant APP.

It can be understood that the link of the application to be skipped (e.g., deeplink1 of the voice assistant APP described above) preset in the first application may be a link of a voice application to be skipped preset in an H5 page, a fast application or a locally installed application corresponding to the first application, and the embodiment of the present invention is not limited in particular. In an alternative manner, the first application may also not preset the link of the application that needs to jump (e.g. deeplink1 of the above-mentioned voice assistant APP). The first application may be obtained from a server corresponding to the first application before the jump, or obtained from an application that needs to jump (e.g., obtain deepink 1 of the voice assistant APP from the voice assistant APP when or after jumping from the voice assistant APP to the first application). The embodiment of the present invention is not particularly limited to this.

The voice assistant APP can operate in the foreground of the mobile phone after being pulled up, and at the moment, the voice assistant APP can obtain corresponding conversation contents according to the conversation identification transmitted by the takeaway APP. For example, the voice assistant APP may query a database in the handset shown in table 1 for the dialog content corresponding to the session identification described above. If the corresponding conversation content is not inquired in the database in the mobile phone, the voice assistant APP can send the conversation identification to the first server, and the conversation content corresponding to the conversation identification is inquired in the first server. Certainly, the voice assistant APP may also directly query the first server for the session content corresponding to the session identifier, which is not limited in this embodiment of the present invention.

And S510, displaying a session interface corresponding to the session identifier by a voice assistant APP of the mobile phone.

After the voice assistant APP acquires the session content corresponding to the session identifier, the voice assistant APP may display the acquired session content (the acquired session content includes the first session content) in the session interface, so as to resume displaying the session interface 601 shown in fig. 6, that is, resume the session interface 601 before the voice assistant APP jumps to the takeaway APP.

Therefore, when the user has a conversation with the voice assistant APP, the voice assistant APP can jump to other applications according to the conversation content input by the user to provide corresponding service resources for the user, and the mobile phone can also jump back to the voice assistant APP from other applications and recover the previous conversation interface, so that the user can continue to continue the conversation process with the voice assistant APP, the continuity of multiple rounds of conversation during voice interaction is improved, and the use experience of the user is improved.

In addition, after the mobile phone jumps to the voice assistant APP again, the mobile phone may also send a session recovery message to the first server, so that the first server knows that the voice assistant APP has resumed to jump to the session interface 601 before the takeaway APP.

For example, the voice assistant APP may carry the session identifier of the first session content and the device identifier of the handset in the session recovery message. In this way, after the first server acquires the session recovery message, the session identifier and the device identifier in the session recovery message can be sent to the second server of the takeout APP. For example, the first server may first send the session identifier and the device identifier in the session recovery message to the HAG, and the HAG determines that the third-party application corresponding to the current session is the takeaway APP. Further, the HAG may send the session identification and the device identification to a second server of the takeaway APP. The second server may determine whether the user is logged into the takeaway APP based on the received device identification (e.g., UUID). Since the second server has marked the UUID of the handset as logged in step 508, the second server can determine that the user has logged in to the takeaway APP based on the device identification in the session resume message.

In step S505, the second server has already acquired the intention and/or slot information of the first session content and the session identifier of the first session content through the second request message. Therefore, after determining that the user has logged in the takeout APP, the second server may determine the corresponding intention and/or slot position information according to the session identifier of the currently received first session content. Furthermore, the second server may determine the first service resource corresponding to the intention and/or the slot position information based on account information such as a coupon, a historical order, a taste preference and the like in the user account. For example, when the first dialog content is intended for take-away, the second server may recommend search results such as price, location, rating, etc. of one or more restaurants (or gourmets) nearby to the user based on the account information of the user. Furthermore, the second server can send the obtained search result to the voice assistant APP of the mobile phone through the first server.

At this time, as shown in fig. 9, the voice assistant APP of the mobile phone may present the received search result in the session interface 601 in the form of a card 901. The search result in the card 901 is associated with the account information of the user after logging in the takeout APP, so that more targeted and accurate service resources can be recommended for the user, and the use experience of the user during voice interaction is improved.

Or the second server can also send the link of the search result obtained based on the account information of the user to the voice assistant APP of the mobile phone through the first server. At this time, the voice assistant APP may obtain the search result according to the link, and display the search result in the session interface 601 in the form of a card 901.

Subsequently, the user can select corresponding options in the card 901 displayed by the voice assistant APP through modes such as voice and touch, so that after the voice assistant APP jumps to the takeaway APP, the voice assistant APP can jump back from the takeaway APP, and the voice connection function of the session can be continuously provided for the user.

In other embodiments, after the first server receives the first conversation content of "i'm main point take-out", if the user does not log in to the account of the take-out APP, the first server may also obtain, from the second server, the first service resource of the user in an unregistered state, for example, a search result of a nearby restaurant (or food) in an unregistered state. And the first server can also obtain the link of the login page in the takeout APP from the second server. That is, in step S506, the second server may send the search result in the unregistered state and the link of the login page to the voice assistant APP. Of course, the second server may also send the link of the search result (i.e., the link of the first service resource) in the unregistered state and the link of the login page to the voice assistant APP, which is not limited in this embodiment of the present application. The link of the login page sent by the second server can still carry the parameters such as the session identifier, the UUID and the like.

If the voice assistant APP receives a search result in the unregistered state and a link to a login page, as shown in fig. 10, the search result in the unregistered state may be displayed in the conversation interface 601 in the form of a card 1001. If the voice assistant APP receives the link of the search result in the unregistered state, the voice assistant APP may first obtain the search result in the unregistered state according to the link, and then display the search result in the unregistered state in the session interface 601 in the form of a card 1001. Since the search result obtained by the voice assistant APP at this time is obtained by searching the second server in a state where the user is not logged in, the search result in the card 1001 may be different from the search result in the logged-in state of the user shown in fig. 9. Also, as shown in fig. 10, the voice assistant APP may display a link 1002 of the login page in the session interface 601, reminding the user to obtain a more accurate search result after logging in the takeaway APP.

Subsequently, the user can jump to the login page of the takeaway APP to log in by clicking the link 1002, and can also jump to the login page of the takeaway APP to log in by voice input. For example, if it is detected that the user clicks the link 1002 in the above-mentioned session interface 601, similar to the above-mentioned steps S507-S510, the voice assistant APP may jump to the login page of the takeaway APP and pass the corresponding session identification to the takeaway APP. In this way, the subsequent handset may jump back to the voice assistant APP directly from the takeaway APP and redisplay the session interface of the voice assistant APP as shown in fig. 10 according to the session identifier.

For another example, if a voice input (i.e., a second voice input) of "login takeaway APP" is detected from the user, the voice assistant APP may transmit the second session content of "login takeaway APP" to the first server, similar to the above-described steps S501 to S510. When the first server extracts that the intention of the second dialogue content is login (login), the second server can be requested to login the account number of the user in the takeout APP. And then, the second server can send the login page and the session identifier of the takeaway APP to the voice assistant APP as the link, and the voice assistant APP jumps to the takeaway APP according to the link. Subsequently, similar to the above embodiment, the mobile phone may jump back to the interface of the voice assistant APP from the displayed interface of the takeout APP, and redisplay the session interface of the voice assistant APP as shown in fig. 10 according to the session identifier.

For example, after the mobile phone jumps back to the voice assistant APP, similar to the above embodiment, a session recovery message may also be sent to the first server, which triggers the first server to obtain a search result of the user for a nearby restaurant (or food) in a login state from the second server, and send the search result to the voice assistant APP. At this time, as shown in fig. 11, the voice assistant APP may display the received search result in the form of a card 1101 in the conversation interface 601. Because the search result received by the voice assistant APP at this time is obtained by searching the second server in the state of user login, the search result in the card 1101 is associated with the account information after the user logs in the takeout APP, and compared with the search result in the card 1001, the search result can recommend more targeted and accurate service resources for the user, and the user experience during voice interaction is improved.

It can be seen that, in the voice interaction method provided in the embodiment of the present application, when a voice assistant APP, a server of the voice assistant APP (i.e. a first server) and a server of a third-party application (i.e. a second server) interact with each other, a session identifier of the current session content may be transmitted as a carried parameter in each interaction process. Therefore, the second server can establish the corresponding relation between the service resource and the corresponding session identifier when providing the service resource corresponding to the conversation content for the voice assistant APP, so that when the voice assistant APP jumps to the third party application providing the service resource, the third party application can also obtain the service resource and the corresponding session identifier at this time, so that the voice assistant APP can be pulled up again according to the session identifier and the corresponding session interface can be recovered when the third party application exits, the voice assistant APP can still jump back to the voice assistant APP after jumping to the third party application to realize the voice connection function, and the use experience of the user is improved.

It should be noted that, in the above embodiment, the example is illustrated by installing the voice assistant APP in a mobile phone, and it may be understood that, in a voice interaction scene, the electronic device where the voice assistant APP is installed may also be a vehicle-mounted device, a tablet computer, a watch, and the like, which may all be used to implement the voice interaction method in the above embodiment, and the embodiment of the present application does not limit this.

As shown in fig. 12, an embodiment of the present application discloses an electronic device, which may be the above-mentioned mobile phone. The electronic device may specifically include: a touch screen 1201, the touch screen 1201 comprising a touch sensor 1206 and a display screen 1207; one or more processors 1202; a memory 1203; one or more applications (not shown); and one or more computer programs 1204, which may be connected by one or more communication buses 1205. Wherein the one or more computer programs 1204 are stored in the memory 1203 and configured to be executed by the one or more processors 1202, the one or more computer programs 1204 comprising instructions that can be used to perform the steps associated with the mobile phone implementation of the embodiments described above.

Through the above description of the embodiments, it is clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions. For the specific working processes of the system, the apparatus and the unit described above, reference may be made to the corresponding processes in the foregoing method embodiments, and details are not described here again.

Each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or make a contribution to the prior art, or all or part of the technical solutions may be implemented in the form of a software product stored in a storage medium and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media that can store program code, such as flash memory, removable hard drive, read-only memory, random-access memory, magnetic or optical disk, etc.

The above description is only a specific implementation of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any changes or substitutions within the technical scope disclosed in the embodiments of the present application should be covered by the scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of voice interaction, comprising:

the electronic equipment displays a conversation interface of a voice application, wherein the conversation interface is used for displaying conversation content between a user and the voice application;

the electronic equipment detects first voice input by a user and converts the first voice into first conversation content;

the electronic equipment acquires a first link according to the first conversation content;

the electronic equipment jumps to an interface of a first application from the conversation interface of the voice application according to the first link;

after the interface of the first application is displayed, the electronic equipment jumps back to the session interface according to the session identification corresponding to the first session content.

2. The method of claim 1, after the electronic device converts the first speech into first dialog content, further comprising:

the electronic equipment sends a first request message to a first server, wherein the first request message comprises the first conversation content, so that the first server determines a conversation identifier and the first link of the first conversation content in response to the first request message;

the method for acquiring the first link according to the first conversation content by the electronic equipment comprises the following steps:

and the electronic equipment receives a first response message sent by the first server, wherein the first response message comprises the first link and the session identification.

3. The method of claim 1 or 2, wherein the electronic device jumps back to the session interface according to the session identifier corresponding to the first session content, comprising:

the first application in the electronic device pulls up the voice application;

after the voice application is pulled up, the voice application in the electronic equipment displays the conversation content corresponding to the conversation identification.

4. The method of claim 3, wherein the first application in the electronic device pulls up the voice application, comprising:

the first application in the electronic equipment pulls up the voice application according to a second link, wherein the second link comprises a link of the voice application and the session identification.

5. The method of claim 3, wherein the voice application of the electronic device displays dialog content corresponding to the session identification, comprising:

the voice application inquires whether conversation content corresponding to the conversation identification is stored in the electronic equipment or not;

if the electronic equipment stores the conversation content corresponding to the conversation identifier, the voice application displays the conversation content corresponding to the conversation identifier in the conversation interface;

and if the electronic equipment does not store the conversation content corresponding to the conversation identification, the voice application acquires the conversation content corresponding to the conversation identification from the first server and displays the conversation content corresponding to the conversation identification in the conversation interface.

6. The method according to claim 2, wherein the first request message further includes a device identifier, and the device identifier is used to confirm whether the electronic device is logged in the first application.

7. The method of claim 6, wherein the first link comprises a link to a landing page in the first application if the electronic device is not logged into the first application.

8. The method of claim 7, wherein jumping from the conversation interface of the voice application to an interface of a first application by the electronic device according to the first link comprises:

and the voice application of the electronic equipment pulls up the first application according to the first link and displays a login page of the first application.

9. The method of claim 8, after the electronic device displays the login page of the first application, further comprising:

the electronic equipment receives login operation input by a user in the login page, wherein the login operation is used for authorizing the electronic equipment to login a user account of the first application;

wherein, the electronic device jumps back to the session interface according to the session identifier corresponding to the first session content, including:

and if a login success message corresponding to the login operation is received, the electronic equipment jumps back to the session interface from the interface of the first application according to the session identifier.

10. The method according to claim 8 or 9, further comprising, after the electronic device jumps back to the session interface according to the session identifier:

the electronic equipment displays first service content corresponding to the first conversation content in the conversation interface, and the first service content is associated with account information of a user after logging in the first application.

11. The method of claim 7, wherein if the electronic device is not logged in to the first application, the method further comprises:

the electronic equipment acquires second service content corresponding to the first conversation content, wherein the second service content is service content corresponding to the first conversation content when the first application is not logged in;

before the electronic device jumps from the conversation interface of the voice application to the interface of the first application according to the first link, the method further comprises the following steps:

the electronic equipment displays the second service content and the link of the login page in the first application in the session interface.

12. The method of claim 11, wherein jumping from the conversation interface of the voice application to an interface of a first application according to the first link by the electronic device comprises:

if the fact that the user selects the link of the login page in the session interface is detected, the electronic equipment jumps to the login page of the first application from the session interface according to the link of the login page; or,

and if the second voice input by the user is used for indicating to log in the first application, the electronic equipment jumps to the login page of the first application from the session interface according to the link of the login page.

13. The method of any of claims 1-12, wherein after jumping from the conversational interface of the voice application to an interface of a first application according to the first link, the electronic device further comprises:

the electronic equipment starts a preset timer;

when the timer times out, the electronic device jumps back to the session interface from the interface of the first application.

14. The method of any of claims 1-13, further comprising, after the electronic device jumps from the conversation interface of the voice application to displaying an interface of a first application in accordance with the first link:

and the electronic equipment switches the voice application to background operation.

15. A method of voice interaction, comprising:

the method comprises the steps that a first server receives a first request message sent by electronic equipment, wherein the first request message comprises first dialogue content;

in response to the first request message, the first server acquires a session identification and a first link of the first session content;

the first server sends a first response message to the electronic device, wherein the first response message comprises the session identification and the first link, and the session identification is used for the electronic device to jump back to an interface of a voice application displaying the first conversation content from an interface of a first application corresponding to the first link.

16. The method of claim 15, wherein the obtaining, by the first server, the session identifier and the first link of the first session content comprises:

the first server distributes session identification to the first session content;

the first server obtains a first link corresponding to the first dialog content from a second server of the first application.

17. The method of claim 16, wherein the first server assigns a session identifier to the first session content, comprising:

the first server identifying semantics of the first conversational content;

and the first server distributes session identification to the first session content according to the semantics of the first session content.

18. The method of claim 16, wherein the first server obtaining a first link corresponding to the first dialog content from a second server of the first application comprises:

the first server sends a first message to the second server, wherein the first message comprises the session identifier and the semantics of the first session content, so that the second server determines the first link according to the semantics of the first session content and establishes a corresponding relationship between the first link and the session identifier;

and the first server receives a second message sent by the second server, wherein the second message comprises the session identification and the first link.

19. The method of claim 18, wherein the first request message includes a device identification of the electronic device;

the first message comprises the equipment identification, so that the second server determines whether the electronic equipment logs in the first application or not according to the equipment identification; if the electronic device is not logged in the first application, the first link is a link to a landing page in the first application.

20. The method of claim 19, further comprising, after the first server sends the first response message to the electronic device:

the first server receives a session recovery message sent by the electronic device, wherein the session recovery message comprises a session identifier of the first session content and the device identifier;

responding to the session recovery message, the first server inquires whether the electronic equipment logs in the first application or not in the second server according to the equipment identification;

if the electronic equipment logs in the first application, the first server acquires first service content corresponding to the first conversation content from a second server, wherein the first service content is associated with account information of a user after logging in the first application;

the first server sends the first service content to the electronic equipment.

21. The method of claim 19, wherein the first response message further comprises a second service content, and wherein the second service content is a service content corresponding to the first session content when the first application is not logged on.

22. A method according to any of claims 15-21, wherein the first response message further comprises a timeout time for the first link.

23. A method of voice interaction, comprising:

the electronic equipment sends a first request message to a first server, wherein the first request message comprises the first dialogue content;

the first server sends a first response message to the electronic device, wherein the first response message comprises the session identification and the first link;

the electronic equipment jumps to an interface of a first application from the conversation interface according to the first link;

after the interface of the first application is displayed, the electronic equipment jumps back to the session interface according to the session identification.

24. The method of claim 23, wherein the first server obtaining the first link comprises:

the first server acquires the first link from a second server according to the first conversation content, and the second server is a server corresponding to the first application.

25. The method of claim 24, wherein the first request message further includes a device identification of the electronic device; the method further comprises the following steps:

the first server sends the device identification to the second server;

the second server determines whether the electronic equipment logs in the first application or not according to the equipment identification; if the electronic equipment does not log in the first application, the first link comprises a link of a login page in the first application.

26. The method of claim 25, further comprising, after the electronic device jumps from the conversation interface to an interface of a first application in accordance with the first link:

responding to a login operation input by a user on an interface of the first application, and sending a login request to the first server by the electronic equipment, wherein the login request comprises the equipment identifier;

in response to the login request, the first server requesting the second server to mark the device identification as a login state;

the first server sends a login success message to the electronic equipment;

wherein, the electronic device jumps back to the session interface according to the session identifier includes:

in response to the login success message, the server sends a login success message, and the electronic equipment jumps back to the session interface from the interface of the first application according to the session identification.

27. The method of claim 25 or 26, further comprising, after the electronic device jumps back to the session interface from the interface of the first application according to the session identifier:

the electronic equipment sends a session recovery message to the first server, wherein the session recovery message comprises the session identification and the equipment identification;

if the electronic equipment logs in the first application, responding to the session recovery message, the first server acquires first service content corresponding to the first session content from the second server and sends the first service content to the electronic equipment, wherein the first service content is associated with account information of a user after logging in the first application;

the electronic equipment displays the first service content in the session interface.

28. The method of claim 25, wherein if the electronic device is not logged in to the first application, the method further comprises:

the first server acquires second service content from the second server, wherein the second service content is service content corresponding to the first conversation content when the first application is not logged in;

before the electronic device jumps from the conversation interface to an interface of a first application according to the first link, further comprising:

the electronic device displays the second service content in the session interface.

29. The method of any of claims 23-28, wherein the electronic device jumping back to the conversation interface based on the conversation identification, comprises:

the first application in the electronic equipment pulls up the voice application according to a second link, wherein the second link comprises a link of the voice application and the session identification;

30. An electronic device, comprising:

a touch screen comprising a touch sensor and a display screen;

one or more processors;

a memory;

a communication module;

wherein the memory has stored therein one or more computer programs comprising instructions which, when executed by the electronic device, cause the electronic device to perform the voice interaction method of any of claims 1-14.

31. A server, comprising:

one or more processors;

a memory;

a communication module;

wherein the memory has stored therein one or more computer programs comprising instructions which, when executed by the server, cause the server to perform the method of voice interaction of any of claims 15-22.

32. A voice interaction system, comprising an electronic device as claimed in claim 30, and a server as claimed in claim 31.

33. A computer-readable storage medium having instructions stored therein, which when run on an electronic device, cause the electronic device to perform the voice interaction method of any one of claims 1-14; or, when run on a server, cause the server to perform the voice interaction method of any of claims 15-22.

34. A computer program product comprising instructions for causing an electronic device to perform the method of voice interaction according to any one of claims 1-14 when the computer program product is run on the electronic device; alternatively, the computer program product, when run on a server, causes the server to perform the voice interaction method of any of claims 15-22.