Nothing Special   »   [go: up one dir, main page]

CN117648411A - Expression generating method and device - Google Patents

Expression generating method and device Download PDF

Info

Publication number
CN117648411A
CN117648411A CN202210998977.1A CN202210998977A CN117648411A CN 117648411 A CN117648411 A CN 117648411A CN 202210998977 A CN202210998977 A CN 202210998977A CN 117648411 A CN117648411 A CN 117648411A
Authority
CN
China
Prior art keywords
text
emotion
expression
computer
speculative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210998977.1A
Other languages
Chinese (zh)
Inventor
邢诗萍
俞雨
邵凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202210998977.1A priority Critical patent/CN117648411A/en
Priority to PCT/CN2023/103053 priority patent/WO2024037196A1/en
Publication of CN117648411A publication Critical patent/CN117648411A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • User Interface Of Digital Computer (AREA)
  • Processing Or Creating Images (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses an expression generating method and device, wherein the method comprises the following steps: after the terminal equipment obtains the first emotion codes, the terminal equipment can match the first text of the pair, then a second text is generated through the speculative neural network, then a second emotion is determined according to the corresponding relation between the text and the emotion codes, the first expression is determined according to the corresponding relation between the emotion and the expression, the first expression is displayed, the expression displayed by the terminal equipment is speculatively changed through the speculative neural network instead of being fixedly set, and the complexity of digital human emotion display can be increased.

Description

Expression generating method and device
Technical Field
The embodiment of the application relates to the field of communication, in particular to an expression generating method and device.
Background
Virtual digital people are widely applied to a plurality of fields such as entertainment, education, service, sales and the like; in the current development period of digital people, the digital people have good 'leather bags' from cartoon to super-realistic. However, in the field of digital man driving, the current driving strategy still relies heavily on manual or large amounts of data, such as voice/text-driven digital man technology, which is a popular technique in academia. With the expansion of digital human application, emotion accompaniment of digital human becomes an important ring, and digital human is also urgent to express from simple emotion to full-dimension natural expression.
The most common practice is to use the mode designed by the animator to express the expression according to rules. The whole process of digital human expression display is prefabricated in advance, so that the digital human expression is single.
Disclosure of Invention
The application provides an expression generating method and device which are used for increasing the complexity of digital human emotion display.
The first aspect of the present application provides an expression generating method, which includes: acquiring a first emotion code; matching the first text according to the first emotion encoding; inputting the first text into a speculative neural network to generate a second text; matching a second emotion encoding according to the second text; determining a corresponding first expression according to the second emotion code; the first expression is displayed.
In the above aspect, the execution subject of the application is a terminal device, after obtaining the first emotion code, the terminal device can match the first text of the pair, then generate the second text through the speculative neural network, then determine the second emotion according to the corresponding relation between the text and the emotion code, determine the first expression according to the corresponding relation between the emotion and the expression, and display the first expression, wherein the expression displayed by the terminal device is speculatively changed by the speculative neural network, not fixedly set, and the complexity of digital human emotion display can be increased.
In one possible embodiment, the first emotion encoding is randomly generated.
In the possible implementation manner, the terminal device can generate the initial first emotion code locally at random, so that the flexibility of the scheme is improved.
In one possible embodiment, the obtaining the first emotion encoding includes: receiving a message of a user; a first emotion encoding is determined from the message.
In the possible implementation manner, the first emotion encoding may also be generated by a message input by a user, so as to improve flexibility of the scheme.
In a possible embodiment, the method further comprises: receiving voice data of a user, wherein the voice data is used for requesting a text corresponding to a first expression; and displaying the second text according to the voice data.
In the possible implementation manner, the user can also check the text corresponding to the current expression through voice, so that the interaction effect of man-machine interaction is improved.
In a possible embodiment, the method further comprises: inputting the second text to the speculative neural network to generate a third text; matching a third emotion encoding according to the third text; determining a second expression according to the third emotion encoding; the second expression is shown.
In the possible implementation manner, the expression displayed by the terminal device can be continuously changed, so that the viewing experience of the user is improved.
In a possible embodiment, the method further comprises: and displaying the emotion label corresponding to the first expression.
In the possible implementation manner, the user can directly determine the emotion of the current expression through the emotion tag, so that the user experience is improved.
In one possible implementation, the speculative neural network is generated based on sample text and emotion tag training corresponding to the sample text.
In the possible embodiments, the accuracy of the speculative neural network is improved.
A second aspect of the present application provides an expression generating apparatus, which may implement the method of the first aspect or any of the possible implementation manners of the first aspect. The apparatus comprises corresponding units or modules for performing the above-described methods. The units or modules included in the apparatus may be implemented in a software and/or hardware manner. The device may be, for example, a network device, a chip system, or a processor that supports the network device to implement the method, or a logic module or software that can implement all or part of the functions of the network device.
A third aspect of the present application provides a computer device comprising: a processor coupled to a memory for storing instructions that when executed by the processor cause the computer device to implement the method of the first aspect or any of the possible implementations of the first aspect. The computer device may be, for example, a network device, or a chip system supporting the network device to implement the above method.
A fourth aspect of the present application provides a computer readable storage medium having instructions stored therein which, when executed by a processor, implement a method as provided by the foregoing first aspect or any one of the possible implementation manners of the first aspect.
A fifth aspect of the present application provides a computer program product comprising computer program code for implementing the method of the first aspect or any one of the possible implementation manners of the first aspect, when the computer program code is executed on a computer.
Drawings
Fig. 1 is a schematic diagram of a system structure of virtual digital human interaction according to an embodiment of the present application;
fig. 2 is a schematic flow chart of an expression generating method according to an embodiment of the present application;
fig. 3 is a schematic diagram of an expression generating flow provided in an embodiment of the present application;
fig. 4 is a schematic structural diagram of a terminal device provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of an expression generating apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides an expression generating method and device which are used for increasing the complexity of digital human emotion display.
Embodiments of the present application will now be described with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some, but not all embodiments of the present application. As one of ordinary skill in the art can appreciate, with the development of technology and the appearance of new scenes, the technical solutions provided in the embodiments of the present application are applicable to similar technical problems.
The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present application. It will be understood by those skilled in the art that the present application may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits have not been described in detail as not to unnecessarily obscure the present application.
Referring to fig. 1, a schematic structural diagram of an artificial intelligence main body framework is shown in fig. 1, and the artificial intelligence main body framework is described below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where the "intelligent information chain" reflects a list of processes from the acquisition of data to the processing. For example, there may be general procedures of intelligent information awareness, intelligent information representation and formation, intelligent reasoning, intelligent decision making, intelligent execution and output. In this process, the data undergoes a "data-information-knowledge-wisdom" gel process. The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of personal intelligence, information (provisioning and processing technology implementation), to the industrial ecological process of the system.
(1) Infrastructure of
The infrastructure provides computing capability support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the base platform. Communicating with the outside through the sensor; the computing power is provided by a smart chip, such as a hardware acceleration chip, e.g., a central processing unit (central processing unit, CPU), a network processor (neural-network processing unit, NPU), a graphics processor (English: graphics processing unit, GPU), an application specific integrated circuit (application specific integrated circuit, ASIC), or a field programmable gate array (field programmable gate array, FPGA); the basic platform comprises a distributed computing framework, a network and other relevant platform guarantees and supports, and can comprise cloud storage, computing, interconnection and interworking networks and the like. For example, the sensor and external communication obtains data that is provided to a smart chip in a distributed computing system provided by the base platform for computation.
(2) Data
The data of the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data relate to graphics, images, voice and text, and also relate to the internet of things data of the traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.
(3) Data processing
Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.
Wherein machine learning and deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.
Reasoning refers to the process of simulating human intelligent reasoning modes in a computer or an intelligent system, and carrying out machine thinking and problem solving by using formal information according to a reasoning control strategy, and typical functions are searching and matching.
Decision making refers to the process of making decisions after intelligent information is inferred, and generally provides functions of classification, sequencing, prediction and the like.
(4) General capability
After the data has been processed, some general-purpose capabilities can be formed based on the result of the data processing, such as algorithms or a general-purpose system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.
(5) Intelligent product and industry application
The intelligent product and industry application refers to products and applications of an artificial intelligent system in various fields, is encapsulation of an artificial intelligent overall solution, and realizes land application by making intelligent information decisions, and the application fields mainly comprise: intelligent terminal, intelligent traffic, intelligent medical treatment, automatic driving, safe city, etc.
Embodiments of the present application relate to neural networks and related applications of data conversion (natural language processing, NLP), and for a better understanding of aspects of embodiments of the present application, related terms and concepts of the neural networks to which embodiments of the present application may relate are described below.
The virtual digital person has the following three aspects: firstly, the appearance of the owner has the characteristics of specific looks, sexes, characters and the like; secondly, the behavior of the owner has the ability of expressing the behavior by language, facial expression and limb actions; thirdly, the idea of the owner has the capability of identifying the external environment and exchanging interaction with the person. With the progress of the aggregate technologies such as computer graphics, deep learning, speech synthesis, brain-like science, etc., virtual digital people are gradually evolving into new species and new media, and more virtual digital people are being designed, manufactured and operated.
The action generating method provided by the embodiment of the application can be executed on a server and also can be executed on an artificial intelligence-based terminal device. The terminal device may be a mobile phone with an image processing function, a tablet personal computer (tablet personal computer, TPC), a media player, a smart television, a notebook computer (LC), a personal digital assistant (personal digital assistant, PDA), a personal computer (personal computer, PC), a camera, a video camera, a smart watch, a Wearable Device (WD), or an autonomous vehicle, which is not limited in this embodiment. The terminal device may be a device running various operating systems. For example, the terminal device may be a device running an android system, a device running an IOS system, or a device running a windows system.
Virtual digital people are widely applied to a plurality of fields such as entertainment, education, service, sales and the like; in the current development period of digital people, the digital people have good 'leather bags' from cartoon to super-realistic. However, in the field of digital man driving, the current driving strategy still relies heavily on manual or large amounts of data, such as voice/text-driven digital man technology, which is a popular technique in academia. With the expansion of digital human application, emotion accompaniment of digital human becomes an important ring, and digital human is also urgent to express from simple emotion to full-dimension natural expression.
The most common practice is to use the mode designed by the animator to express the expression according to rules. The whole process of digital human expression display is prefabricated in advance, so that the digital human expression is single.
In order to solve the above-mentioned problems, the embodiments of the present application provide an expression generating method, which is described below.
Referring to fig. 2, fig. 2 is a flowchart of an expression generating method according to an embodiment of the present application, where the method includes:
step 201, the terminal equipment acquires a first emotion code.
In this embodiment, the terminal device may locally obtain a first emotion code, where the first emotion code is used as an initial emotion of the virtual digital person, that is, an expression corresponding to the first emotion code may be used as an initial expression of the virtual digital person. The first emotion encoding may be a vector of values (e.g., a 1x256 dimension vector).
The first emotion code stored locally can be randomly generated by the terminal equipment, namely the terminal equipment randomly selects one emotion as the emotion of the virtual digital person, and then determines the emotion code corresponding to the emotion.
The locally stored first emotion encoding may also be determined by an input of the user, e.g. the terminal device may receive a message of the user, and then match the first emotion encoding according to the message, wherein the message may be a voice input or a text input of the user, which is not limited in the embodiment of the present application.
Step 202, the terminal equipment matches the first text according to the first emotion code.
In the embodiment of the application, each emotion code corresponds to one expression, each emotion code corresponds to one or more texts, the content of each text is related to the expression, and when the terminal equipment obtains the first emotion code, the first text can be determined according to the association relation.
Step 203, the terminal device inputs the first text into the speculative neural network to generate a second text.
In this embodiment, each expression has a name indicating the expression, for example, a smiling face corresponds to happiness, a crying face corresponds to pain, etc., and the first text corresponds to one emotion label, and if the first text is an initial text and the initial text does not show emotion, the emotion label may be randomly given. The terminal device may input the first text and the corresponding emotion tag into a speculative neural network, which may decode a word or phrase of the next text (second text) and the corresponding emotion tag.
The speculative neural network is generated according to sample text and emotion label training corresponding to the sample text. Specifically, when training the speculative neural network, the database construction module has a large amount of collected text (diary, narrative …) of psychological activities and aligned labels entered by a manual or text understanding algorithm, the text information is digitally encoded, and then a plurality of conventional network preprocessing operations are performed, and then emotion labels are combined to obtain text training data. The characters at the next moment can be estimated according to the characters obtained at the previous moment through a common neural network structure. The result of the output of the neural network is based on the new permutation and combination of the sentence structure, and the generated new content is not text in the reproduction database.
Step 204, the terminal equipment matches a second emotion encoding according to the second text.
In this embodiment, the terminal device may determine, according to an association relationship between a text and an emotion code, a second emotion code closest to the second text, where the association relationship between the text and the emotion code may be stored locally or may be obtained through networking, and this is not limited herein.
Text information can be coded in the form of characters, word frequency-inverse text frequency index (TF-IDF) and so on, speech information can also be coded in the form of cross-mode coding, data formats such as mel-p, mel-frequency cepstral coefficient (mel-frequency cepstral coefficient, MFCC) and so on, expression can be coded by using expression groups and coefficients thereof, and if grid representation is used, grid coding can be used. The values of these codes are all different. The cross-modal retriever may convert these different numbers of codes into the same number representation, which is a common implementation of cross-modal retrieval algorithms.
Illustratively, the text code is a Chinese character code [00,12,3,55 … ] which is output as a 1x218 vector through a trained network, the audio code is also infinitely close to the 1x218 vector through learning the audio code corresponding to the text, and the expression and emotion are the same. Thus, the data of expression, text, audio and emotion modes can be retrieved by using the 1x218 vector.
The database construction module in the embodiment of the application is a module for constructing a certain amount of text, voice and expression time alignment as training data. The most common way is to train one encoder and one decoder for all modes, and the decoder can restore the encoded content to the original content; and then the decoders are crossed so that the content output by the encoder of another mode can also decode the restored content. For example, expression codes can also be used for solving similar texts, and forming loss is used for supervising the consistency of the coding space.
Step 205, the terminal device determines a corresponding first expression according to the second emotion encoding.
In this embodiment, the terminal device may determine, according to the correspondence between emotion codes, a first expression corresponding to the second emotion code, where the first expression may be obtained locally, and the locally stored expression may include multiple expression groups, where each expression group corresponds to a local expression. For example, the locally stored expressions may include a plurality of expression groups corresponding to a plurality of common local expressions (the plurality of expressions can cover the expressions of the eyebrows, eyes, nose, mouth, chin, cheeks, and the like of the face). The plurality of local expressions may include some expressions common to faces, such as blinking, zhang Zuiba, frowning, lifting, etc. In addition, the above expression may further include an expression obtained by subdividing some expressions common to a human face, for example, the above multiple local expressions may include an expression such as an upward movement of the inner side of the left eyebrow, lifting of the lower eyelid of the right eye, and eversion of the upper lip, which is not limited herein. The first expression may also be obtained by the terminal device through networking matching, which is not limited herein.
Step 206, the terminal equipment displays the first expression.
In this embodiment, after obtaining the first expression, the terminal device may display the first expression on the display screen, so as to inform the expressive person of the current emotion. Illustratively, when the first expression is a smiling face, the current emotion of the digital person is expressed as happy, and when the first expression is a crying face, the current emotion of the digital person is expressed as wounded. In an example, the terminal device may further directly display the emotion, for example, an emotion tag representing the emotion is displayed on a display screen, and the specific display position may be any position around the digital person.
The expression of the digital person can also change gradually according to the lapse of time, namely, after the terminal equipment displays the first expression, the terminal equipment can further input the second text into the presumption neural network so as to presume the subsequent psychological activities of the digital person, namely, generate a third text, then determine a third emotion code corresponding to the third text according to the relation between the text and the emotion code, determine a second expression corresponding to the third emotion code according to the relation between the emotion code and the expression, display the second expression on the display screen, and correspondingly, subsequently input the third text and even the subsequently obtained text into the presumption neural network to presume a new expression for display. Specifically, the terminal device reads the last expression state or randomly generates a new expression state as an initial value (including content and emotion) of the psychoactive text. The text starts to write new subjects, predicates, objects, idioms and the like according to the diary and the sentence structure of the psychological uniqueness learned by a large amount of data from the coding network, and continuously generates the corresponding emotion. After the emotion labels and the text information enter a cross-modal retrieval network to generate codes, retrieving the expression coefficients by using the codes, and displaying the expression coefficients on a display screen to form expressions, namely the final IDLE (IDLE) expressions. Under the condition that the user does not directly communicate with the digital person, the expression change of the digital person can enable the user to think that the digital person is thinking when watching, and even the digital person can actively communicate with the user in the opposite direction, and the digital person is not limited in the specification.
In one example, the user may also view text corresponding to the current expression, e.g., the current expression is a first expression, and the user may view a second text corresponding to the first expression. Specifically, the user may ask the second text corresponding to the first expression through voice, for example, "what the digital person wants" or "what the digital person makes this expression", etc., which is not limited herein. After receiving the voice data, if the voice data is analyzed to be the current text, the terminal equipment can directly display the current text, such as displaying a second text.
Specifically, the process of generating the expression in the embodiment of the present application may refer to fig. 3, which is a schematic diagram of an expression generation flow provided in the embodiment of the present application, and as shown in fig. 3, for a first emotion code serving as an initial emotion, the first emotion code is input into a speculative neural network to generate a second text and an emotion tag, and the second text and the emotion tag obtain codes (such as the foregoing 1×128 vector) of a uniform modal space through cross-modal retrieval, and then match the corresponding first expression according to the codes.
According to the method and the device, after the first emotion codes are obtained through the terminal equipment, the first text of the pair can be matched, then the second text is generated through the speculative neural network, the second emotion is determined according to the corresponding relation between the text and the emotion codes, the first expression is determined according to the corresponding relation between the emotion and the expression, the first expression is displayed, the expression displayed by the terminal equipment is speculatively changed through the speculative neural network and is not fixedly set, and the complexity of digital human emotion display can be increased.
The structure of the terminal device according to the embodiment of the present application may refer to a schematic structural diagram of a terminal device shown in fig. 4, where the terminal device includes: the input module 401, the database establishing module 402, the psychological activity generating module 403, the digital human expression cross-modal retrieving module 404 and the output module 405 may also be executed by one module, which is not limited herein.
The input module 401: the method can receive various input modes such as emotion label input, voice text expression base coefficient input and the like, generate a first emotion code, can select no input, randomly generate the first emotion code, and can select whether interaction data is needed or not because of an IDLE expression, or can select to input emotion labels, voice, text data, expression base coefficient and other data (usually in a state of being interactively stored in a preamble) based on the characteristics of a subsequent module. The input module 401 may perform steps 201 and 202 in the method embodiment of fig. 2.
Database creation module 402: this section generates data primarily for the subsequent mental activity generation module 403 and the expression cross-modality retrieval module 404. The central lining activity generation module needs text training data and the corresponding emotion labels. The cross-modal retrieval module needs time domain aligned emotion labels, emotion base coefficients, texts and other data.
Psychological activity generation module 403: the module is responsible for continuously and self-coding the active text in the heart. The database module is provided with a large number of texts such as diaries, psychological monologies and the like, the texts are subjected to sentence component analysis by utilizing the NLU, and after automatic labeling, new text contents conforming to sentence structures are generated by utilizing a generation network which can generate new contents such as a self-coding network VAE and the like. The mental activity generation module 403 may perform step 203 in the embodiment of the method of fig. 2.
Digital human expression cross-modal retrieval module 404: the module learns the common digital person related modal codes such as the aligned expressions, texts, voices, emotion codes and the like under the same mode by using a machine learning algorithm. I.e. different forms of data can obtain the same encoded representation in the form of an n x m matrix or 1x m vector. The data form is from different forms of data, but expresses the same object, and can be used for searching databases of different modalities to obtain the best matching result. The digital human expression cross-modality retrieval module 404 may perform steps 204 and 205 in the method embodiment of fig. 2.
An output module 405: the module refers to the output of the system of the present invention: digital human expressions conforming to psychological activities (4 expressions are shown in the figure by way of example), emotion labels (not shown in the figure) corresponding thereto, and brand-new psychological activity text information corresponding thereto, for example, when the user's voice requires psychological activity text feedback ("what the digital person is thinking of" or what the digital person makes this expression "), the output module 405 can output the psychological activity text by voice, or directly present the psychological activity text: "today is happy, the subject passes, and the city lives for many years, and the vehicle management station does not know where, so the old and the early can simply make breakfast and take … …". The output module 405 may or may not display the expression and the psychoactive text simultaneously, which is not limited herein. The output module 405 may perform step 206 in the method embodiment of fig. 2.
For a scenario in which the above-described functional modules are executed by only one unit, reference may be made to a schematic structural diagram of an expression generating apparatus shown in fig. 5, the apparatus 50 including:
the processing unit 501 is configured to obtain a first emotion code, match a first text according to the first emotion code, input the first text into the speculative neural network to generate a second text, match a second emotion code according to the second text, determine a corresponding first expression according to the second emotion code, and display the first expression.
The processing unit 501 is configured to execute steps 201 to 206 in the method embodiment of fig. 2.
Optionally, the first emotion encoding is randomly generated.
Optionally, the apparatus 50 further includes a transceiver unit 502, where the transceiver unit 502 is specifically configured to: receiving a message of a user;
the processing unit 501 is further configured to: a first emotion encoding is determined from the message.
Optionally: the transceiver 502 is further configured to receive voice data of a user, where the voice data is used to request a text corresponding to the first expression; the processing unit 501 is further configured to: and displaying the second text according to the voice data.
Optionally, the processing unit 501 is further configured to: inputting the second text to the speculative neural network to generate a third text; matching a third emotion encoding according to the third text; determining a second expression according to the third emotion encoding; the second expression is shown.
Optionally, the processing unit 501 is further configured to: and displaying the emotion label corresponding to the first expression.
Optionally, the speculative neural network is generated according to sample text and emotion label training corresponding to the sample text.
Fig. 6 is a schematic diagram of a possible logic structure of a computer device 60 according to an embodiment of the present application. The computer device 60 includes: processor 601, communication interface 602, storage system 603, and bus 604. The processor 601, the communication interface 602, and the storage system 603 are connected to each other through a bus 604. In the embodiment of the present application, the processor 601 is configured to control and manage the actions of the computer device 60, for example, the processor 601 is configured to perform the steps performed by the terminal device in the method embodiment of fig. 2. The communication interface 602 is used to support communication by the computer device 60. A storage system 603 for storing program code and data for the computer device 60.
The processor 601 may be a central processor unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. Processor 601 may also be a combination that performs computing functions, such as including one or more microprocessors, digital signal processors and microprocessors, and the like. Bus 604 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 6, but not only one bus or one type of bus.
The transceiving unit 502 in the arrangement 50 corresponds to the communication interface 602 in the computer device 60 and the processing unit 501 in the arrangement 50 corresponds to the processor 601 in the computer device 60.
The computer device 60 of the present embodiment may correspond to the terminal device in the embodiment of the method of fig. 2, and the communication interface 602 in the computer device 60 may implement the functions and/or the steps implemented by the terminal device in the embodiment of the method of fig. 2, which are not described herein for brevity.
It should be understood that the division of the units in the above apparatus is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated when actually implemented. And the units in the device can be all realized in the form of software calls through the processing element; or can be realized in hardware; it is also possible that part of the units are implemented in the form of software, which is called by the processing element, and part of the units are implemented in the form of hardware. For example, each unit may be a processing element that is set up separately, may be implemented as integrated in a certain chip of the apparatus, or may be stored in a memory in the form of a program, and the functions of the unit may be called and executed by a certain processing element of the apparatus. Furthermore, all or part of these units may be integrated together or may be implemented independently. The processing element described herein may in turn be a processor, which may be an integrated circuit with signal processing capabilities. In implementation, each step of the above method or each unit above may be implemented by an integrated logic circuit of hardware in a processor element or in the form of software called by a processing element.
In one example, the unit in any of the above apparatuses may be one or more integrated circuits configured to implement the above methods, for example: one or more specific integrated circuits (application specific integrated circuit, ASIC), or one or more microprocessors (digital singnal processor, DSP), or one or more field programmable gate arrays (field programmable gate array, FPGA), or a combination of at least two of these integrated circuit forms. For another example, when the units in the apparatus may be implemented in the form of a scheduler of processing elements, the processing elements may be general-purpose processors, such as a central processing unit (central processing unit, CPU) or other processor that may invoke the program. For another example, the units may be integrated together and implemented in the form of a system-on-a-chip (SOC).
In another embodiment of the present application, there is further provided a computer readable storage medium, where computer executable instructions are stored, and when executed by a processor of a device, the device performs a method performed by the terminal device in the above method embodiment.
In another embodiment of the present application, there is also provided a computer program product comprising computer-executable instructions stored in a computer-readable storage medium. When the processor of the device executes the computer-executable instructions, the device performs the method performed by the terminal device in the method embodiment described above.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM, random access memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Claims (17)

1. An expression generating method, comprising:
acquiring a first emotion code;
matching a first text according to the first emotion encoding;
inputting the first text into a speculative neural network to generate a second text;
matching a second emotion encoding according to the second text;
determining a corresponding first expression according to the second emotion code;
and displaying the first expression.
2. The method of claim 1, wherein the first emotion encoding is randomly generated.
3. The method of claim 1, wherein the obtaining a first emotion encoding comprises:
receiving a message of a user;
and determining the first emotion encoding according to the message.
4. A method according to any one of claims 1-3, wherein the method further comprises:
receiving voice data of a user, wherein the voice data is used for requesting a text corresponding to the first expression;
and displaying the second text according to the voice data.
5. The method according to any one of claims 1-4, further comprising:
inputting the second text to the speculative neural network to generate a third text;
matching a third emotion encoding according to the third text;
determining a second expression according to the third emotion encoding;
the second expression is displayed.
6. The method according to any one of claims 1-5, further comprising:
and displaying the emotion label corresponding to the first expression.
7. The method of any of claims 1-6, wherein the speculative neural network is generated from sample text and emotion tag training corresponding to the sample text.
8. An expression generating apparatus, comprising:
the processing unit is used for acquiring a first emotion code, matching the first emotion code with a first text, inputting the first text into a speculative neural network to generate a second text, matching the second emotion code with the second text, determining a corresponding first expression according to the second emotion code, and displaying the first expression.
9. The apparatus of claim 8, wherein the first emotion encoding is randomly generated.
10. The apparatus according to claim 8, further comprising a transceiver unit, the transceiver unit being specifically configured to:
receiving a message of a user;
the processing unit is further configured to determine the first emotion encoding according to the message.
11. The apparatus according to any of the claims 8-10, further comprising a transceiver unit, in particular for:
receiving voice data of a user, wherein the voice data is used for requesting a text corresponding to the first expression;
the processing unit is further configured to display the second text according to the voice data.
12. The apparatus according to any one of claims 8-11, wherein the processing unit is further configured to:
inputting the second text to the speculative neural network to generate a third text;
matching a third emotion encoding according to the third text;
determining a second expression according to the third emotion encoding;
the second expression is displayed.
13. The apparatus according to any one of claims 8-12, wherein the processing unit is further configured to:
and displaying the emotion label corresponding to the first expression.
14. The apparatus of any of claims 8-13, wherein the speculative neural network is generated from sample text and emotion tag training corresponding to the sample text.
15. A computer device, comprising: a processor and a memory, wherein the processor is configured to,
the processor is configured to execute instructions stored in the memory to cause the computer device to perform the method of any one of claims 1 to 7.
16. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when run on the computer, causes the computer to perform the method according to any of claims 1 to 7.
17. A computer program product, characterized in that the computer performs the method according to any of claims 1 to 7 when the computer program product is executed on a computer.
CN202210998977.1A 2022-08-19 2022-08-19 Expression generating method and device Pending CN117648411A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210998977.1A CN117648411A (en) 2022-08-19 2022-08-19 Expression generating method and device
PCT/CN2023/103053 WO2024037196A1 (en) 2022-08-19 2023-06-28 Communication method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210998977.1A CN117648411A (en) 2022-08-19 2022-08-19 Expression generating method and device

Publications (1)

Publication Number Publication Date
CN117648411A true CN117648411A (en) 2024-03-05

Family

ID=89940589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210998977.1A Pending CN117648411A (en) 2022-08-19 2022-08-19 Expression generating method and device

Country Status (2)

Country Link
CN (1) CN117648411A (en)
WO (1) WO2024037196A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10628985B2 (en) * 2017-12-01 2020-04-21 Affectiva, Inc. Avatar image animation using translation vectors
JP7351745B2 (en) * 2016-11-10 2023-09-27 ワーナー・ブラザース・エンターテイメント・インコーポレイテッド Social robot with environmental control function
WO2019144542A1 (en) * 2018-01-26 2019-08-01 Institute Of Software Chinese Academy Of Sciences Affective interaction systems, devices, and methods based on affective computing user interface
CN112330780A (en) * 2020-11-04 2021-02-05 北京慧夜科技有限公司 Method and system for generating animation expression of target character
CN114357135B (en) * 2021-12-31 2024-11-01 科大讯飞股份有限公司 Interaction method, interaction device, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2024037196A1 (en) 2024-02-22

Similar Documents

Publication Publication Date Title
KR102130750B1 (en) Method for providing bigdata and artificial intelligence based counseling and psychological service using bidirectional virtual reality contents
CN107728780B (en) Human-computer interaction method and device based on virtual robot
JP2021514514A (en) Affective computing Sensitive interaction systems, devices and methods based on user interfaces
CN108942919B (en) Interaction method and system based on virtual human
CN114298121B (en) Multi-mode-based text generation method, model training method and device
CN111414506B (en) Emotion processing method and device based on artificial intelligence, electronic equipment and storage medium
CN111104512A (en) Game comment processing method and related equipment
Zhang Voice keyword retrieval method using attention mechanism and multimodal information fusion
CN111967334A (en) Human body intention identification method, system and storage medium
CN114724224A (en) Multi-mode emotion recognition method for medical care robot
CN115497510A (en) Speech emotion recognition method and device, electronic equipment and storage medium
CN111062207B (en) Expression image processing method and device, computer storage medium and electronic equipment
CN117540703A (en) Text generation method, model training method, device and electronic equipment
CN117648411A (en) Expression generating method and device
CN116580691A (en) Speech synthesis method, speech synthesis device, electronic device, and storage medium
CN116543798A (en) Emotion recognition method and device based on multiple classifiers, electronic equipment and medium
CN116740237A (en) Bad behavior filtering method, electronic equipment and computer readable storage medium
CN116721449A (en) Training method of video recognition model, video recognition method, device and equipment
CN115171673A (en) Role portrait based communication auxiliary method and device and storage medium
CN115858816A (en) Construction method and system of intelligent agent cognitive map for public security field
CN117034133A (en) Data processing method, device, equipment and medium
CN114519999A (en) Speech recognition method, device, equipment and storage medium based on bimodal model
CN115862794A (en) Medical record text generation method and device, computer equipment and storage medium
CN118155214B (en) Prompt learning method, image classification method and related devices
US20230077446A1 (en) Smart seamless sign language conversation device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication