CN118800274A - Automatic AI digital human expression generating system based on voice drive - Google Patents
Automatic AI digital human expression generating system based on voice drive Download PDFInfo
- Publication number
- CN118800274A CN118800274A CN202410973676.2A CN202410973676A CN118800274A CN 118800274 A CN118800274 A CN 118800274A CN 202410973676 A CN202410973676 A CN 202410973676A CN 118800274 A CN118800274 A CN 118800274A
- Authority
- CN
- China
- Prior art keywords
- voice
- expression
- model
- emotion
- dimensional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000014509 gene expression Effects 0.000 title claims abstract description 81
- 230000008451 emotion Effects 0.000 claims abstract description 52
- 238000012549 training Methods 0.000 claims abstract description 24
- 230000008921 facial expression Effects 0.000 claims abstract description 20
- 230000001815 facial effect Effects 0.000 claims abstract description 11
- 238000000605 extraction Methods 0.000 claims abstract description 10
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 9
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 9
- 238000000034 method Methods 0.000 claims description 32
- 239000013598 vector Substances 0.000 claims description 15
- 238000013179 statistical model Methods 0.000 claims description 13
- 230000008447 perception Effects 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 12
- 238000004422 calculation algorithm Methods 0.000 claims description 10
- 238000005516 engineering process Methods 0.000 claims description 10
- 238000004458 analytical method Methods 0.000 claims description 6
- 238000000513 principal component analysis Methods 0.000 claims description 6
- 230000033764 rhythmic process Effects 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 5
- 238000010801 machine learning Methods 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000002349 favourable effect Effects 0.000 abstract description 2
- 238000001514 detection method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 206010063659 Aversion Diseases 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
Landscapes
- Processing Or Creating Images (AREA)
Abstract
The invention discloses an AI digital human automatic expression generating system based on voice drive, which comprises a voice expression generating module, a human face expression database, an expression feature extracting module and a human face three-dimensional reconstructing module; the voice synthesis parameters are controlled by manually making rules, the voice with the specific expression is generated, a large number of voice data training models with emotion labels are utilized to learn the relation between the voice and emotion, and the voice with rich expression is generated on a new text, so that a digital person can drive more smoothly under the cooperation of a facial expression database, an expression feature extraction module and a facial three-dimensional reconstruction module according to the voice with the specific expression, the calculated amount is reduced, and the digital person can be more finely expressed when the digital person expression is driven, and the digital person is favorable for use.
Description
Technical Field
The invention relates to the technical field of digital human expression generation, in particular to an AI digital human automatic expression generation system based on voice driving.
Background
With the development of society, the AI industry has also developed, including digital people. The digital person is a digital person which is created by real people or subjectively, generates and converts three-dimensional or two-dimensional image data through a computer technology, stores and applies the image data in a form of a computer code, and partially or fully completes human behaviors such as information transmission, emotion expression, interaction with other people, problem solving and the like according to the accessed AI algorithm, knowledge graph, driving system and other capabilities or different systems.
However, the current digital person cannot smoothly generate the automatic expression under the driving of the voice with the specific expression, so that the calculated amount is large, and the digital person expression cannot be driven to be displayed more finely, therefore, an AI digital person automatic expression generation system based on voice driving is provided to solve the problems.
Disclosure of Invention
The invention aims to provide an AI digital human automatic expression generating system based on voice driving so as to solve the problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions:
An AI digital person automatic expression generating system based on voice drive comprises a voice expression generating module, a face expression database, an expression feature extracting module and a face three-dimensional reconstructing module; the voice expression in the voice expression generation module expresses emotion or attitude by utilizing a plurality of acoustic characteristics of tone, speech speed and rhythm through a voice synthesis technology; the facial expression database classifies the collected facial expressions according to the transmission of the expression semantics by acquiring a large number of facial expressions, so that a set of each expression is acquired; the expression feature extraction module comprises an active shape model method and an active appearance model method for extracting facial features; the face three-dimensional reconstruction module reconstructs three-dimensional information of the model according to images of a single view angle or images of a plurality of view angles.
As a further scheme of the invention: the specific method for generating the voice expression in the voice expression generating module is to manually make rules to control voice synthesis parameters and generate voice with specific expression, wherein the rules are designed by considering emotion and semantic factors; and training a model by utilizing a large amount of voice data with emotion labels, learning the relation between the voice and emotion, and generating the voice with rich expression on a new text.
As a further scheme of the invention: the voice expression generating module comprises an emotion feature extracting unit and an emotion perception model unit; the emotion feature extraction unit extracts rhythm, intonation and acoustic parameters and emotion related features in the voice by utilizing a voice signal analysis technology, models and classifies the extracted features by utilizing a statistical model and a machine learning algorithm, identifies different emotion types, combines multiple text information analyses of vocabulary and semantics, and enhances emotion identification accuracy; the emotion perception model unit explores a multi-mode emotion perception model, integrates various information of audio, text and vision, comprehensively perceives the emotion states of speakers, adopts a deep neural network and a variety automatic encoder to construct an emotion perception model by adopting various advanced machine learning methods, improves recognition accuracy and generalization capability, and improves adaptability of the model to different speakers, emotion contexts and noise backgrounds by training and optimizing model parameters.
As a further scheme of the invention: the realization process of the active shape model method comprises training and searching, wherein n face image samples are manually marked in the training process, 68 feature points are adopted for each face image to fit a face shape model, and the positions of the feature points are used for forming shape vectors of the image; normalizing or aligning the training sets of n shape vectors by using a method for solving a change matrix, and eliminating the influence caused by a plurality of external factors such as gesture transformation, different angles and distance in a face image; and then PCA principal component analysis is carried out on the aligned shape vectors in the training set, any shape vector used for training is determined by adopting an average shape vector and parameters obtained by principal component analysis, local characteristics of each characteristic point are established, and a new position is searched for each characteristic point in each iteration process.
As a further scheme of the invention: the searching comprises a local texture model and a global statistical model, wherein the local searching and the global constraint are respectively realized, and when certain characteristic points fall into local extremum or have larger deviation in the local searching, the global statistical model can adjust the situation.
As a further scheme of the invention: the active appearance model method comprehensively analyzes shape information and texture information of a human face, establishes a hybrid model and is divided into modeling and feature matching; the modeling means to build a hybrid model with shape information and texture information; the feature matching means that an energy function is represented through a mixed model and a mean square error of an input image, model parameters are updated through algorithm calculation, new feature point positions are generated, and the processes are iterated repeatedly to obtain final feature point positions.
As a further scheme of the invention: the three-dimensional face reconstruction module comprises a three-dimensional face reconstruction unit based on multi-view information, a three-dimensional face reconstruction unit based on a deformation statistical model and a three-dimensional face reconstruction unit based on a light and shade restoration shape.
As a further scheme of the invention: the three-dimensional face reconstruction unit based on multi-view information comprises the steps of firstly, recovering and utilizing a computer vision technology to estimate camera parameters of one shot face image at a camera view angle, and recovering three-dimensional coordinates of facial feature points of an input face object; then calculating the three-dimensional coordinates of the residual points by the three-dimensional coordinates of the above estimated characteristic points and using an interpolation algorithm in the scattered point interpolation stage; finally, in the shape repositioning stage, the accuracy of shape fitting is improved by defining the additional corresponding relation between the facial feature points and the image coordinates under the condition of keeping the camera view angle fixed.
As a further scheme of the invention: the three-dimensional face reconstruction unit based on the deformation model comprises the steps of matching and combining the face image with the model after the deformation model gives a new face image, modifying corresponding parameters of the model, deforming the model until the difference between the model and the face image is minimized, and optimizing and adjusting textures at the same time, so that face modeling can be completed.
As still further aspects of the invention: in the three-dimensional face reconstruction unit based on the light and shade restoration shape, the light and shade restoration shape restores various parameter values of the relative height, the surface normal direction, the surface gradient and the gradient of each point of the surface of the object by utilizing the light and shade change of the surface of the object in a single image or a plurality of images, so that an object model is reconstructed.
Compared with the prior art, the invention has the beneficial effects that:
Through setting up the pronunciation expression generation module, control the pronunciation synthesis parameter through manual rule formulation, generate the pronunciation with specific expression, utilize a large amount of pronunciation data training models with emotion label, learn the relation between pronunciation and the emotion, and generate the pronunciation that the expression is abundant on new text, make the digit people can drive more smooth under the cooperation of face expression database, expression feature extraction module, face three-dimensional reconstruction module according to the pronunciation with specific expression, reduced calculated amount, can show more minutely when driving the digital human expression, be favorable to using.
Drawings
Fig. 1 is a schematic structural diagram of an AI digital human automatic expression generating system based on voice driving in the present invention.
Fig. 2 is a schematic structural diagram of a speech expression generating module in the present invention.
Fig. 3 is a schematic structural diagram of a three-dimensional face reconstruction module according to the present invention.
Detailed Description
In one embodiment, as shown in fig. 1-3, an AI digital human automatic expression generating system based on voice driving comprises a voice expression generating module, a human facial expression database, an expression feature extracting module and a human face three-dimensional reconstructing module; the voice expression in the voice expression generation module expresses emotion or attitude by utilizing a plurality of acoustic characteristics of tone, speech speed and rhythm through a voice synthesis technology; the facial expression database classifies the collected facial expressions according to the transmission of the expression semantics by acquiring a large number of facial expressions, so that a set of each expression is acquired; the expression feature extraction module comprises an active shape model method and an active appearance model method for extracting facial features; the face three-dimensional reconstruction module reconstructs three-dimensional information of a model according to images of a single view or images of a plurality of views;
The specific method for generating the voice expression in the voice expression generating module is to manually make rules to control voice synthesis parameters and generate voice with specific expression, wherein the rules are designed by considering various factors of emotion and semantics; training a model by utilizing a large amount of voice data with emotion labels, learning the relation between voice and emotion, and generating voice with rich expression on a new text;
The voice expression generating module comprises an emotion feature extracting unit and an emotion perception model unit; the emotion feature extraction unit extracts rhythm, intonation and acoustic parameters and emotion related features in the voice by utilizing a voice signal analysis technology, models and classifies the extracted features by utilizing a statistical model and a machine learning algorithm, identifies different emotion types, combines multiple text information analyses of vocabulary and semantics, and enhances emotion identification accuracy; the emotion perception model unit explores a multi-mode emotion perception model, integrates various information of audio, text and vision, comprehensively perceives the emotion state of a speaker, adopts a deep neural network and a variety automatic encoder to construct an emotion perception model by adopting various advanced machine learning methods, improves the recognition accuracy and generalization capability, and improves the adaptability of the model to different speakers, emotion contexts and noise backgrounds by training and optimizing model parameters;
4824 pictures of 67 subjects are contained in the facial expression database, each subject is staring ahead, left and right respectively and has 8 facial expressions, and eight expressions are respectively: anger, contempt, aversion, fear, happiness, sadness, surprise and neutrality;
Each picture is provided with 40 attribute labels, including smiling expressions and non-smiling expressions related to facial expressions; the smile expression and non-smile expression classification method comprises the following steps: face key point detection, face correction and face interception;
The key point detection of the face is to detect key points of the face expression image by a detection tool to obtain 68 face key points;
Face correction is to correct a face by using an affine transformation method, and the coordinates of the 37 th and 46 th points (two points of the corner of the eye) are connected through line segments according to the coordinates of the face key points obtained by face key point detection, and then affine transformation is carried out on the line segments to correct the inclined face;
the face is intercepted according to the coordinate positions of the leftmost, rightmost, uppermost and bottommost points in 68 key points, a square is framed according to a certain proportion to intercept the face, and finally the size of the intercepted face is 256 multiplied by 256.
The realization process of the active shape model method comprises training and searching, wherein n face image samples are manually marked in the training process, 68 feature points are adopted for each face image to fit a face shape model, and the positions of the feature points are used for forming shape vectors of the image; normalizing or aligning the training sets of n shape vectors by using a method for solving a change matrix, and eliminating the influence caused by a plurality of external factors such as gesture transformation, different angles and distance in a face image; then PCA principal component analysis is carried out on the aligned shape vectors in the training set, any shape vector used for training is determined by adopting an average shape vector and parameters obtained by principal component analysis, local characteristics of each characteristic point are established, and a new position is searched for each characteristic point in each iteration process;
The searching comprises a local texture model and a global statistical model, local searching and global constraint are respectively realized, and when certain characteristic points of the local searching fall into local extremum or have larger deviation, the global statistical model can adjust the situation;
the active appearance model method comprehensively analyzes the shape information and the texture information of the human face, establishes a hybrid model and is divided into modeling and feature matching; the modeling means to build a hybrid model with shape information and texture information; the feature matching means that an energy function is represented through a mixed model and a mean square error of an input image, model parameters are updated through algorithm calculation, new feature point positions are generated, and the processes are iterated repeatedly to obtain final feature point positions;
The face three-dimensional reconstruction module comprises a three-dimensional face reconstruction unit based on multi-view information, a three-dimensional face reconstruction unit based on a deformation statistical model and a three-dimensional face reconstruction unit based on a light and shade restoration shape;
The three-dimensional face reconstruction unit based on multi-view information comprises the steps of firstly, estimating camera parameters (position, direction and focal length) of a face image which is not shot by utilizing a computer vision technology at a camera view, and simultaneously recovering three-dimensional coordinates of facial feature points of an input face object; then calculating the three-dimensional coordinates of the residual points by the three-dimensional coordinates of the above estimated characteristic points and using an interpolation algorithm in the scattered point interpolation stage; finally, in the shape repositioning stage, the accuracy of shape fitting is improved by defining the additional corresponding relation between the facial feature points and the image coordinates under the condition of keeping the camera view angle fixed;
The three-dimensional face reconstruction unit based on the deformation model comprises the steps of after the deformation model gives out a new face image, matching and combining the face image with the model, modifying corresponding parameters of the model, deforming the model until the difference between the model and the face image is minimized, and optimizing and adjusting textures to finish face modeling;
the deformation statistical model is a parameterized face deformation model constructed by utilizing data in a face database, and parameters in the model are controlled to generate any expected face shape;
In the three-dimensional face reconstruction unit based on the light and shade restoration shape, the light and shade restoration shape restores various parameter values of the relative height, the surface normal direction, the surface gradient and the gradient of each point of the surface of the object by utilizing the light and shade change of the surface of the object in a single image or a plurality of images, thereby reconstructing the object model.
According to the invention, the voice synthesis parameters are controlled by manually making rules, the voice with the specific expression is generated, the relation between the voice and the emotion is learned by utilizing a large number of voice data training models with emotion labels, and the voice with rich expression is generated on a new text, so that a digital person can drive more smoothly under the cooperation of a facial expression database, an expression feature extraction module and a facial three-dimensional reconstruction module according to the voice with the specific expression, the calculated amount is reduced, and the digital person can be more finely shown when the digital person expression is driven, thereby being beneficial to use.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.
Claims (10)
1. The AI digital human automatic expression generating system based on voice driving is characterized by comprising a voice expression generating module, a human face expression database, an expression feature extracting module and a human face three-dimensional reconstructing module; the voice expression in the voice expression generation module expresses emotion or attitude by utilizing a plurality of acoustic characteristics of tone, speech speed and rhythm through a voice synthesis technology; the facial expression database classifies the collected facial expressions according to the transmission of the expression semantics by acquiring a large number of facial expressions, so that a set of each expression is acquired; the expression feature extraction module comprises an active shape model method and an active appearance model method for extracting facial features; the face three-dimensional reconstruction module reconstructs three-dimensional information of the model according to images of a single view angle or images of a plurality of view angles.
2. The system for automatically generating the AI digital human expression based on the voice driving of claim 1, wherein the specific method for generating the voice expression in the voice expression generating module is to control voice synthesis parameters by manually making rules, and the rules are designed to consider emotion and semantic factors; and training a model by utilizing a large amount of voice data with emotion labels, learning the relation between the voice and emotion, and generating the voice with rich expression on a new text.
3. The AI digital human automatic expression generating system based on voice driving according to claim 2, wherein the voice expression generating module comprises an emotion feature extraction unit and an emotion perception model unit; the emotion feature extraction unit extracts rhythm, intonation and acoustic parameters and emotion related features in the voice by utilizing a voice signal analysis technology, models and classifies the extracted features by utilizing a statistical model and a machine learning algorithm, identifies different emotion types, combines multiple text information analyses of vocabulary and semantics, and enhances emotion identification accuracy; the emotion perception model unit explores a multi-mode emotion perception model, integrates various information of audio, text and vision, comprehensively perceives the emotion state of a speaker, adopts a deep neural network and a variety automatic encoder to construct the emotion perception model, improves identification accuracy and generalization capability, and improves adaptability of the model to different speakers, emotion contexts and noise backgrounds by training and optimizing model parameters.
4. The voice-driven AI digital human automatic expression generating system according to claim 1, wherein the implementation process of the active shape model method comprises training and searching, wherein n human face image samples are manually marked in the training process, 68 feature points are adopted for each human face image to fit a face shape model, and the positions of the feature points are used for forming a shape vector of the image; normalizing or aligning the training sets of n shape vectors by using a method for solving a change matrix, and eliminating the influence caused by a plurality of external factors such as gesture transformation, different angles and distance in a face image; and then PCA principal component analysis is carried out on the aligned shape vectors in the training set, any shape vector used for training is determined by adopting an average shape vector and parameters obtained by principal component analysis, local characteristics of each characteristic point are established, and a new position is searched for each characteristic point in each iteration process.
5. The voice-driven AI digital human automatic expression generating system of claim 4 wherein the searching includes local texture models and global statistical models to implement local searching and global constraints, respectively, the local searching being adapted by the global statistical model when certain feature points fall into local extrema or large deviations occur.
6. The voice-driven AI digital human expression generating system according to claim 1, wherein the active appearance model method comprehensively analyzes shape information and texture information of a human face, establishes a hybrid model, and is divided into modeling and feature matching; the modeling means to build a hybrid model with shape information and texture information; the feature matching means that an energy function is represented through a mixed model and a mean square error of an input image, model parameters are updated through algorithm calculation, new feature point positions are generated, and the processes are iterated repeatedly to obtain final feature point positions.
7. The voice-driven AI digital human automatic expression generating system according to claim 1, wherein the human face three-dimensional reconstruction module comprises a three-dimensional human face reconstruction unit based on multi-view information, a three-dimensional human face reconstruction unit based on a deformation statistical model and a three-dimensional human face reconstruction unit based on a bright-dark restoration shape.
8. The AI digital human automatic expression generating system based on voice driving of claim 7, wherein the step of the three-dimensional human face reconstructing unit based on multi-view information includes estimating camera parameters of one shot human face image by computer vision technique at camera view angle, the camera parameters including position, direction and focal length, and recovering three-dimensional coordinates of facial feature points of an input human face object; then calculating the three-dimensional coordinates of the residual points by the three-dimensional coordinates of the above estimated characteristic points and using an interpolation algorithm in the scattered point interpolation stage; finally, in the shape repositioning stage, the accuracy of shape fitting is improved by defining the additional corresponding relation between the facial feature points and the image coordinates under the condition of keeping the camera view angle fixed.
9. The system of claim 7, wherein the three-dimensional face reconstruction unit based on the deformation model comprises matching and combining the face image with the model after the deformation model gives a new face image, modifying parameters corresponding to the model, deforming the model until the difference between the model and the face image is minimized, and optimizing and adjusting textures to complete face modeling.
10. The system according to claim 7, wherein the three-dimensional face reconstruction unit based on a shading-recovering shape is configured to recover a plurality of parameter values of relative height, surface normal direction, surface gradient and inclination of each point on the surface of the object by using shading changes of the surface of the object in the single image or the plurality of images, thereby reconstructing the object model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410973676.2A CN118800274A (en) | 2024-07-19 | 2024-07-19 | Automatic AI digital human expression generating system based on voice drive |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410973676.2A CN118800274A (en) | 2024-07-19 | 2024-07-19 | Automatic AI digital human expression generating system based on voice drive |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118800274A true CN118800274A (en) | 2024-10-18 |
Family
ID=93034809
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410973676.2A Pending CN118800274A (en) | 2024-07-19 | 2024-07-19 | Automatic AI digital human expression generating system based on voice drive |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118800274A (en) |
-
2024
- 2024-07-19 CN CN202410973676.2A patent/CN118800274A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12039454B2 (en) | Microexpression-based image recognition method and apparatus, and related device | |
CN109522818B (en) | Expression recognition method and device, terminal equipment and storage medium | |
CN112800903B (en) | Dynamic expression recognition method and system based on space-time diagram convolutional neural network | |
CN109472198B (en) | Gesture robust video smiling face recognition method | |
CN111160264B (en) | Cartoon character identity recognition method based on generation countermeasure network | |
Hong et al. | Real-time speech-driven face animation with expressions using neural networks | |
CN103218842B (en) | A kind of voice synchronous drives the method for the three-dimensional face shape of the mouth as one speaks and facial pose animation | |
CN109948475B (en) | Human body action recognition method based on skeleton features and deep learning | |
Murtaza et al. | Analysis of face recognition under varying facial expression: a survey. | |
WO2006034256A9 (en) | System, method, and apparatus for generating a three-dimensional representation from one or more two-dimensional images | |
CN111028319B (en) | Three-dimensional non-photorealistic expression generation method based on facial motion unit | |
CN111680550B (en) | Emotion information identification method and device, storage medium and computer equipment | |
CN111563417A (en) | Pyramid structure convolutional neural network-based facial expression recognition method | |
CN113255457A (en) | Animation character facial expression generation method and system based on facial expression recognition | |
CN110110603A (en) | A kind of multi-modal labiomaney method based on facial physiologic information | |
CN116721190A (en) | Voice-driven three-dimensional face animation generation method | |
CN113076916B (en) | Dynamic facial expression recognition method and system based on geometric feature weighted fusion | |
CN114758399A (en) | Expression control method, device, equipment and storage medium of bionic robot | |
Zeng et al. | Video‐driven state‐aware facial animation | |
Kumar et al. | Facial emotion recognition and detection using cnn | |
CN113158828A (en) | Facial emotion calibration method and system based on deep learning | |
CN117689887A (en) | Workpiece grabbing method, device, equipment and storage medium based on point cloud segmentation | |
CN118800274A (en) | Automatic AI digital human expression generating system based on voice drive | |
CN116206024A (en) | Video-based virtual human model driving method, device, equipment and storage medium | |
CN114783049A (en) | Spoken language learning method and system based on deep neural network visual recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination |