CN116614665A - Video interactive play system for interacting with personas in video - Google Patents
Video interactive play system for interacting with personas in video Download PDFInfo
- Publication number
- CN116614665A CN116614665A CN202310639004.3A CN202310639004A CN116614665A CN 116614665 A CN116614665 A CN 116614665A CN 202310639004 A CN202310639004 A CN 202310639004A CN 116614665 A CN116614665 A CN 116614665A
- Authority
- CN
- China
- Prior art keywords
- video
- instruction information
- user
- module
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 38
- 230000003993 interaction Effects 0.000 claims abstract description 47
- 230000004044 response Effects 0.000 claims abstract description 29
- 238000000034 method Methods 0.000 claims abstract description 16
- 230000008569 process Effects 0.000 claims abstract description 13
- 230000009471 action Effects 0.000 claims abstract description 12
- 230000001815 facial effect Effects 0.000 claims description 14
- 230000008439 repair process Effects 0.000 claims description 13
- 239000011159 matrix material Substances 0.000 claims description 12
- 230000000694 effects Effects 0.000 claims description 11
- 230000006399 behavior Effects 0.000 claims description 10
- 239000013598 vector Substances 0.000 claims description 10
- 230000005236 sound signal Effects 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000007547 defect Effects 0.000 claims description 4
- 238000009432 framing Methods 0.000 claims description 4
- 230000006835 compression Effects 0.000 claims description 3
- 238000007906 compression Methods 0.000 claims description 3
- 239000008186 active pharmaceutical agent Substances 0.000 claims 1
- 230000009286 beneficial effect Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/8146—Monomedia components thereof involving graphical data, e.g. 3D object, 2D graphics
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computer Graphics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a video interactive playing system for interacting with a persona in a video, which is used for carrying out video interactive playing through a user interface module, generating matched dialogue data according to instruction information initiated by a user in the normal playing process of the video, and searching matched video data and/or response text data from a knowledge base; after processing the video data, playing the video generated in real time to a user through a user interface module; according to the persona in the video, constructing a virtual persona which interacts with the user, combining response text data, dynamically converting voice and actions of the virtual persona, generating a virtual persona interaction video, playing the virtual persona interaction video through a user interface module, providing a scene interface which interacts with the virtual persona for the user, receiving command information sent by the user in real time, accurately grasping video watching requirements of the user, ensuring reliable video playing for the user, creating a virtual persona matched with the user, performing real-time video interaction, and improving video watching experience of the user.
Description
Technical Field
The invention relates to the technical field of intelligent man-machine interaction, in particular to a video interaction playing system for interacting with a person role in a video.
Background
At present, all video players play videos in one direction, namely video images are directly played to users, and in the video playing process, the users cannot interact with people in the video pictures, so that video watching experience is single. In an online teaching or multimedia teaching scene, students need to ask questions at any time when watching teaching videos, but the teaching videos are all prerecorded and formed, and cannot interact with the students in real-time teaching. In commercial scenes such as online live broadcasting and the like, in order to realize interaction with audiences, the live broadcasting is realized through a live broadcasting mode, but for a large amount of interaction information from different audiences, the live broadcasting of the live broadcasting cannot comprehensively and timely respond to all interaction information, so that experience of live broadcasting interaction is reduced, and more human resources are required to be consumed. In the video playing process, the interactive request of the user is timely received and correctly identified, and the interactive response is accurately returned to the user in real time, so that the interactive request is a determining factor for realizing video playing interaction.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a video interactive playing system for interacting with a character in a video, which is used for carrying out video interactive playing through a user interface module, generating matched dialogue data according to instruction information initiated by a user in the normal playing process of the video, and searching matched video data and/or response text data from a knowledge base; after processing the video data, playing the video generated in real time to a user through a user interface module; according to the persona in the video, constructing a virtual persona which interacts with the user, combining response text data, dynamically converting voice and actions of the virtual persona, generating a virtual persona interaction video, playing the virtual persona interaction video through a user interface module, providing a scene interface which interacts with the virtual persona for the user, receiving command information sent by the user in real time, accurately grasping video watching requirements of the user, ensuring reliable video playing for the user, creating a virtual persona matched with the user, performing real-time video interaction, and improving video watching experience of the user.
The present invention provides a video interactive play system for interacting with a character in a video, comprising
The user interface module is used for receiving instruction information initiated by a user, providing a scene interface for the user to interact with the virtual character and/or playing video;
the control module triggers the language identification module and/or the video processing module to work according to the instruction information; and according to the virtual character interaction video from the digital human module, instructing the user interface module to form the scene interface interacted with the virtual character, and/or according to the playable video from the video processing module, instructing the user interface module to play the video;
the language identification module analyzes the instruction information and generates dialogue data matched with the instruction information;
the intelligent question-answering module searches matched video data and/or answer text data from a knowledge base according to the dialogue data;
the video processing module is used for processing the video data to obtain a playable video and sending the playable video to the control module;
and the digital person module generates the virtual character interaction video according to the response text data and sends the virtual character interaction video to the control module.
Further, before receiving instruction information initiated by a user, the user interface module includes:
shooting a face image of a user currently interacted with the user interface module, and extracting facial feature information of the user from the face image; judging whether the user belongs to a legal user or not according to the facial feature information; if the user belongs to a legal user, receiving instruction information initiated by the user; if the user is not legal, not receiving instruction information initiated by the user;
or detecting the position information of the user currently interacting with the user interface module, and judging whether the user is positioned in a preset activity range or not according to the position information; if the user is within the preset activity range, receiving instruction information initiated by the user; and if the user is not in the preset activity range, not receiving the instruction information initiated by the user.
Further, after receiving the instruction information initiated by the user, the user interface module further includes:
the instruction information is sent to the control module, so that whether the instruction information belongs to sound instruction information or text instruction information is judged;
if the instruction information belongs to the sound instruction information, analyzing a useful sound signal and a background sound signal of the sound instruction information, and judging whether the sound instruction information belongs to the effective sound instruction information; if the effective voice command information belongs to the effective voice command information, the effective voice command information is directly subjected to content identification; if the instruction information does not belong to the effective sound instruction information, the user interface module is instructed to generate instruction information and send a reminding message again;
if the instruction information belongs to the text instruction information, performing text defect analysis processing on the text instruction information, and judging whether the text instruction information has text errors or not; if the text error does not exist, the text instruction information is directly subjected to content identification; and if the text error exists, indicating the user interface module to generate instruction information and resend the reminding message.
Further, the control module triggers the language identification module and/or the video processing module to work according to the instruction information, and the method comprises the following steps:
performing content identification on the instruction information to obtain an instruction code contained in the instruction information;
and comparing the instruction codes with a preset code catalog, and triggering the language identification module and/or the video processing module to work according to the comparison result.
Further, the language identification module analyzes the instruction information to generate dialogue data matched with the instruction information, and the language identification module comprises:
when the instruction information belongs to sound instruction information, extracting voice information components of the user from the sound instruction information; performing voice recognition on the voice information component to generate text dialogue data matched with the instruction information;
and when the instruction information belongs to the text instruction information, directly generating text dialogue data matched with the instruction information.
Further, the intelligent question-answering module searches matched video data and/or answer text data from a knowledge base according to the dialogue data, and comprises the following steps:
extracting all dialogue text words contained in the dialogue data, and generating feature vectors corresponding to the dialogue data according to all dialogue text words;
inputting the feature vector into a knowledge base learning model, and searching matched video data and/or response text data from the knowledge base;
and sending the video data to the video processing module and/or sending the response text data to the digital person module.
The intelligent question-answering module can also call public AI analysis systems such as ChatGPT, microsoft New Bing, hundred degree text, xingfeixing fire and the like.
Further, after the video processing module processes the video data, a playable video is obtained, and the playable video is sent to the control module, which includes:
carrying out framing treatment on the video data to obtain a plurality of video image frames; performing picture content identification on the video image frame, and performing picture restoration processing on the video image frame;
then, according to the video playing parameters of the user interface module, video format conversion is carried out on the video data to obtain playable video; and transmitting the playable video package compression to the control module;
the control module converts the playable video into a video stream playing signal and sends the video stream playing signal to the user interface module.
Further, the video processing module performs picture content identification on the video image frame, performs picture repair processing on the video image frame, and includes:
step S1, before the picture restoration processing, judging whether the picture content is restored in other video images or not by using the following formula (1) on the picture content of the video image to be restored currently,
in the above formula (1), X (b) represents a determination value as to whether or not the picture content of the b-th frame video image currently to be subjected to restoration processing is subjected to restoration processing of other video images; ya (i, j) represents the pixel value of the ith row and jth column pixel points in the pixel matrix of the unrepaired a-th frame video image; y is b (i, j) representing pixel values of a j-th row pixel point in a pixel matrix of a b-th frame video image to be repaired currently; the absolute value is calculated by the expression; m represents the total number of pixel points in any row in a pixel matrix of the video image; n represents the total number of any column of pixel points in the pixel matrix of the video image;
if X (b) =1, the picture content of the b-th frame video image to be repaired is repaired in other video images;
if X (b) =0, the picture content of the b-th frame video image to be repaired is not repaired in other video images;
step S2, if the picture content is repaired in other video images, using the following formula (2), locating the frame number value of the picture content repaired in other video images,
in the above formula (2), a represents an array of frame values in which the image content of the b-th frame video image to be repaired is repaired in other video images;&&representation logicRelational and; e represents belonging to;is represented by a belonging to the interval [1, b-1 ]]Will meet +.>The a values of (a) are all screened out and arranged into an array according to the sequence from small to large;
step S3, verifying whether the repaired picture content corresponding to the previous frame number is correct or not according to the frame number value of the picture content repaired in other video images by using the following formula (3),
in the above formula (3), Z represents a determination value of whether the repair processing of the picture content of the b-th frame video image to be repaired is correct or not in the repaired picture content corresponding to the number of frames of the other video images to be repaired;
if z=0, the repaired picture content corresponding to the frame number of the previous repair process needs to be repaired again;
if z=1, directly controlling to repair the picture content of the b-th frame video image to be repaired into repaired picture content corresponding to any element in the array a.
Further, the digital person module generates the virtual character interaction video according to the response text data, and sends the virtual character interaction video to the control module, including:
constructing corresponding virtual roles according to the video interaction history logs of the users, and setting characteristic elements of the virtual roles for performing voice behaviors and action behaviors;
generating virtual character voice information according to the response text data; generating the virtual character interactive video according to the virtual character and the virtual character voice information; the virtual character interaction video is packed, compressed and sent to the control module;
and the control module takes the virtual character interactive video as a video stream playing signal and sends the video stream playing signal to the user interface module.
Compared with the prior art, the video interactive playing system for interacting with the personas in the video plays the video interactively through the user interface module, generates matched dialogue data according to instruction information initiated by a user in the normal playing process of the video, and searches matched video data and/or response text data from a knowledge base; after processing the video data, playing the video generated in real time to a user through a user interface module; according to the persona in the video, constructing a virtual persona which interacts with the user, combining response text data, dynamically converting voice and actions of the virtual persona, generating a virtual persona interaction video, playing the virtual persona interaction video through a user interface module, providing a scene interface which interacts with the virtual persona for the user, receiving command information sent by the user in real time, accurately grasping video watching requirements of the user, ensuring reliable video playing for the user, creating a virtual persona matched with the user, performing real-time video interaction, and improving video watching experience of the user.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic structural diagram of a video interactive playing system for interacting with a character in a video according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a schematic structural diagram of a video interactive playing system for interacting with a character in a video according to an embodiment of the present invention is shown. The video interactive play system for interacting with a character in a video includes:
the user interface module is used for receiving instruction information initiated by a user, providing a scene interface for the user to interact with the virtual character and/or playing video;
the control module triggers the language identification module and/or the video processing module to work according to the instruction information; and according to the virtual character interaction video from the digital human module, instructing the user interface module to form a scene interface for interaction with the virtual character, and/or according to the playable video from the video processing module, instructing the user interface module to play the video;
the language identification module analyzes the instruction information and generates dialogue data matched with the instruction information;
the intelligent question-answering module searches matched video data and/or answer text data from the knowledge base according to the dialogue data;
the video processing module is used for processing the video data to obtain a playable video and sending the playable video to the control module;
and the digital person module generates the virtual character interaction video according to the response text data and sends the virtual character interaction video to the control module.
The beneficial effects of the technical scheme are as follows: the video interactive playing system for interacting with the personas in the video performs video interactive playing through the user interface module, generates matched dialogue data according to instruction information initiated by a user in the normal playing process of the video, and searches matched video data and/or response text data from a knowledge base; after processing the video data, playing the video generated in real time to a user through a user interface module; according to the persona in the video, constructing a virtual persona which interacts with the user, combining response text data, dynamically converting voice and actions of the virtual persona, generating a virtual persona interaction video, playing the virtual persona interaction video through a user interface module, providing a scene interface which interacts with the virtual persona for the user, receiving command information sent by the user in real time, accurately grasping video watching requirements of the user, ensuring reliable video playing for the user, creating a virtual persona matched with the user, performing real-time video interaction, and improving video watching experience of the user.
Preferably, the user interface module, before receiving instruction information initiated by a user, comprises:
shooting a face image of a user currently interacted with the user interface module, and extracting facial feature information of the user from the face image; judging whether the user belongs to a legal user or not according to the facial feature information; if the user belongs to a legal user, receiving instruction information initiated by the user; if the user is not legal, not receiving instruction information initiated by the user;
or detecting the position information of the user currently interacted with the user interface module, and judging whether the user is positioned in a preset activity range or not according to the position information; if the user is in the preset activity range, receiving instruction information initiated by the user; if the user is not in the preset activity range, the instruction information initiated by the user is not received.
The beneficial effects of the technical scheme are as follows: the user interface module may include, but is not limited to, a display having sound playing and sound receiving functions, and a camera disposed on the display. Wherein the display is used for providing a scene interface for the user to interact with the virtual character and/or playing video, and the display can be, but is not limited to, a touch display screen with a sound box and a microphone; the camera is used for shooting a user currently interacting with the user interface module. The camera may be, but is not limited to, a binocular camera. The binocular camera is used for shooting the face area of the user to obtain a binocular image about the face of the user. Generating a three-dimensional face image according to binocular parallax of the binocular image of the face of the user, and then carrying out facial feature contour feature recognition processing on the three-dimensional face image to obtain facial feature contour feature information of the face of the user. Comparing the facial feature outline characteristic information with a preset facial feature library, if the facial feature outline characteristic information exists in the preset facial feature library, determining that the user belongs to a legal user, and receiving instruction information input by the user through a microphone and/or a touch display screen at the moment; if the facial feature outline characteristic information does not exist in the preset facial feature library, determining that the user does not belong to a legal user, and indicating the microphone and the touch display screen to stop working, so that receiving of instruction information input by the user is refused, and therefore operation safety of the video interactive playing system can be ensured. In addition, the binocular camera can be used for shooting the user to obtain a binocular image of the current environment of the user, parallax analysis is carried out on the binocular image, the relative distance and the relative position of the user and the user interface module are determined, and whether the user is currently located in a preset moving range or not is judged according to the relative distance and the relative position; wherein the predetermined range of motion is a range of a predetermined size area of the area proximate the user interface module. When the user is in a preset activity range, the user interface module can accurately receive instruction information sent by the user, and instruction information input by the user through a microphone and/or a touch display screen is received at the moment; when the user is not in the preset activity range, the user interface module can not accurately receive the instruction information sent by the user, and the microphone and the touch display screen are instructed to stop working at the moment, so that the instruction information input by the user is refused to be received, and the working reliability of the video interactive playing system can be ensured.
Preferably, after the user interface module receives the instruction information initiated by the user, the method further comprises:
transmitting the instruction information to the control module so as to judge whether the instruction information belongs to sound instruction information or text instruction information;
if the instruction information belongs to the sound instruction information, analyzing a useful sound signal and a background sound signal of the sound instruction information, and judging whether the sound instruction information belongs to the effective sound instruction information; if the effective voice command information belongs to the effective voice command information, the effective voice command information is directly subjected to content identification; if the instruction information does not belong to the effective sound instruction information, the user interface module is instructed to generate instruction information and send a reminding message again;
if the instruction information belongs to the text instruction information, performing text defect analysis processing on the text instruction information, and judging whether the text instruction information has text errors or not; if the text error does not exist, the text instruction information is directly subjected to content identification; if the text error exists, the user interface module is instructed to generate instruction information to resend the reminding message.
The beneficial effects of the technical scheme are as follows: the user may input instruction information to the user interface module in a voice manner and a text manner. When the control module receives instruction information sent by a user, the control module firstly identifies whether the instruction information belongs to sound instruction information or text instruction information, so that the instruction information is identified in a distinguishing manner accurately. When the instruction information belongs to the voice instruction information, extracting a user voice signal sent by a user and a background voice signal generated by a background environment where the user is located according to the voiceprint characteristics of the user. And determining the signal-to-noise ratio of the voice command information according to the useful voice signal and the background voice signal, judging that the voice command information belongs to the effective voice command information when the signal-to-noise ratio is greater than or equal to a preset signal-to-noise ratio threshold, and directly carrying out content identification on the voice command information at the moment to obtain command content contained in the voice command information, thereby improving the identification accuracy of the command content. When the instruction information belongs to the text instruction information, text identification is carried out on the text instruction information, and whether text errors such as wrongly written characters exist in the text instruction information is judged. When the text error does not exist, the text instruction information is directly subjected to content identification, and instruction content contained in the text instruction information can be obtained, so that the identification accuracy of the instruction content is improved.
Preferably, the control module triggers the language identification module and/or the video processing module to work according to the instruction information, and the method comprises the following steps:
performing content identification on the instruction information to obtain an instruction code contained in the instruction information;
and comparing the instruction codes with a preset code catalog, and triggering the language identification module and/or the video processing module to work according to the comparison result.
The beneficial effects of the technical scheme are as follows: after the content of the instruction information is identified, the instruction semantic content contained in the instruction information can be obtained, and the instruction semantic content simultaneously contains corresponding instruction codes. And comparing the instruction codes with a preset code catalog, wherein the preset code catalog comprises a plurality of preset instruction codes respectively corresponding to the language identification module and the video processing module, and if the instruction codes are matched with the plurality of preset instruction codes corresponding to the language identification module and the video processing module, triggering the language identification module and/or the video processing module to work, so that the language identification module and the video processing module can be ensured to work only under proper conditions, and the increase of the calculation workload of the language identification module and the video processing module is avoided.
Preferably, the language identification module analyzes the instruction information to generate dialogue data matched with the instruction information, including:
when the instruction information belongs to sound instruction information, extracting voice information components of the user from the sound instruction information; performing voice recognition on the voice information component to generate text dialogue data matched with the instruction information;
when the instruction information belongs to the text instruction information, directly generating text dialogue data matched with the instruction information.
The beneficial effects of the technical scheme are as follows: for the different situations that the instruction information belongs to sound instruction information and text instruction information, the language recognition module needs to perform different working modes of voice recognition and text recognition, namely the language recognition module has the functions of voice recognition and text recognition, so that the requirement of a user can be accurately recognized under the condition that the user inputs different forms of instruction information, and the multi-scene matching of the user instruction recognition is realized.
Preferably, the intelligent question-answering module searches matching video data and/or answer text data from a knowledge base according to the dialogue data, and comprises:
extracting all dialogue text words contained in the dialogue data, and generating feature vectors corresponding to the dialogue data according to all dialogue text words;
inputting the feature vector into a knowledge base learning model, and searching matched video data and/or response text data from the knowledge base;
the video data is sent to the video processing module and/or the response text data is sent to the digital person module.
The beneficial effects of the technical scheme are as follows: when the corresponding dialogue data is identified from the instruction information input by the user, all dialogue text words contained in the dialogue data are processed in models such as ChatGPT, microsoft New Bing, hundred-degree text-to-speech, xingfeixing fire and the like, so that feature vectors corresponding to the dialogue data are generated. The feature vector may be, but is not limited to, a vector obtained by performing mathematical transformation on the dialogue text vocabulary, which belongs to a vocabulary transformation mode commonly used in the neural network model, and is not described in detail herein. Inputting the feature vector into a knowledge base learning model, and searching matched video data and/or response text data from the knowledge base; wherein the knowledge base comprises knowledge data of different types, which may have a text form and/or a video form. And transmitting the video data to the video processing module and/or transmitting the response text data to the digital person module, so that the video processing module and the digital person module can perform corresponding data processing.
Preferably, after the video processing module processes the video data, a playable video is obtained, and the playable video is sent to the control module, which includes:
carrying out framing treatment on the video data to obtain a plurality of video image frames; performing picture content identification on the video image frame, and performing picture restoration processing on the video image frame;
then, according to the video playing parameters of the user interface module, video format conversion is carried out on the video data to obtain playable video; and transmitting the playable video package compression to the control module;
the control module converts the playable video into a video stream playing signal and sends the video stream playing signal to the user interface module.
The beneficial effects of the technical scheme are as follows: the video processing module is used for carrying out framing processing and picture repairing processing on video data, so that the picture quality of the video data can be improved, video format conversion is carried out on the video data according to video playing parameters of the user interface module, playable video is obtained, normal and smooth video playing of the user interface module can be ensured, video playing clamping and stopping conditions are avoided, and video watching experience of a user is improved.
Preferably, the video processing module performs picture content identification on the video image frame, performs picture repair processing on the video image frame, and includes:
step S1, before the picture restoration processing, judging whether the picture content is restored in other video images or not by using the following formula (1) on the picture content of the video image to be restored currently,
in the above formula (1)X (b) represents a determination value of whether or not the picture content of the b-th frame video image currently to be repaired is being repaired in other video images; y is a (i, j) representing pixel values of a j-th column pixel point of an i-th row in a pixel matrix of an unrepaired a-th frame video image; y is b (i, j) representing pixel values of a j-th row pixel point in a pixel matrix of a b-th frame video image to be repaired currently; the absolute value is calculated by the expression; m represents the total number of pixel points in any row in a pixel matrix of the video image; n represents the total number of any column of pixel points in the pixel matrix of the video image;
if X (b) =1, the picture content of the b-th frame video image to be repaired is repaired in other video images;
if X (b) =0, the picture content of the b-th frame video image to be repaired is not repaired in other video images;
step S2, if the picture content is repaired in other video images, the frame number value of the picture content repaired in other video images is located by using the following formula (2),
in the above formula (2), a represents an array of frame values in which the image content of the b-th frame video image to be repaired is repaired in other video images;&&represents a logical relationship and; e represents belonging to;is represented by a belonging to the interval [1, b-1 ]]Will meet +.>The a values of (a) are all screened out and arranged into an array according to the sequence from small to large;
step S3, verifying whether the repaired picture content corresponding to the previous frame number is correct or not according to the frame number value of the picture content repaired in other video images by using the following formula (3),
in the above formula (3), Z represents a determination value as to whether or not the restoration processing of the picture content of the b-th frame video image to be restored is correct in the restored picture content corresponding to the number of frames of the other video images;
if z=0, the repaired picture content corresponding to the frame number of the previous repair process needs to be repaired again;
if z=1, directly controlling to repair the picture content of the b-th frame video image to be repaired into the repaired picture content corresponding to any element in the array a.
The beneficial effects of the technical scheme are as follows: by utilizing the formula (1), judging whether the picture content is repaired in other video images or not according to the picture content of the video image to be repaired currently, and further rapidly checking whether the video image is repaired or not, thereby embodying the capability of rapid system inspection; and then using the formula (2) to locate the frame number value of the picture content which is repaired by other video images, so that the result of the previous repair treatment can be known, the accuracy of the repair treatment can be verified, and the accuracy of a system is ensured; and finally, verifying whether the repaired picture content corresponding to the previous frame number is correct or not according to the frame number value of the picture content repaired in other video images by utilizing the formula (3), and ensuring the overall reliability of the system through repeated verification.
Preferably, the digital person module generates the virtual character interaction video according to the response text data, and sends the virtual character interaction video to the control module, including:
constructing a corresponding virtual role according to the video interaction history log of the user, and setting characteristic elements of the virtual role for performing voice behaviors and action behaviors;
generating virtual character voice information according to the response text data; generating the virtual character interactive video according to the virtual character and the virtual character voice information; the virtual character interaction video is packed, compressed and sent to the control module;
the control module takes the virtual character interactive video as a video stream playing signal and sends the video stream playing signal to the user interface module.
The beneficial effects of the technical scheme are as follows: the digital person module may be, but is not limited to, a virtual image construction module having a virtual animation generation function to generate a video animation containing a virtual character. Extracting virtual character images used to by a user in a history video interaction process from a video interaction history log of the user, constructing corresponding virtual character roles, and setting characteristic elements of the virtual character roles for voice behaviors and action behaviors, wherein the characteristic elements of the voice behaviors can be but are not limited to voice tones, volume, speed and the like of speaking of the virtual character roles; the characteristic elements of the action behavior can be, but are not limited to, limb action types, action amplitudes and the like of the virtual character in the speaking process, so that the interactive reality of the user can be improved in the video interaction process between the user and the virtual character. In addition, the virtual character interactive video and the playable video can be integrated and overlapped, so that the user can interact with the virtual character in real time while watching the video.
As can be seen from the above embodiments, the video interactive playing system for interacting with the persona in the video performs interactive playing with the video through the user interface module, generates matched dialogue data according to instruction information initiated by the user in the normal playing process of the video, and searches the matched video data and/or response text data from the knowledge base; after processing the video data, playing the video generated in real time to a user through a user interface module; according to the persona in the video, constructing a virtual persona which interacts with the user, combining response text data, dynamically converting voice and actions of the virtual persona, generating a virtual persona interaction video, playing the virtual persona interaction video through a user interface module, providing a scene interface which interacts with the virtual persona for the user, receiving command information sent by the user in real time, accurately grasping video watching requirements of the user, ensuring reliable video playing for the user, creating a virtual persona matched with the user, performing real-time video interaction, and improving video watching experience of the user. .
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (10)
1. A video interactive play system for interacting with a persona in a video, comprising:
the user interface module is used for receiving instruction information initiated by a user, providing a scene interface for the user to interact with the virtual character and/or playing video;
the control module triggers the language identification module and/or the video processing module to work according to the instruction information; and according to the virtual character interaction video from the digital human module, instructing the user interface module to form the scene interface interacted with the virtual character, and/or according to the playable video from the video processing module, instructing the user interface module to play the video;
the language identification module analyzes the instruction information and generates dialogue data matched with the instruction information;
the intelligent question-answering module searches matched video data and/or answer text data from a knowledge base according to the dialogue data;
the video processing module is used for processing the video data to obtain a playable video and sending the playable video to the control module;
and the digital person module generates the virtual character interaction video according to the response text data and sends the virtual character interaction video to the control module.
2. The video interactive play system for interacting with a persona in a video of claim 1, wherein:
the user interface module, before receiving instruction information initiated by a user, comprises:
shooting a face image of a user currently interacted with the user interface module, and extracting facial feature information of the user from the face image; judging whether the user belongs to a legal user or not according to the facial feature information; if the user belongs to a legal user, receiving instruction information initiated by the user; if the user is not legal, not receiving instruction information initiated by the user;
or detecting the position information of the user currently interacting with the user interface module, and judging whether the user is positioned in a preset activity range or not according to the position information; if the user is within the preset activity range, receiving instruction information initiated by the user; and if the user is not in the preset activity range, not receiving the instruction information initiated by the user.
3. The video interactive play system for interacting with a persona in a video of claim 1, wherein:
after receiving instruction information initiated by a user, the user interface module further comprises:
the instruction information is sent to the control module, so that whether the instruction information belongs to sound instruction information or text instruction information is judged;
if the instruction information belongs to the sound instruction information, analyzing a useful sound signal and a background sound signal of the sound instruction information, and judging whether the sound instruction information belongs to the effective sound instruction information; if the effective voice command information belongs to the effective voice command information, the effective voice command information is directly subjected to content identification; if the instruction information does not belong to the effective sound instruction information, the user interface module is instructed to generate instruction information and send a reminding message again;
if the instruction information belongs to the text instruction information, performing text defect analysis processing on the text instruction information, and judging whether the text instruction information has text errors or not; if the text error does not exist, the text instruction information is directly subjected to content identification; and if the text error exists, indicating the user interface module to generate instruction information and resend the reminding message.
4. The video interactive play system for interacting with a persona in a video of claim 1, wherein:
the control module triggers the language identification module and/or the video processing module to work according to the instruction information, and comprises the following steps:
performing content identification on the instruction information to obtain an instruction code contained in the instruction information; and comparing the instruction codes with a preset code catalog, and triggering the language identification module and/or the video processing module to work according to the comparison result.
5. The video interactive play system for interacting with a persona in a video of claim 1, wherein:
the language identification module analyzes the instruction information to generate dialogue data matched with the instruction information, and the language identification module comprises the following steps:
when the instruction information belongs to sound instruction information, extracting voice information components of the user from the sound instruction information; performing voice recognition on the voice information component to generate text dialogue data matched with the instruction information;
and when the instruction information belongs to the text instruction information, generating text dialogue data matched with the instruction information.
6. The video interactive play system for interacting with a persona in a video of claim 1, wherein:
the intelligent question-answering module searches matched video data and/or answer text data from a knowledge base according to the dialogue data, and can call APIs of platforms such as ChatGPT, microsoft New Bing, hundred-degree text first language, xingfeixing fire and the like;
and sending the video data to the video processing module and/or sending the response text data to the digital person module.
7. The video interactive play system for interacting with a persona in a video of claim 1, wherein:
and the intelligent question-answering module searches matched video data and/or answer text data from a knowledge base according to the dialogue data, and comprises the following steps:
extracting all dialogue text words contained in the dialogue data, and generating feature vectors corresponding to the dialogue data according to all dialogue text words;
inputting the feature vector into a knowledge base learning model, and searching matched video data and/or response text data from the knowledge base;
and sending the video data to the video processing module and/or sending the response text data to the digital person module.
8. The video interactive play system for interacting with a persona in a video of claim 1, wherein:
the video processing module processes the video data to obtain a playable video, and sends the playable video to the control module, and the video processing module comprises:
carrying out framing treatment on the video data to obtain a plurality of video image frames; performing picture content identification on the video image frame, and performing picture restoration processing on the video image frame;
then, according to the video playing parameters of the user interface module, video format conversion is carried out on the video data to obtain playable video; and transmitting the playable video package compression to the control module;
the control module converts the playable video into a video stream playing signal and sends the video stream playing signal to the user interface module.
9. The video interactive play system for interacting with a persona in a video of claim 8, wherein:
the video processing module performs picture content identification on the video image frame, performs picture repair processing on the video image frame, and comprises:
step S1, before the picture restoration processing, judging whether the picture content is restored in other video images or not by using the following formula (1) on the picture content of the video image to be restored currently,
in the above formula (1), X (b) represents a determination value as to whether or not the picture content of the b-th frame video image currently to be subjected to restoration processing is subjected to restoration processing of other video images; y is a (i, j) representing pixel values of a j-th column pixel point of an i-th row in a pixel matrix of an unrepaired a-th frame video image; y is b (i, j) representing pixel values of a j-th row pixel point in a pixel matrix of a b-th frame video image to be repaired currently; the absolute value is calculated by the expression; m represents the total number of pixel points in any row in a pixel matrix of the video image; n represents the total number of any column of pixel points in the pixel matrix of the video image;
if X (b) =1, the picture content of the b-th frame video image to be repaired is repaired in other video images;
if X (b) =0, the picture content of the b-th frame video image to be repaired is not repaired in other video images;
step S2, if the picture content is repaired in other video images, using the following formula (2), locating the frame number value of the picture content repaired in other video images,
in the above formula (2), a represents an array of frame values in which the image content of the b-th frame video image to be repaired is repaired in other video images;&&represents a logical relationship and; e represents belonging to;is represented by a belonging to the interval [1, b-1 ]]Will meet +.>The a values of (a) are all screened out and arranged into an array according to the sequence from small to large;
step S3, verifying whether the repaired picture content corresponding to the previous frame number is correct or not according to the frame number value of the picture content repaired in other video images by using the following formula (3),
in the above formula (3), Z represents a determination value of whether the repair processing of the picture content of the b-th frame video image to be repaired is correct or not in the repaired picture content corresponding to the number of frames of the other video images to be repaired;
if z=0, the repaired picture content corresponding to the frame number of the previous repair process needs to be repaired again;
if z=1, directly controlling to repair the picture content of the b-th frame video image to be repaired into repaired picture content corresponding to any element in the array a.
10. The video interactive play system for interacting with a persona in a video of claim 1, wherein:
the digital person module generates the virtual character interaction video according to the response text data and sends the virtual character interaction video to the control module, and the digital person module comprises:
constructing corresponding virtual roles according to the video interaction history logs of the users, and setting characteristic elements of the virtual roles for performing voice behaviors and action behaviors;
generating virtual character voice information according to the response text data; generating the virtual character interactive video according to the virtual character and the virtual character voice information; the virtual character interaction video is packed, compressed and sent to the control module;
and the control module takes the virtual character interactive video as a video stream playing signal and sends the video stream playing signal to the user interface module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310639004.3A CN116614665A (en) | 2023-06-01 | 2023-06-01 | Video interactive play system for interacting with personas in video |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310639004.3A CN116614665A (en) | 2023-06-01 | 2023-06-01 | Video interactive play system for interacting with personas in video |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116614665A true CN116614665A (en) | 2023-08-18 |
Family
ID=87674488
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310639004.3A Pending CN116614665A (en) | 2023-06-01 | 2023-06-01 | Video interactive play system for interacting with personas in video |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116614665A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117273054A (en) * | 2023-09-28 | 2023-12-22 | 南京八点八数字科技有限公司 | Virtual human interaction method and system applying different scenes |
CN118247700A (en) * | 2024-03-19 | 2024-06-25 | 西安隆腾科技文化有限公司 | Multimedia information interaction method and system |
CN118250488A (en) * | 2024-04-11 | 2024-06-25 | 天翼爱音乐文化科技有限公司 | Video face changing method and system based on voice interaction, electronic equipment and medium |
-
2023
- 2023-06-01 CN CN202310639004.3A patent/CN116614665A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117273054A (en) * | 2023-09-28 | 2023-12-22 | 南京八点八数字科技有限公司 | Virtual human interaction method and system applying different scenes |
CN118247700A (en) * | 2024-03-19 | 2024-06-25 | 西安隆腾科技文化有限公司 | Multimedia information interaction method and system |
CN118250488A (en) * | 2024-04-11 | 2024-06-25 | 天翼爱音乐文化科技有限公司 | Video face changing method and system based on voice interaction, electronic equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116614665A (en) | Video interactive play system for interacting with personas in video | |
CN112215927B (en) | Face video synthesis method, device, equipment and medium | |
WO2021114881A1 (en) | Intelligent commentary generation method, apparatus and device, intelligent commentary playback method, apparatus and device, and computer storage medium | |
CN107203953B (en) | Teaching system based on internet, expression recognition and voice recognition and implementation method thereof | |
US6384829B1 (en) | Streamlined architecture for embodied conversational characters with reduced message traffic | |
CN100345085C (en) | Method for controlling electronic game scene and role based on poses and voices of player | |
CN111263227B (en) | Multimedia playing method and device, storage medium and terminal | |
CN111885414B (en) | Data processing method, device and equipment and readable storage medium | |
CN112135160A (en) | Virtual object control method and device in live broadcast, storage medium and electronic equipment | |
US20230047858A1 (en) | Method, apparatus, electronic device, computer-readable storage medium, and computer program product for video communication | |
CN113870395A (en) | Animation video generation method, device, equipment and storage medium | |
CN112668407A (en) | Face key point generation method and device, storage medium and electronic equipment | |
CN113052085A (en) | Video clipping method, video clipping device, electronic equipment and storage medium | |
CN111711834A (en) | Recorded broadcast interactive course generation method and device, storage medium and terminal | |
CN111063024A (en) | Three-dimensional virtual human driving method and device, electronic equipment and storage medium | |
CN114697685B (en) | Method, device, server and storage medium for generating comment video | |
CN111265851B (en) | Data processing method, device, electronic equipment and storage medium | |
CN113762056A (en) | Singing video recognition method, device, equipment and storage medium | |
CN117292022A (en) | Video generation method and device based on virtual object and electronic equipment | |
CN117809679A (en) | Server, display equipment and digital human interaction method | |
CN110221694A (en) | Control method and device of interactive platform, storage medium and interactive platform | |
CN111768729A (en) | VR scene automatic explanation method, system and storage medium | |
KR102659886B1 (en) | VR and AI Recognition English Studying System | |
TWI771858B (en) | Smart language learning method and system thereof combining image recognition and speech recognition | |
CN111968621B (en) | Audio testing method and device and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |