Nothing Special   »   [go: up one dir, main page]

CN110442867A - Image processing method, device, terminal and computer storage medium - Google Patents

Image processing method, device, terminal and computer storage medium Download PDF

Info

Publication number
CN110442867A
CN110442867A CN201910693744.9A CN201910693744A CN110442867A CN 110442867 A CN110442867 A CN 110442867A CN 201910693744 A CN201910693744 A CN 201910693744A CN 110442867 A CN110442867 A CN 110442867A
Authority
CN
China
Prior art keywords
emotion
image
data
target
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910693744.9A
Other languages
Chinese (zh)
Other versions
CN110442867B (en
Inventor
王伟航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910693744.9A priority Critical patent/CN110442867B/en
Publication of CN110442867A publication Critical patent/CN110442867A/en
Application granted granted Critical
Publication of CN110442867B publication Critical patent/CN110442867B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Computing Systems (AREA)
  • Psychiatry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Child & Adolescent Psychology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The embodiment of the invention provides a kind of image processing method, device, terminal and computer storage mediums, wherein, the described method includes: obtaining mood data and image to be processed, identify the target emotion that the mood data is reflected, with according to the target emotion be the corresponding target filter mode of the images match to be processed, filter processing is finally carried out to the image to be processed using the target filter mode, obtains target image.Using the embodiment of the present invention, it is able to solve that image enhancement effects present in traditional technology are poor, can not accurately express the problems such as true intention of user.

Description

Image processing method, device, terminal and computer storage medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, a terminal, and a computer storage medium.
Background
Social interaction refers to the interpersonal communication between people in the society, and is the meaning of people using a certain mode (tool) to transmit information and communicate ideas so as to achieve various social activities with a certain purpose. With the development of science and technology and the application of internet resources in life, people-to-people interaction is realized by means of the internet, and strangers can also realize social contact through the internet so as to further expand and develop themselves.
At present, strangers often use intelligent terminals in the social process. The user displays himself in a stranger social application of the intelligent terminal in dynamic modes of characters, voice, images and the like, and the user is attracted to interact with more resonators. Where images are the most common choices used by users to publish personal dynamics. However, in practice it has been found that: due to the fact that the image filter mode provided by the intelligent terminal is limited, the image enhancement effect is poor, the image effect issued by the user is limited, and the real intention of the user cannot be accurately expressed. Therefore, the enthusiasm of stranger interaction is influenced, the utilization rate of stranger social application is influenced, and the development of stranger social application is not facilitated.
Disclosure of Invention
The embodiment of the invention provides an image processing method, an image processing device, a terminal and a computer storage medium, which can improve the image effect, further improve the enthusiasm of user interaction and improve the utilization rate of social applications.
In one aspect, an embodiment of the present invention discloses an image processing method, including:
acquiring emotion data and an image to be processed, wherein the emotion data comprises emotion voice data, emotion image data or emotion text data;
identifying a target emotion reflected by the emotion data;
matching a corresponding target filter mode for the image to be processed according to the target emotion;
and performing filter processing on the image to be processed by adopting the target filter mode to obtain a target image.
On the other hand, the embodiment of the invention also discloses an image processing device, which comprises:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring emotion data and an image to be processed, and the emotion data comprises emotion text data, emotion voice data or emotion image data;
the identification unit is used for identifying the target emotion reflected by the emotion data;
the matching unit is used for matching a corresponding target filter mode for the image to be processed according to the target emotion;
and the processing unit is used for carrying out filter processing on the image to be processed by adopting the target filter mode to obtain a target image.
In another aspect, an embodiment of the present invention further provides a terminal, where the terminal includes an input device and an output device, and the terminal further includes:
a processor adapted to implement one or more instructions; and the number of the first and second groups,
a computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the steps of:
acquiring emotion data and an image to be processed, wherein the emotion data comprises emotion text data, emotion voice data or emotion image data;
identifying a target emotion reflected by the emotion data;
matching a corresponding target filter mode for the image to be processed according to the target emotion;
and performing filter processing on the image to be processed by adopting the target filter mode to obtain a target image. In yet another aspect, an embodiment of the present invention provides a computer storage medium, where one or more instructions are stored, and the one or more instructions are adapted to be loaded by a processor and execute the following steps:
acquiring emotion data and an image to be processed, wherein the emotion data comprises emotion text data, emotion voice data or emotion image data;
identifying a target emotion reflected by the emotion data;
matching a corresponding target filter mode for the image to be processed according to the target emotion;
and performing filter processing on the image to be processed by adopting the target filter mode to obtain a target image.
According to the embodiment of the invention, emotion data and an image to be processed can be obtained, the target emotion reflected by the emotion data is identified, a corresponding target filter mode is matched for the image to be processed according to the target emotion, and finally the target filter mode is adopted to carry out filter processing on the image to be processed to obtain the target image. Therefore, the image is subjected to filter processing based on emotion, and the problems that the image enhancement effect is poor, the real intention of a user cannot be accurately expressed, the interaction enthusiasm is influenced and the like in the traditional technology can be solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present invention.
Fig. 2(a) and 2(b) are schematic waveforms of two emotional voice data provided by the embodiment of the invention.
Fig. 3 is a schematic diagram of emotion classification provided by an embodiment of the present invention.
Fig. 4-5 are schematic flow charts of two other image processing methods according to embodiments of the present invention.
Fig. 6(a) -6 (h) are a series of scene diagrams provided by the embodiment of the present invention.
Fig. 7 is a flowchart illustrating another image processing method according to an embodiment of the present invention.
Fig. 8 is a flowchart illustrating another image processing method according to an embodiment of the present invention.
Fig. 9 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention.
Fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," and "third" (if any) in the description and claims of the invention and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprises" and any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the present invention. The image processing method may be performed by a terminal. The method as shown in fig. 1 comprises the following steps S101-S104.
S101, obtaining emotion data and an image to be processed.
When detecting a dynamic issuing instruction in the social application, the terminal can respond to the dynamic issuing instruction to acquire emotion data and an image to be processed. The dynamic publishing instruction may be sent by receiving from another device (e.g., a server), or may be generated by detecting a dynamic publishing operation of the user by the terminal, where the dynamic publishing operation may refer to a corresponding operation that the user needs to publish a dynamic state in the social application, for example, a sliding operation according to a preset track in the social application, or a series of clicking operations for a specified button in the social application.
The emotion data refers to data for describing the emotion of a user, wherein the emotion refers to a general term of a series of subjective cognitive experiences and is a psychological and physiological state generated by comprehensively combining various feelings, ideas and behaviors. For example, emotions may include, but are not limited to, anger, joy, excitement, hope, or other words that describe the user's psychological and physiological state.
In practical applications, the specific expression form of the emotion data is not limited, and may include, but is not limited to, at least one of the following: emotional voice data, emotional image data, emotional video data, and emotional text data. Here, the video is generally composed of one frame of image, and thus the emotion video data can be regarded as being composed of one frame of emotion image data. The essence of the analysis of the emotion video data by the terminal is the analysis of the emotion image data of one frame, so the invention is described below by taking the emotion image data as an example to replace the emotion video data.
And S102, identifying the target emotion reflected by the emotion data.
In one embodiment, if the emotion data includes emotion voice data, step S102 may specifically include the following steps S11 to S13:
and S11, converting the emotion voice data into emotion text data, and extracting text features in the emotion text data.
The terminal converts the emotion voice data into corresponding emotion text data through a voice recognition program. The speech recognition program may be a system program deployed in the terminal, or may be a third-party application program, and is used to implement conversion from speech to text. And further, the terminal adopts a text feature extraction algorithm to extract text features contained in the emotion text data. The text feature is used to reflect the emotion that the emotional speech data exhibits on the text. The text feature extraction algorithm is customized by the system, for example, customized according to actual requirements, which may include, but is not limited to, a text feature vector algorithm, a principal component analysis method, or other algorithms for extracting text features.
And S12, extracting acoustic features in the emotional voice data.
The terminal adopts an acoustic feature extraction algorithm to extract acoustic features in the emotion voice data, and the acoustic feature extraction algorithm can be specifically set by a system in a self-defined mode, such as a convolutional neural network algorithm, a cyclic neural network algorithm and the like.
Optionally, the acoustic features comprise time domain acoustic features and/or frequency domain acoustic features. The time domain acoustic features refer to features which are displayed on the time domain by the emotional voice data and are used for reflecting the emotion of the user. The frequency domain acoustic features refer to features of emotional voice data shown in a frequency domain to reflect the emotion of a user.
In practical applications, the emotion voice data collected by the terminal is substantially a voice signal, and includes characteristics of voice in frequency domain and time domain. Fig. 2(a) shows a waveform diagram of the speech signal (which may also be referred to as a time-frequency signal), in which the abscissa represents frequency (frequency) and the ordinate represents vibration amplitude (amplitude). The terminal can extract various characteristics such as time, amplitude, frequency and the like in the voice signal so as to obtain time-domain acoustic characteristics of the voice in a time domain. Further, the terminal may use a fourier transform algorithm to convert the voice signal into a voice spectrum, and fig. 2(b) shows a schematic diagram of a voice spectrogram. The speech spectrogram is a waveform diagram of a speech signal in a frequency domain, and can also be called a spectrogram. That is, the speech spectrogram is a waveform diagram represented by converting a time domain signal of speech into a frequency domain signal, and an abscissa in the speech spectrogram represents time and an ordinate in the speech spectrogram represents frequency. The terminal can analyze and recognize the change conditions of the frequency domain signals of different time periods along with time in the voice spectrogram to obtain the frequency domain acoustic characteristics displayed on the frequency domain by the voice signals.
Specifically, the terminal may analyze the voice spectrogram by using a frequency domain feature extraction algorithm to obtain frequency domain acoustic features. The frequency domain feature extraction algorithm may be specifically configured by a system in a self-defined manner, and is used for acoustically extracting the frequency domain features of the speech, which may include, but is not limited to, a convolutional neural network algorithm, a cyclic neural network algorithm, and the like. For example, the terminal performs local feature extraction on the speech spectrogram by using a convolutional neural network algorithm, such as performing image processing on the speech spectrogram by shifting, scaling or other forms of distortion invariance to obtain frequency domain acoustic features.
And S13, calling the first emotion model to perform fusion recognition on the text features and the acoustic features to obtain the target emotion.
And the terminal calls the first emotion model to perform unified recognition or fusion recognition on the text features and the acoustic features to obtain a target emotion reflected by the emotion voice data. The first mood model can be custom set for the system, for example, according to user preferences or actual needs. The first emotion model is a pre-trained model for recognizing the emotion of the user, which may include, but is not limited to, a feed-forward (FF) neural network, a deep feed-forward (DFF) neural network, a Recurrent Neural Network (RNN), a long/short term memory (LSTM) network, or a model for emotion recognition.
It should be noted that, in the embodiment of the present invention, if the accuracy of emotion recognition is not considered, the terminal may consider only the text feature or the acoustic feature, and use the corresponding recognized emotion as the target emotion reflected by the emotion data. Emotion recognition under the combined action of the text features and the acoustic features does not need to be comprehensively considered, terminal computing resources are saved, and processing efficiency is improved.
In yet another embodiment, if the emotion data includes emotional voice data, the step S102 includes the following steps S21-S25.
And S21, converting the emotion voice data into emotion text data, and calling a second emotion model to perform semantic analysis on the emotion text data to obtain a first emotion.
And after the terminal converts the emotion voice data into emotion text data, semantic analysis can be performed on the emotion text data by calling the second emotion model to obtain the first emotion. The second emotion model may also be a pre-trained emotion recognition model, and reference may be made to the above description related to the first emotion model, which is not described herein again.
In specific implementation, the terminal performs semantic analysis on the emotion text data through the second emotion model to obtain one or more candidate emotion words contained in the emotion text data, wherein the candidate emotion words are used for reflecting the emotion of the user, such as anger, irritability, happiness, joy and the like. Specifically, the terminal can perform semantic analysis on the emotion text data according to an existing emotion word bank in the model, for example, analysis processing such as capturing syntax rules and splitting words is performed, so that at least one candidate emotion word is obtained. The emotion word library is set by a system in a self-defined way, and can be a language query and word number statistics (CLIWC) word library, an EmoCD emotion word library and the like, wherein the emotion word library comprises at least one reference emotion word which is configured in advance. Optionally, each reference emotion vocabulary is configured with a corresponding weight (also called weight). The weight value is used for indicating the intensity of the emotion reflected by the reference emotion vocabulary, namely the emotion intensity. For example, the greater the emotional intensity reflected by the reference emotional vocabulary is, the greater the weight of the reference emotional vocabulary is; on the contrary, the smaller the emotion intensity reflected by the reference emotion vocabulary is, the smaller the weight of the reference emotion vocabulary is.
And further, the terminal can perform similarity matching on the candidate emotion vocabulary and the reference emotion vocabulary in the model, calculate the similarity between the candidate emotion vocabulary and the reference emotion vocabulary, and further determine the emotion reflected by the target emotion vocabulary as the first emotion. Wherein the target emotion vocabulary is a vocabulary meeting the following conditions in at least one candidate emotion vocabulary: the similarity between the candidate emotion vocabulary and the reference emotion vocabulary is greater than or equal to a preset threshold (specifically, may be a third threshold), and the weight of the reference emotion vocabulary is greater than or equal to a fourth threshold. The third threshold and the fourth threshold may be specifically set by a system, for example, set by a user according to a favorite or actual requirement of the user, or obtained by a series of experimental data statistics, and the like. They may or may not be equal, and the invention is not limited.
The similarity matching according to the present invention is not limited in the specific embodiments. For example, the terminal device may calculate the inter-vocabulary similarity by using any one or a combination of multiple items of the following similarity matching algorithm (also called similarity algorithm): a Term Frequency (TF) calculation method, a term frequency-inverse document frequency (TF-IDF) calculation method, a vocabulary to vector conversion (word2Vec) calculation method, or other algorithms for obtaining vocabulary similarity.
And S22, calling a third emotion model to perform acoustic analysis on the emotion voice data to obtain a second emotion.
The terminal can perform acoustic feature analysis on the emotion voice data through the third emotion model to obtain acoustic features contained in the emotion voice data. The acoustic features are divided into time-domain acoustic features and frequency-domain acoustic features according to frequency domain and time domain. Further, the third emotion model can analyze the second emotion reflected by the emotion voice data according to the time domain acoustic features and/or the frequency domain acoustic features contained in the emotion voice data. The following description of the present invention will describe a specific implementation of obtaining the second emotion in detail by taking comprehensive analysis of time-domain acoustic features and frequency-domain acoustic features as an example.
Specifically, the terminal can perform feature extraction on the emotion voice data in the time domain through the third emotion model to obtain time domain acoustic features contained in the emotion voice data. The time-domain acoustic features refer to time-domain features contained in the emotional speech data in a time-domain direction, which may include, but are not limited to, speech speed, duration, mel-frequency cepstral coefficients (MFCCs), Perceptual Linear Prediction (PLP), formants, or other time-domain feature parameters. Correspondingly, the terminal can also perform feature extraction on the emotion voice data on the frequency domain through the third emotion model to obtain frequency domain acoustic features contained in the emotion voice data. The frequency domain acoustic feature refers to a frequency domain feature included in the emotional voice data in a frequency domain direction, which may include, but is not limited to, a short-time energy, a short-time average amplitude, a zero-crossing rate, or other frequency domain feature parameters.
Further, the third emotion model can comprehensively analyze the time domain acoustic features and the frequency domain acoustic features to obtain a second emotion. For example, the third emotion model may analyze a threshold interval range in which the time-domain acoustic feature and the frequency-domain acoustic feature are located, and further obtain an emotion corresponding to the threshold interval range. The third emotion model may be a pre-trained emotion recognition model, and reference may be made to the related description about the first emotion model, which is not repeated herein.
And S23, calculating the similarity between the first emotion and the second emotion.
The terminal calculates the similarity between the first emotion and the second emotion by adopting a preset similarity algorithm, so that the target emotion reflected by the emotion voice data can be conveniently determined based on the similarity. For the similarity algorithm, reference may be made to the foregoing related explanations regarding the similarity matching algorithm, and details are not repeated here.
And S24, when the similarity is larger than or equal to the first threshold value, determining the first emotion or the second emotion as the target emotion.
And S25, when the similarity is smaller than the first threshold value, determining the first emotion as the target emotion.
And if the terminal determines that the similarity between the first emotion and the second emotion is greater than or equal to the first threshold value, the terminal considers that the first emotion and the second emotion are relatively similar, for example, the first emotion is happy, and the second emotion is happy. The terminal may determine the first emotion or the second emotion as the target emotion.
Conversely, if the similarity between the first emotion and the second emotion is less than the first threshold, the first emotion and the second emotion are considered to be significantly different or contradictory, e.g., the first emotion is pleasure and the second emotion is dysphoria. In order to ensure the emotion recognition accuracy, the terminal can select a target emotion from the first emotion and the second emotion. For example, the terminal may select one of the first emotion and the second emotion as the target emotion. As in emotion recognition, generally, the text semantic analysis is more accurate than the acoustic feature analysis, so the terminal can determine the first emotion obtained by semantic analysis as the target emotion.
It should be noted that, if the accuracy of emotion recognition is not considered, the terminal may only consider text semantic analysis or speech acoustic analysis, and use the corresponding recognized emotion as the target emotion reflected by the emotion data, without comprehensively considering text semantic (text features) and speech acoustic (acoustic features), thereby saving the computing resources of the terminal and facilitating the improvement of the computing efficiency.
The division granularity of the emotion (target emotion) related to the embodiment of the present invention is not limited. For example, when the division granularity of the emotion is large, the more fuzzy the emotion description is, e.g., the emotion is divided into positive and negative ones, which can be classified into positive and negative ones. Conversely, when the granularity of division of the emotion is smaller, the emotion description is more accurate. Among the emotions with large granularity, emotions with small granularity can be included.
For example, fig. 3 shows a schematic diagram of emotion classification. The emotions are divided into three granularities as shown in fig. 3, including the emotions of the first level, the second level and the third level. The first hierarchy is divided according to the positive and negative directions of emotion, and comprises positive emotion and negative emotion. The second level is divided along positive and negative emotions, and each emotion comprises a plurality of emotions with intensity. For example, the positive emotions include interesting emotions and optimistic emotions, and the negative emotions include irritative emotions and counteremotional emotions. The third hierarchy is a further fine and discrete emotional classification of each emotion contained in the second hierarchy. For example, in the illustration, the fun emotion includes the joy, the pleasure and the entertainment, and the optimism emotion includes the hope and the desire. The dysphoria includes Zhan 2428, and also has discontent, discontent and depressed effects. The aversion to emotion contains keeping away from sight and annoyance.
In the actual processing, because the granularity of the emotion obtained by the text semantic analysis and the granularity of the emotion obtained by the voice acoustic analysis may not be the same, the terminal may further verify or refine the emotion recognition mode with smaller granularity by using the emotion recognition mode with smaller granularity. For example, if the granularity of the emotion obtained through the text semantic analysis is greater than the granularity of the emotion obtained through the voice acoustic analysis, that is, the emotion granularity of the second emotion model is greater than the emotion granularity of the third emotion model, after the terminal calls the second emotion model to recognize the first emotion reflected by the emotion data based on the text semantic, the terminal can further call the third emotion model to recognize the second emotion reflected by the emotion data based on the acoustic analysis, so as to further verify or refine the first emotion. The target emotion reflected by the emotion data can be acquired more accurately in the follow-up process, so that the emotion recognition accuracy is improved.
As described above, the terminal may consider the first emotion and the second emotion to belong to the same emotion type if the degree of similarity between the first emotion and the second emotion is greater than or equal to the first threshold value. Since the second emotion is smaller in granularity than the first emotion and the second emotion describes an emotion that is finer, the terminal may determine the first emotion as a target emotion reflected by the emotion data. Otherwise, if the similarity between the first emotion and the second emotion is smaller than the first threshold, the terminal considers that the first emotion and the second emotion do not belong to the same emotion type, and at this time, prompt information can be sent to prompt the user whether to determine the first emotion as the target emotion reflected by the emotion data. Therefore, the emotion recognition accuracy is improved, the user can participate in the emotion recognition, and the user experience is improved.
In another embodiment, if the emotion data includes emotion text data, the terminal may perform semantic analysis on the emotion text data to obtain a target emotion reflected by the emotion text data, which may specifically refer to the specific embodiment of step S11 or S21, and details are not repeated here.
In still another embodiment, if the emotion data includes emotion image data, the step S102 includes the following steps S31-S35.
And S31, extracting the target facial expression in the emotion image data, and obtaining a third emotion reflected by the target facial expression.
The terminal can adopt a face recognition algorithm to perform face recognition on the emotion image data to obtain a target facial expression contained in the emotion image data and a third emotion reflected by the target facial expression. The face recognition algorithm can be set by the system in a self-defined mode, and can include but is not limited to a face emotion recognition algorithm based on geometric features, a local feature analysis algorithm, a characteristic face algorithm, a neural network algorithm and the like.
Taking a face emotion recognition algorithm based on geometric features as an example, the terminal can perform face recognition on emotion image data by using the geometric features, for example, important feature organs such as human eyes, mouths, noses, and dimples are usually extracted as classification features to obtain a face image contained in the emotion image data. Furthermore, expression recognition can be carried out on the face image to obtain a target facial expression contained in the face image, and then a third emotion reflected by the target facial expression is obtained. For example, if the target facial expression is smiling, the third emotion reflected by the target facial expression is happy, and the like.
And S32, extracting the target limb behavior in the emotion image data, and obtaining a fourth emotion reflected by the target limb behavior.
The terminal can perform behavior recognition on the emotion image data by adopting a behavior recognition algorithm to obtain a target limb behavior contained in the emotion image data and a fourth emotion reflected by the target limb behavior. The behavior recognition algorithm may be pre-trained, which may include, but is not limited to, a deep learning based human behavior algorithm, a convolutional neural network based human behavior algorithm, and so on.
Alternatively, the output result of the behavior recognition algorithm may be the target body behavior included in the emotion image data, or may be a fourth emotion reflected by the target body behavior. When the output result of the behavior recognition algorithm is the target limb behavior contained in the emotion image data, the terminal needs to obtain a fourth emotion reflected by the target limb behavior from the limb emotion mapping relation table because different limb behaviors can correspond to different emotions. The body emotion mapping table comprises one or more groups of mapping relations between body behaviors and emotions, each body behavior corresponds to one emotion, and one emotion can correspond to one or more body behaviors. For example, table 1 below shows a schematic diagram of a limb emotion mapping table.
TABLE 1
Serial number Behavior of limbs User emotion
1 Limb behavior 1 Happy
2 Behavior of limbs 2 Anger and anger
....... ....... ......
And S33, calculating the similarity between the third emotion and the fourth emotion.
And S34, when the similarity between the third emotion and the fourth emotion is greater than or equal to a second threshold value, determining the third emotion or the fourth emotion as the target emotion.
And S35, when the similarity between the third emotion and the fourth emotion is smaller than a second threshold value, determining the third emotion as the target emotion.
And the terminal calculates the similarity between the third emotion and the fourth emotion by adopting a similarity algorithm. When the similarity between the third emotion and the fourth emotion is smaller than or equal to the second threshold, it indicates that the third emotion and the fourth emotion are relatively similar, and at this time, the terminal may use the third emotion or the fourth emotion as the target emotion reflected by the emotion data.
On the contrary, when the similarity between the third emotion and the fourth emotion is larger than the second threshold, the difference between the third emotion and the fourth emotion is large, and at the moment, the terminal can obtain the target emotion from the third emotion and the fourth emotion according to the preset judgment rule. The preset judgment rule is set by the system in a self-defined mode, for example, a third emotion reflected by the facial expression is directly determined as a target emotion and the like.
Alternatively, the terminal may determine a third emotion reflected by the target facial expression or a fourth emotion reflected by the target limb behavior as the target emotion regardless of the accuracy of emotion recognition. The target emotion is analyzed without comprehensively considering the facial expression and the body behavior, so that the computing resource of the terminal can be saved, and the processing efficiency is improved.
Optionally, in the actual processing process, because the granularity of the facial expression reflecting emotion and the granularity of the limb behavior reflecting emotion may be different, at this time, the terminal may verify or refine the emotion recognition mode with larger granularity by using the emotion recognition mode with smaller granularity, which may refer to the related description in the foregoing embodiment, and will not be described again here.
It should be noted that several specific embodiments related to the embodiments of the present invention may be used alone or in combination. For example, if the emotion data includes emotion voice data and emotion image data, the terminal may simultaneously combine respective emotion recognition modes of the emotion voice data and the emotion image data, comprehensively analyze and obtain a target emotion reflected by the emotion data, and similarly, reference may be made to the above specific implementation manner of obtaining the target emotion with respect to the emotion voice data and the emotion image data, which is not described herein again.
And S103, matching a corresponding target filter mode for the image to be processed according to the target emotion.
And after the terminal obtains the target emotion, the emotion filter mapping table can be obtained, and a target filter mode corresponding to the target emotion is further obtained from the emotion filter mapping table. The emotion filter mapping table may be pre-configured in a local database of the terminal, or may be configured in a remote server. The emotion filter mapping table comprises one or more groups of mapping relations between emotions and filter modes, each emotion corresponds to one filter mode, and one filter mode can correspond to one or more emotions. Illustratively, table 2 below shows a schematic diagram of an emotion filter map.
TABLE 2
And S104, performing filter processing on the image to be processed by adopting a target filter mode to obtain a target image.
In the embodiment of the invention, because the image to be processed is usually a coded image, for example, an image in a format such as JPG, PNG, etc., the terminal needs to decode the image to be processed to obtain a decoded image. Then, the terminal performs filtering and rendering on the decoded image by a Central Processing Unit (CPU) in a target filter mode to obtain a target image. Therefore, the terminal carries out emotion filtering on the image to be processed, the image enhancement effect is favorably improved, and the problems that the word is not satisfactory, the real intention of the user cannot be accurately expressed and the like are solved.
Fig. 4 is a schematic flow chart of another image processing method according to an embodiment of the present invention. The method as shown in fig. 4 comprises steps S401-S405.
S401, responding to a dynamic issuing instruction in the social application, and collecting emotion data.
S402, if the emotion data comprise emotion image data, determining the emotion image data as an image to be processed.
And if the terminal detects a dynamic issuing instruction aiming at the social application, the terminal can respond to the dynamic issuing instruction and collect emotion data. The emotion data may be emotion data of a given user or emotion data of a given user within a given time period. The emotion data may be referred to above, and specifically may be at least one of the following: emotion voice data, emotion image data, emotion text data, and emotion text data. The specified time period may be set by the user or by the system default, such as 60 seconds(s), etc. The designated user can be any user, and the terminal can record audio of the designated user to obtain emotional voice data; and tracking and shooting the appointed user to obtain emotion image data and the like.
If the emotion data comprises emotion image data, the terminal can directly use the emotion image data as an image to be processed, so that the situation that a user re-inputs the image to be processed is avoided, user operation is reduced, and the image processing efficiency is improved.
S403, identifying target emotion reflected by the emotion data;
s404, matching a corresponding target filter mode for the image to be processed according to the target emotion;
s405, performing filter processing on the image to be processed by adopting the target filter mode to obtain a target image. Optionally, the terminal may also post the target image in a social application for review by the user.
Fig. 5 is a schematic flow chart of another image processing method according to an embodiment of the present invention. The method as shown in fig. 4 comprises steps S501-S505.
S501, responding to a dynamic issuing instruction in the social application, and collecting emotion data.
And if the terminal detects a dynamic issuing instruction aiming at the social application, the terminal can respond to the dynamic issuing instruction and collect emotion data. The dynamic issuing command and the emotion data may be referred to above, and are not described herein again.
And S502, acquiring the image to be processed according to the dynamic issuing instruction.
In the embodiment of the invention, if the dynamic issuing instruction carries the image to be processed, the terminal can directly obtain the image to be processed by analyzing the dynamic issuing instruction. Or, if the dynamic issuing instruction does not carry the image to be processed but is used for instructing to acquire the image to be processed, the terminal may acquire the image to be processed according to the instruction of the dynamic issuing instruction, where the image to be processed may be input by the user or transmitted by another device (e.g., a server).
Optionally, after the terminal collects the emotion data, a prompt message may be sent to prompt the user whether to input the image to be processed. The implementation of the prompt message is not limited, and for example, the user is prompted by a pop-up window (floating window), a short message, a subtitle, a picture, or the like to select whether to input the image to be processed.
And S503, identifying the target emotion reflected by the emotion data.
And S504, matching a corresponding target filter mode for the image to be processed according to the target emotion.
And S505, performing filter processing on the image to be processed by adopting the target filter mode to obtain a target image.
For example, social applications are taken as examples of echo applications. Referring to fig. 6(a) -6 (e) there is shown a schematic diagram of a scene where mood data and images to be processed are collected. As shown in fig. 6(a), the user enables the echo application and enters the interface for using the echo application, as shown in fig. 6 (b). The user selects the release dynamics in the user interface and enters the audio recording interface shown in fig. 6 (c). The user presses the audio recording key for a long time to record emotion data of a specified time period for the user, namely emotion voice data. The emotion data was recorded for 38 seconds as shown in fig. 6 (d). Further, the user clicks a next button (shown as an icon for indicating next operation, specifically, a circle icon including a symbol larger than the symbol), and enters the interface shown in fig. 6(e), and the user actively selects the to-be-processed image to be dynamically published. Optionally, a prompt interface (not shown) may be additionally displayed between fig. 6(d) and fig. 6(e) to prompt the user whether to select the image to be processed, and if the terminal detects that the user needs to select the image to be processed, the terminal jumps to fig. 6(e) for the user to select and input the image to be processed.
It should be noted that, regarding the content not described in fig. 4 and fig. 5, the description in the embodiment of the method described in fig. 1 may be referred to correspondingly, and is not repeated here.
Optionally, the terminal may further perform steps S701-S705 in fig. 7 after obtaining the target image.
And S701, if the emotion data comprise emotion voice data or emotion text data, synthesizing the emotion voice data or emotion text data into the target image to obtain a synthesized image.
In one embodiment, if emotion voice data is included in the emotion data, the terminal may convert the emotion voice data into corresponding emotion text data, and further add the emotion text data to the target image in the form of image subtitles, thereby obtaining a synthesized image. The specific position of the emotion text data added to the target image is not limited, and may be added to the upper left corner, the upper right corner, or the middle position of the target image. For the conversion of the emotional speech data into the emotional text data, reference may be made to the related description in the foregoing embodiments, and details are not repeated here.
In another embodiment, if the emotion data includes emotion voice data, the terminal may embed the emotion voice data in the target image to obtain a synthesized image.
In another embodiment, if the emotion data directly contains emotion text data, the terminal may add the emotion text data to the target image in the form of image subtitles, thereby obtaining a composite image.
In another embodiment, if the emotion data includes emotion text data, the terminal may convert the emotion text data into corresponding emotion voice data, and embed the emotion voice data into the target image, thereby obtaining a synthesized image. The conversion method from the emotion text data to the emotion voice data is not limited, and for example, the terminal plays the emotion text data by using a preset voice mode (such as a child voice, a female treble, etc.) to form corresponding emotion voice data.
It should be noted that the above-described embodiments of obtaining a composite image may be used alone or in combination. For example, the terminal may embed the emotion voice data in the target image, or add emotion text data corresponding to the emotion voice data to the target image to obtain a synthesized image including voice and text data.
S702, publishing the composite image in the social application.
The terminal can further respond to the image publishing instruction and publish the synthetic image in the social application. The image issue instruction and the above dynamic issue instruction may refer to the same instruction or different instructions, and the present invention is not limited thereto. When they are different instructions, the image issuing instruction refers to an instruction generated when the terminal detects that the image issuing operation is performed, and the dynamic issuing instruction refers to an instruction generated when the terminal detects that the data acquisition operation is performed, and is used for acquiring emotion data and/or an image to be processed and the like. The image publishing operation is an operation set by a system in a self-defined way, such as clicking a publishing button and the like; accordingly, the data collection operation can also be a custom set operation of the system, such as clicking a voice recording button and the like.
For example, with reference to the emotion data and the image to be processed collected in the aforementioned examples shown in fig. 6(a) to 6(e), it is assumed that the terminal recognizes the target emotion reflected by the emotion data as a distraction. Referring to fig. 6(f) -6 (h), schematic diagrams of specific scenes of publishing a composite image in a social application are shown. Specifically, after recognizing that the target emotion reflected by the emotion data is happy, the terminal may perform filter processing on the image to be processed according to a target filter mode corresponding to happy feeling, and as shown in fig. 6(f), may render a happy expression of a smiley face on the image to be processed, to obtain the target image. Moreover, the terminal may embed the recorded emotion data (emotion voice data) into the target image to obtain a synthesized image, as shown in fig. 6 (g). Further the user may click on the issue button in the echo application to issue the composite image in the echo application, as shown in fig. 6 (h).
And S703, responding to the first viewing operation aiming at the synthetic image, and displaying the target image in the synthetic image.
And if the terminal detects a first viewing operation aiming at the synthetic image, responding to the first viewing operation and displaying a target image contained in the synthetic image in a display screen. The first viewing operation is system-customized, for example, according to product requirements or user preferences. For example, when a user browses a composite image in a social application and a terminal detects a browsing operation for the composite image, the terminal may display a target image included in the composite image, and may not play emotion voice data or display emotion text data or the like included in the composite image.
And S704, responding to a second viewing operation for the synthesized image, displaying a target image in the synthesized image and playing target voice data, wherein the target voice data can be emotion voice data contained in the emotion data or voice data obtained by converting emotion text data contained in the emotion data.
And whether the emotion data comprises emotion voice data and/or emotion text data, if the terminal detects a second viewing operation aiming at the synthetic image, responding to the second viewing operation, displaying a target image in the synthetic image in the display screen, and playing the target voice data. Wherein, if the emotion data comprises emotion voice data, the target voice data can be directly emotion voice data. If the emotion data includes emotion text data, the target speech data may be speech data converted corresponding to the emotion text data. If the emotion data includes emotion text data and emotion voice data, the target voice data may be emotion voice data in order to accurately convey the true intention of the user. The emotion text data set by default for the system may correspond to the converted voice data, and the like.
The second viewing operation can also be set by the system in a self-defining way, and is different from the first viewing operation. For example, if the terminal detects a double-click operation on the synthetic image in the social application, the terminal may enter full-screen display of the target image in the synthetic image and play emotional voice data in the synthetic image.
Optionally, after the terminal responds to the second viewing operation, target text data may be synchronously displayed, where the target text data may be emotion text data included in the emotion data, or text data correspondingly converted from emotion voice data included in the emotion data. Therefore, the watching experience of the user is guaranteed, and the utilization rate of the social application is improved.
And S705, responding to a third viewing operation for the synthetic image, and displaying a target image and target text data in the synthetic image, wherein the target text data can be emotion text data contained in the emotion data or text data correspondingly converted from emotion voice data contained in the emotion data.
And if the terminal detects a third viewing operation aiming at the synthetic image, responding to the third viewing operation and displaying the target image and the target text data in the synthetic image on the display screen. If the emotion data contains emotion text data, the target text data can be directly emotion text data. If the emotion data includes emotion voice data, the target text data may be text data converted corresponding to the emotion voice data. If the emotion data includes emotion voice data and emotion text data, the target text data may be emotion text data in order to save terminal resources. Optionally, the emotion voice data may also be text data corresponding to the conversion, and the present invention is not limited thereto.
The third viewing operation can also be set by the system in a self-defining way, and is different from the first viewing operation and the second viewing operation. For example, when the terminal detects a click operation on the composite image in the social application, the terminal may display the target image in the composite image and synchronously display the emotion text data and the like.
In practical applications, the terminal may perform any one or more of steps S703-S705. When the terminal can execute a plurality of steps, the execution sequence of each step is not limited, for example, the terminal may execute step S705 first and then step S703.
The social terminal in the embodiment of the present invention may include an internet device such as a smart phone (e.g., an Android phone, an IOS phone, etc.), a personal computer, a tablet computer, a palmtop computer, a Mobile Internet Device (MID), or a wearable smart device, and the embodiment of the present invention is not limited thereto.
By implementing the embodiment of the invention, the images of the sound or the characters can be presented through multi-sensory content presentation, for example, a sound and visual combined mode, so that the user can more accurately and abundantly display the published content in the social application, and the interestingness, the interactivity and the utilization rate of the social application are promoted. Moreover, the issued content (image) is enhanced based on emotion recognition, and the problems that the image enhancement effect is poor, the real intention of a user cannot be expressed and the like in the traditional technology are solved.
Fig. 8 is a schematic flowchart of an image processing method based on a scene application according to an embodiment of the present invention. The method as shown in fig. 8 comprises steps S801-S803.
S801, responding to a dynamic issuing instruction in the social application, and acquiring emotion data and an image to be processed.
According to the embodiment of the invention, if the terminal detects the dynamic issuing instruction in the social application, the terminal can respond to the dynamic issuing instruction in the social application to acquire the emotion data and the target image. Specifically, the terminal may obtain the emotion data and the image to be processed, and process the image to be processed based on the emotion data to obtain the target image, and reference may be made to the description in any one of the method embodiments in fig. 1, fig. 4, and fig. 5 for obtaining the target image, which is not described herein again.
The dynamic publishing instruction may be an instruction generated by a terminal detecting that a user performs a dynamic publishing operation in the social application, and the dynamic publishing operation may be a click operation, a sliding operation, and the like for a specified dynamic publishing key in the social application. The social application refers to software for achieving the purpose of user communication through a network, and may include but is not limited to blog-type applications, microblog-type applications, forum-type applications, social network-type applications (e.g., facebook), and instant messaging-type applications (e.g., WeChat, QQ, etc.).
S802, identifying the target emotion reflected by the emotion data, and matching a target filter mode corresponding to the target emotion for the image to be processed.
And S803, performing filter processing on the image to be processed by adopting a target filter mode to obtain a target image.
And S804, publishing the target image in the social application.
Alternatively, considering interest and completeness of dynamic (image) publishing and enhancing social contact of the user, the terminal may consider to synthesize emotion data and the target image to obtain a composite image so as to publish the composite image in a social contact application. Specifically, the method comprises the following steps:
in one embodiment, if only emotion image data is included in emotion data, the terminal may post the target image in the social application in response to the dynamic post instruction.
In another embodiment, if the emotion data includes emotion voice data or emotion text data, the terminal may synthesize the emotion voice data or emotion text data into the target image to obtain a synthesized image. And then publishing the composite image in the social application to complete corresponding dynamic publishing. The description of the synthesized image may correspond to the description related to the embodiment of the method described with reference to fig. 7, and will not be described herein again. For example, reference may be made to the related descriptions in the embodiments of the present invention in fig. 6(a) -6 (h), and the user completes publishing of the synthesized image in the social application by performing sequential operations in the social application, which is not described herein again.
By implementing the embodiment of the invention, the images of the sound or the characters can be presented through multi-sensory content presentation, for example, a sound and visual combined mode, so that the user can more accurately and abundantly display the published content in the social application, and the interestingness, the interactivity and the utilization rate of the social application are promoted. Moreover, the issued content (image) is enhanced based on emotion recognition, and the problems that the image enhancement effect is poor, the real intention of a user cannot be expressed and the like in the traditional technology are solved.
Based on the description of the above embodiment of the image processing method, the embodiment of the present invention also discloses an image processing apparatus, which may be a computer program (including a program code) running in a terminal. The apparatus may perform the method as described above with respect to any of the method embodiments of fig. 1-8. Referring to fig. 9, the image processing apparatus 800 may operate the following units:
an obtaining unit 801, configured to obtain emotion data and an image to be processed, where the emotion data includes emotion voice data, emotion image data, or emotion text data;
an identifying unit 802 for identifying a target emotion reflected by the emotion data;
a matching unit 803, configured to match a corresponding target filter pattern for the to-be-processed image according to the target emotion;
and the processing unit 804 is configured to perform filter processing on the image to be processed by using the target filter mode to obtain a target image.
In one embodiment, the obtaining unit 801 is specifically configured to collect emotion data in response to a dynamic issuing instruction in a social application; and if the emotion data comprises emotion image data, determining the emotion image data as the image to be processed.
In another embodiment, the obtaining unit 801 is specifically configured to collect emotion data in response to a dynamic issuing instruction in a social application; and acquiring the image to be processed according to the dynamic issuing instruction.
In another embodiment, the processing unit 804 is further configured to, if the emotion data includes emotion voice data or emotion text data, synthesize the emotion voice data or emotion text data into the target image to obtain a synthesized image; publishing the composite image in the social application.
In yet another embodiment, the processing unit 804 is further configured to display a target image in the composite image in response to a first viewing operation for the composite image; or responding to a second viewing operation aiming at the synthetic image, displaying a target image in the synthetic image and playing target voice data, wherein the target voice data is the emotion voice data or voice data corresponding to the emotion text data; or responding to a third viewing operation aiming at the synthetic image, and displaying a target image and target text data in the synthetic image, wherein the target text data is the emotion text data or text data corresponding to the emotion voice data.
In another embodiment, the matching unit 803 is specifically configured to obtain an emotion filter mapping relationship table, where the emotion filter mapping relationship table records a mapping relationship between an emotion and a filter pattern, and the mapping relationship is that one filter pattern corresponds to at least one emotion; and acquiring the target filter mode corresponding to the target emotion from the emotion filter mapping relation table.
In another embodiment, if the emotion data includes emotion voice data, the recognition unit 802 is specifically configured to convert the emotion voice data into corresponding emotion text data, and extract text features in the corresponding emotion text data; extracting acoustic features in the emotional voice data; and calling a first emotion model to perform fusion recognition on the text features and the acoustic features to obtain the target emotion.
In another embodiment, if the emotion data includes emotion voice data, the recognition unit 802 is specifically configured to convert the emotion voice data into corresponding emotion text data, and invoke a second emotion model to perform semantic analysis on the corresponding emotion text data to obtain a first emotion; calling a third emotion model to perform acoustic feature analysis on the emotion voice data to obtain a second emotion; determining the first emotion or the second emotion as the target emotion when a similarity between the first emotion and the second emotion is greater than or equal to a first threshold; determining the first emotion as a target emotion when the similarity between the first emotion and the second emotion is less than a first threshold.
In another embodiment, if the emotion data includes emotion image data, the identifying unit 802 is specifically configured to extract a target facial expression in the emotion image data, and obtain a third emotion reflected by the target facial expression; extracting target limb behaviors in the emotion image data, and obtaining a fourth emotion reflected by the target limb behaviors; determining the third emotion or the fourth emotion as the target emotion when a similarity between the third emotion and the fourth emotion is greater than or equal to a second threshold; determining the third emotion as the target emotion when the similarity between the third emotion and the fourth emotion is less than a second threshold.
In another embodiment, the identifying unit 802 is specifically configured to perform semantic analysis on the emotion text data to obtain at least one candidate emotion vocabulary; carrying out similarity matching on the candidate emotion vocabulary and a reference emotion vocabulary contained in the first emotion model to obtain the similarity between the candidate emotion vocabulary and the reference emotion vocabulary; determining the emotion reflected by the target emotion vocabulary as the first emotion; the target emotion vocabulary is a vocabulary corresponding to the at least one candidate emotion vocabulary in which the similarity is greater than or equal to a third threshold, and the weight of the reference emotion vocabulary is greater than or equal to a fourth threshold, and the weight of the reference emotion vocabulary is used for indicating the intensity of the emotion reflected by the reference emotion vocabulary.
In another embodiment, the identifying unit 802 is specifically configured to perform feature extraction on the emotion voice data in a time domain to obtain a time domain acoustic feature; extracting features of the emotion voice data on a frequency domain to obtain frequency domain acoustic features; and analyzing the time domain acoustic features and the frequency domain acoustic features to obtain the second emotion.
According to another embodiment of the present invention, the units in the image processing apparatus shown in fig. 9 may be respectively or entirely combined into one or several other units to form the image processing apparatus, or some unit(s) thereof may be further split into multiple units with smaller functions to form the image processing apparatus, which may achieve the same operation without affecting the achievement of the technical effects of the embodiments of the present invention. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present invention, the image processing apparatus may also include other units, and in practical applications, these functions may also be implemented by being assisted by other units, and may be implemented by cooperation of a plurality of units.
According to another embodiment of the present invention, the image processing apparatus device as shown in fig. 9 may be constructed by running a computer program (including program codes) capable of executing the steps involved in any of fig. 1 to 8, on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and a storage element, and an image processing method according to an embodiment of the present invention may be implemented. The computer program may be recorded on a computer-readable recording medium, for example, and loaded and executed in the above-described computing apparatus via the computer-readable recording medium.
According to the embodiment of the invention, emotion data and an image to be processed can be obtained, the target emotion reflected by the emotion data is identified, a corresponding target filter mode is matched for the image to be processed according to the target emotion, and finally the target filter mode is adopted to carry out filter processing on the image to be processed to obtain the target image. Therefore, the image is subjected to filter processing based on emotion, and the problems that the image enhancement effect is poor, the real intention of a user cannot be accurately expressed, the interaction enthusiasm is influenced and the like in the traditional technology can be solved.
Based on the description of the method embodiment and the device embodiment, the embodiment of the invention also provides a terminal. Referring to fig. 10, the terminal includes at least a processor 901, an input device 902, an output device 903, and a computer storage medium 904. The processor 901, input device 902, output device 903, and computer storage medium 904 in the terminal may be connected by a bus or other means.
A computer storage medium 904 may be stored in the memory of the terminal, said computer storage medium 904 being adapted to store a computer program comprising program instructions, said processor 901 being adapted to execute the program instructions stored by said computer storage medium 904. The processor 901 (or CPU) is a computing core and a control core of the terminal, and is adapted to implement one or more instructions, and in particular, is adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function; in one embodiment, the processor 901 according to the embodiment of the present invention may be configured to perform a series of image processing, including: acquiring emotion data and an image to be processed; identifying a target emotion reflected by the emotion data; matching a corresponding target filter mode for the image to be processed according to the target emotion; and performing filter processing on the image to be processed by adopting the target filter mode to obtain a target image, and the like.
The embodiment of the invention also provides a computer storage medium (Memory), which is a Memory device in the terminal and is used for storing programs and data. It is understood that the computer storage medium herein may include a built-in storage medium in the terminal, and may also include an extended storage medium supported by the terminal. The computer storage medium provides a storage space that stores an operating system of the terminal. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 901. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor.
In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by processor 901 to perform the corresponding steps described above with respect to the method in the image processing embodiments; in particular implementations, one or more instructions in the computer storage medium are loaded by the processor 901 and perform the following steps:
acquiring emotion data and an image to be processed, wherein the emotion data comprises emotion voice data, emotion image data or emotion text data;
identifying a target emotion reflected by the emotion data;
matching a corresponding target filter mode for the image to be processed according to the target emotion;
and performing filter processing on the image to be processed by adopting the target filter mode to obtain a target image.
In yet another embodiment, the one or more instructions are loaded and specifically executed by processor 901: collecting emotion data in response to a dynamic issuing instruction in a social application; and if the emotion data comprises emotion image data, determining the emotion image data as the image to be processed.
In yet another embodiment, the one or more instructions are loaded and specifically executed by processor 901: collecting emotion data in response to a dynamic issuing instruction in a social application; and acquiring the image to be processed according to the dynamic issuing instruction.
In yet another embodiment, the one or more instructions are also loadable and executable by the processor 901 to: if the emotion data comprise emotion voice data or emotion text data, synthesizing the emotion voice data or emotion text data into the target image to obtain a synthesized image; publishing the composite image in the social application.
In yet another embodiment, the one or more instructions are also loadable and executable by the processor 901 to: in response to a first viewing operation for the composite image, a target image in the composite image is displayed. Or responding to a second viewing operation aiming at the synthetic image, displaying a target image in the synthetic image and playing target voice data, wherein the target voice data is the emotion voice data or voice data corresponding to the emotion text data. Or responding to a third viewing operation aiming at the synthetic image, and displaying a target image and target text data in the synthetic image, wherein the target text data is the emotion text data or text data corresponding to the emotion voice data.
In yet another embodiment, the one or more instructions are loaded and specifically executed by processor 901: acquiring an emotion filter mapping relation table, wherein the emotion filter mapping relation table records the mapping relation between emotions and filter modes, and the mapping relation is that one filter mode corresponds to at least one emotion; and acquiring the target filter mode corresponding to the target emotion from the emotion filter mapping relation table.
In yet another embodiment, the one or more instructions are loaded and specifically executed by processor 901: converting the emotion voice data into corresponding emotion text data, and extracting text features in the corresponding emotion text data; extracting acoustic features in the emotional voice data; and calling a first emotion model to perform fusion recognition on the text features and the acoustic features to obtain the target emotion.
In yet another embodiment, the one or more instructions are loaded and specifically executed by processor 901: converting the emotion voice data into corresponding emotion text data, and calling a second emotion model to perform semantic analysis on the corresponding emotion text data to obtain a first emotion; calling a third emotion model to perform acoustic feature analysis on the emotion voice data to obtain a second emotion; determining the first emotion or the second emotion as the target emotion when a similarity between the first emotion and the second emotion is greater than or equal to a first threshold; determining the first emotion as a target emotion when the similarity between the first emotion and the second emotion is less than a first threshold.
In yet another embodiment, the one or more instructions are loaded and specifically executed by processor 901: extracting a target facial expression in the emotion image data, and obtaining a third emotion reflected by the target facial expression; extracting target limb behaviors in the emotion image data, and obtaining a fourth emotion reflected by the target limb behaviors; determining the third emotion or the fourth emotion as the target emotion when a similarity between the third emotion and the fourth emotion is greater than or equal to a second threshold; determining the third emotion as the target emotion when the similarity between the third emotion and the fourth emotion is less than a second threshold.
In yet another embodiment, the one or more instructions are loaded and specifically executed by processor 901: performing semantic analysis on the emotion text data to obtain at least one candidate emotion vocabulary; carrying out similarity matching on the candidate emotion vocabulary and a reference emotion vocabulary contained in the first emotion model to obtain the similarity between the candidate emotion vocabulary and the reference emotion vocabulary; determining the emotion reflected by the target emotion vocabulary as the first emotion; the target emotion vocabulary is a vocabulary corresponding to the at least one candidate emotion vocabulary in which the similarity is greater than or equal to a third threshold, and the weight of the reference emotion vocabulary is greater than or equal to a fourth threshold, and the weight of the reference emotion vocabulary is used for indicating the intensity of the emotion reflected by the reference emotion vocabulary.
In yet another embodiment, the one or more instructions are loaded and specifically executed by processor 901: performing feature extraction on the emotion voice data in a time domain to obtain time domain acoustic features; extracting features of the emotion voice data on a frequency domain to obtain frequency domain acoustic features; and analyzing the time domain acoustic features and the frequency domain acoustic features to obtain the second emotion.
According to the embodiment of the invention, emotion data and an image to be processed can be obtained, the target emotion reflected by the emotion data is identified, a corresponding target filter mode is matched for the image to be processed according to the target emotion, and finally the target filter mode is adopted to carry out filter processing on the image to be processed to obtain the target image. Therefore, the image is subjected to filter processing based on emotion, and the problems that the image enhancement effect is poor, the real intention of a user cannot be accurately expressed, the interaction enthusiasm is influenced and the like in the traditional technology can be solved. The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (15)

1. An image processing method, characterized in that the method comprises:
acquiring emotion data and an image to be processed, wherein the emotion data comprises emotion voice data, emotion image data or emotion text data;
identifying a target emotion reflected by the emotion data;
matching a corresponding target filter mode for the image to be processed according to the target emotion;
and performing filter processing on the image to be processed by adopting the target filter mode to obtain a target image.
2. The method of claim 1, wherein the obtaining mood data and the image to be processed comprises:
collecting emotion data in response to a dynamic issuing instruction in a social application;
and if the emotion data comprises emotion image data, determining the emotion image data as the image to be processed.
3. The method of claim 1, wherein the obtaining mood data and the image to be processed comprises:
collecting emotion data in response to a dynamic issuing instruction in a social application;
and acquiring the image to be processed according to the dynamic issuing instruction.
4. A method according to claim 2 or 3, characterized in that the method further comprises:
if the emotion data comprise emotion voice data or emotion text data, synthesizing the emotion voice data or emotion text data into the target image to obtain a synthesized image;
publishing the composite image in the social application.
5. The method of claim 4, further comprising:
displaying a target image in the composite image in response to a first viewing operation for the composite image; or,
responding to a second viewing operation aiming at the synthetic image, displaying a target image in the synthetic image and playing target voice data, wherein the target voice data is the emotion voice data or voice data correspondingly converted from the emotion text data; or,
and responding to a third viewing operation aiming at the synthetic image, and displaying a target image and target text data in the synthetic image, wherein the target text data is the emotion text data or text data correspondingly converted from the emotion voice data.
6. The method according to any one of claims 1-5, wherein the matching of the corresponding target filter pattern for the image to be processed according to the target emotion comprises:
acquiring an emotion filter mapping relation table, wherein the emotion filter mapping relation table records the mapping relation between emotions and filter modes, and the mapping relation is that one filter mode corresponds to at least one emotion;
and acquiring the target filter mode corresponding to the target emotion from the emotion filter mapping relation table.
7. The method of any one of claims 1-6, wherein the emotion data comprises emotional speech data, and wherein the identifying the target emotion reflected by the emotion data comprises:
converting the emotion voice data into corresponding emotion text data, and extracting text features in the corresponding emotion text data;
extracting acoustic features in the emotional voice data;
and calling a first emotion model to perform fusion recognition on the text features and the acoustic features to obtain the target emotion.
8. The method of any one of claims 1-6, wherein the emotion data comprises emotional speech data, and wherein the identifying the target emotion reflected by the emotion data comprises:
converting the emotion voice data into corresponding emotion text data, and calling a second emotion model to perform semantic analysis on the corresponding emotion text data to obtain a first emotion;
calling a third emotion model to perform acoustic feature analysis on the emotion voice data to obtain a second emotion;
determining the first emotion or the second emotion as the target emotion when a similarity between the first emotion and the second emotion is greater than or equal to a first threshold;
determining the first emotion as a target emotion when the similarity between the first emotion and the second emotion is less than a first threshold.
9. The method of any one of claims 1-6, wherein the mood data comprises mood image data, and wherein identifying the target mood reflected by the mood data comprises:
extracting a target facial expression in the emotion image data, and obtaining a third emotion reflected by the target facial expression;
extracting target limb behaviors in the emotion image data, and obtaining a fourth emotion reflected by the target limb behaviors;
determining the third emotion or the fourth emotion as the target emotion when a similarity between the third emotion and the fourth emotion is greater than or equal to a second threshold;
determining the third emotion as the target emotion when the similarity between the third emotion and the fourth emotion is less than a second threshold.
10. The method of claim 8, wherein invoking the second emotion model to semantically analyze the emotion text data, resulting in the first emotion comprises:
performing semantic analysis on the emotion text data to obtain at least one candidate emotion vocabulary;
carrying out similarity matching on the candidate emotion vocabulary and a reference emotion vocabulary contained in the first emotion model to obtain the similarity between the candidate emotion vocabulary and the reference emotion vocabulary;
determining the emotion reflected by the target emotion vocabulary as the first emotion;
the target emotion vocabulary is a vocabulary corresponding to the at least one candidate emotion vocabulary in which the similarity is greater than or equal to a third threshold, and the weight of the reference emotion vocabulary is greater than or equal to a fourth threshold, and the weight of the reference emotion vocabulary is used for indicating the intensity of the emotion reflected by the reference emotion vocabulary.
11. The method of claim 8, wherein said invoking a third emotion model to perform acoustic analysis on the emotional speech data, resulting in a second emotion comprises:
performing feature extraction on the emotion voice data in a time domain to obtain time domain acoustic features;
extracting features of the emotion voice data on a frequency domain to obtain frequency domain acoustic features;
and analyzing the time domain acoustic features and the frequency domain acoustic features to obtain the second emotion.
12. An image processing method, characterized in that the method comprises:
responding to a dynamic issuing instruction in the social application, and acquiring emotion data and an image to be processed; the emotion data comprises emotion voice data, emotion image data or emotion text data;
identifying a target emotion reflected by the emotion data, and matching a target filter mode corresponding to the target emotion to the image to be processed;
performing filter processing on the image to be processed by adopting the target filter mode to obtain a target image;
publishing the target image in the social application.
13. An image processing apparatus characterized by comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring emotion data and an image to be processed, and the emotion data comprises emotion voice data, emotion image data or emotion text data;
the identification unit is used for identifying the target emotion reflected by the emotion data;
the matching unit is used for matching a corresponding target filter mode for the image to be processed according to the target emotion;
and the processing unit is used for carrying out filter processing on the image to be processed by adopting the target filter mode to obtain a target image.
14. A terminal comprising an input device and an output device, further comprising:
a processor adapted to implement one or more instructions; and the number of the first and second groups,
a computer storage medium having stored thereon one or more instructions adapted to be loaded by the processor and to perform the image processing method according to any of claims 1-11.
15. A computer storage medium having stored thereon one or more instructions adapted to be loaded by a processor and to perform the image processing method according to any of claims 1-11.
CN201910693744.9A 2019-07-30 2019-07-30 Image processing method, device, terminal and computer storage medium Active CN110442867B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910693744.9A CN110442867B (en) 2019-07-30 2019-07-30 Image processing method, device, terminal and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910693744.9A CN110442867B (en) 2019-07-30 2019-07-30 Image processing method, device, terminal and computer storage medium

Publications (2)

Publication Number Publication Date
CN110442867A true CN110442867A (en) 2019-11-12
CN110442867B CN110442867B (en) 2024-07-26

Family

ID=68432176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910693744.9A Active CN110442867B (en) 2019-07-30 2019-07-30 Image processing method, device, terminal and computer storage medium

Country Status (1)

Country Link
CN (1) CN110442867B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110879840A (en) * 2019-11-19 2020-03-13 珠海格力电器股份有限公司 Information feedback method, device and storage medium
CN110991427A (en) * 2019-12-25 2020-04-10 北京百度网讯科技有限公司 Emotion recognition method and device for video and computer equipment
EP4174849A1 (en) * 2021-11-02 2023-05-03 Capital One Services, LLC Automatic generation of a contextual meeting summary
WO2024168698A1 (en) * 2023-02-16 2024-08-22 华为技术有限公司 Control method and apparatus and vehicle
CN118709146A (en) * 2024-08-28 2024-09-27 宏景科技股份有限公司 Emotion intelligent recognition method based on multi-mode data fusion

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203344A (en) * 2016-07-12 2016-12-07 北京光年无限科技有限公司 A kind of Emotion identification method and system for intelligent robot
CN107992824A (en) * 2017-11-30 2018-05-04 努比亚技术有限公司 Take pictures processing method, mobile terminal and computer-readable recording medium
CN108537749A (en) * 2018-03-29 2018-09-14 广东欧珀移动通信有限公司 Image processing method, device, mobile terminal and computer readable storage medium
CN108805089A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Based on multi-modal Emotion identification method
CN109254669A (en) * 2017-07-12 2019-01-22 腾讯科技(深圳)有限公司 A kind of expression picture input method, device, electronic equipment and system
CN109325904A (en) * 2018-08-28 2019-02-12 百度在线网络技术(北京)有限公司 Image filters treating method and apparatus
CN109410986A (en) * 2018-11-21 2019-03-01 咪咕数字传媒有限公司 Emotion recognition method and device and storage medium
CN109660728A (en) * 2018-12-29 2019-04-19 维沃移动通信有限公司 A kind of photographic method and device
CN109766759A (en) * 2018-12-12 2019-05-17 成都云天励飞技术有限公司 Emotion identification method and Related product

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203344A (en) * 2016-07-12 2016-12-07 北京光年无限科技有限公司 A kind of Emotion identification method and system for intelligent robot
CN109254669A (en) * 2017-07-12 2019-01-22 腾讯科技(深圳)有限公司 A kind of expression picture input method, device, electronic equipment and system
CN107992824A (en) * 2017-11-30 2018-05-04 努比亚技术有限公司 Take pictures processing method, mobile terminal and computer-readable recording medium
CN108537749A (en) * 2018-03-29 2018-09-14 广东欧珀移动通信有限公司 Image processing method, device, mobile terminal and computer readable storage medium
CN108805089A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Based on multi-modal Emotion identification method
CN109325904A (en) * 2018-08-28 2019-02-12 百度在线网络技术(北京)有限公司 Image filters treating method and apparatus
CN109410986A (en) * 2018-11-21 2019-03-01 咪咕数字传媒有限公司 Emotion recognition method and device and storage medium
CN109766759A (en) * 2018-12-12 2019-05-17 成都云天励飞技术有限公司 Emotion identification method and Related product
CN109660728A (en) * 2018-12-29 2019-04-19 维沃移动通信有限公司 A kind of photographic method and device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110879840A (en) * 2019-11-19 2020-03-13 珠海格力电器股份有限公司 Information feedback method, device and storage medium
CN110991427A (en) * 2019-12-25 2020-04-10 北京百度网讯科技有限公司 Emotion recognition method and device for video and computer equipment
EP4174849A1 (en) * 2021-11-02 2023-05-03 Capital One Services, LLC Automatic generation of a contextual meeting summary
US11967314B2 (en) 2021-11-02 2024-04-23 Capital One Services, Llc Automatic generation of a contextual meeting summary
WO2024168698A1 (en) * 2023-02-16 2024-08-22 华为技术有限公司 Control method and apparatus and vehicle
CN118709146A (en) * 2024-08-28 2024-09-27 宏景科技股份有限公司 Emotion intelligent recognition method based on multi-mode data fusion

Also Published As

Publication number Publication date
CN110442867B (en) 2024-07-26

Similar Documents

Publication Publication Date Title
CN110442867B (en) Image processing method, device, terminal and computer storage medium
CN110519636B (en) Voice information playing method and device, computer equipment and storage medium
CN110598576B (en) Sign language interaction method, device and computer medium
CN109176535B (en) Interaction method and system based on intelligent robot
CN111106995B (en) Message display method, device, terminal and computer readable storage medium
AU2016277548A1 (en) A smart home control method based on emotion recognition and the system thereof
CN116484318B (en) Lecture training feedback method, lecture training feedback device and storage medium
Kabani et al. Emotion based music player
CN107480766B (en) Method and system for content generation for multi-modal virtual robots
KR20180109499A (en) Method and apparatus for providng response to user's voice input
CN109710799B (en) Voice interaction method, medium, device and computing equipment
CN110825164A (en) Interaction method and system based on wearable intelligent equipment special for children
CN111413877A (en) Method and device for controlling household appliance
CN113703585A (en) Interaction method, interaction device, electronic equipment and storage medium
CN114708869A (en) Voice interaction method and device and electric appliance
CN112306238A (en) Method and device for determining interaction mode, electronic equipment and storage medium
CN109961152B (en) Personalized interaction method and system of virtual idol, terminal equipment and storage medium
CN110910898B (en) Voice information processing method and device
CN113205569B (en) Image drawing method and device, computer readable medium and electronic equipment
CN112860213B (en) Audio processing method and device, storage medium and electronic equipment
CN113762056A (en) Singing video recognition method, device, equipment and storage medium
CN113301352B (en) Automatic chat during video playback
CN110795581B (en) Image searching method and device, terminal equipment and storage medium
CN115376517A (en) Method and device for displaying speaking content in conference scene
CN111971670B (en) Generating a response in a dialog

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant