WO2018033156A1 - 视频图像的处理方法、装置和电子设备 - Google Patents
视频图像的处理方法、装置和电子设备 Download PDFInfo
- Publication number
- WO2018033156A1 WO2018033156A1 PCT/CN2017/098216 CN2017098216W WO2018033156A1 WO 2018033156 A1 WO2018033156 A1 WO 2018033156A1 CN 2017098216 W CN2017098216 W CN 2017098216W WO 2018033156 A1 WO2018033156 A1 WO 2018033156A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video image
- business object
- feature vector
- network model
- neural network
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 30
- 238000000034 method Methods 0.000 claims abstract description 117
- 238000013527 convolutional neural network Methods 0.000 claims description 137
- 230000000694 effects Effects 0.000 claims description 88
- 238000012545 processing Methods 0.000 claims description 81
- 238000012549 training Methods 0.000 claims description 70
- 230000033001 locomotion Effects 0.000 claims description 56
- 238000001514 detection method Methods 0.000 claims description 41
- 238000010425 computer drawing Methods 0.000 claims description 32
- 238000004891 communication Methods 0.000 claims description 27
- 230000009471 action Effects 0.000 claims description 26
- 230000006870 function Effects 0.000 claims description 26
- 238000012360 testing method Methods 0.000 claims description 23
- 230000003321 amplification Effects 0.000 claims description 12
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 11
- 238000012795 verification Methods 0.000 claims description 9
- 238000003062 neural network model Methods 0.000 claims description 8
- 239000002245 particle Substances 0.000 claims description 6
- 210000003813 thumb Anatomy 0.000 claims description 5
- 210000005036 nerve Anatomy 0.000 claims description 2
- 238000007689 inspection Methods 0.000 claims 1
- 230000004044 response Effects 0.000 abstract description 18
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000000903 blocking effect Effects 0.000 abstract 1
- 230000002265 prevention Effects 0.000 abstract 1
- 230000008569 process Effects 0.000 description 17
- 238000009877 rendering Methods 0.000 description 12
- 238000011176 pooling Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 8
- 230000009286 beneficial effect Effects 0.000 description 6
- 238000000605 extraction Methods 0.000 description 6
- 238000002372 labelling Methods 0.000 description 4
- 238000012935 Averaging Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 230000003190 augmentative effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000001815 facial effect Effects 0.000 description 2
- 230000006698 induction Effects 0.000 description 2
- 238000012804 iterative process Methods 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 206010063659 Aversion Diseases 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000004397 blinking Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 210000003414 extremity Anatomy 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 210000004247 hand Anatomy 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011022 operating instruction Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/40—Filling a planar surface by adding surface attributes, e.g. colour or texture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
- G11B27/036—Insert-editing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23424—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/466—Learning process for intelligent management, e.g. learning user preferences for recommending movies
- H04N21/4662—Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
- H04N21/4666—Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms using neural networks, e.g. processing the feedback provided by the user
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/812—Monomedia components thereof involving advertisement data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the embodiments of the present application relate to artificial intelligence technologies, and in particular, to a video image processing method, apparatus, and electronic device.
- Internet video has become an important business traffic portal and is considered a premium resource for ad placement.
- Existing video advertisements are mainly implanted.
- a common method of implanting is to insert a fixed-time advertisement before the video is played, or at a certain time of video playback, or in a fixed position in the area where the video is played and its surrounding area. Place the ad.
- the embodiment of the present application provides a technical solution for processing a video image.
- a method for processing a video image including: acquiring a video image to be processed and a business object to be displayed; determining a background area of the video image; The background area of the video image is drawn to the business object.
- a processing apparatus for a video image including: an acquiring module, configured to acquire a video image to be processed and a business object to be displayed; and a background area determining module, configured to determine a background area of the video image; a rendering module for drawing the business object in a background area of the video image by computer graphics.
- an electronic device including: a processor, a memory, a communication interface, and a communication bus, where the processor, the memory, and the communication interface are completed by using the communication bus Communication with each other; the memory is configured to store at least one executable instruction that causes the processor to perform operations corresponding to the processing method of the video image of any of the embodiments of the present application.
- another electronic device including: a processor and a processing device for video images according to any of the embodiments of the present application;
- the processor runs the processing device of the video image
- the unit in the video image processing apparatus according to any of the embodiments of the present application is executed.
- a computer program comprising computer readable code, the processor in the device executing the implementation of the present application when the computer readable code is run on a device The instruction of each step in the processing method of the video image according to any of the embodiments.
- a computer readable storage medium for storing computer readable instructions, the instructions being executed to implement the video image of any of the embodiments of the present application. The operation of each step in the processing method.
- the video image to be processed and the business object to be displayed are obtained, the background area of the video image to be processed is determined, and the business object is drawn in the background area of the video image by using a computer drawing manner to implement the business object.
- Displaying in the background area of the video image can prevent the business object from obscuring the foreground area, does not affect the normal video viewing experience of the viewer, and is not easy to cause the audience to resent, and is beneficial to realize the expected business object display effect.
- FIG. 1 is a flowchart of an embodiment of a method for processing a video image according to an embodiment of the present application
- FIG. 2 is a flow chart of another embodiment of a method for processing a video image of the present application
- FIG. 3 is a flow chart of still another embodiment of a method for processing a video image of the present application
- FIG. 4 is a flowchart of still another embodiment of a method for processing a video image of the present application
- FIG. 5 is a flowchart of still another embodiment of a method for processing a video image of the present application
- FIG. 6 is a structural block diagram of an embodiment of a processing apparatus for video images of the present application.
- FIG. 7 is a structural block diagram of another embodiment of a processing apparatus for a video image of the present application.
- FIG. 8 is a schematic structural diagram of an embodiment of an electronic device of the present application.
- FIG. 9 is a schematic structural diagram of another embodiment of an electronic device according to the present application.
- Embodiments of the present application can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general purpose or special purpose computing system environments or configurations.
- Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
- Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
- program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
- the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
- program modules may be located on a local or remote computing system storage medium including storage devices.
- FIG. 1 is a flowchart of an embodiment of a method for processing a video image according to an embodiment of the present application.
- the processing method of the video image of each embodiment of the present application may be exemplarily performed by any device having data acquisition, processing, and transmission functions, including but not limited to an electronic device such as a terminal device, a personal computer (PC), a server, or the like.
- the embodiment of the present application does not limit the device that implements the processing method of the video image in the embodiment of the present application.
- a method for processing a video image in this embodiment includes:
- Step S100 Acquire a video image to be processed and a business object to be displayed.
- the video image currently being displayed is acquired, that is, the live video image; for example, in the video recording scene, the video image currently being recorded is acquired; for example, in the recorded video, the current video is obtained.
- the video image in various embodiments of the present invention may also be a still image. The embodiment does not limit the video image type, the acquisition mode, and the specific acquisition means.
- the business object to be displayed is an object that can be displayed in a video image according to a certain business requirement.
- the business object may include, but is not limited to, any one or more of the following: an effect (which may be referred to as a target) containing semantic information (eg, advertisements, entertainment, weather forecasts, traffic forecasts, pets, etc.) Special effects), video (which can be called target video), image (which can be called target image), etc., wherein the target effect can be, for example, a three-dimensional (3D) form of special effects, such as a 3D form of advertising effect such as an advertisement displayed using a 3D special effect form.
- the target video may be a scene video, which may be a recorded scene video or a live scene video; the target video may also be a sports video, which may be a recorded sports video or a live broadcast. Sports video.
- the invention is not limited thereto, and other forms of business objects are also applicable to the technical solutions of the embodiments of the present application, such as an application (APP) or an application description or introduction, or a certain form of an object (such as an electronic pet) that interacts with a video audience. Etc., this embodiment does not limit the specific form and content of the target video.
- the process of acquiring the business object to be displayed may be concurrently performed with the process of acquiring the video image to be processed.
- the line may be executed after or before the process of acquiring the video image to be processed.
- the order of execution of the business object to be displayed in this embodiment is not limited in this embodiment.
- step S100 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an acquisition module 600 executed by the processor.
- Step S102 Determine a background area of the video image.
- the foreground area and the background area of the video image may be detected from the video image; or the background area of the video image may be directly detected from the video image Or the foreground area of the video image can be detected from the video image, and then the area outside the foreground area in the video image is determined as the background area of the video image.
- This embodiment does not limit the technical means for determining the foreground area of the video image.
- the foreground area of the live video image includes the area where the character is located, and the background area of the live video image includes at least a partial area other than the area where the character is located.
- step S102 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a background area determination module 602 that is executed by the processor.
- Step S104 drawing a business object in a background area of the video image by using a computer drawing method.
- the business object is drawn in the background area of the video image by computer drawing.
- the business object can be drawn in the background area of the video image by computer drawing, that is, the business object is drawn by computer drawing. Draws the entire background area or part of the background area of the video image.
- the drawing of the business object in the determined background area by using the computer drawing method can be implemented by appropriate computer graphics image drawing or rendering, for example, but not limited to: drawing based on an open graphics language (OpenGL) graphics rendering engine, etc. .
- OpenGL defines a professional graphical program interface for cross-programming language and cross-platform programming interface specifications. It is hardware-independent and can easily draw 2D or 3D graphics images.
- 3D stickers can be drawn, and 3D effects can be drawn and particle effects can be drawn.
- the present application is not limited to the drawing method based on the OpenGL graphics rendering engine, and other methods may be adopted.
- the drawing manner based on the graphics rendering engine such as Unity or OpenCL is also applicable to the embodiments of the present application.
- step S104 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a rendering module 604 being executed by the processor.
- the execution order between the business object to be displayed and the operation of determining the background area of the video image may be in any order, that is, one of the operations may be restricted, and the other may be executed.
- the two operations can also be performed at the same time, which does not affect the implementation of the embodiment of the present application.
- the video image to be processed and the business object to be displayed are acquired, the background area of the video image to be processed is determined, and the business object is drawn in the background area of the video image by using computer graphics.
- the business object is displayed in the background area of the video image, which can prevent the business object from obscuring the foreground area, does not affect the normal video viewing experience of the viewer, and is not easy to cause the audience to resent, and is beneficial to realize the expected business object display effect.
- FIG. 2 is a flowchart of another embodiment of a method for processing a video image according to an embodiment of the present application.
- the processing method of the video image in this embodiment includes:
- Step S200 Acquire a video image to be processed and a business object to be displayed.
- step S200 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an acquisition module 600 executed by the processor.
- Step S202 determining a foreground area and a background area of the video image, and a display position of the business object in the video image.
- a foreground area and a background area of the video image may be detected from the video image; or, a foreground area of the video image may be directly detected from the video image; Alternatively, the background area of the video image may be detected from the video image, and the area outside the background area in the video image may be determined as the foreground area of the video image.
- This embodiment does not limit the technical means for determining the foreground area of the video image. For the same reason, the determination of the background area of the video image can be determined by referring to the implementation manner of determining the foreground area of the video image, and details are not described herein again.
- the display position may be a central position of a designated area in the video image, or may be a coordinate of a plurality of edge positions in the specified area or the like.
- step S200 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a foreground region determination module 6042, a background region determination module 602, and a placement determination module 6040 that are executed by the processor.
- Step S204 Determine whether there is an overlapping portion between the business object and the foreground area according to the above display position. If yes, step S206 is performed; if not, step S208 is performed.
- the purpose of determining whether the business object and the foreground area overlap is to determine whether the business object occludes the foreground area of the video image. If there is an overlap, it means that the business object occludes the foreground area of the video image; if there is no heavy The overlapped portion indicates that the business object does not occlude the foreground area of the video image.
- step S204 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a decision module 6044 executed by the processor.
- Step S206 The computer drawing mode is used to draw a portion of the business object other than the overlapping portion in the background area.
- the business object Y has an overlapping portion c with the foreground region q of the video image S, and a portion w of the business object Y other than the overlapping portion c is drawn in the background region b of the video image S, wherein The overlapping portion c and the portion w together form a business object Y, and the foreground region q and the background region b together constitute a business object Y.
- the foreground area may be displayed on the service object by adjusting the display layer of the foreground area, or the service may be adjusted.
- the way the object's display layer is displayed is placed under the foreground area.
- the embodiments of the present application do not specifically limit the technical means for placing the foreground area on the business object.
- step S206 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a presentation module 6046 executed by the processor.
- Step S208 The entire business object is drawn in the background area by using a computer drawing method.
- step S208 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a rendering module 604 being executed by the processor.
- the video image processing method provided in this embodiment acquires a video image to be processed and a business object to be displayed, determines a foreground area and a background area of the video image to be processed, and a display position of the business object in the video image, and then Determining whether there is an overlap between the business object and the foreground area of the video image according to the display position of the business object in the video image. If there is an overlapping part, indicating that part or all of the business object is located in the foreground area, the foreground area is set for the overlapping part. Displayed on top of the business object, avoiding the business object from obscuring the foreground area, does not affect the normal video viewing experience of the viewer, and is not easy to cause the audience to resent, and can effectively achieve the desired effect.
- FIG. 3 is a flowchart of still another embodiment of a method for processing a video image according to an embodiment of the present application.
- the processing method of the video image in this embodiment includes:
- Step S300 Acquire a video image to be processed and a business object to be displayed.
- step S300 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an acquisition module 600 executed by the processor.
- Step S302 determining a background area of the video image by the pre-trained first convolutional neural network model.
- the video image is detected using a pre-trained first convolutional neural network model for segmenting the foreground and background regions in the video image to determine the background region of the video image.
- the first convolutional neural network model for segmenting the foreground region and the background region in the video image is trained by labeling the sample image with the foreground region and the background region, and the first convolutional neural network can be used to detect the video image to obtain a video.
- the prediction information of the foreground area of the image and the prediction information of the background area can determine the background area in the video image based on the prediction information of the background area, and can accurately and efficiently determine the foreground area and the background area in the video image.
- an optional training method includes the following processes:
- the first sample image is a sample image including the foreground region and the background region marker information, that is, the first sample image is a sample image in which the foreground region and the background region have been marked.
- the foreground area may be an area where the image body is located, for example, the area where the character is located; the background area may be other areas than the area where the subject is located, and may be all or part of other areas.
- the first sample image may comprise a multi-frame sample image of at least one video stream.
- the video stream including the multi-frame sample image may be input to the first convolutional neural network model before the first feature vector of the first sample image is acquired.
- an optional method includes: first determining an image of a plurality of key frames of the video stream as a sample image, and performing annotation of the foreground region and the background region on the sample images; on the basis of which, the labeled sample is to be marked. The images are combined, and the combined video stream of the sample images including the plurality of frames is input into the first convolutional neural network model.
- the key frame is extracted from the video stream and the extracted key frame is labeled, and can be implemented by any suitable method by a person skilled in the art, for example, by extracting key frames by means of uniform sampling.
- the extracted key frame can be combined with the video context to distinguish the foreground and the background, and a more accurate label boundary is obtained.
- the labeled sample image is taken as the first sample image, and the first feature vector is extracted.
- the extraction of the first feature vector may be implemented in an appropriate manner in the related art, and details are not described herein again.
- Convolution processing is performed on the first feature vector by using the first convolutional neural network model to obtain a first feature vector convolution result.
- the acquired first feature vector convolution result includes information for distinguishing the foreground area and the background area of the video image.
- the number of convolution processes for the first feature vector can be set according to actual needs, that is, in the first convolutional neural network model, The number of layers of the convolutional layer can be set according to actual needs.
- the first feature vector convolution result is a result of feature extraction of the first feature vector, and the result can effectively characterize the feature and classification of the foreground region and the background region in the video image.
- the amplification of the first feature vector convolution result may adopt a linear interpolation manner, such as, but not limited to, linear interpolation, bilinear interpolation, trilinear interpolation, and the like.
- the linear interpolation formula used therein can be appropriately modified by a person skilled in the art according to actual needs, which is not limited in this embodiment.
- the first feature vector convolution result may be amplified by bilinear interpolation of the first feature vector convolution result.
- the first convolutional neural network model can learn a more accurate amplification coefficient, and based on the amplification factor and the amplified first feature vector, it is beneficial to reduce The parameter adjustment and calculation amount of the first convolutional neural network model reduces the training cost of the first convolutional neural network model, improves the training efficiency, and shortens the training time.
- the first feature vector convolution result is bilinearly interpolated by the linear interpolation layer to amplify the convolution processed image feature, and the obtained original image is obtained.
- the same size the same length and width of the image
- the specific implementation means for bilinear interpolation is not limited.
- the convolution convergence condition can be appropriately set by a person skilled in the art according to actual needs.
- the amplified first feature vector convolution result satisfies the convolution convergence condition
- the amplified first feature vector convolution result cannot satisfy the convolution convergence In the condition, it can be considered that the network parameters in the first convolutional neural network model are not properly set and need to be adjusted.
- the adjustment may be an iterative process until the first feature vector is convolved using the adjusted network parameters.
- the result of the processing satisfies the convolution convergence condition.
- the enlarged first feature vector convolution result may be calculated by using the loss function in the loss layer, and then it is determined according to the calculation result whether it is satisfied.
- Convolution convergence condition That is, the loss value of the amplified first feature vector convolution result is calculated using the set loss function; and whether the amplified first feature vector convolution result satisfies the convolution convergence condition is determined according to the loss value.
- the loss layer and the loss function may be appropriately set by a person skilled in the art according to actual conditions, such as by a softmax function or a logistic function.
- the loss value is obtained by calculating the first feature vector convolution result and the predetermined standard output feature vector by the loss function.
- the loss value After obtaining the loss value, in an optional manner, whether the first eigenvector convolution result amplified according to the loss value satisfies a convolution convergence condition, such as whether the loss value is less than or equal to a set threshold, thereby determining the present Whether the secondary training result satisfies the convolution convergence condition; in another alternative manner, it can be judged whether the calculation of the loss value has reached the set number of times, that is, the first convolutional neural network model in this training Whether the iteration training number has reached the set number of times, and if it is reached, the convolution convergence condition is satisfied.
- a convolution convergence condition such as whether the loss value is less than or equal to a set threshold
- the loss function of the loss layer may separately calculate the loss value of the multi-frame image in the video stream, and output the result of the multi-frame to make the first volume.
- the product neural network model obtains more stable results on the video and improves the computational efficiency by parallel computing of multi-frame images.
- the training of the first convolutional neural network model is completed; if the convolution convergence condition is not satisfied, the first convolutional neural network is adjusted according to the convolved first feature vector convolution result
- the network parameters of the model are iteratively trained according to the adjusted network parameters of the first convolutional neural network model to the first convolution description network model until the first eigenvector convolution result after the iterative training satisfies the convolution convergence condition .
- the first convolutional neural network model can extract and classify the image features of the video image, thereby having the function of determining the foreground region and the background region in the video image.
- the first convolutional ejaculation network model can be used to identify the background area in the video image.
- the trained first convolutional neural network model can be tested for accuracy by testing the sample image, and then the first convolution is determined according to the test result.
- the neural network model also retrains the first convolutional neural network model. In this manner, after the initial training of the first convolutional neural network model is completed, the test sample image can also be acquired, and the pre-background region of the test sample image is predicted using the trained first convolutional neural network model.
- the test sample image may be a sample image without any labeling; whether the predicted front background region is correct; if not, the first convolutional neural network model is retrained; if correct, the first convolution
- the neural network model can be used to perform front background determination of the video image, or, in order to make the first convolutional neural network model more accurate, and then obtain other test sample images for testing; or, using a sample image different from the original training sample image Train again.
- the first convolutional neural network model needs to be retrained when it is verified by the test sample image that the front background region predicted using the first convolutional neural network model is incorrect.
- a retraining mode only the former from the test sample image can be used.
- the background region predicts an incorrect sample image as a sample image for re-training; then, the first convolutional neural network model is retrained using these predicted incorrect sample images.
- These test sample images that are retrained can be labeled before the background for training. Through this retraining method, not only the training is more targeted, but also the training cost is saved.
- the embodiment of the present application is not limited thereto. In actual use, other sample images that have been labeled with the front background may be used for training.
- step S302 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a background area determination module 602 that is executed by the processor.
- Step S304 Determine the display position of the business object in the video image.
- An optional implementation manner includes:
- the display image to be displayed is determined from the multi-frame display image corresponding to the business object; and the display position of the display image to be displayed in the video image is determined.
- Method 2 Determine the display position of the business object in the video image according to the setting rule.
- the display image whose first frame has not been played is selected, and the selected display image is determined as the display image to be displayed.
- the display image that has not been played in the first frame is selected from the multi-frame display image with the play order corresponding to the service object. Specifically, the play status is determined from the multi-frame display image corresponding to the play order corresponding to the service object.
- the undisplayed display image is selected, and then the first frame display image in the undisplayed display image is selected. In this case, after the selected display image in which the first frame has not been played is played, the playback state of the multi-frame display image having the playback order is updated.
- the business object may be composed of multiple frames of images. Each frame of the video image is time-aligned with each frame of the business object. When the video image plays the next frame image, the next frame image of the corresponding business object can also be displayed synchronously.
- the setting rule may include, for example, setting the rule 1, setting the display position of the preset business object in the video image, or setting the rule 2 And determining the display position of the business object in the video image according to the motion detection data of the target object in the foreground area.
- the preset business object has a corresponding relationship with the display position in the video image, and if the business object to be displayed is determined to be a preset business object, the display position of the business object to be displayed is correspondingly Determined as the set placement of the pre-set business object in the video image.
- the pre-trained second convolutional neural network model and the motion detection data of the target object in the foreground area can be respectively used to determine the display position of the business object in the video image; and according to the type of the business object And the motion detection data of the target object in the foreground area, determining the display position of the business object in the video image; or determining the business object in the video image according to the preset motion data and the motion detection data of the target object in the foreground area Placement.
- the following three ways of determining the placement of a business object in a video image are exemplified below.
- a convolutional neural network model ie, a second convolutional neural network model
- the trained second convolutional neural network model has the function of determining the position of the business object in the video image; or, directly
- a convolutional neural network model that has been trained by a third party to function to determine the placement of a business object in a video image.
- an optional training method includes the following process:
- the second feature vector includes position information and/or confidence information of the business object in the second sample image, and a target object feature vector of the target object in the second sample image.
- the confidence information of the business object indicates the probability that the business object can achieve the effect (such as being focused or clicked or viewed) when the current location is displayed.
- the probability may be set according to the statistical analysis result of the historical data, or may be According to the results of the simulation experiment, it can also be set according to the artificial experience.
- only the location information of the business object may be trained according to actual needs, or only the confidence information of the business object may be trained, and both may be trained. Training both of them enables the trained second convolutional neural network model to more effectively and accurately determine the location information and confidence information of the business object, so as to provide a basis for the processing of the video image.
- the second convolutional neural network model trains a large number of second sample images.
- the second convolutional neural network model can be trained using the second sample image including the business object, and those skilled in the art should understand
- the second sample image used for training may include information of the target object attribute, that is, facial expressions and/or actions of the face, gestures and/or actions of the human hand, and the like, in addition to the business object.
- the business object in the second sample image in this embodiment may be marked with pre-labeled location information, or confidence information, or both location information and confidence information. Of course, in practical applications, this information is also available. Take it through other means. By marking the corresponding information on the business object in advance, the data processing data and the number of interactions can be effectively saved, and the data processing efficiency is improved.
- Position information and/or confidence information of the business object and a second sample image of a certain target object attribute are used as training samples, and feature vector extraction is performed to obtain location information and/or confidence information including the business object.
- the business object feature vector and the target object feature vector corresponding to the target object attribute are used as training samples, and feature vector extraction is performed to obtain location information and/or confidence information including the business object.
- the target object and the business object may be simultaneously trained using the second convolutional neural network model.
- the second feature vector of the second sample image also includes the feature of the target object.
- the extraction of the second feature vector may be implemented in an appropriate manner in the related art, and details are not described herein again.
- the acquired second feature vector convolution result includes the location information and/or the confidence information of the service object, and the feature vector convolution result corresponding to the target object feature vector corresponding to the target object attribute.
- the feature vector convolution result also includes the target object information.
- the number of times of convolution processing on the second feature vector can be set according to actual needs, that is, in the second convolutional neural network model, the number of layers of the convolution layer is set according to actual needs, and details are not described herein again.
- the second feature vector convolution result is a result of feature extraction of the second feature vector, and the result can effectively represent the business object corresponding to the feature of the target object in the video image.
- the second feature vector convolution result is shared in the subsequent judgment of the convergence condition separately, and no need for repeated processing and calculation, which can reduce resource loss caused by data processing, and improve data processing speed and efficiency.
- the service object convergence condition and the target object convergence condition can be appropriately set by a person skilled in the art according to actual needs.
- the network parameter in the second convolutional neural network model may be considered to be appropriate; when the location information and/or the confidence If the degree information does not satisfy the convergence condition of the business object, and/or the target object feature vector does not satisfy the target object convergence condition, it may be considered that the network parameter setting in the second convolutional neural network model is inappropriate, and the network parameter needs to be adjusted.
- the adjustment process can be an iterative process until the result of convolution of the second feature vector using the adjusted network parameters satisfies the convergence condition.
- the service object convergence condition may be set according to a preset standard location and/or a preset standard confidence, for example, a location indicated by the location information of the service object in the second feature vector convolution result.
- the distance from the preset standard position satisfies a certain threshold value as a convergence condition of the location information of the business object; the confidence level indicated by the confidence information of the business object in the second feature vector convolution result is compared with the preset standard confidence level The difference between the two meets a certain threshold as a convergence condition of the confidence information of the business object.
- the preset standard position may be an average position obtained by averaging the positions of the business objects in the second sample image to be trained; and the preset standard confidence may be the second sample image to be trained.
- an optional manner includes:
- the first distance between the location and the preset standard location determines whether the location information of the corresponding service object satisfies the convergence condition of the service object according to the first distance;
- the second distance between the confidence level indicated by the confidence information and the preset standard confidence level determines whether the confidence information of the corresponding service object satisfies the convergence condition of the service object according to the second distance.
- the Euclidean distance method is adopted, and the implementation is simple and can effectively indicate whether the convergence condition is satisfied.
- the embodiment of the present application is not limited thereto, and other methods such as a horse distance, a bar distance, and the like may also be adopted.
- the preset standard position is an average position obtained by averaging the positions of the business objects in the second sample image to be trained; and/or, the preset standard confidence is to be trained.
- the confidence of the business object in the second sample image is averaged after the average processing.
- the second feature vector convolution result further includes information of the target object
- the judgment of whether the information of the target object converges can be referred to The judgment is made according to the convergence condition of the second convolutional neural network model, and will not be described here. If the information of the target object satisfies the convergence condition, that is, the target object feature vector in the second feature vector convolution result satisfies the target object convergence condition, the target object may be classified, and the belonging category of the target object is clarified, so that the subsequent business object is displayed. Location determination provides a reference and basis.
- the training of the second convolutional neural network model is completed; If any or all of the convergence conditions are not satisfied, that is, the location information and/or the confidence information does not satisfy the convergence condition of the business object, and/or the target object feature vector does not satisfy the target object convergence condition, the convolution result according to the second feature vector Position information and/or confidence information of the corresponding business object, and target object feature vector in the second feature vector convolution result, adjusting parameters of the second convolutional neural network model, and according to the adjusted second volume
- the network parameters of the neural network model are iteratively trained on the second convolutional neural network model until the position information and/or confidence information of the service object after the iterative training satisfies the convergence condition, and the target object feature vector satisfies the target object convergence condition.
- the second convolutional neural network model can perform feature extraction and classification on the display position based on the business object, thereby having the function of determining the display position of the business object in the video image.
- the second convolutional neural network model can also determine the order of the display effects in the plurality of display positions, thereby determining the final display position. In subsequent applications, when a business object needs to be displayed, a valid placement can be determined from the video image.
- the second sample image may be pre-processed, including: acquiring a plurality of second sample images, where each second sample image includes a service The labeling information of the object; determining the location of the business object according to the labeling information, determining whether the distance between the determined location of the business object and the preset location is less than or equal to a set threshold; and the second sample corresponding to the business object that is less than or equal to the set threshold
- the image is determined to be a second sample image to be trained.
- the preset position and the set threshold may be appropriately set by any suitable means by a person skilled in the art, for example, according to the statistical analysis result of the data or the calculation formula of the related distance or the artificial experience, etc., which is not limited in this embodiment.
- the location of the business object determined according to the annotation information may be the central location of the business object.
- the central location of the business object may be determined according to the annotation information; and then determining the central location and the pre-determination Set whether the variance of the position is less than or equal to the set threshold.
- the sample image that does not meet the condition can be filtered out to ensure the accuracy of the training result.
- the training of the second convolutional neural network model is realized, and the trained second convolutional neural network model can be used to determine the position of the business object in the video image. For example, in the live broadcast process, if the anchor clicks on the business object to indicate the display of the business object, after the second convolutional neural network model obtains the facial feature point of the anchor in the live video image, the final display of the business object may be indicated.
- the location is a location in the background area from the preset distance of the anchor of the anchor, thereby controlling the live application to display the business object at the location; or, in the live broadcast of the video, if the anchor clicks on the business object to indicate the display of the business object, the second volume
- the product neural network model can directly determine the display position of the business object based on the live video image.
- the action type of the target object includes, for example but not limited to, a face type, a hand type, and an action type.
- the face type is used to indicate that the face occupies a main part in the video image
- the hand type is used to indicate that the hand occupies a main part in the video image
- the action type is used to indicate that the character has performed some action.
- the motion detection data of the target object may include, but is not limited to, any one or more of the following: blink data, open mouth data, nodding data, moving head data, kiss data, smile data, wave data, scissors data, fist data, and hand held hands
- the data, the thumb data, the pistol posture data, the OK hand data, and the like, the action corresponding to the target corresponding to the motion detection data may include, but is not limited to, any one or more of the following: blinking, opening, nodding, shaking, kissing, smiling , wave, scissors, fist, hand, thumb, pistol posture, OK hand, etc.
- Determining whether the motion detection data of the target object in the foreground area matches the preset motion data that is, determining whether the action of the target object in the foreground area matches the preset action; if it matches, the action stored in advance
- the target display position corresponding to the preset action data is acquired as the display position of the business object in the video image.
- a plurality of different motion data may be preset, and different motion data may be correspondingly marked, such as an action marked as a face or an action of a hand.
- Different action data corresponds to different placements.
- the motion detection data of the target object in the foreground area is matched with the preset motion data, and if the motion detection data is the same as the preset motion data, that is:
- the operation of the target object in the foreground region is the same as the preset action, and it is possible to determine that the motion detection data of the target object in the foreground region matches the motion data set in advance.
- the matching result may be determined by calculation.
- the matching algorithm may be set to calculate the matching degree between the motion detection data and the preset motion data.
- the motion detection data and the preset may be used.
- the matching motion data is matched and calculated, and the matching degree value between the two is obtained.
- the matching degree value between the motion detection data and each preset motion data is respectively calculated by the above method, and the matching degree value is selected from the obtained matching degree values.
- the maximum matching degree value if the maximum matching degree value exceeds a predetermined matching threshold, it may be determined that the preset matching action data corresponding to the maximum matching degree value matches the motion detection data. If the maximum matching value does not exceed the predetermined matching threshold, the match fails.
- steps S302 and S304 may be performed at the same time, or may be performed in any order. For example, step S302 is performed first, then step S304 is performed, or step S304 is performed first, and then step S302 is performed.
- the order of execution of S302 and step S304 is not specifically limited.
- step S304 may be performed by a processor invoking a corresponding instruction stored in memory, or may be performed by a placement determination module 6040 that is executed by the processor.
- Step S306 Determine whether there is an overlapping portion between the business object and the foreground area according to the display position. If yes, step S308 is performed; if not, step S310 is performed.
- the purpose of determining whether the business object and the foreground area overlap is to determine whether the business object occludes the foreground area of the video image. If there is an overlapping part, it indicates that the business object occludes the foreground area of the video image; if there is no overlapping part, it indicates that the business object does not occlude the foreground area of the video image.
- step S306 may be performed by the processor invoking a corresponding instruction stored in the memory, or may be performed by a decision module 6044 executed by the processor.
- Step S308 The computer drawing mode is used to draw a portion of the business object other than the overlapping portion in the background area.
- the business object Y has an overlapping portion c with the foreground region q of the video image S, and a portion w of the business object Y other than the overlapping portion c is drawn in the background region b of the video image S, wherein The overlapping portion c and the portion w together form a business object Y, and the foreground region q and the background region b together constitute a business object Y.
- the foreground area may be displayed on the service object by adjusting the display layer of the foreground area, or the service may be adjusted. The way the object's display layer is displayed is placed under the foreground area. This embodiment does not specifically limit the technical means for placing the foreground area on top of the business object.
- the foreground area corresponding to the overlapping part may be displayed on the business object corresponding to the overlapping part.
- step S308 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a presentation module 6046 executed by the processor.
- Step S310 The entire business object is drawn in the background area by using a computer drawing method.
- step S310 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a rendering module 604 being executed by the processor.
- the video image processing method provided by the embodiment obtains the video image to be processed and the business object to be displayed, determines the foreground area of the video image to be processed, and the display position of the business object in the video image, and then according to the business object. Determining whether there is an overlap between the business object and the foreground area of the video image in the display position in the video image. If there is an overlapping part, indicating that part or all of the business object is located in the foreground area, the foreground area is placed in the business object for the overlapping part.
- the above display to avoid business objects occluding the foreground area, does not affect the viewer's normal video viewing experience, is not easy to cause the audience to resent, and is conducive to achieve the desired business object display effect.
- the trained first convolutional neural network model may be utilized to detect an image and segment the front background of the image.
- An alternative implementation includes:
- the image may comprise an image in a still image or a video
- the video image is detected by the trained first convolutional neural network model, and the prediction information of the foreground region of the video image and the prediction information of the background region are obtained.
- the video image is an image in a live video.
- the video image described above includes multiple frames of images in a video stream. Since the multi-frame image in the video stream has more context associations, the trained first convolutional neural network model can quickly and efficiently detect the front background of each frame of the video stream.
- the business object may be a target special effect including semantic information, wherein the target special effect may include at least one of the following special effects of the advertisement information: a two-dimensional sticker Special effects, 3D effects, particle effects.
- a video image processing scheme of the embodiment of the present application is described by taking a business object as a two-dimensional advertising sticker effect and a video image as a live video image. Refer to Figure 4.
- Step S400 Acquire a live video image to be processed and an advertisement sticker effect to be displayed.
- the live video image to be processed can be obtained from the live video application.
- the advertising sticker effect to be displayed may be an advertising sticker effect customized by the advertiser, or an advertising sticker effect selected for the anchor.
- step S400 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an acquisition module 600 executed by the processor.
- Step S402 Determine the anchor area of the live video image and the display position of the advertisement sticker effect in the live video image.
- the live video image can be divided into two parts, which are a main broadcast area and a background area.
- the anchor area is the area of the main broadcast itself, and the background area is an area other than the anchor area in the live video image.
- the execution process of the anchor area and the display location may be performed according to the related content in the foregoing embodiment, and details are not described herein again.
- step S402 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a placement determination module 6040 that is executed by the processor.
- Step S404 judging whether there is an overlapping portion between the advertisement sticker effect and the anchor area according to the display position. If yes, step S406 is performed; if not, step S408 is performed.
- step S402 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a decision module 6044 executed by the processor.
- Step S406 Display the anchor area on top of the advertisement sticker effect.
- step S406 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a presentation module 6046 executed by the processor.
- Step S408 The entire business object is drawn in the background area by using a computer drawing method.
- step S406 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a rendering module 604 being executed by the processor.
- the main focus area of the viewer is the face area and the limb movement of the main broadcast, in order to enable the viewer to notice the content of the special effect of the advertisement sticker without affecting the anchor.
- a semantic virtual item such as an advertisement sticker effect is added to the video image related area.
- the main image and movement of the anchor are preserved, and the special effects of the augmented reality add interest to the video image, which reduces the viewer's possible aversion to the advertisement and can attract the attention of the audience.
- the advertising sticker effect is a bird sticker effect
- the bird sticker effect is displayed in the video image.
- the anchor area When the exhibition position of the bird sticker effect overlaps with the anchor area, that is, when the bird sticker effect blocks the anchor area, the anchor area is placed. Show on the bird sticker special effects. Specifically, if the overlapping part of the bird sticker effect and the anchor area is the head of the bird sticker special effect, and the rest of the bird sticker effect does not overlap with the anchor area, the anchor area is displayed on the head of the bird sticker special effect, and the display is performed. The effect is the visible part of the bird sticker effect except the head and the anchor area. If the overlap between the bird effect and the anchor area is the whole bird sticker effect, the anchor area will be displayed on the whole bird sticker effect, and the display effect is invisible to the entire bird sticker effect, and the anchor area can be seen.
- the above-mentioned advertising sticker special effect taking the bird sticker special effect as an example, may be a static advertising sticker special effect or a dynamic advertising sticker special effect.
- the advertisement sticker effect is a dynamic advertisement sticker effect
- each frame of the dynamic advertisement sticker effect can be processed as a static advertisement sticker effect, and the process can refer to the related content in the above embodiment, and no longer Narration.
- the advertisement is delivered by the business object, and the advertisement is effectively integrated with the live video content.
- the method is flexible and the effect is vivid, which not only does not affect the live viewing experience of the user, but also helps to improve the delivery effect of the advertisement. This is especially true for scenarios where business objects are displayed using smaller display screens, and ad serving.
- video image processing method of the present embodiment can be implemented in any suitable electronic device having the functions of data collection, processing, and transmission, and the embodiment of the present application does not limit the implementation of the device.
- the business object to be displayed is used to draw in the background area of the video image in the step of subsequent drawing, so that the business object can be covered in the background area of the video image.
- Original content The business object to be displayed may include various forms, and the embodiments of the present application do not limit the specific form of the business object.
- the business object to be displayed may be a picture A containing a beach scene, and the original content in the background area of the video image is a picture B containing an indoor scene.
- the business object is drawn in the background area of the video image by using a computer drawing method, and the computer drawing method can be adopted.
- the business object is drawn in the background area such that the business object covers the original content of the background area.
- the business object can be drawn in the background area of the video image by computer drawing, that is, the business object is drawn in the entire background area of the video image by computer drawing.
- the background area after the drawing of the business object is implemented to replace the original content of the background area.
- the original background area of the video image is an indoor scene picture
- the business object is a waterfall picture
- a waterfall picture is displayed in the background area after the business object is drawn, instead of the original indoor scene picture, the video image
- the foreground area can remain unchanged.
- the business object is drawn in the determined background area by using the computer drawing method, so that the business object covers the original content of the background area in the video image, and the expected business object delivery effect can be effectively realized.
- the service object is drawn in the background area of the video image by using a computer drawing manner, in addition to the local display of the video image of the business object in the background area,
- the video image of the business object is drawn to other devices in the background area, for example, a mobile terminal, a PC, a server, a tablet, etc., and the sharing of the video image of the business object in the background area can be realized.
- FIG. 5 is a flowchart of still another embodiment of a method for processing a video image of the present application. This embodiment focuses on the differences from the above-mentioned embodiments. For the same, reference may be made to the description and description of the foregoing embodiments, and details are not described herein. Referring to FIG. 5, the processing method of the video image in this embodiment includes:
- Step S500 Obtain a business object to be displayed.
- the business object to be displayed may be obtained by, for example, but not limited to, using the following methods: acquiring a business object from a predetermined storage location, or receiving a business object from a video source.
- the business object is stored locally in the mobile terminal, and the business object can be directly read from the mobile terminal.
- the locally stored business object can be any one or several of the target video, the target image, and the target special effect introduced above;
- the target video can be received from the video source. If the target video is a live video, the live video can be received from the live camera.
- step S500 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an acquisition module 600 executed by the processor.
- Step S502 Detecting a background area of the video image by using a pre-trained first convolutional neural network model.
- the video image is detected and the background area is determined using the trained first convolutional neural network model for segmenting the foreground and background regions in the video image.
- the first convolutional neural network model can accurately and efficiently determine the foreground and background regions in the video image by training the sample images with the foreground and background regions.
- the first convolutional network model is a full convolutional network model, and the volume required by the full convolutional network model is used compared with the convolutional network model with a fully connected layer. There are fewer layering parameters and the training speed is faster.
- the feature vector of the sample image to be trained may be input, and the feature vector includes information of the background region of the sample image, or the feature vector includes information of the foreground region of the sample image and information of the background region.
- the linear interpolation layer interpolates the feature vector of the first 31 layers by bilinear interpolation to enlarge the features of the middle layer, and obtain the same size as the sample image of the training.
- the output image (the same length and width of the image).
- the loss layer of the 33 layers is processed by the Softmax function.
- An optional Softmax function is as follows:
- x represents the characteristic of the input
- j represents the jth category
- y represents the category of the output
- K represents the total number of categories
- k represents the kth category
- W j represents the classification parameter of the jth category
- X T represents the transposition of the X vector
- P(y j
- x) represents the probability that the given input x is predicted to be of the jth class.
- the processing of the feature vector by the convolutional layer may be performed iteratively multiple times.
- the parameters of the first convolutional neural network model (such as the value of the convolution kernel) are adjusted according to the result calculated by the loss layer.
- the weight of the linear change of the output between the layers, etc. is processed based on the parameter-adjusted first convolutional neural network model, and iterated a plurality of times until the convergence condition is satisfied.
- the convergence condition may be that the number of iterative trainings on the first convolutional neural network model reaches a maximum number of iterations, such as 10,000 to 50,000 times.
- the first convolutional neural network model can input a single-frame video image for the video image, or output the multi-frame video image through the multi-frame video image input. That is, the first input layer can input one frame of video image, or can input a video stream, which contains multiple frames of video images.
- the last loss layer can calculate the loss function for one frame of video image, and can also calculate the loss function for multi-frame video images of the video stream.
- the first convolutional neural network model can obtain relatively stable results on video, and at the same time, the parallel computing of multi-frame video images can improve the computational efficiency.
- simultaneous input and output of multi-frame video images can be realized by modifying the size of the feature maps of the input layer and the output layer.
- each convolutional layer has a nonlinear response unit, and the nonlinear response unit adopts a Rectified Linear Units (ReLU), which is added after the convolutional layer.
- the above-mentioned correcting linear unit makes the mapping result of the convolution layer as sparse as possible so as to be closer to the human visual response, thereby making the image processing effect better.
- the stride of the pooling layer has the feature of enhancing the spatial invariance, that is, allowing the same The input appears in different image positions, and the input The result is the same as the response.
- the linear difference layer can magnify the previous features to the original image size to obtain a predicted value for each pixel.
- the convolutional layer of the full convolutional neural network model can be used for information induction and fusion, and the maximum pooling layer (Max pooling) for high-level information induction, the first convolutional neural network model structure can be fine-tuned To accommodate the trade-offs between different performance and efficiency.
- Max pooling maximum pooling layer
- the size of the convolution kernel, the number of channels, the size of the pooled core, the interval, and the number of layers of the convolution layer are all illustrative. In practical applications, those skilled in the art The adaptive adjustment can be made according to actual needs, and the embodiment of the present invention does not limit this. In addition, combinations and parameters of all layers in the first convolutional neural network model in this embodiment are optional, and may be arbitrarily combined.
- step S502 may be performed by a processor invoking a corresponding instruction stored in memory, or may be performed by background area determination module 602 being executed by the processor.
- Step S504 The business object is drawn in the determined background area by using a computer drawing manner, so that the business object covers the original content of the background area in the video image.
- step S504 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a rendering module 604 executed by the processor.
- Step S506 Send a video image of the business object drawn in the background area.
- step S504 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a communication module on the electronic device.
- the above steps S500-S504 may be performed on the mobile terminal A, and the video image in which the business object is drawn in the background area may be displayed on the mobile terminal A, and/or the video image in which the business object is drawn in the background area may be transmitted to the mobile terminal.
- B mobile terminal C, server D, and the like.
- a video image in which a business object is drawn in the background area is sent to other devices, and the business object can be directly drawn on other devices.
- the business object itself is obtained from the storage of the business object; then the business object is drawn.
- the attribute information of the business object may be derived from a transport stream including the video image and the attribute information of the business object.
- the attribute information of the business object may be used to identify the business object itself, and the attribute information of a business object corresponds to a unique business object itself or a unique type of business object itself.
- the attribute information of the business object may be a number with a unique identifier or the like.
- the repository of the business object may include a repository of business objects of the current device and a repository of business objects of other devices that perform the method of the embodiment of the present application.
- the storage object of the current device's business object and the business object of the business object of the other device and the attribute information of the business object may be kept in synchronization.
- the anchor terminal acquires a video image when the anchor broadcasts live, where the video image is a live video image. Detecting the background area from the video image, acquiring the business object locally in the anchor terminal or receiving the business object from the third-party camera in real time, drawing the business object in the background area of the video image by computer drawing, and displaying the video image after drawing the business object on the anchor terminal And the video image after the business object is drawn is sent to the server or the fan terminal, and is transferred or stored in the server, and displayed at the fan terminal.
- the background area of the live video image includes an area other than the anchor image in the live video image.
- the execution sequence between the foregoing step S500 and the step S502 may be the first step S500, and then the step S502; the step S502 may be performed first, and then the step S500 may be performed; or the step S500 and the step may be performed simultaneously.
- the execution sequence of the step S500 and the step S502 is not limited. In this embodiment, the step S500 is performed first, and the step S502 is performed as an example.
- the business object can be drawn in the determined background area by using a computer drawing manner, so that the business object covers the original content of the background area in the video image, so that the business object is combined with the video playing. It is beneficial to save network resources and/or system resources of the client, and enriches the type of the background area, and can effectively realize the effect of the expected business object delivery; in addition, if the received target video is drawn in the background area of the video image, The background area of the video image shows the effect of the video.
- the content of the business object is related to the content of the video image, for example, the business object is the target video, the target video is the introduction video of the goddess of freedom, and the anchor is introducing the goddess of freedom in the video image
- the background area after the business object is drawn may be As an auxiliary display of video images, the display effect of video images is improved.
- FIG. 6 is a structural block diagram of an embodiment of a processing apparatus for video images of the present application.
- the processing device for video images of the embodiments of the present application can be used to implement the processing method embodiments of the video images described above.
- the processing apparatus of the video image of this embodiment includes: an obtaining module 600, configured to acquire a video image to be processed and a service object to be displayed; a background area determining module 602, configured to determine a background area of the video image; and a drawing module 604. For drawing a business object in a background area of the video image by using a computer drawing method.
- the video image processing apparatus acquires a video image to be processed and a business object to be displayed, determines a background area of the video image to be processed, and draws a business object in a background area of the video image by using a computer drawing manner.
- the business object is displayed in the background area of the video image, which can prevent the business object from obscuring the foreground area, does not affect the normal video viewing experience of the viewer, and is not easy to cause the audience to resent, and is beneficial to realize the expected business object display effect.
- the drawing module 604 is configured to draw a business object to be displayed in a background area of the video image by using a computer drawing manner, so that the business object covers the original content of the background area.
- FIG. 7 is a structural block diagram of another embodiment of a processing apparatus for a video image of the present application.
- the rendering module 604 includes: a placement determining module 6040 for determining a display position of the business object in the video image; the foreground area is determined.
- the module 6042 is configured to determine a foreground area of the video image
- the determining module 6044 is configured to determine, according to the display location, whether the business object has an overlapping portion with the foreground area
- the displaying module 6046 is configured to: if the business object overlaps with the foreground area, A computer drawing method in which a portion of a business object other than the overlapping portion is drawn in the background area.
- the background area determining module 602 is configured to determine a background area of the video image by using the pre-trained first convolutional neural network model.
- the processing apparatus for the video image of the embodiment may further include: a first training module 606, configured to train the first convolutional neural network model.
- the first training module 606 can include: a first feature vector obtaining module 6060, configured to acquire a first feature vector of the first sample image by using the first convolutional neural network model, where The sample image is a sample image including foreground annotation information and background annotation information.
- the first convolution processing module 6061 is configured to perform convolution processing on the first feature vector by using the first convolutional neural network model to obtain the first feature vector.
- the first execution module 6064 is configured to: if the amplified first feature vector convolution result satisfies a convolution convergence condition, complete training on the first convolutional neural network model; if the amplified first feature vector convolution result is not Satisfying the convolution convergence condition, adjusting the network parameters of the first convolutional neural network model according to the enlarged first feature vector convolution result and adjusting according to Network parameters convolution neural network model of the network model convolution iterative training, until the result of the convolution of the first feature vector satisfies the iterative training convolution convergence condition.
- the amplification processing module 6064 is configured to amplify the first feature vector convolution result by performing bilinear interpolation on the first feature vector convolution result.
- the amplification processing module 6064 is configured to enlarge the first feature vector convolution result to the size of the image corresponding to the enlarged first feature vector convolution result to be consistent with the original image size.
- the first condition determining module 6066 is configured to calculate, by using the set loss function, the amplified first feature vector convolution result and the loss value of the predetermined standard output feature vector; and determine the enlarged first according to the loss value. Whether a feature vector convolution result satisfies the convergence condition.
- the first training module 606 may further include: a prediction module 6065, configured to acquire a test sample image, and use the trained first convolutional neural network model to predict a front background region of the test sample image.
- the verification module 6066 is configured to check whether the predicted front background area is correct; the retraining module 6067 is configured to retrain the first convolutional neural network model using the test sample image if the verification result of the verification module is incorrect.
- the retraining module 6067 is configured to: if the verification result of the verification module is incorrect, obtain a sample image that is incorrectly predicted from the front background region from the test sample image; and use the sample image that is incorrectly predicted to the first convolutional nerve The network model is trained again, wherein the predicted sample image contains foreground information and background information.
- the first training module 606 may further include: a video stream module 6068, configured to include a multi-frame sample image before the first vector acquiring module acquires the first feature vector of the first sample image.
- the video stream is input to the first convolutional neural network model.
- the first training module 606 may further include: an annotation module 6069, configured to determine a video stream before the video stream module inputs the video stream including the multi-frame sample image into the first convolutional neural network model.
- the image of the plurality of key frames is a sample image, and the foreground image and the background region are labeled for the sample image.
- the first convolutional neural network model in various embodiments of the present application may include a full convolutional neural network model.
- the placement determining module 6040 is configured to determine, according to the setting rule, a display position of the business object in the video image; wherein the setting rule comprises: setting a display position of the preset business object in the video image, Alternatively, the placement determination module 6040 determines the placement of the business object in the video image based on the motion detection data of the target object in the foreground region.
- the placement determining module 6040 is configured to determine, according to the motion detection data of the target object in the foreground region, the display position of the business object in the video image by using the pre-trained second convolutional neural network model.
- the processing apparatus for the video image of the embodiment may further include: a second training module 608, configured to train the second convolutional neural network model.
- the second training module 608 can include: a second feature vector obtaining module 6080, configured to acquire a first feature vector of the second sample image by using the second convolutional neural network model, wherein the second feature The vector includes position information and/or confidence information of the business object in the second sample image, and a target object feature vector of the target object in the second sample image; and a second convolution processing module 6082 for utilizing the second volume
- the second neural network model performs a convolution process on the second feature vector to obtain a second feature vector convolution result.
- the second condition determination module 6084 is configured to determine the location information of the corresponding business object in the second feature vector convolution result and/or Whether the confidence information satisfies the convergence condition of the business object, and determines whether the target object feature vector in the second feature vector convolution result satisfies the target object convergence condition; the second execution module 6086 is configured to use the second feature vector convolution result The location information and/or confidence information of the corresponding business object satisfies the convergence condition of the business object, and the second feature vector convolution If the target object feature vector satisfies the target object convergence condition, the training of the second convolutional neural network model is completed; otherwise, the network parameters of the second convolutional neural network model are adjusted and according to the adjusted second convolutional neural network. The network parameters of the model are iteratively trained on the second convolutional neural network model until the position information and/or the confidence information and the target object feature vector of the service object after the iterative training satisfy the corresponding convergence conditions.
- the placement determining module 6040 is configured to determine a location of the business object in the video image according to the motion detection data of the target object in the foreground area and the type of the business object.
- the placement determining module 6040 includes: a location obtaining module 60400, configured to obtain a plurality of display positions of the business object in the video image according to the action detection data and the type of the business object of the target object in the foreground area;
- the location selection module 60402 is configured to select at least one placement location from the plurality of placement locations as a final placement location of the business object in the video image.
- the location determining module 6040 includes: a data determining module 60404, configured to determine whether the motion detection data of the target object in the foreground area matches the preset motion data; and the location obtaining module 60406 is configured to be in the foreground area.
- the target display position corresponding to the preset motion data is acquired as a business object in the video image from the correspondence relationship between the motion data stored in advance and the display position. Placement.
- the obtaining module 600 is configured to obtain a business object to be displayed from a predetermined storage location, or receive a business object from a video source.
- the business object may include, for example but not limited to, any one or more of the following: a target video, a target image, and an effect including semantic information;
- the video image may include: a still image or a live video image.
- the target effect may include any one or more of the following special effects of the advertisement information: two-dimensional sticker effects, three-dimensional effects, particle effects, and the like.
- the foreground area of the live video image is the area where the character is located.
- the background area of the live video image includes at least a partial area other than the area in which the character is located.
- the motion detection data of the target object includes any one or more of the following: blink data, open mouth data, nod data, moving head data, kiss data, smile data, wave data, scissors data, fist data, and hand data , thumb data, pistol posture data, OK hand data.
- the processing apparatus of the video image of the embodiment is used to implement the processing method of the corresponding video image in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiments, and details are not described herein again.
- processing device of the video image of the embodiment may be disposed in an appropriate electronic device, including but not limited to a mobile terminal, a PC, a server, or the like.
- FIG. 8 is a structural block diagram of still another embodiment of a processing apparatus for a video image of the present application.
- the embodiments of the present application do not limit the specific implementation of the electronic device.
- the electronic device can include a processor 802, a communications interface 804, a memory 806, and a communications bus 808. among them:
- Processor 802, communication interface 804, and memory 806 complete communication with one another via communication bus 808.
- the communication interface 804 is configured to communicate with network elements of other devices, such as other clients or servers.
- the processor 802 may be a central processing unit (CPU), or an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application, or a graphics processor ( GraphicsProcessing Unit, GPU).
- the one or more processors included in the terminal device may be the same type of processor, such as one or more CPUs, or one or more GPUs; or may be different types of processors, such as one or more CPUs and One or more GPUs.
- the memory 806 is for at least one executable instruction that causes the processor 802 to perform operations corresponding to a method of presenting a business object in a video image as in any of the above-described embodiments of the present application.
- the memory 806 may include a high speed random access memory (RAM), and may also include a non-volatile memory such as at least one disk memory.
- FIG. 9 is a schematic structural diagram of an embodiment of an electronic device according to the present application.
- the electronic device includes one or more processors, a communication unit, etc., such as one or more central processing units (CPUs) 901, and/or one or more An image processor (GPU) 913 or the like, the processor may be in accordance with executable instructions stored in read only memory (ROM) 902 or from storage Portion 908 loads executable instructions in random access memory (RAM) 903 to perform various appropriate actions and processes.
- processors such as one or more central processing units (CPUs) 901, and/or one or more An image processor (GPU) 913 or the like
- the processor may be in accordance with executable instructions stored in read only memory (ROM) 902 or from storage
- ROM read only memory
- RAM random access memory
- Communication portion 912 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (Infiniband) network card, and the processor can communicate with read only memory 902 and/or random access memory 903 to execute executable instructions over bus 904.
- the operation of the video image processing method provided by the embodiment of the present application is completed, for example, the video image to be processed and the business object to be displayed are obtained, which are connected to the communication unit 912 and communicate with other target devices via the communication unit 912. Determining a background area of the video image; drawing the business object in a background area of the video image using a computer graphics method.
- RAM 903 various programs and data required for the operation of the device can be stored.
- the CPU 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904.
- ROM 902 is an optional module.
- the RAM 903 stores executable instructions or writes executable instructions to the ROM 902 at runtime, the executable instructions causing the processor 901 to perform operations corresponding to the processing methods of the video images described above.
- An input/output (I/O) interface 905 is also coupled to bus 904.
- the communication unit 912 may be integrated or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards) and on the bus link.
- the following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, etc.; an output portion 907 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 908 including a hard disk or the like. And a communication portion 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the Internet.
- the drive 911 is also connected to the I/O interface 905 as needed.
- a removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 911 as needed so that a computer program read therefrom is installed into the storage portion 908 as needed.
- FIG. 9 is only an optional implementation manner.
- the number and type of components in FIG. 9 may be selected, deleted, added, or replaced according to actual needs;
- Different function components can also be implemented in separate settings or integrated settings, such as GPU and CPU detachable settings or GPU can be integrated on the CPU, the communication part can be separated, or integrated on the CPU or GPU. and many more.
- an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Executing instructions corresponding to the method steps provided by the embodiments of the present application, for example, acquiring a video image to be processed and an instruction of a business object to be displayed; determining an instruction of a background area of the video image; and using the computer drawing manner in the video image The background area draws instructions for the business object.
- the embodiment of the present application further provides a computer program, the computer program comprising computer readable code, the program code includes computer operating instructions, when the computer readable code is run on the device, the processor in the device executes An instruction for implementing each step in the processing method of the video image of any of the embodiments of the present application.
- an embodiment of the present application includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Executing instructions corresponding to the method steps provided by the embodiments of the present application, for example, acquiring a video image to be processed and a business object to be displayed; determining a background area of the video image; drawing a background area of the video image by using a computer drawing manner The business object.
- the methods and apparatus of the present application may be implemented in a number of ways.
- the methods and apparatus of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
- the various components/steps described in the embodiments of the present application may be split into more components/steps according to the needs of the implementation, or two or more components/steps or partial operations of the components/steps may be combined into new components. /Steps to achieve the objectives of the embodiments of the present application.
- the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order specifically described above unless otherwise specifically stated.
- the present application can also be implemented as a program recorded in a recording medium, the programs including machine readable instructions for implementing the method according to the present application.
- the present application also covers a recording medium storing a program for executing the method according to the present application.
- the above method according to an embodiment of the present application may be implemented in hardware, firmware, or as software or computer code that may be stored in a recording medium such as a CDROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or implemented through a network.
- a recording medium such as a CDROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or implemented through a network.
- the downloaded computer code originally stored in a remote recording medium or non-transitory machine readable medium and stored in a local recording medium, such that the methods described herein can be stored using a general purpose computer, a dedicated processor, or programmable or Such software processing on a recording medium of dedicated hardware such as an ASIC or an FPGA.
- a computer, processor, microprocessor controller or programmable hardware includes storage components (eg, RAM, ROM, flash memory, etc.) that can store or receive software or computer code, when the software or computer code is The processing methods described herein are implemented when the processor or hardware is accessed and executed. Moreover, when a general purpose computer accesses code for implementing the processing shown herein, the execution of the code converts the general purpose computer into a special purpose computer for performing the processing shown herein.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Business, Economics & Management (AREA)
- Marketing (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Controls And Circuits For Display Device (AREA)
Abstract
本申请实施例提供了一种视频图像的处理方法、装置和电子设备,涉及人工智能技术领域,其中,所述方法包括:获取待处理的视频图像和待展示的业务对象;确定所述视频图像的背景区域,采用计算机绘图方式在所述视频图像的背景区域绘制所述业务对象。本申请实施例可实现业务对象在视频图像的背景区域展示,有利于避免业务对象遮挡前景区域,不影响观众的正常视频观看体验,不易引起观众反感,有利于实现预想的业务对象展示效果。
Description
本申请要求在2016年08月19日提交中国专利局、申请号为CN201610694814.9、发明名称为“图像前背景分割及网络模型训练、图像处理方法和装置”、2016年08月19日提交中国专利局、申请号为CN201610694601.6、发明名称为“视频图像的处理方法、装置和终端设备”、和2016年08月19日提交中国专利局、申请号为CN201610697316.X、发明名称为“视频图像处理方法、装置和终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请实施例涉及人工智能技术,尤其涉及一种视频图像的处理方法、装置和电子设备。
随着互联网技术的发展,人们越来越多地使用互联网观看视频,互联网视频为许多新的业务提供了商机。互联网视频已成为重要的业务流量入口,并且被认为是广告植入的优质资源。
现有视频广告主要通过植入的方式,一种常见的植入方式,是在视频播放之前、或者视频播放的某个时间插入固定时长的广告,或者在视频播放的区域及其周边区域固定位置放置广告。
发明内容
本申请实施例提供了视频图像的处理技术方案。
根据本申请实施例的一个方面,提供了一种视频图像的处理方法,包括:获取待处理的视频图像和待展示的业务对象;确定所述视频图像的背景区域;采用计算机绘图方式在所述视频图像的背景区域绘制所述业务对象。
根据本申请实施例的另一方面,还提供了一种视频图像的处理装置,包括:获取模块,用于获取待处理的视频图像和待展示的业务对象;背景区域确定模块,用于确定所述视频图像的背景区域;绘制模块,用于采用计算机绘图方式在所述视频图像的背景区域绘制所述业务对象。
根据本申请实施例的又一方面,还提供了一种电子设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;所述存储器用于存储至少一可执行指令,所述可执行指令使所述处理器执行本申请任一实施例所述的视频图像的处理方法对应的操作。
根据本申请实施例的又一方面,还提供了另一种电子设备,包括:处理器和本申请任一实施例所述的视频图像的处理装置;
在处理器运行所述视频图像的处理装置时,本申请任一实施例所述的视频图像的处理装置中的单元被运行。
根据本申请实施例的再一方面,还提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请任一实施例所述的视频图像的处理方法中各步骤的指令。
根据本申请实施例的还一方面,还提供了一种计算机可读存储介质,用于存储计算机可读取的指令,所述指令被执行时实现本申请任一实施例所述的视频图像的处理方法中各步骤的操作。
根据本申请实施例提供的技术方案,获取待处理的视频图像和待展示的业务对象,确定待处理的视频图像的背景区域,采用计算机绘图方式在视频图像的背景区域绘制业务对象,实现业务对象在视频图像的背景区域展示,可以避免业务对象遮挡前景区域,不影响观众的正常视频观看体验,不易引起观众反感,有利于实现预想的业务对象展示效果。
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,其中:
图1是本申请实施例视频图像的处理方法一实施例的流程图;
图2是本申请视频图像的处理方法另一实施例的流程图;
图3是本申请视频图像的处理方法又一实施例的流程图;
图4是本申请视频图像的处理方法再一实施例的流程图;
图5是本申请视频图像的处理方法还一实施例的流程图;
图6是本申请视频图像的处理装置一实施例的结构框图;
图7是本申请视频图像的处理装置另一实施例的结构框图;
图8是本申请电子设备一实施例的结构示意图;
图9为本申请电子设备另一实施例的结构示意图。
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本领域技术人员可以理解,本申请实施例中的“第一”、“第二”等术语仅用于区别不同步骤、设备或模块等,既不代表任何特定技术含义,也不表示它们之间的必然逻辑顺序。
本申请实施例可以应用于终端设备、计算机系统、服务器等电子设备,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与终端设备、计算机系统、服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
终端设备、计算机系统、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
图1是本申请实施例视频图像的处理方法一实施例的流程图。本申请各实施例视频图像的处理方法可以示例性地由任意具有数据采集、处理和传输功能的设备执行,包括但不限于终端设备、个人计算机(PC)、服务器等电子设备。本申请实施例不对实现本申请实施例视频图像的处理方法的设备进行限制。参照图1,本实施例视频图像的处理方法包括:
步骤S100:获取待处理的视频图像和待展示的业务对象。
例如,在直播场景中,获取当前正在显示的视频图像,即直播类视频图像;再如,在视频录制场景中,获取当前正在录制的视频图像;又如,在录制完成的视频中,获取当前播放的视频图像等。另外,本发明各实施例中的视频图像还可以是静态图像。其中,本实施例对视频图像类型、获取方式和具体获取手段不作限制。
另外,本申请各实施例中以对一张视频图像的处理为例,但本领域技术人员应当明了,对于多张视频图像或视频流中的视频图像序列均可参照本申请各实施例进行视频图像的处理。
本申请各实施例中,待展示的业务对象是根据一定的业务需求而创建的、可以在视频图像中展示的对象。在一个可选示例中,业务对象例如可以包括但不限于以下任意一项或多项:包含有语义信息(例如广告、娱乐、天气预报、交通预报、宠物等信息)的特效(可称为目标特效),视频(可称为目标视频),图像(可称为目标图像)等,其中目标特效例如可以是三维(3D)形式的特效,如使用3D特效形式展示的广告等3D形式的广告特效;也可以是二维(2D)形式的贴纸,如使用贴纸形式展示的广告等2D形式的广告贴纸特效;还可以是粒子特效等。例如,目标视频可以为景象视频,既可以为录制完毕的景象视频,又可以为正在直播的景象视频;目标视频还可以为体育视频,既可以为录制完毕的体育视频,又可以为正在直播的体育视频。但不限于此,其它形式的业务对象也同样适用本申请各实施例的技术方案,如应用(APP)或应用的文字说明或介绍,或者一定形式的与视频观众交互的对象(如电子宠物)等,本实施例对目标视频的具体形式和内容不做限制。
本申请各实施例中,对待展示的业务对象的获取过程可以与待处理的视频图像的获取过程同时执
行,也可以在待处理的视频图像的获取过程之后或者之前执行,本实施例对待展示的业务对象的获取过程在本实施例中的执行顺序不做限制。
在一个可选示例中,步骤S100可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的获取模块600执行。
步骤S102:确定视频图像的背景区域。
在本申请各实施例的一个可选示例中,确定视频图像的背景区域时,可以从视频图像中检测视频图像的前景区域和背景区域;或者,可以从视频图像中直接检测视频图像的背景区域;或者可以从视频图像中检测视频图像的前景区域,然后将视频图像中前景区域以外的区域确定为该视频图像的背景区域。本实施例对确定视频图像的前景区域的技术手段不做限制。
在本申请各实施例中,在视频直播场景中,直播类视频图像的前景区域包括人物所在的区域,直播类视频图像的背景区域包括除了人物所在的区域之外的至少局部区域。
在一个可选示例中,步骤S102可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的背景区域确定模块602执行。
步骤S104:采用计算机绘图方式在视频图像的背景区域绘制业务对象。
在确定了视频图像的背景区域以后,采用计算机绘图方式将业务对象绘制在视频图像的背景区域例如,可以采用计算机绘图方式将业务对象绘制在视频图像的背景区域,即采用计算机绘图方式将业务对象绘制在视频图像的整个背景区域或者部分背景区域。
其中,采用计算机绘图方式在确定出的背景区域绘制业务对象,可以通过适当的计算机图形图像绘制或渲染等方式实现,例如可以包括但不限于:基于开放图形语言(OpenGL)图形绘制引擎进行绘制等。OpenGL定义了一个跨编程语言、跨平台的编程接口规格的专业的图形程序接口,其与硬件无关,可以方便地进行2D或3D图形图像的绘制。通过OpenGL图形绘制引擎,不仅可以实现2D效果如2D贴纸的绘制,还可以实现3D特效的绘制及粒子特效的绘制等等。但本申请不限于基于OpenGL图形绘制引擎的绘制方式,还可以采取其它方式,例如基于Unity或OpenCL等图形绘制引擎的绘制方式也同样适用于本申请各实施例。
在一个可选示例中,步骤S104可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的绘制模块604执行。
需要说明的是,本申请各实施例中,待展示的业务对象和确定视频图像的背景区域的操作之间的执行顺序可以为任意顺序,即:可以限制性其中任意一个操作、后执行另一个操作,也可以同时执行这两个操作,均不影响本申请实施例的实现。
通过本实施例提供的视频图像的处理方法,获取待处理的视频图像和待展示的业务对象,确定待处理的视频图像的背景区域,采用计算机绘图方式在视频图像的背景区域绘制业务对象,实现业务对象在视频图像的背景区域展示,可以避免业务对象遮挡前景区域,不影响观众的正常视频观看体验,不易引起观众反感,有利于实现预想的业务对象展示效果。
图2是本申请实施例视频图像的处理方法另一实施例的流程图。参照图2,本实施例视频图像的处理方法包括:
步骤S200:获取待处理的视频图像和待展示的业务对象。
在一个可选示例中,步骤S200可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的获取模块600执行。
步骤S202:确定视频图像的前景区域和背景区域,以及业务对象在视频图像中的展示位置。
在本申请各实施例的一个可选示例,确定视频图像的前景区域时,可以从视频图像中检测视频图像的前景区域和背景区域;或者,可以从视频图像中直接检测视频图像的前景区域;或者可以从视频图像中检测视频图像的背景区域,将视频图像中背景区域以外的区域确定为视频图像的前景区域。本实施例对确定视频图像的前景区域的技术手段不做限制。同理,确定视频图像的背景区域的确定可以参照上述确定视频图像的前景区域的实现方式,在此不再赘述。
本申请各实施例中,展示位置可以是视频图像中指定区域的中心位置,或者可以是上述指定区域中多个边缘位置的坐标等。
在一个可选示例中,步骤S200可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的前景区域确定模块6042、背景区域确定模块602和展示位置确定模块6040执行。
步骤S204:根据上述展示位置判断业务对象与前景区域是否存在重叠部分。若存在,则执行步骤S206;若不存在,则执行步骤S208。
本步骤中,判断业务对象与前景区域是否存在重叠部分的目的是判断业务对象是否对视频图像的前景区域造成遮挡。若存在重叠部分,则表示业务对象对视频图像的前景区域造成遮挡;若不存在重
叠部分,则表示业务对象对视频图像的前景区域未造成遮挡。
在一个可选示例中,步骤S204可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的判断模块6044执行。
步骤S206:采用计算机绘图方式,在背景区域绘制业务对象中除重叠部分之外的部分。
一种可选的方式中,例如,业务对象Y与视频图像S的前景区域q存在重叠部分c,在视频图像S的背景区域b绘制业务对象Y中除重叠部分c之外的部分w,其中,重叠部分c和部分w共同组成业务对象Y,前景区域q和背景区域b共同组成业务对象Y。
另一种可选的方式中,若业务对象对视频图像的前景区域造成遮挡,则可以采用调整前景区域的显示图层的方式将前景区域置于业务对象之上展示,或者还可以采用调整业务对象的显示图层的方式将业务对象置于前景区域之下展示。本申请各实施例对将前景区域置于业务对象之上展示的技术手段不做具体限制。
在一个可选示例中,步骤S206可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的展示模块6046执行。
之后,不执行本实施例的后续流程。
步骤S208:采用计算机绘图方式,在背景区域绘制整个业务对象。
在一个可选示例中,步骤S208可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的绘制模块604执行。
通过本实施例提供的视频图像的处理方法,获取待处理的视频图像和待展示的业务对象,确定待处理的视频图像的前景区域和背景区域,以及业务对象在视频图像中的展示位置,然后根据业务对象在视频图像中的展示位置判断业务对象与视频图像的前景区域是否存在重叠部分,若存在重叠部分,表示部分或全部业务对象位于前景区域内,则对于该重叠部分,将前景区域置于业务对象之上展示,避免业务对象遮挡前景区域,不影响观众的正常视频观看体验,不易引起观众反感,可以有效实现预想的效果。
图3是本申请实施例视频图像的处理方法又一实施例的流程图。参照图3,本实施例视频图像的处理方法包括:
步骤S300:获取待处理的视频图像和待展示的业务对象。
在一个可选示例中,步骤S300可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的获取模块600执行。
步骤S302:通过预先训练好的第一卷积神经网络模型确定视频图像的背景区域。
本实施例中,使用预先训练好的、用于分割视频图像中的前景区域和背景区域的第一卷积神经网络模型,对视频图像进行检测,确定视频图像的背景区域。用于分割视频图像中的前景区域和背景区域的第一卷积神经网络模型通过标注有前景区域和背景区域的样本图像进行训练后,可以采用该第一卷积神经网络检测视频图像,获得视频图像的前景区域的预测信息和背景区域的预测信息,基于背景区域的预测信息便可确定视频图像中的背景区域,可以准确、高效地确定视频图像中的前景区域和背景区域。
当需要预先训练第一卷积神经网络模型时,一种可选的训练方式包括以下过程:
(1)利用第一卷积神经网络模型获取第一样本图像的第一特征向量。
其中,第一样本图像为包含有前景区域和背景区域标记信息的样本图像,也即,第一样本图像为已标记了前景区域和背景区域的样本图像。本实施例中,前景区域可以为图像主体所在区域,例如人物所在区域;背景区域可以为除了主体所在区域外的其它区域,可以是其它区域中的全部或者部分。
在一种可选的实施方式中,第一样本图像可以包括至少一个视频流的多帧样本图像。在此方式中,在获取待第一样本图像的第一特征向量之前,可以将包括多帧样本图像的视频流输入第一卷积神经网络模型。在实现时,一种可选方式包括:先确定视频流的多个关键帧的图像为样本图像,对这些样本图像进行前景区域和背景区域的标注;在此基础上,将进行了标注的样本图像进行组合,再将组合后的包括多帧进行了标注的样本图像的视频流输入第一卷积神经网络模型。其中,对视频流抽取关键帧并对抽取的关键帧进行标注,均可以由本领域技术人员采用任意适当的方式实现,例如通过均匀采样的方式抽取关键帧等。在抽取了关键帧后,可以结合视频上下文对抽取的关键帧标注区分前景和背景,得到较精确的标注边界。将进行了标注后的样本图像作为第一样本图像,提取其第一特征向量。此外,本步骤中,对第一特征向量的提取可以采用相关技术中的适当方式实现,本实施例在此不再赘述。
(2)利用第一卷积神经网络模型对第一特征向量进行卷积处理,获取第一特征向量卷积结果。获取的第一特征向量卷积结果中包含有用于分辨视频图像的前景区域和背景区域的信息。对第一特征向量的卷积处理次数可以根据实际需要进行设定,也即,第一卷积神经网络模型中,
卷积层的层数可以根据实际需要进行设置。
第一特征向量卷积结果是对第一特征向量进行了特征提取后的结果,该结果能够有效表征视频图像中前景区域和背景区域的特征和分类。
(3)对第一特征向量卷积结果进行放大处理。
一种可选方式中,对第一特征向量卷积结果的放大可以采用线性插值的方式,例如包括但不限于:线性插值、双线性插值、三线性插值等方式。其中使用的线性插值公式可以由本领域技术人员根据实际需要采用适当的公式,本实施例对此不作限制。可选地,可以通过对第一特征向量卷积结果进行双线性插值来放大第一特征向量卷积结果。通过对第一特征向量卷积结果进行放大处理,可以得到与用于训练的原始图像同样大小的输出图像,获得每一个像素点的特征信息,有利于精确地确定图像的前景区域和背景区域。另外,通过对卷积处理后的第一特征向量的放大处理,第一卷积神经网络模型可以学习到一个较为准确的放大系数,基于该放大系数和放大后的第一特征向量,有利于减少第一卷积神经网络模型的参数调整和计算量,降低第一卷积神经网络模型训练成本,提高训练效率,缩短训练时间。
在其中一个示例中,在获得第一特征向量卷积结果后,通过线性插值层对第一特征向量卷积结果进行双线性插值,以放大卷积处理后的图像特征,并得到的原始图像同样大小(图像长宽相同)的输出。需要说明的是,本实施例中对双线性插值的具体实现手段不作限制。
(4)判断放大后的第一特征向量卷积结果是否满足卷积收敛条件。
其中,卷积收敛条件可以由本领域技术人员根据实际需求适当设定。当放大后的第一特征向量卷积结果满足卷积收敛条件时,可以认为第一卷积神经网络模型中的网络参数设置适当;当放大后的第一特征向量卷积结果不能满足卷积收敛条件时,可以认为第一卷积神经网络模型中的网络参数设置不适当,需要对其进行调整,该调整可以是一个迭代的过程,直至使用调整后的网络参数对第一特征向量进行卷积处理的结果满足卷积收敛条件。
本实施例中,在通过线性插值层对第一特征向量卷积结果进行放大后,可以在损失层使用损失函数对放大后的第一特征向量卷积结果进行计算,进而根据计算结果确定是否满足卷积收敛条件。也即,使用设定的损失函数计算放大后的第一特征向量卷积结果的损失值;根据损失值判断放大后的第一特征向量卷积结果是否满足卷积收敛条件。其中,损失层和损失函数可以根据实际情况由本领域技术人员适当设定,如通过基于代价函数(softmax)函数或者罗吉特(Logistic)函数等。通过损失函数对第一特征向量卷积结果和预定的标准输出特征向量进行计算,获得损失值。在获得损失值后,一种可选方式中,可以根据该损失值放大后的第一特征向量卷积结果是否满足卷积收敛条件,如该损失值是否小于或等于设定阈值,从而确定本次训练结果是否满足卷积收敛条件;另一种可选方式中,可判断对该损失值的计算是否已达到设定次数,也即,在本次训练中对第一卷积神经网络模型的迭代训练次数是否已达到设定次数,如达到则满足卷积收敛条件。
需要说明的是,当输入的是视频流中的多帧图像时,损失层的损失函数也可以分别对该视频流中的多帧图像进行损失值计算,输出多帧的结果,使第一卷积神经网络模型在得到视频上更加稳定的结果,并通过多帧图像的并行计算,提升计算效率。
(5)若满足卷积收敛条件,则完成对第一卷积神经网络模型的训练;若不满足卷积收敛条件,则根据放大后的第一特征向量卷积结果调整第一卷积神经网络模型的网络参数,并根据调整后的第一卷积神经网络模型的网络参数对该第一卷积说明网络模型进行迭代训练,直至迭代训练后的第一特征向量卷积结果满足卷积收敛条件。
通过对第一卷积神经网络模型进行上述训练,得到的第一卷积神经网络模型可以对视频图像的图像特征进行特征提取和分类,从而具有确定视频图像中的前景区域和背景区域的功能。在实际应用中,可以使用该第一卷积射精网络模型识别出视频图像中的背景区域。
为了使训练的结果更为精准,在一种可选的实施方式中,可以通过测试样本图像测试该训练出的第一卷积神经网络模型是否准确,进而根据测试结果决定使用该第一卷积神经网络模型还是对该第一卷积神经网络模型进行再训练。在此方式中,在完成了对第一卷积神经网络模型的初步训练后,还可以获取测试样本图像,使用训练后的第一卷积神经网络模型对测试样本图像进行前背景区域的预测,其中,测试样本图像可为未进行任何标注的样本图像;检验预测的前背景区域是否正确;若不正确,则对第一卷积神经网络模型进行再次训练;若正确,则该第一卷积神经网络模型可用于进行视频图像的前背景确定,或者,为了使第一卷积神经网络模型更为精准,再获取其它测试样本图像进行测试;或者,使用与原训练样本图像不同的样本图像进行再次训练。
当通过测试样本图像检验到使用第一卷积神经网络模型预测的前背景区域不正确时,需要对该第一卷积神经网络模型进行再次训练。在一种再次训练方式中,可以仅使用从测试样本图像中获取的前
背景区域预测不正确的样本图像作为再次训练使用的样本图像;然后,使用这些预测不正确的样本图像对第一卷积神经网络模型进行再次训练。这些进行再次训练的测试样本图像在用于训练前,可进行前背景标注。通过这种再训练方式,不仅使得训练更有针对性,也有利于节约训练成本。本申请实施例不限于此,在实际使用中,也可以使用其它进行了前背景标注的样本图像进行训练。
在一个可选示例中,步骤S302可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的背景区域确定模块602执行。
步骤S304:确定业务对象在视频图像中的展示位置。
本步骤中,可以确定业务对象在视频图像中的一个或者多个展示位置,一种可选的实现方式包括:
方式一、从业务对象对应的多帧展示图像中,确定待展示的展示图像;确定该待展示的展示图像在视频图像中的展示位置。方式二、按照设定规则确定业务对象在视频图像中的展示位置。
以下,分别对上述两种方式进行示例性说明。
方式一
从业务对象对应的具有播放顺序的多帧展示图像中,选择首帧未被播放过的展示图像,将选择的展示图像确定为待展示的展示图像。
其中,从业务对象对应的具有播放顺序的多帧展示图像中,选择首帧未被播放过的展示图像,具体可以先从业务对象对应的具有播放顺序的多帧展示图像中,确定播放状态为未播放的展示图像,再选择播放状态为未播放的展示图像中的首帧展示图像。在此情况下,在选择的首帧未被播放过的展示图像播放完毕之后,对具有播放顺序的多帧展示图像的播放状态进行更新。
需要说明的是,无论业务对象为动态业务对象还是静态业务对象,业务对象均可由多帧图像组成。视频图像中的每帧图像与业务对象的每帧图像保持时间对齐,在视频图像播放下一帧图像时,也可以同步展示对应的业务对象的下一帧图像。
方式二
按照设定规则确定业务对象在视频图像中的展示位置;其中,设定规则例如可以包括:设定规则1、预先设定的业务对象在视频图像的设定展示位置,或者,设定规则2、根据前景区域中的目标对象的动作检测数据,确定业务对象在视频图像中的展示位置。
基于设定规则1:预先设定的业务对象与其在视频图像中的展示位置存在对应关系,若待展示的业务对象确定为预先设定的业务对象,则待展示的业务对象的展示位置相应地确定为预先设定的业务对象在视频图像中的设定展示位置。
基于设定规则2:可以分别使用预先训练好的第二卷积神经网络模型和前景区域中的目标对象的动作检测数据,确定业务对象在视频图像中的展示位置;还可以根据业务对象的类型和前景区域中的目标对象的动作检测数据,确定业务对象在视频图像中的展示位置;或者可以根据预先设定的动作数据和前景区域中的目标对象的动作检测数据确定业务对象在视频图像中的展示位置。以下对上述三种确定业务对象在视频图像中的展示位置的方式进行示例性介绍。
1)使用预先训练好的第二卷积神经网络模型和前景区域中的目标对象的动作检测数据,确定业务对象在视频图像中的展示位置。
其中,预先训练一个卷积神经网络模型(即第二卷积神经网络模型),训练完成的第二卷积神经网络模型具有确定业务对象在视频图像中的展示位置的功能;或者,也可以直接使用第三方已训练完成的、具有确定业务对象在视频图像中的展示位置的功能的卷积神经网络模型。
预先训练第二卷积神经网络模型时,一种可选的训练方式包括以下过程:
(1)获取第二样本图像的第二特征向量。
其中,第二特征向量中包含有第二样本图像中的业务对象的位置信息和/或置信度信息,以及第二样本图像中目标对象的目标对象特征向量。业务对象的置信度信息指示了业务对象展示在当前位置时,能够达到的效果(如被关注或被点击或被观看)的概率,该概率可以根据对历史数据的统计分析结果设定,也可以根据仿真实验的结果设定,还可以根据人工经验进行设定。在实际应用中,可以根据实际需要,仅对业务对象的位置信息进行训练,也可以仅对业务对象的置信度信息进行训练,还可以对二者均进行训练。对二者均进行训练,能够使得训练后的第二卷积神经网络模型更为有效和精准地确定业务对象的位置信息和置信度信息,以便为视频图像的处理提供依据。
第二卷积神经网络模型对大量的第二样本图像进行训练,本实施例中,可使用包含有业务对象的第二样本图像对第二卷积神经网络模型进行训练,本领域技术人员应当明了的是,用来训练的第二样本图像中,除了包含业务对象外,也可以包含目标对象属性的信息,即人脸的面部表情和/或动作、人手的手势和/或动作等等信息。此外,本实施例中的第二样本图像中的业务对象可以被预先标注位置信息、或者置信度信息,或者位置信息和置信度二种信息都标注。当然,在实际应用中,这些信息也可
以通过其它途径获取。而通过预先在对业务对象进行相应信息的标注,可以有效节约数据处理的数据和交互次数,提高数据处理效率。
将具有业务对象的位置信息和/或置信度信息,以及某种目标对象属性的第二样本图像作为训练样本,对其进行特征向量提取,获得包含有业务对象的位置信息和/或置信度信息的业务对象特征向量,以及目标对象属性对应的目标对象特征向量。
可选地,可以使用第二卷积神经网络模型对目标对象和业务对象同时进行训练,在此情况下,第二样本图像的第二特征向量中,也包含目标对象的特征。
对第二特征向量的提取可以采用相关技术中的适当方式实现,本实施例在此不再赘述。
(2)对第二特征向量进行卷积处理,获取第二特征向量卷积结果。
在本实施例中,获取的第二特征向量卷积结果中包含有业务对象的位置信息和/或置信度信息,目标对象属性对应的目标对象特征向量对应的特征向量卷积结果。在对目标对象和业务对象进行联合训练的情况下,特征向量卷积结果中还包含目标对象信息。
对第二特征向量的卷积处理次数可以根据实际需要进行设定,也即,第二卷积神经网络模型中,卷积层的层数根据实际需要进行设置,在此不再赘述。
第二特征向量卷积结果是对第二特征向量进行了特征提取后的结果,该结果可以有效表征视频图像中目标对象的特征对应的业务对象。
本实施例中,当第二特征向量中既包含业务对象的位置信息、又包含业务对象的置信度信息时,也即,对业务对象的位置信息和置信度信息均进行了训练的情况下,该第二特征向量卷积结果在后续分别进行收敛条件判断时共享,无须进行重复处理和计算,可以减少由数据处理引起的资源损耗,提高数据处理速度和效率。
(3)判断第二特征向量卷积结果中对应的业务对象的位置信息和/或置信度信息是否满足业务对象收敛条件,并判断第二特征向量卷积结果中对应的目标对象特征向量是否满足目标对象收敛条件。
其中,业务对象收敛条件和目标对象收敛条件可以由本领域技术人员根据实际需求适当设定。当位置信息和/或置信度信息满足业务对象收敛条件,且目标对象特征向量满足目标对象收敛条件时,可以认为第二卷积神经网络模型中的网络参数设置适当;当位置信息和/或置信度信息不满足业务对象收敛条件,和/或目标对象特征向量不满足目标对象收敛条件时,可以认为第二卷积神经网络模型中的网络参数设置不适当,需要对其网络参数进行调整,该调整过程可以是一个迭代的过程,直至使用调整后的网络参数对第二特征向量进行卷积处理的结果满足收敛条件。
一种可选方式中,业务对象收敛条件可以根据预设的标准位置和/或预设的标准置信度进行设定,例如,将第二特征向量卷积结果中业务对象的位置信息指示的位置与预设的标准位置之间的距离满足一定阈值作为业务对象的位置信息的收敛条件;将第二特征向量卷积结果中业务对象的置信度信息指示的置信度与预设的标准置信度之间的差别满足一定阈值作为业务对象的置信度信息的收敛条件等。
其中,可选地,预设的标准位置可以是对待训练的第二样本图像中的业务对象的位置进行平均处理后获得的平均位置;预设的标准置信度可以是对待训练的第二样本图像中的业务对象的置信度进行平均处理后获取的平均置信度。因第二样本图像为待训练样本且数据量庞大,可依据待训练的第二样本图像中的业务对象的位置和/或置信度设定标准位置和/或标准置信度,这样设定的标准位置和标准置信度也更为客观和精确。
在具体进行第二特征向量卷积结果中对应的业务对象的位置信息和/或置信度信息是否满足业务对象收敛条件的判断时,一种可选的方式包括:
获取第二特征向量卷积结果中对应的业务对象的位置信息,通过计算对应的业务对象的位置信息指示的位置与预设的标准位置之间的欧式距离,得到对应的业务对象的位置信息指示的位置与预设的标准位置之间的第一距离,根据第一距离判断对应的业务对象的位置信息是否满足业务对象收敛条件;
和/或,
获取第二特征向量卷积结果中对应的业务对象的置信度信息,计算对应的业务对象的置信度信息指示的置信度与预设的标准置信度之间的欧式距离,得到对应的业务对象的置信度信息指示的置信度与预设的标准置信度之间的第二距离,根据第二距离判断对应的业务对象的置信度信息是否满足业务对象收敛条件。其中,采用欧式距离的方式,实现简单且能够有效指示收敛条件是否被满足。但本申请实施例并不限于此,还可以采用马式距离、巴式距离等其它方式。
可选地,如前所述,预设的标准位置为对待训练的第二样本图像中的业务对象的位置进行平均处理后获得的平均位置;和/或,预设的标准置信度为对待训练的第二样本图像中的业务对象的置信度进行平均处理后获取的平均置信度。
当第二特征向量卷积结果中还包含目标对象的信息时,对目标对象的信息是否收敛的判断可以参
照相关使用第二卷积神经网络模型的收敛条件进行判断,在此不再赘述。若目标对象的信息满足收敛条件,即第二特征向量卷积结果中的目标对象特征向量满足目标对象收敛条件,则可对目标对象进行分类,明确目标对象的所属类别,以为后续业务对象的展示位置确定提供参考和依据。
(4)若上述收敛条件都满足,即:位置信息和/或置信度信息满足业务对象收敛条件,且目标对象特征向量满足目标对象收敛条件,则完成对第二卷积神经网络模型的训练;若不满足任一或全部收敛条件,即:位置信息和/或置信度信息不满足业务对象收敛条件,和/或目标对象特征向量不满足目标对象收敛条件,则根据第二特征向量卷积结果中对应的业务对象的位置信息和/或置信度信息,和第二特征向量卷积结果中的目标对象特征向量,调整该第二卷积神经网络模型的参数,并根据调整后的第二卷积神经网络模型的网络参数对该第二卷积神经网络模型进行迭代训练,直至迭代训练后的业务对象的位置信息和/或置信度信息满足收敛条件,目标对象特征向量满足目标对象收敛条件。
通过对第二卷积神经网络模型进行上述训练,第二卷积神经网络模型可以对基于业务对象的展示位置进行特征提取和分类,从而具有确定业务对象在视频图像中的展示位置的功能。其中,当展示位置包括多个时,通过上述业务对象置信度的训练,第二卷积神经网络模型还可以确定出多个展示位置中的展示效果的优劣顺序,从而确定最终的展示位置。在后续应用中,当需要展示业务对象时,根据视频图像即可确定出有效的展示位置。
此外,在对第二卷积神经网络模型进行上述训练之前,还可以预先对第二样本图像进行预处理,包括:获取多个第二样本图像,其中,每个第二样本图像中包含有业务对象的标注信息;根据标注信息确定业务对象的位置,判断确定的业务对象的位置与预设位置的距离是否小于或等于设定阈值;将小于或等于设定阈值的业务对象对应的第二样本图像,确定为待训练的第二样本图像。其中,预设位置和设定阈值均可以由本领域技术人员采用任意适当方式进行适当设置,例如根据数据统计分析结果或者相关距离计算公式或者人工经验等,本实施例对此不作限制。
在一种可选方式中,根据标注信息确定的业务对象的位置可以是业务对象的中心位置。在根据标注信息确定业务对象的位置,判断确定的业务对象的位置与预设位置的距离是否小于或等于设定阈值时,可以根据标注信息确定业务对象的中心位置;进而判断该中心位置与预设位置的方差是否小于或等于设定阈值。
通过预先对第二样本图像进行预处理,可以过滤掉不符合条件的样本图像,以保证训练结果的准确性。
通过上述过程实现了第二卷积神经网络模型的训练,训练完成的第二卷积神经网络模型可以用来确定业务对象在视频图像中的展示位置。例如,在视频直播过程中,若主播点击业务对象指示进行业务对象展示时,在第二卷积神经网络模型获得了直播的视频图像中主播的面部特征点后,可以指示出展示业务对象的最终位置如背景区域中距离主播的头部预设距离的位置,进而控制直播应用在该位置展示业务对象;或者,在视频直播过程中,若主播点击业务对象指示进行业务对象展示时,第二卷积神经网络模型可以直接根据直播的视频图像确定业务对象的展示位置。
2)根据业务对象的类型和前景区域中的目标对象的动作检测数据,确定业务对象在视频图像中的展示位置。
首先根据前景区域中的目标对象的动作检测数据和业务对象的类型,获得业务对象在视频图像中的多个展示位置;然后从多个展示位置中选择至少一个展示位置作为业务对象在所述视频图像中的最终展示位置。其中,目标对象的动作类型例如包括但不限于:人脸类型、手部类型和动作类型。其中,人脸类型用于指示人脸在视频图像中占据主要部分,手部类型用于指示手部在视频图像中占据主要部分,而动作类型则用于指示人物进行了某种动作。目标对象的动作检测数据例如可以包括但不限于以下任意一项或多项:眨眼数据、张嘴数据、点头数据、摇头数据、亲吻数据、微笑数据、挥手数据、剪刀手数据、握拳数据、托手数据、大拇指数据、手枪姿势数据、OK手数据等,上述动作检测数据对应的目标对应的动作例如可以包括但不限于以下任意一项或多项:眨眼、张嘴、点头、摇头、亲吻、微笑、挥手、剪刀手、握拳、托手、大拇指、手枪姿势、OK手等。
3)根据预先设定的动作数据和前景区域中的目标对象的动作检测数据确定业务对象在视频图像中的展示位置。
判断前景区域中的目标对象的动作检测数据与预先设定的动作数据是否匹配,即:判断前景区域中的目标对象的动作与预先设定的动作是否匹配;若匹配,则从预先存储的动作数据与展示位置的对应关系中,获取预先设定的动作数据对应的目标展示位置作为业务对象在视频图像中的展示位置。
在实施中,可以预先设定多种不同的动作数据,并对不同的动作数据进行相应的标记,如标记为脸部的动作或者手部的动作等等。不同的动作数据对应着不同的展示位置。将前景区域中的目标对象的动作检测数据与预先设定的动作数据进行匹配,如果动作检测数据与预先设定的动作数据相同,即:
前景区域中的目标对象的动作与预先设定的动作相同,则可以确定前景区域中的目标对象的动作检测数据与预先设定的动作数据进行匹配。
为了提高匹配的准确度,可以通过计算的方式确定上述匹配结果,例如,可以设置匹配算法计算动作检测数据和预先设定的动作数据之间的匹配度,例如,可以使用动作检测数据和预先设定的动作数据进行匹配计算,得到两者之间的匹配度数值,通过上述方式分别计算动作检测数据与预先设定的每一个动作数据之间的匹配度数值,从得到的匹配度数值中选取最大的匹配度数值,如果该最大的匹配度数值超过预定的匹配阈值,则可以确定最大的匹配度数值对应的预先设定的动作数据与动作检测数据相匹配。如果该最大的匹配度数值未超过预定的匹配阈值,则匹配失败。
需要说明的是,上述步骤S302和步骤S304可以同时执行,也可以按照任意顺序执行,例如,先执行步骤S302,后执行步骤S304,或者先执行步骤S304,后执行步骤S302,本实施例对步骤S302和步骤S304的执行顺序不做具体限定。
在一个可选示例中,步骤S304可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的展示位置确定模块6040执行。
步骤S306:根据展示位置判断业务对象与前景区域是否存在重叠部分。若存在,则执行步骤S308;若不存在,则执行步骤S310。
本步骤中,判断业务对象与前景区域是否存在重叠部分的目的是判断业务对象是否对视频图像的前景区域造成遮挡。若存在重叠部分,则表示业务对象对视频图像的前景区域造成遮挡;若不存在重叠部分,则表示业务对象对视频图像的前景区域未造成遮挡。
在一个可选示例中,步骤S306可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的判断模块6044执行。
步骤S308:采用计算机绘图方式,在背景区域绘制业务对象中除重叠部分之外的部分。
一种可选的方式中,例如,业务对象Y与视频图像S的前景区域q存在重叠部分c,在视频图像S的背景区域b绘制业务对象Y中除重叠部分c之外的部分w,其中,重叠部分c和部分w共同组成业务对象Y,前景区域q和背景区域b共同组成业务对象Y。
另一种可选的方式中,若业务对象对视频图像的前景区域造成遮挡,则可以采用调整前景区域的显示图层的方式将前景区域置于业务对象之上展示,或者还可以采用调整业务对象的显示图层的方式将业务对象置于前景区域之下展示。本实施例对将前景区域置于业务对象之上展示的技术手段不做具体限制。
本实施例中,将前景区域置于业务对象之上展示时,可以将重叠部分对应的前景区域置于重叠部分对应的业务对象之上展示。
在一个可选示例中,步骤S308可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的展示模块6046执行。
之后,不执行本实施例的后续流程。
步骤S310:采用计算机绘图方式,在背景区域绘制整个业务对象。
在一个可选示例中,步骤S310可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的绘制模块604执行。
通过本实施例提供的视频图像的处理方法,获取待处理的视频图像和待展示的业务对象,确定待处理的视频图像的前景区域,以及业务对象在视频图像中的展示位置,然后根据业务对象在视频图像中的展示位置判断业务对象与视频图像的前景区域是否存在重叠部分,若存在重叠部分,表示部分或全部业务对象位于前景区域内,则对于该重叠部分,将前景区域置于业务对象之上展示,以避免业务对象遮挡前景区域,不影响观众的正常视频观看体验,不易引起观众反感,有利于实现预想的业务对象展示效果。
在本申请各实施例的一个可选示例中,可以利用训练好的第一卷积神经网络模型,对图像进行检测,分割出图像的前背景。一种可选的实现方式包括:
获取待检测的视频图像。其中,该图像可以包括静态图像或视频中的图像;
采用训练好的第一卷积神经网络模型检测视频图像,获得该视频图像的前景区域的预测信息和背景区域的预测信息。
在一种可选方案中,上述视频图像为直播类视频中的图像。在另一种可选方案中,上述视频图像包括视频流中的多帧图像。由于视频流中的多帧图像存在较多的上下文关联,通过训练好的第一卷积神经网络模型,可以快速高效地对视频流中每帧图像的前背景进行检测。
图4是本申请视频图像的处理方法再一实施例的流程图。本申请各实施例中,业务对象可以是包含有语义信息的目标特效,其中,目标特效可以包含广告信息的以下至少一种形式的特效:二维贴纸
特效、三维特效、粒子特效。本实施例以业务对象为二维广告贴纸特效、视频图像为直播类视频图像为例,对本申请实施例的视频图像的处理方案进行说明。参照图4。
本实施例视频图像的处理方法包括:
步骤S400:获取待处理的直播类视频图像和待展示的广告贴纸特效。
可以从直播类视频应用中获取待处理的直播类视频图像。待展示的广告贴纸特效可以为广告商定制的广告贴纸特效,或者可以为主播选择的广告贴纸特效。
在一个可选示例中,步骤S400可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的获取模块600执行。
步骤S402:确定直播类视频图像的主播区域,以及广告贴纸特效在直播类视频图像中的展示位置。
其中,直播类视频图像可以划分为两部分区域,分别为主播区域和背景区域。其中,主播区域为主播本身的区域,背景区域为直播类视频图像中除主播区域以外的区域。
本步骤确定主播区域以及展示位置的执行过程可以按照上述实施例中的相关内容,在此不再赘述。
在一个可选示例中,步骤S402可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的展示位置确定模块6040执行。
步骤S404:根据展示位置判断广告贴纸特效与主播区域是否存在重叠部分。若存在,则执行步骤S406;若不存在,则执行步骤S408。
在一个可选示例中,步骤S402可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的判断模块6044执行。
步骤S406:将主播区域置于广告贴纸特效之上展示。
在一个可选示例中,步骤S406可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的展示模块6046执行。
之后,不执行本实施例的后续流程。
步骤S408:采用计算机绘图方式,在背景区域绘制整个业务对象。
在一个可选示例中,步骤S406可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的绘制模块604执行。
本实施例中,当视频图像的主体为主播时,观众主要关注的区域为主播的脸部区域和肢体动作,为了能够既让观众注意到广告贴纸特效的内容,又不会影响到主播,可以通过增强现实感效果,给视频图像相关区域加上有语义的虚拟物品如广告贴纸特效。并通过虚拟物品上的展示效果和信息达到商业价值。通过这种方式,既保留了主播的主要形象和动作,同时通过增强现实的特效为视频图像增加了趣味性,减少了观众对广告投放引起的可能的反感,并能够吸引到观众的注意力,可形成商业的价值。例如,若广告贴纸特效为一个飞鸟贴纸特效,飞鸟贴纸特效在视频图像中展示,当飞鸟贴纸特效的展示位置与主播区域存在重叠部分时,即飞鸟贴纸特效遮挡主播区域时,将主播区域置于飞鸟贴纸特效之上展示。具体地,若飞鸟贴纸特效与主播区域的重叠部分为飞鸟贴纸特效的头部,飞鸟贴纸特效的其余部分不与主播区域重叠,则将主播区域置于飞鸟贴纸特效的头部之上展示,展示效果为可见飞鸟贴纸特效除头部以外的部分和主播区域。若飞鸟贴纸特效与主播区域的重叠部分为整个飞鸟贴纸特效,则将主播区域置于整个飞鸟贴纸特效之上展示,展示效果为不可见整个飞鸟贴纸特效,可见主播区域。
需要说明的是,上述广告贴纸特效,以飞鸟贴纸特效为例,可以为静态的广告贴纸特效,也可以为动态的广告贴纸特效。当广告贴纸特效为动态的广告贴纸特效时,可以将动态的广告贴纸特效的每一帧作为一个静态的广告贴纸特效进行处理,该处理过程可以参照上述实施例中的相关内容,在此不再赘述。
随着互联网直播的兴起,越来越多的视频以直播的方式出现。这类视频具有场景简单、实时、因观众主要在手机等移动终端上观看而视频图像尺寸较小等特点。本实施例通过业务对象投放广告,将广告投放与视频直播内容有效融合,方式灵活,效果生动,不仅不影响用户的直播观看体验,且有利于提升广告的投放效果。对于使用较小的显示屏幕进行业务对象展示,广告投放等场景尤其适用。
此外,本实施例的视频图像处理方法可以在任意适当的具有数据采集、处理和传输功能的电子设备实现,本申请实施例对实现设备不作限制。
另外,在本申请视频图像的处理方法的又一实施例中,待展示的业务对象用于在后续绘制的步骤中绘制在视频图像的背景区域,可以使业务对象覆盖视频图像的背景区域中的原始内容。该待展示的业务对象可以包括多种形式,本申请各实施例对业务对象的具体形式不做限制。例如,待显示的业务对象可以是包含沙滩场景的图片A,视频图像的背景区域中的原始内容为包含室内场景的图片B。该实施例中,采用计算机绘图方式在视频图像的背景区域绘制所述业务对象,可以采用计算机绘图方式
在所述背景区域绘制所述业务对象,使所述业务对象覆盖所述背景区域的原始内容。
在确定了背景区域以后,可以采用计算机绘图方式将业务对象绘制在视频图像的背景区域即采用计算机绘图方式将业务对象绘制在视频图像的整个背景区域。实现绘制业务对象后的背景区域替换背景区域的原始内容。例如,视频图像的原始背景区域为一幅室内场景图片,业务对象为一幅瀑布图片,绘制业务对象后的背景区域中显示的是一幅瀑布图片,而不是原始的室内场景图片,视频图像的前景区域可以保持不变。
通过本实施例视频图像的处理方法,采用计算机绘图方式在确定出的背景区域绘制业务对象,以使业务对象覆盖视频图像中的背景区域的原始内容,可以有效实现预想的业务对象投放效果。
本申请各实施例视频图像的处理方法中,采用计算机绘图方式在视频图像的背景区域绘制所述业务对象之后,除了可以在本地显示在背景区域绘制了业务对象的视频图像之外,还可以将在背景区域绘制了业务对象的视频图像发送至其他设备,例如,移动终端、PC、服务器或者平板电脑等等,可以实现在背景区域绘制了业务对象的视频图像的共享。
图5是本申请视频图像的处理方法还一实施例的流程图。本实施例重点强调与上述实施例的不同之处,相同之处可以参照上述实施例的介绍和说明,在此不再赘述。参见图5,本实施例视频图像的处理方法包括:
步骤S500、获取待展示的业务对象。
本申请各实施例中,例如可以通过但不限于使用以下方式获取待展示的业务对象:从预定存储位置获取业务对象,或者从视频源接收业务对象。例如,业务对象存储在移动终端本地,可以从移动终端本地直接读取业务对象,本地存储的业务对象可以为上述介绍的目标视频、目标图像、目标特效中的任意一种或者几种;当业务对象为目标视频时,可以从视频源接收目标视频,若目标视频为直播类视频,则可以从直播摄影机接收该直播类视频。
在一个可选示例中,步骤S500可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的获取模块600执行。
步骤S502、通过预先训练的第一卷积神经网络模型检测视频图像的背景区域。
本实施例中,使用已训练好的、用于分割视频图像中的前景区域和背景区域的第一卷积神经网络模型,对视频图像进行检测,确定其背景区域。该第一卷积神经网络模型通过标注有前景区域和背景区域的样本图像进行训练后,可以准确、高效地确定视频图像中的前景区域和背景区域。
在本申请各实施例的一种可选方式中,第一卷积网络模型为全卷积网络模型,与具有全连接层的卷积网络模型相比,采用全卷积网络模型所需的卷积层参数较少,训练速度较快。
在其中一个可选示例中,对第一卷积神经网络模型的一个示例性结构的简要说明如下:
(1)输入层
例如,可以输入待训练的样本图像的特征向量,该特征向量中包含有样本图像的背景区域的信息,或者,该特征向量中包含有样本图像的前景区域的信息和背景区域的信息。
(2)卷积层
//第一阶段,对待训练的样本图像的特征向量进行卷积处理,获得卷积结果。
2.<=1卷积层1_1(3x3x64)
3.<=2非线性响应ReLU层
4.<=3卷积层1_2(3x3x64)
5.<=4非线性响应ReLU层
6.<=5池化层(3x3/2)
7.<=6卷积层2_1(3x3x128)
8.<=7非线性响应ReLU层
9.<=8卷积层2_2(3x3x128)
10.<=9非线性响应ReLU层
11.<=10池化层(3x3/2)
12.<=11卷积层3_1(3x3x256)
13.<=12非线性响应ReLU层
14.<=13卷积层3_2(3x3x256)
15.<=14非线性响应ReLU层
16.<=15卷积层3_3(3x3x256)
17.<=16非线性响应ReLU层
18.<=17池化层(3x3/2)
19.<=18卷积层4_1(3x3x512)
50.<=19非线性响应ReLU层
21.<=50卷积层4_2(3x3x512)
22.<=21非线性响应ReLU层
23.<=22卷积层4_3(3x3x512)
24.<=23非线性响应ReLU层
25.<=24池化层(3x3/2)
26.<=25卷积层5_1(3x3x512)
27.<=26非线性响应ReLU层
28.<=27卷积层5_2(3x3x512)
29.<=28非线性响应ReLU层
30.<=29卷积层5_3(3x3x512)
31.<=30非线性响应ReLU层
//第二阶段,对第一阶段获得的卷积结果进行插值放大,并进行损失函数的计算。
32.<=31线性差值层
33.<=32损失层,进行损失函数的计算
(3)输出层
第一,在通过前31层的处理获得特征向量后,线性插值层通过双线性插值法对前31层处理后的特征向量进行插值,以放大中间层特征,得到和训练的样本图像同样大小(图像长宽一样)的输出图像。
第二,本实施例中,33层的损失层采用Softmax函数进行处理。一种可选的Softmax函数如下:
其中,x表示输入的特征,j表示第j类别,y表示输出的类别,K表示总共类别数,k表示第k类别,Wj表示第j类别的分类参数,XT表示X向量的转置,P(y=j|x)表示给定输入x,预测为第j类的概率。
但不限于此,在实际使用中,本领域技术人员也可以采用其它Softmax函数,本发明实施例对此不做限制。
第三,上述卷积层对特征向量进行的处理可以是迭代多次进行的,每完成一次,根据损失层计算出的结果调整第一卷积神经网络模型的参数(如卷积核的值、层间输出线性变化的权重等等),基于参数调整后的第一卷积神经网络模型再进行处理,迭代多次,直至满足收敛条件。
第四,本实施例中,收敛条件可以是对第一卷积神经网络模型进行迭代训练的次数达到最大迭代次数,如10000~50000次。
第五,上述第一卷积神经网络模型对于视频图像的学习,可以单帧视频图像输入,也可以通过多帧视频图像输入,输出多帧视频图像的结果。即第一层输入层可以输入一帧视频图像,也可以输入一个视频流,这个视频流包含多帧视频图像。
最后一层损失层,可以针对一帧视频图像计算损失函数,也可以对视频流的多帧视频图像计算损失函数。
通过视频流方式的训练和学习,可以使第一卷积神经网络模型得到视频上较为稳定的结果,同时通过多帧视频图像的并行计算,提升计算效率。
其中,可以通过修改输入层和输出层的特征图谱的大小实现多帧视频图像的同时输入和输出。
第六,上述卷积网络结构的说明中,2.<=1表明当前层为第二层,输入为第一层;卷积层后面括号为卷积层参数(3x3x64)表明卷积核大小为3x3,通道数为64;池化层后面括号(3x3/2)表明池化核大小为3x3,间隔为2。其它依此类推,不再赘述。
在上述第一卷积神经网络模型结构中,每个卷积层之后都有一个非线性响应单元,该非线性响应单元采用纠正线性单元(Rectified Linear Units,ReLU),通过在卷积层后增加上述纠正线性单元,将卷积层的映射结果尽量稀疏,以便更接近人的视觉反应,从而使图像处理效果更好。
将卷积层的卷积核设为3x3,可以较好地综合局部信息。
设定池化层(Max pooling)的步长stride,使上层特征在不增加计算量的前提下获得更大的视野,池化层的步长stride还有增强空间不变性的特征,即允许同样的输入出现在不同的图像位置上,而输
出结果响应相同。
线性差值层可以将之前的特征放大到原图大小,得到每个像素的预测值。
综上所述,该全卷积神经网络模型的卷积层可以用于信息归纳和融合,最大池化层(Max pooling)进行高层信息的归纳,该第一卷积神经网络模型结构可以进行微调来适应不同的性能和效率的权衡。
但本领域技术人员应当明了的是,上述卷积核的大小、通道数、池化核的大小、间隔以及卷积层的层数数量均为示例性说明,在实际应用中,本领域技术人员可以根据实际需要进行适应性调整,本发明实施例对此不做限制。此外,本实施例中的第一卷积神经网络模型中的所有层的组合及参数都是可选的,可以任意组合。
通过本实施例中的第一卷积神经网络模型,可以实现对视频图像中前背景区域的有效分割。
在一个可选示例中,步骤S502可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的背景区域确定模块602执行。
步骤S504、采用计算机绘图方式在确定出的背景区域绘制业务对象,以使业务对象覆盖视频图像中的背景区域的原始内容。
在一个可选示例中,步骤S504可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的绘制模块604执行。
步骤S506、发送在背景区域绘制了业务对象的视频图像。
在一个可选示例中,步骤S504可以由处理器调用存储器存储的相应指令执行,也可以由电子设备上的通信模块执行。
例如,上述步骤S500-S504可以在移动终端A上执行,可以在移动终端A上显示在背景区域绘制了业务对象的视频图像,以及/或者发送在背景区域绘制了业务对象的视频图像至移动终端B、移动终端C、服务器D等等。
除了上述执行本申请实施例方法的设备将在背景区域绘制了业务对象的视频图像发送至其他设备以外,还可以直接在其他设备上绘制业务对象。可以先获取到待展示的业务对象,可以获取业务对象本身,还可以获取业务对象的属性信息,根据业务对象的属性信息,从业务对象的存储库中获取到业务对象本身;再将业务对象绘制在视频图像的背景区域中。其中,业务对象的属性信息可以来源于包括视频图像和业务对象的属性信息的传输流。其中,业务对象的属性信息可以用于标识业务对象本身,一个业务对象的属性信息对应着唯一一个业务对象本身或者唯一一类业务对象本身。例如,业务对象的属性信息可以为具有唯一标识的编号等。
需要说明的是,业务对象的存储库可以包括执行本申请实施例方法的当前设备的业务对象的存储库和其他设备的业务对象的存储库。另外,当前设备的业务对象的存储库和其他设备的业务对象的存储库中的业务对象本身及业务对象的属性信息可以保持同步。
上述图5所述实施例可以应用在一种示例性的应用场景中,例如,主播终端获取主播直播时的视频图像,其中,视频图像为直播类视频图像。从视频图像检测背景区域,在主播终端本地获取业务对象或者从第三方摄像机实时接收业务对象,采用计算机绘图方式在视频图像的背景区域绘制业务对象,在主播终端上显示绘制业务对象后的视频图像,并将绘制业务对象后的视频图像发送至服务器或者粉丝终端,在服务器进行中转或者存储,在粉丝终端进行显示。其中,直播类视频图像的背景区域包括直播类视频图像中除了主播图像之外的区域。
需要说明的是,上述步骤S500和步骤S502之间的执行顺序可以为先执行步骤S500,后执行步骤S502;还可以为先执行步骤S502,后执行步骤S500;或者可以为同时执行步骤S500和步骤S502,本实施例对步骤S500和步骤S502的执行顺序不做限制,本实施例仅以先执行步骤S500,后执行步骤S502为例进行说明。
通过本实施例提供的视频图像的处理方法,可以采用计算机绘图方式在确定出的背景区域绘制业务对象,使业务对象覆盖视频图像中的背景区域的原始内容,使得业务对象与视频播放相结合,有利于节约网络资源和/或客户端的系统资源,并且丰富了背景区域的类型,可以有效实现预想的业务对象投放效果;另外,若在视频图像的背景区域绘制接收到的目标视频,实现了在视频图像的背景区域观看视频的效果。若业务对象的内容与视频图像的内容具有相关性,例如,业务对象为目标视频,目标视频为自由女神的介绍视频,主播在视频图像中正在介绍自由女神,则绘制业务对象后的背景区域可以作为视频图像的辅助显示,提升了视频图像的显示效果。
图6是本申请视频图像的处理装置一实施例的结构框图。本申请各实施例视频图像的处理装置可用于实现本申请上述各视频图像的处理方法实施例。参照图6,本实施例视频图像的处理装置包括:获取模块600,用于获取待处理的视频图像和待展示的业务对象;背景区域确定模块602,用于确定视频图像的背景区域;绘制模块604,用于采用计算机绘图方式在视频图像的背景区域绘制业务对象。
通过本实施例提供的视频图像的处理装置,获取待处理的视频图像和待展示的业务对象,确定待处理的视频图像的背景区域,采用计算机绘图方式在视频图像的背景区域绘制业务对象,实现业务对象在视频图像的背景区域展示,可以避免业务对象遮挡前景区域,不影响观众的正常视频观看体验,不易引起观众反感,有利于实现预想的业务对象展示效果。
在本申请各实施例的一个可选示例中,绘制模块604,用于采用计算机绘图方式在视频图像的背景区域绘制待展示的业务对象,使该业务对象覆盖背景区域的原始内容。
图7是本申请视频图像的处理装置另一实施例的结构框图。如图7所示,与图6所示实施例相比,视频图像的处理装置中,绘制模块604包括:展示位置确定模块6040,用于确定业务对象在视频图像中的展示位置;前景区域确定模块6042,用于确定视频图像的前景区域;判断模块6044,用于根据展示位置判断业务对象与前景区域是否存在重叠部分;展示模块6046,用于若业务对象与前景区域存在重叠部分,则采用计算机绘图方式,在背景区域绘制业务对象中除重叠部分之外的部分。
可选地,背景区域确定模块602,用于通过预先训练好的第一卷积神经网络模型确定视频图像的背景区域。
可选地,本实施例的视频图像的处理装置还可以包括:第一训练模块606,用于对第一卷积神经网络模型进行训练。
在其中一个可选示例中,第一训练模块606可以包括:第一特征向量获取模块6060,用于利用第一卷积神经网络模型获取第一样本图像的第一特征向量,其中,第一样本图像为包含有前景标注信息和背景标注信息的样本图像;第一卷积处理模块6061,用于利用第一卷积神经网络模型对第一特征向量进行卷积处理,获取第一特征向量卷积结果;放大处理模块6062,用于对第一特征向量卷积结果进行放大处理;第一条件判断模块6063,用于判断放大后的第一特征向量卷积结果是否满足卷积收敛条件;第一执行模块6064,用于若放大后的第一特征向量卷积结果满足卷积收敛条件,则完成对第一卷积神经网络模型的训练;若放大后的第一特征向量卷积结果不满足卷积收敛条件,则根据放大后的第一特征向量卷积结果调整第一卷积神经网络模型的网络参数并根据调整后的第一卷积神经网络模型的网络参数对第一卷积网络模型进行迭代训练,直至迭代训练后的第一特征向量卷积结果满足卷积收敛条件。
可选地,放大处理模块6064,用于通过对第一特征向量卷积结果进行双线性插值,放大第一特征向量卷积结果。
可选地,放大处理模块6064,用于将第一特征向量卷积结果放大到放大后的第一特征向量卷积结果对应的图像的大小与原始图像大小一致。
可选地,第一条件判断模块6066,用于使用设定的损失函数计算放大后的第一特征向量卷积结果和预定的标准输出特征向量的损失值;以及根据损失值判断放大后的第一特征向量卷积结果是否满足收敛条件。
可选地,再参见图7,第一训练模块606还可以包括:预测模块6065,用于获取测试样本图像,使用训练后的第一卷积神经网络模型对测试样本图像进行前背景区域的预测;检验模块6066,用于检验预测的前背景区域是否正确;再训练模块6067,用于若检验模块的检验结果不正确,使用测试样本图像对第一卷积神经网络模型进行再次训练。
可选地,再训练模块6067,用于若检验模块的检验结果不正确,从测试样本图像中获取前背景区域预测不正确的样本图像;以及使用预测不正确的样本图像对第一卷积神经网络模型进行再次训练,其中,预测不正确的样本图像包含有前景信息和背景信息。
可选地,再参见图7,第一训练模块606还可以包括:视频流模块6068,用于在第一向量获取模块获取第一样本图像的第一特征向量之前,将包括多帧样本图像的视频流输入第一卷积神经网络模型。
可选地,再参见图7,第一训练模块606还可以包括:标注模块6069,用于在视频流模块将包括多帧样本图像的视频流输入第一卷积神经网络模型之前,确定视频流的多个关键帧的图像为样本图像,对样本图像进行前景区域和背景区域的标注。
可选地,本申请各实施例中的第一卷积神经网络模型可以包括全卷积神经网络模型。
可选地,展示位置确定模块6040,用于按照设定规则确定业务对象在视频图像中的展示位置;其中,该设定规则包括:预先设定的业务对象在视频图像的设定展示位置,或者,展示位置确定模块6040根据前景区域中的目标对象的动作检测数据,确定业务对象在视频图像中的展示位置。
可选地,展示位置确定模块6040,用于根据前景区域中的目标对象的动作检测数据,通过预先训练好的第二卷积神经网络模型确定业务对象在视频图像中的展示位置。
可选地,本实施例的视频图像的处理装置还可以包括:第二训练模块608,用于对第二卷积神经网络模型进行训练。
在其中一个可选示例中,第二训练模块608可以包括:第二特征向量获取模块6080,用于利用第二卷积神经网络模型获取第二样本图像的第一特征向量,其中,第二特征向量中包含有第二样本图像中的业务对象的位置信息和/或置信度信息,以及第二样本图像中目标对象的目标对象特征向量;第二卷积处理模块6082,用于利用第二卷积神经网络模型对第二特征向量进行卷积处理,获取第二特征向量卷积结果;第二条件判断模块6084,用于判断第二特征向量卷积结果中对应的业务对象的位置信息和/或置信度信息是否满足业务对象收敛条件,并判断第二特征向量卷积结果中的目标对象特征向量是否满足目标对象收敛条件;第二执行模块6086,用于若第二特征向量卷积结果中对应的业务对象的位置信息和/或置信度信息满足业务对象收敛条件,且第二特征向量卷积结果中的目标对象特征向量满足目标对象收敛条件,则完成对第二卷积神经网络模型的训练;否则,调整第二卷积神经网络模型的网络参数并根据调整后的第二卷积神经网络模型的网络参数对第二卷积神经网络模型进行迭代训练,直至迭代训练后的业务对象的位置信息和/或置信度信息和目标对象特征向量均满足相应的收敛条件。
可选地,展示位置确定模块6040,用于根据前景区域中的目标对象的动作检测数据和业务对象的类型,确定业务对象在视频图像中的展示位置。
可选地,展示位置确定模块6040包括:展示位置获得模块60400,用于根据前景区域中的目标对象的动作检测数据和业务对象的类型,获得业务对象在视频图像中的多个展示位置;展示位置选择模块60402,用于从多个展示位置中选择至少一个展示位置作为所述业务对象在视频图像中的最终展示位置。
可选地,展示位置确定模块6040包括:数据判断模块60404,用于判断前景区域中的目标对象的动作检测数据与预先设定的动作数据是否匹配;位置获取模块60406,用于若前景区域中的目标对象的动作检测数据与预先设定的动作数据匹配,则从预先存储的动作数据与展示位置的对应关系中,获取预先设定的动作数据对应的目标展示位置作为业务对象在视频图像中的展示位置。
可选地,获取模块600,用于从预定存储位置获取待展示的业务对象,或者从视频源接收业务对象。
可选地,业务对象例如可以但不限于包括以下任意一项或多项:目标视频、目标图像、包含有语义信息的特效;视频图像可以包括:静态图像或直播类视频图像。
可选地,目标特效可以包含广告信息的以下任意一种或多种形式的特效:二维贴纸特效、三维特效、粒子特效等。
可选地,直播类视频图像的前景区域为人物所在的区域。
可选地,直播类视频图像的背景区域包括除了人物所在的区域之外的至少局部区域。
可选地,目标对象的动作检测数据包括以下任意一项或多项:眨眼数据、张嘴数据、点头数据、摇头数据、亲吻数据、微笑数据、挥手数据、剪刀手数据、握拳数据、托手数据、大拇指数据、手枪姿势数据、OK手数据。
本实施例视频图像的处理装置用于实现前述多个方法实施例中相应的视频图像的处理方法,并具有相应的方法实施例的有益效果,在此不再赘述。
此外,本实施例的视频图像的处理装置可以设置于适当的电子设备中,包括但不限于移动终端、PC、服务器等。
图8是本申请视频图像的处理装置又一实施例的结构框图。本申请实施例并不对电子设备的具体实现做限定。如图8所示,该电子设备可以包括:处理器(processor)802、通信接口(Communications Interface)804、存储器(memory)806、以及通信总线808。其中:
处理器802、通信接口804、以及存储器806通过通信总线808完成相互间的通信。
通信接口804,用于与其它设备比如其它客户端或服务器等的网元通信。
处理器802可能是中央处理器(CPU),或者是特定集成电路(Application Specific Integrated Circuit,ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路,或者是图形处理器(GraphicsProcessing Unit,GPU)。终端设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU,或者,一个或多个GPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个GPU。
存储器806,用于至少一可执行指令,该可执行指令使处理器802执行如本申请上述任一实施例在视频图像中展示业务对象的方法对应的操作。存储器806可能包含高速随机存取存储器(random access memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
图9为本申请电子设备一个实施例的结构示意图。下面参考图9,其示出了适于用来实现本申请实施例的终端设备或服务器的电子设备的结构示意图。如图9所示,该电子设备包括一个或多个处理器、通信部等,所述一个或多个处理器例如:一个或多个中央处理单元(CPU)901,和/或一个或多个图像处理器(GPU)913等,处理器可以根据存储在只读存储器(ROM)902中的可执行指令或者从存储
部分908加载到随机访问存储器(RAM)903中的可执行指令而执行各种适当的动作和处理。通信部912可包括但不限于网卡,所述网卡可包括但不限于IB(Infiniband)网卡,处理器可与只读存储器902和/或随机访问存储器903中通信以执行可执行指令,通过总线904与通信部912相连、并经通信部912与其他目标设备通信,从而完成本申请实施例提供的任一视频图像的处理方法对应的操作,例如,获取待处理的视频图像和待展示的业务对象;确定所述视频图像的背景区域;采用计算机绘图方式在所述视频图像的背景区域绘制所述业务对象。
此外,在RAM 903中,还可存储有装置操作所需的各种程序和数据。CPU901、ROM902以及RAM903通过总线904彼此相连。在有RAM903的情况下,ROM902为可选模块。RAM903存储可执行指令,或在运行时向ROM902中写入可执行指令,可执行指令使处理器901执行上述视频图像的处理方法对应的操作。输入/输出(I/O)接口905也连接至总线904。通信部912可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线链接上。
以下部件连接至I/O接口905:包括键盘、鼠标等的输入部分906;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分907;包括硬盘等的存储部分908;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分909。通信部分909经由诸如因特网的网络执行通信处理。驱动器911也根据需要连接至I/O接口905。可拆卸介质911,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器911上,以便于从其上读出的计算机程序根据需要被安装入存储部分908。
需要说明的,如图9所示的架构仅为一种可选实现方式,在具体实践过程中,可根据实际需要对上述图9的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU和CPU可分离设置或者可将GPU集成在CPU上,通信部可分离设置,也可集成设置在CPU或GPU上,等等。这些可替换的实施方式均落入本申请公开的保护范围。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请实施例提供的方法步骤对应的指令,例如,获取待处理的视频图像和待展示的业务对象的指令;确定所述视频图像的背景区域的指令;采用计算机绘图方式在所述视频图像的背景区域绘制所述业务对象的指令。
另外,本申请实施例还提供了一种计算机程序,该计算机程序包括计算机可读代码,该程序代码包括计算机操作指令,当计算机可读代码在设备上运行时,设备中的处理器执行用于实现本申请任一实施例视频图像的处理方法中各步骤的指令。
根据本申请的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请实施例提供的方法步骤对应的指令,例如,获取待处理的视频图像和待展示的业务对象;确定所述视频图像的背景区域;采用计算机绘图方式在所述视频图像的背景区域绘制所述业务对象。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于装置、设备、程序、存储介质等实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
可能以许多方式来实现本申请的方法和装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法和装置。根据实施的需要,可将本申请实施例中描述的各个部件/步骤拆分为更多部件/步骤,也可将两个或多个部件/步骤或者部件/步骤的部分操作组合成新的部件/步骤,以实现本申请实施例的目的。用于所述方法的步骤的上述顺序仅是为了进行说明,本申请的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请的方法的程序的记录介质。
上述根据本申请实施例的方法可在硬件、固件中实现,或者被实现为可存储在记录介质(诸如CDROM、RAM、软盘、硬盘或磁光盘)中的软件或计算机代码,或者被实现通过网络下载的原始存储在远程记录介质或非暂时机器可读介质中并将被存储在本地记录介质中的计算机代码,从而在此描述的方法可被存储在使用通用计算机、专用处理器或者可编程或专用硬件(诸如ASIC或FPGA)的记录介质上的这样的软件处理。可以理解,计算机、处理器、微处理器控制器或可编程硬件包括可存储或接收软件或计算机代码的存储组件(例如,RAM、ROM、闪存等),当所述软件或计算机代码被计算机、
处理器或硬件访问且执行时,实现在此描述的处理方法。此外,当通用计算机访问用于实现在此示出的处理的代码时,代码的执行将通用计算机转换为用于执行在此示出的处理的专用计算机。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及方法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。
以上实施方式仅用于说明本申请实施例,而并非对本申请实施例的限制,有关技术领域的普通技术人员,在不脱离本申请实施例的精神和范围的情况下,还可以做出各种变化和变型,因此所有等同的技术方案也属于本申请实施例的范畴,本申请实施例的专利保护范围应由权利要求限定。
Claims (55)
- 一种视频图像的处理方法,其特征在于,包括:获取待处理的视频图像和待展示的业务对象;确定所述视频图像的背景区域;采用计算机绘图方式在所述视频图像的背景区域绘制所述业务对象。
- 根据权利要求1所述的方法,其特征在于,所述采用计算机绘图方式在所述视频图像的背景区域绘制所述业务对象,包括:确定所述业务对象在所述视频图像中的展示位置和所述视频图像的前景区域;根据所述展示位置判断所述业务对象与所述视频图像的前景区域是否存在重叠部分;若存在,则采用计算机绘图方式,在所述背景区域绘制所述业务对象中除所述重叠部分之外的部分。
- 根据权利要求1所述的方法,其特征在于,所述采用计算机绘图方式在所述视频图像的背景区域绘制所述业务对象,包括:采用计算机绘图方式在所述背景区域绘制所述业务对象,使所述业务对象覆盖所述背景区域的原始内容。
- 根据权利要求1-3任一所述的方法,其特征在于,所述确定所述视频图像的背景区域包括:通过预先训练好的第一卷积神经网络模型确定所述视频图像的背景区域。
- 根据权利要求4所述的方法,其特征在于,所述通过预先训练好的第一卷积神经网络模型确定所述视频图像的背景区域,包括:采用所述第一卷积神经网络检测所述视频图像,获得所述视频图像的前景区域的预测信息和背景区域的预测信息。
- 根据权利要求5所述的方法,其特征在于,所述对所述第一卷积神经网络模型的预先训练,包括:利用所述第一卷积神经网络模型获取第一样本图像的第一特征向量,其中,所述第一样本图像为包含有前景标注信息和背景标注信息的样本图像;利用所述第一卷积神经网络模型对所述第一特征向量进行卷积处理,获取第一特征向量卷积结果;对所述第一特征向量卷积结果进行放大处理;判断放大后的所述第一特征向量卷积结果是否满足卷积收敛条件;若满足,则完成对所述第一卷积神经网络模型的训练;若不满足,则根据放大后的所述第一特征向量卷积结果调整所述第一卷积神经网络模型的网络参数,并根据调整后的所述第一卷积神经网络模型的网络参数对所述第一卷积神经网络模型进行迭代训练,直至迭代训练后的所述第一特征向量卷积结果满足所述卷积收敛条件。
- 根据权利要求6所述的方法,其特征在于,所述对所述第一特征向量卷积结果进行放大处理,包括:通过对所述第一特征向量卷积结果进行双线性插值,放大所述第一特征向量卷积结果。
- 根据权利要求6或7所述的方法,其特征在于,所述对所述第一特征向量卷积结果进行放大处理,包括:将所述第一特征向量卷积结果放大到放大后的第一特征向量卷积结果对应的图像的大小与原始图像大小一致。
- 根据权利要求6-8任一所述的方法,其特征在于,所述判断放大后的所述第一特征向量卷积结果是否满足卷积收敛条件包括:使用设定的损失函数计算放大后的所述第一特征向量卷积结果和预定的标准输出特征向量的损失值;根据所述损失值判断放大后的所述第一特征向量卷积结果是否满足收敛条件。
- 根据权利要求6-9任一所述的方法,其特征在于,还包括:获取测试样本图像,使用训练后的所述第一卷积神经网络模型对所述测试样本图像进行前背景区域的预测;检验预测的前背景区域是否正确;若不正确,则使用所述测试样本图像对所述第一卷积神经网络模型进行再次训练。
- 根据权利要求10所述的方法,其特征在于,所述使用所述测试样本图像对所述卷积神经网络模型进行再次训练,包括:从所述测试样本图像中获取前背景区域预测不正确的样本图像;使用预测不正确的样本图像对所述第一卷积神经网络模型进行再次训练,其中,所述预测不正确的样本图像包含有前景信息和背景信息。
- 根据权利要求6-11任一所述的方法,其特征在于,所述获取第一样本图像的第一特征向量之前,还包括:将包括多帧样本图像的视频流输入所述第一卷积神经网络模型。
- 根据权利要求12所述的方法,其特征在于,所述将包括多帧样本图像的视频流输入所述第一卷积神经网络模型之前,还包括:确定所述视频流的多个关键帧的图像为样本图像,对所述样本图像进行前景区域和背景区域的标注。
- 根据权利要求6-13任一所述的方法,其特征在于,所述第一卷积神经网络模型包括全卷积神经网络模型。
- 根据权利要求2-14任一所述的方法,其特征在于,所述确定所述业务对象在所述视频图像中的展示位置,包括:按照设定规则确定所述业务对象在所述视频图像中的展示位置;其中,所述设定规则包括:预先设定的所述业务对象在所述视频图像的设定展示位置,或者,根据所述前景区域中的目标对象的动作检测数据,确定所述业务对象在所述视频图像中的展示位置。
- 根据权利要求15所述的方法,其特征在于,所述根据所述前景区域中的目标对象的动作检测数据,确定所述业务对象在所述视频图像中的展示位置,包括:根据所述前景区域中的目标对象的动作检测数据,通过预先训练好的、第二卷积神经网络模型确定所述业务对象在所述视频图像中的展示位置。
- 根据权利要求16所述的方法,其特征在于,所述对所述第二卷积神经网络模型的预先训练,包括:利用所述第二卷积神经网络模型获取第二样本图像的第一特征向量,其中,所述第二特征向量中包含有所述第二样本图像中的业务对象的位置信息和/或置信度信息,以及所述第二样本图像中目标对象的目标对象特征向量;利用所述第二卷积神经网络模型对所述第二特征向量进行卷积处理,获取第二特征向量卷积结果;判断所述第二特征向量卷积结果中对应的业务对象的位置信息和/或置信度信息是否满足业务对象收敛条件,并判断所述第二特征向量卷积结果中的目标对象特征向量是否满足目标对象收敛条件;若均满足,则完成对所述第二卷积神经网络模型的训练;否则,调整所述第二卷积神经网络模型的网络参数并根据调整后的所述第二卷积神经网络模型的网络参数对所述第二卷积神经网络模型进行迭代训练,直至迭代训练后的业务对象的位置信息和/或置信度信息和目标对象特征向量均满足相应的收敛条件。
- 根据权利要求15所述的方法,其特征在于,所述根据所述前景区域中的目标对象的动作检测数据,确定所述业务对象在所述视频图像中的展示位置,包括:根据所述前景区域中的目标对象的动作检测数据和所述业务对象的类型,确定所述业务对象在所述视频图像中的展示位置。
- 根据权利要求18所述的方法,其特征在于,所述根据所述前景区域中的目标对象的动作检测数据和所述业务对象的类型,确定所述业务对象在所述视频图像中的展示位置,包括:根据所述前景区域中的目标对象的动作检测数据和所述业务对象的类型,获得所述业务对象在所述视频图像中的多个展示位置;从所述多个展示位置中选择至少一个展示位置作为所述业务对象在所述视频图像中的展示位置。
- 根据权利要求15所述的方法,其特征在于,所述根据所述前景区域中的目标对象的动作检测数据,确定所述业务对象在所述视频图像中的展示位置,包括:判断所述前景区域中的目标对象的动作检测数据与预先设定的动作数据是否匹配;若匹配,从预先存储的动作数据与展示位置的对应关系中,获取所述预先设定的动作数据对应的目标展示位置作为所述业务对象在所述视频图像中的展示位置。
- 根据权利要求1-20任一所述的方法,其特征在于,所述获取待展示的所述业务对象包括:从预定存储位置获取所述业务对象,或者从视频源接收所述业务对象。
- 根据权利要求1-21任一所述的方法,其特征在于,所述业务对象包括以下任意一项或多项:目标视频、目标图像、包含有语义信息的目标特效;所述视频图像包括:静态图像或直播类视频图像。
- 根据权利要求22所述的方法,其特征在于,所述目标特效包括包含广告信息的以下任意一种或多种形式的特效:二维贴纸特效、三维特效、粒子特效。
- 根据权利要求23所述的方法,其特征在于,所述直播类视频图像的前景区域包括人物所在的区域。
- 根据权利要求22-24任一所述的方法,其特征在于,所述直播类视频图像的背景区域包括除了人物所在的区域之外的至少局部区域。
- 根据权利要求15-25任一所述的方法,其特征在于,所述目标对象的动作检测数据包括以下任意一项或多项:眨眼数据、张嘴数据、点头数据、摇头数据、亲吻数据、微笑数据、挥手数据、剪刀手数据、握拳数据、托手数据、大拇指数据、手枪姿势数据、OK手数据。
- 一种视频图像的处理装置,其特征在于,包括:获取模块,用于获取待处理的视频图像和待展示的业务对象;背景区域确定模块,用于确定所述视频图像的背景区域;绘制模块,用于采用计算机绘图方式在所述视频图像的背景区域绘制所述业务对象。
- 根据权利要求27所述的装置,其特征在于,所述绘制模块包括:展示位置确定模块,用于确定所述业务对象在所述视频图像中的展示位置;前景区域确定模块,用于确定所述视频图像的前景区域;判断模块,用于根据所述展示位置判断所述业务对象与所述前景区域是否存在重叠部分;展示模块,用于若所述业务对象与所述前景区域存在重叠部分,则采用计算机绘图方式,在所述背景区域绘制所述业务对象中除所述重叠部分之外的部分。
- 根据权利要求27所述的装置,其特征在于,所述绘制模块,用于采用计算机绘图方式在所述背景区域绘制所述业务对象,使所述业务对象覆盖所述背景区域的原始内容。
- 根据权利要求27-29任一所述的装置,其特征在于,所述背景区域确定模块,用于通过预先训练好的第一卷积神经网络模型确定所述视频图像的背景区域。
- 根据权利要求30所述的装置,其特征在于,还包括:第一训练模块,用于对所述第一卷积神经网络模型进行训练;所述第一训练模块包括:第一特征向量获取模块,用于利用所述第一卷积神经网络模型获取第一样本图像的第一特征向量,其中,所述第一样本图像为包含有前景标注信息和背景标注信息的样本图像;第一卷积处理模块,用于利用所述第一卷积神经网络模型对所述第一特征向量进行卷积处理,获取第一特征向量卷积结果;放大处理模块,用于对所述第一特征向量卷积结果进行放大处理;第一条件判断模块,用于判断放大后的所述第一特征向量卷积结果是否满足卷积收敛条件;第一执行模块,用于若放大后的所述第一特征向量卷积结果满足卷积收敛条件,则完成对所述第一卷积神经网络模型的训练;若放大后的所述第一特征向量卷积结果不满足卷积收敛条件,则根据放大后的所述第一特征向量卷积结果调整所述第一卷积神经网络模型的网络参数并根据调整后的所述第一卷积神经网络模型的网络参数对所述第一卷积网络模型进行迭代训练,直至迭代训练后的所述第一特征向量卷积结果满足所述卷积收敛条件。
- 根据权利要求31所述的装置,其特征在于,所述放大处理模块,用于通过对所述第一特征向量卷积结果进行双线性插值,放大所述第一特征向量卷积结果。
- 根据权利要求31或32所述的装置,其特征在于,所述放大处理模块,用于将所述第一特征向量卷积结果放大到放大后的第一特征向量卷积结果对应的图像的大小与原始图像大小一致。
- 根据权利要求31-33任一所述的装置,其特征在于,第一条件判断模块,用于使用设定的损失函数计算放大后的所述第一特征向量卷积结果和预定的标准输出特征向量的损失值;以及根据所述损失值判断放大后的所述第一特征向量卷积结果是否满足收敛条件。
- 根据权利要求31-34任一所述的装置,其特征在于,所述第一训练模块还包括:预测模块,用于获取测试样本图像,使用训练后的所述第一卷积神经网络模型对所述测试样本图像进行前背景区域的预测;检验模块,用于检验预测的前背景区域是否正确;再训练模块,用于若所述检验模块的检验结果不正确,使用所述测试样本图像对所述第一卷积神经网络模型进行再次训练。
- 根据权利要求35所述的装置,其特征在于,所述再训练模块,用于若所述检验模块的检验结果不正确,从所述测试样本图像中获取前背景区域预测不正确的样本图像;以及使用预测不正确的样本图像对所述第一卷积神经网络模型进行再次训练,其中,所述预测不正确的样本图像包含有前景信息和背景信息。
- 根据权利要求31-36任一所述的装置,其特征在于,所述第一训练模块还包括:视频流模块,用于在所述第一向量获取模块获取第一样本图像的第一特征向量之前,将包括多帧样本图像的视频流输入所述第一卷积神经网络模型。
- 根据权利要求37所述的装置,其特征在于,所述第一训练模块还包括:标注模块,用于在所述视频流模块将包括多帧样本图像的视频流输入所述第一卷积神经网络模型之前,确定所述视频流的多个关键帧的图像为样本图像,对所述样本图像进行前景区域和背景区域的标注。
- 根据权利要求30-38任一所述的装置,其特征在于,所述第一卷积神经网络模型包括全卷积神经网络模型。
- 根据权利要求28-39任一所述的装置,其特征在于,所述展示位置确定模块,用于按照设定规则确定所述业务对象在所述视频图像中的展示位置;其中,所述设定规则包括:预先设定的所述业务对象在所述视频图像的设定展示位置,或者,所述展示位置确定模块根据所述前景区域中的目标对象的动作检测数据,确定所述业务对象在所述视频图像中的展示位置。
- 根据权利要求40所述的装置,其特征在于,所述展示位置确定模块,用于根据所述前景区域中的目标对象的动作检测数据,通过预先训练好的第二卷积神经网络模型确定所述业务对象在所述视频图像中的展示位置。
- 根据权利要求41所述的装置,其特征在于,还包括:第二训练模块,用于对所述第二卷积神经网络模型进行训练;所述第二训练模块包括:第二特征向量获取模块,用于利用所述第二卷积神经网络模型获取第二样本图像的第一特征向量,其中,所述第二特征向量中包含有所述第二样本图像中的业务对象的位置信息和/或置信度信息,以及所述第二样本图像中目标对象的目标对象特征向量;第二卷积处理模块,用于利用所述第二卷积神经网络模型对所述第二特征向量进行卷积处理,获取第二特征向量卷积结果;第二条件判断模块,用于判断所述第二特征向量卷积结果中对应的业务对象的位置信息和/或置信度信息是否满足业务对象收敛条件,并判断所述第二特征向量卷积结果中的目标对象特征向量是否满足目标对象收敛条件;第二执行模块,用于若所述第二特征向量卷积结果中对应的业务对象的位置信息和/或置信度信息满足业务对象收敛条件,且所述第二特征向量卷积结果中的目标对象特征向量满足目标对象收敛条件,则完成对所述第二卷积神经网络模型的训练;否则,调整所述第二卷积神经网络模型的网络参数并根据调整后的所述第二卷积神经网络模型的网络参数对所述第二卷积神经网络模型进行迭代训练,直至迭代训练后的业务对象的位置信息和/或置信度信息和目标对象特征向量均满足相应的收敛条件。
- 根据权利要求40所述的装置,其特征在于,所述展示位置确定模块,用于根据所述前景区域中的目标对象的动作检测数据和所述业务对象的类型,确定所述业务对象在所述视频图像中的展示位置。
- 根据权利要求43所述的装置,其特征在于,所述展示位置确定模块包括:展示位置获得模块,用于根据所述前景区域中的目标对象的动作检测数据和所述业务对象的类型,获得所述业务对象在所述视频图像中的多个展示位置;展示位置选择模块,用于从所述多个展示位置中选择至少一个展示位置作为所述业务对象在所述视频图像中的展示位置。
- 根据权利要求40所述的装置,其特征在于,所述展示位置确定模块包括:数据判断模块,用于判断所述前景区域中的目标对象的动作检测数据与预先设定的动作数据是否匹配;位置获取模块,用于若所述前景区域中的目标对象的动作检测数据与预先设定的动作数据匹配,则从预先存储的动作数据与展示位置的对应关系中,获取所述预先设定的动作数据对应的目标展示位置作为所述业务对象在所述视频图像中的展示位置。
- 根据权利要求27-45任一所述的装置,其特征在于,所述获取模块,用于从预定存储位置获取所述业务对象,或者从视频源接收所述业务对象。
- 根据权利要求27-46任一所述的装置,其特征在于,所述业务对象包括以下任意一项或多项:目标视频、目标图像、包含有语义信息的目标特效;所述视频图像包括:静态图像或直播类视频图像。
- 根据权利要求47所述的装置,其特征在于,所述目标特效包括包含广告信息的以下任意一种 或多种形式的特效:二维贴纸特效、三维特效、粒子特效。
- 根据权利要求48所述的装置,其特征在于,所述直播类视频图像的前景区域包括人物所在的区域。
- 根据权利要求47-49任一所述的装置,其特征在于,所述直播类视频图像的背景区域包括除了人物所在的区域之外的至少局部区域。
- 根据权利要求40-50任一所述的装置,其特征在于,所述目标对象的动作检测数据包括以下任意一项或多项:眨眼数据、张嘴数据、点头数据、摇头数据、亲吻数据、微笑数据、挥手数据、剪刀手数据、握拳数据、托手数据、大拇指数据、手枪姿势数据、OK手数据。
- 一种电子设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求1-26任一所述的视频图像的处理方法对应的操作。
- 一种电子设备,其特征在于,包括:处理器和权利要求27-52任一所述的视频图像的处理装置;在处理器运行所述视频图像的处理装置时,权利要求26-51任一所述的视频图像的处理装置中的单元被运行。
- 一种计算机程序,包括计算机可读代码,其特征在于,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现权利要求1-26任一项所述的视频图像的处理方法中各步骤的指令。
- 一种计算机可读存储介质,用于存储计算机可读取的指令,其特征在于,所述指令被执行时实现权利要求1-26任一项所述的视频图像的处理方法中各步骤的操作。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/845,802 US10580179B2 (en) | 2016-08-19 | 2017-12-18 | Method and apparatus for processing video image and electronic device |
US16/709,551 US10776970B2 (en) | 2016-08-19 | 2019-12-10 | Method and apparatus for processing video image and computer readable medium |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610694814.9 | 2016-08-19 | ||
CN201610694601.6 | 2016-08-19 | ||
CN201610694601.6A CN107347125B (zh) | 2016-08-19 | 2016-08-19 | 视频图像的处理方法、装置和终端设备 |
CN201610697316.XA CN107347166B (zh) | 2016-08-19 | 2016-08-19 | 视频图像的处理方法、装置和终端设备 |
CN201610694814.9A CN107341805B (zh) | 2016-08-19 | 2016-08-19 | 图像前背景分割及网络模型训练、图像处理方法和装置 |
CN201610697316.X | 2016-08-19 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/845,802 Continuation US10580179B2 (en) | 2016-08-19 | 2017-12-18 | Method and apparatus for processing video image and electronic device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018033156A1 true WO2018033156A1 (zh) | 2018-02-22 |
Family
ID=61197301
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/098216 WO2018033156A1 (zh) | 2016-08-19 | 2017-08-21 | 视频图像的处理方法、装置和电子设备 |
Country Status (2)
Country | Link |
---|---|
US (2) | US10580179B2 (zh) |
WO (1) | WO2018033156A1 (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109145858A (zh) * | 2018-09-03 | 2019-01-04 | 哦加哦网络科技(上海)有限公司 | 无人货架视频审核系统及方法 |
CN110909506A (zh) * | 2018-09-10 | 2020-03-24 | 北京京东金融科技控股有限公司 | 平面设计物料生成的方法和装置 |
CN111479130A (zh) * | 2020-04-02 | 2020-07-31 | 腾讯科技(深圳)有限公司 | 一种视频定位方法、装置、电子设备和存储介质 |
CN111598902A (zh) * | 2020-05-20 | 2020-08-28 | 北京字节跳动网络技术有限公司 | 图像分割方法、装置、电子设备及计算机可读介质 |
CN113128304A (zh) * | 2019-12-31 | 2021-07-16 | 深圳云天励飞技术有限公司 | 图像处理的方法及电子设备 |
Families Citing this family (53)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10217001B2 (en) * | 2016-04-14 | 2019-02-26 | KickView Corporation | Video object data storage and processing system |
WO2018033137A1 (zh) * | 2016-08-19 | 2018-02-22 | 北京市商汤科技开发有限公司 | 在视频图像中展示业务对象的方法、装置和电子设备 |
WO2018128741A1 (en) * | 2017-01-06 | 2018-07-12 | Board Of Regents, The University Of Texas System | Segmenting generic foreground objects in images and videos |
US10586111B2 (en) * | 2017-01-13 | 2020-03-10 | Google Llc | Using machine learning to detect which part of the screen includes embedded frames of an uploaded video |
EP3576026A4 (en) * | 2017-11-16 | 2020-12-16 | Sony Corporation | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING PROCESS AND PROGRAM |
CN108154103A (zh) * | 2017-12-21 | 2018-06-12 | 百度在线网络技术(北京)有限公司 | 检测推广信息显著性的方法、装置、设备和计算机存储介质 |
CN110163640B (zh) * | 2018-02-12 | 2023-12-08 | 华为技术有限公司 | 一种在视频中植入广告的方法及计算机设备 |
CN108898587A (zh) * | 2018-06-19 | 2018-11-27 | Oppo广东移动通信有限公司 | 图片处理方法、图片处理装置及终端设备 |
US11461698B2 (en) | 2018-07-09 | 2022-10-04 | Athene Noctua LLC | Integrated machine learning audiovisual application for a defined subject |
CN109086690B (zh) * | 2018-07-13 | 2021-06-22 | 北京旷视科技有限公司 | 图像特征提取方法、目标识别方法及对应装置 |
CN110855875A (zh) * | 2018-08-20 | 2020-02-28 | 珠海格力电器股份有限公司 | 获取图像的背景信息的方法及装置 |
CN110969054B (zh) * | 2018-09-29 | 2023-09-26 | 杭州海康威视数字技术股份有限公司 | 一种集装箱箱号识别方法及装置 |
CN109583443B (zh) * | 2018-11-15 | 2022-10-18 | 四川长虹电器股份有限公司 | 一种基于文字识别的视频内容判断方法 |
CN111260548B (zh) * | 2018-11-30 | 2023-07-21 | 浙江宇视科技有限公司 | 基于深度学习的贴图方法及装置 |
JP7167668B2 (ja) * | 2018-11-30 | 2022-11-09 | コニカミノルタ株式会社 | 学習方法、学習装置、プログラムおよび記録媒体 |
CN109816744B (zh) * | 2018-12-20 | 2023-06-27 | 武汉西山艺创文化有限公司 | 一种基于神经网络的二维特效图片生成方法和装置 |
US10977501B2 (en) * | 2018-12-21 | 2021-04-13 | Waymo Llc | Object classification using extra-regional context |
CN111462160A (zh) * | 2019-01-18 | 2020-07-28 | 北京京东尚科信息技术有限公司 | 图像处理方法、装置及存储介质 |
US10325371B1 (en) * | 2019-01-22 | 2019-06-18 | StradVision, Inc. | Method and device for segmenting image to be used for surveillance using weighted convolution filters for respective grid cells by converting modes according to classes of areas to satisfy level 4 of autonomous vehicle, and testing method and testing device using the same |
US10373317B1 (en) * | 2019-01-22 | 2019-08-06 | StradVision, Inc. | Learning method and learning device for attention-driven image segmentation by using at least one adaptive loss weight map to be used for updating HD maps required to satisfy level 4 of autonomous vehicles and testing method and testing device using the same |
US10339424B1 (en) * | 2019-01-22 | 2019-07-02 | StradVision, Inc. | Method and device of neural network operations using a grid generator for converting modes according to classes of areas to satisfy level 4 of autonomous vehicles |
US10410352B1 (en) * | 2019-01-25 | 2019-09-10 | StradVision, Inc. | Learning method and learning device for improving segmentation performance to be used for detecting events including pedestrian event, vehicle event, falling event and fallen event using edge loss and test method and test device using the same |
US10402977B1 (en) * | 2019-01-25 | 2019-09-03 | StradVision, Inc. | Learning method and learning device for improving segmentation performance in road obstacle detection required to satisfy level 4 and level 5 of autonomous vehicles using laplacian pyramid network and testing method and testing device using the same |
US10474543B1 (en) * | 2019-01-28 | 2019-11-12 | Stradivision, INC. | Method and device for economizing computing resources to be used during a process of verification of convolutional parameters using test pattern to enhance fault tolerance and fluctuation robustness in extreme situations |
CN111741327B (zh) * | 2019-03-25 | 2022-09-02 | 华为技术有限公司 | 一种媒体处理方法及媒体服务器 |
CN110276346B (zh) * | 2019-06-06 | 2023-10-10 | 北京字节跳动网络技术有限公司 | 目标区域识别模型训练方法、装置和计算机可读存储介质 |
US20210152751A1 (en) * | 2019-11-19 | 2021-05-20 | Tencent Technology (Shenzhen) Company Limited | Model training method, media information synthesis method, and related apparatuses |
US20230010199A1 (en) * | 2019-12-16 | 2023-01-12 | Nec Corporation | Foreground extraction apparatus, foreground extraction method, and recording medium |
US11687778B2 (en) | 2020-01-06 | 2023-06-27 | The Research Foundation For The State University Of New York | Fakecatcher: detection of synthetic portrait videos using biological signals |
US10817648B1 (en) * | 2020-01-30 | 2020-10-27 | Leap Tools Inc. | Systems and methods for product visualization using a single-page application |
US11194952B2 (en) | 2020-01-30 | 2021-12-07 | Leap Tools, Inc. | Systems and methods for product visualization using a single-page application |
CN113222995A (zh) * | 2020-02-06 | 2021-08-06 | 阿里巴巴集团控股有限公司 | 视频处理方法、视频处理装置及电子设备 |
CN113449538A (zh) * | 2020-03-24 | 2021-09-28 | 顺丰科技有限公司 | 视觉模型的训练方法、装置、设备及存储介质 |
CN111586319B (zh) | 2020-05-27 | 2024-04-09 | 北京百度网讯科技有限公司 | 视频的处理方法和装置 |
CN111833372B (zh) * | 2020-07-23 | 2024-07-02 | 浙江大华技术股份有限公司 | 一种前景目标提取方法及装置 |
US20230291926A1 (en) * | 2020-08-06 | 2023-09-14 | Hyundai Motor Company | Video encoding and decoding using deep learning based inter prediction |
CN111970527B (zh) * | 2020-08-18 | 2022-03-29 | 广州虎牙科技有限公司 | 直播数据处理方法及装置 |
WO2022040574A1 (en) | 2020-08-21 | 2022-02-24 | Beam, Inc. | Integrating overlaid digital content into displayed data via graphics processing circuitry |
CN112153400B (zh) * | 2020-09-22 | 2022-12-06 | 北京达佳互联信息技术有限公司 | 直播互动方法、装置、电子设备及存储介质 |
CN112188215B (zh) * | 2020-09-23 | 2022-02-22 | 腾讯科技(深圳)有限公司 | 视频解码方法、装置、设备及存储介质 |
CN112613891B (zh) * | 2020-12-24 | 2023-10-03 | 支付宝(杭州)信息技术有限公司 | 一种店铺注册信息验证方法、装置及设备 |
US11625880B2 (en) * | 2021-02-09 | 2023-04-11 | Electronic Arts Inc. | Machine-learning models for tagging video frames |
US11481933B1 (en) | 2021-04-08 | 2022-10-25 | Mobeus Industries, Inc. | Determining a change in position of displayed digital content in subsequent frames via graphics processing circuitry |
US11477020B1 (en) | 2021-04-30 | 2022-10-18 | Mobeus Industries, Inc. | Generating a secure random number by determining a change in parameters of digital content in subsequent frames via graphics processing circuitry |
US11682101B2 (en) | 2021-04-30 | 2023-06-20 | Mobeus Industries, Inc. | Overlaying displayed digital content transmitted over a communication network via graphics processing circuitry using a frame buffer |
US11601276B2 (en) | 2021-04-30 | 2023-03-07 | Mobeus Industries, Inc. | Integrating and detecting visual data security token in displayed data via graphics processing circuitry using a frame buffer |
US11483156B1 (en) | 2021-04-30 | 2022-10-25 | Mobeus Industries, Inc. | Integrating digital content into displayed data on an application layer via processing circuitry of a server |
US11475610B1 (en) | 2021-04-30 | 2022-10-18 | Mobeus Industries, Inc. | Controlling interactivity of digital content overlaid onto displayed data via graphics processing circuitry using a frame buffer |
US11586835B2 (en) | 2021-04-30 | 2023-02-21 | Mobeus Industries, Inc. | Integrating overlaid textual digital content into displayed data via graphics processing circuitry using a frame buffer |
CN114463730B (zh) * | 2021-07-15 | 2023-07-14 | 荣耀终端有限公司 | 一种页面识别方法及终端设备 |
US11562153B1 (en) * | 2021-07-16 | 2023-01-24 | Mobeus Industries, Inc. | Systems and methods for recognizability of objects in a multi-layer display |
CN114187392B (zh) * | 2021-10-29 | 2024-04-19 | 北京百度网讯科技有限公司 | 虚拟偶像的生成方法、装置和电子设备 |
CN114185429B (zh) * | 2021-11-11 | 2024-03-26 | 杭州易现先进科技有限公司 | 手势关键点定位或姿态估计的方法、电子装置和存储介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1728781A (zh) * | 2004-07-30 | 2006-02-01 | 新加坡科技研究局 | 将附加内容插入视频的方法和装置 |
CN103024479A (zh) * | 2012-12-17 | 2013-04-03 | 深圳先进技术研究院 | 视频内容中自适应投放广告的方法及系统 |
CN104463865A (zh) * | 2014-12-05 | 2015-03-25 | 浙江大学 | 一种人像分割方法 |
CN104486680A (zh) * | 2014-12-19 | 2015-04-01 | 珠海全志科技股份有限公司 | 基于视频的广告推送方法及系统 |
CN104837060A (zh) * | 2014-02-07 | 2015-08-12 | 索尼电脑娱乐美国公司 | 确定广告和媒体中的其它插入物的位置和时间安排的方案 |
US20150249847A1 (en) * | 2014-02-28 | 2015-09-03 | Surewaves Mediatech Private Limited | System and method for providing integrated advertising in a television distribution network |
CN105578255A (zh) * | 2015-12-31 | 2016-05-11 | 合一网络技术(北京)有限公司 | 一种广告播放方法及装置 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080033801A1 (en) * | 2006-07-14 | 2008-02-07 | Vulano Group, Inc. | System for dynamic personalized object placement in a multi-media program |
US9661239B2 (en) * | 2013-04-17 | 2017-05-23 | Digital Makeup Ltd. | System and method for online processing of video images in real time |
WO2015047246A1 (en) * | 2013-09-25 | 2015-04-02 | Intel Corporation | Dynamic product placement in media content |
US20150304698A1 (en) * | 2014-04-21 | 2015-10-22 | Eyesee, Lda | Dynamic Interactive Advertisement Insertion |
CN104766229A (zh) | 2015-04-22 | 2015-07-08 | 合一信息技术(北京)有限公司 | 植入式广告投放方法 |
US10242285B2 (en) * | 2015-07-20 | 2019-03-26 | Kofax, Inc. | Iterative recognition-guided thresholding and data extraction |
US10198635B2 (en) * | 2016-01-19 | 2019-02-05 | Fuji Xerox Co., Ltd. | Systems and methods for associating an image with a business venue by using visually-relevant and business-aware semantics |
DE112017001311T5 (de) * | 2016-03-17 | 2018-11-29 | Avigilon Corporation | System und Verfahren zum Trainieren eines Objektklassifikators durch maschinelles Lernen |
CN105872838A (zh) | 2016-04-28 | 2016-08-17 | 徐文波 | 即时视频的媒体特效发送方法和装置 |
EP3549063A4 (en) * | 2016-12-05 | 2020-06-24 | Avigilon Corporation | APPEARANCE SEARCH SYSTEM AND METHOD |
-
2017
- 2017-08-21 WO PCT/CN2017/098216 patent/WO2018033156A1/zh active Application Filing
- 2017-12-18 US US15/845,802 patent/US10580179B2/en active Active
-
2019
- 2019-12-10 US US16/709,551 patent/US10776970B2/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1728781A (zh) * | 2004-07-30 | 2006-02-01 | 新加坡科技研究局 | 将附加内容插入视频的方法和装置 |
CN103024479A (zh) * | 2012-12-17 | 2013-04-03 | 深圳先进技术研究院 | 视频内容中自适应投放广告的方法及系统 |
CN104837060A (zh) * | 2014-02-07 | 2015-08-12 | 索尼电脑娱乐美国公司 | 确定广告和媒体中的其它插入物的位置和时间安排的方案 |
US20150249847A1 (en) * | 2014-02-28 | 2015-09-03 | Surewaves Mediatech Private Limited | System and method for providing integrated advertising in a television distribution network |
CN104463865A (zh) * | 2014-12-05 | 2015-03-25 | 浙江大学 | 一种人像分割方法 |
CN104486680A (zh) * | 2014-12-19 | 2015-04-01 | 珠海全志科技股份有限公司 | 基于视频的广告推送方法及系统 |
CN105578255A (zh) * | 2015-12-31 | 2016-05-11 | 合一网络技术(北京)有限公司 | 一种广告播放方法及装置 |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109145858A (zh) * | 2018-09-03 | 2019-01-04 | 哦加哦网络科技(上海)有限公司 | 无人货架视频审核系统及方法 |
CN110909506A (zh) * | 2018-09-10 | 2020-03-24 | 北京京东金融科技控股有限公司 | 平面设计物料生成的方法和装置 |
CN113128304A (zh) * | 2019-12-31 | 2021-07-16 | 深圳云天励飞技术有限公司 | 图像处理的方法及电子设备 |
CN113128304B (zh) * | 2019-12-31 | 2024-01-05 | 深圳云天励飞技术有限公司 | 图像处理的方法及电子设备 |
CN111479130A (zh) * | 2020-04-02 | 2020-07-31 | 腾讯科技(深圳)有限公司 | 一种视频定位方法、装置、电子设备和存储介质 |
CN111479130B (zh) * | 2020-04-02 | 2023-09-26 | 腾讯科技(深圳)有限公司 | 一种视频定位方法、装置、电子设备和存储介质 |
CN111598902A (zh) * | 2020-05-20 | 2020-08-28 | 北京字节跳动网络技术有限公司 | 图像分割方法、装置、电子设备及计算机可读介质 |
CN111598902B (zh) * | 2020-05-20 | 2023-05-30 | 抖音视界有限公司 | 图像分割方法、装置、电子设备及计算机可读介质 |
Also Published As
Publication number | Publication date |
---|---|
US20200111241A1 (en) | 2020-04-09 |
US10580179B2 (en) | 2020-03-03 |
US20180122114A1 (en) | 2018-05-03 |
US10776970B2 (en) | 2020-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018033156A1 (zh) | 视频图像的处理方法、装置和电子设备 | |
CN107347166B (zh) | 视频图像的处理方法、装置和终端设备 | |
US11595737B2 (en) | Method for embedding advertisement in video and computer device | |
US11182591B2 (en) | Methods and apparatuses for detecting face, and electronic devices | |
WO2018033137A1 (zh) | 在视频图像中展示业务对象的方法、装置和电子设备 | |
WO2018033154A1 (zh) | 手势控制方法、装置和电子设备 | |
WO2018033155A1 (zh) | 视频图像的处理方法、装置和电子设备 | |
WO2018033143A1 (zh) | 视频图像的处理方法、装置和电子设备 | |
US10902262B2 (en) | Vision intelligence management for electronic devices | |
US11044295B2 (en) | Data processing method, apparatus and electronic device | |
WO2019011249A1 (zh) | 一种图像中物体姿态的确定方法、装置、设备及存储介质 | |
US10575067B2 (en) | Context based augmented advertisement | |
JP2020524861A (ja) | セマンティックセグメンテーションモデルの訓練方法および装置、電子機器、ならびに記憶媒体 | |
CN107347125B (zh) | 视频图像的处理方法、装置和终端设备 | |
JP7030078B2 (ja) | 端末装置に適用される情報生成方法および装置 | |
WO2018214869A1 (zh) | 业务对象推荐方法、装置、电子设备和存储介质 | |
US11663463B2 (en) | Center-biased machine learning techniques to determine saliency in digital images | |
US11978216B2 (en) | Patch-based image matting using deep learning | |
CN107770602B (zh) | 视频图像处理方法、装置和终端设备 | |
CN107770603B (zh) | 视频图像处理方法、装置和终端设备 | |
CN107770580B (zh) | 视频图像处理方法、装置和终端设备 | |
WO2020200082A1 (zh) | 直播互动方法、装置、直播系统及电子设备 | |
US11631221B2 (en) | Augmenting a video flux of a real scene | |
US20240020941A1 (en) | Multi-camera domain adaptive object detection system and detection method thereof | |
WO2024104286A1 (zh) | 一种视频处理方法、装置、电子设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17841128 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17841128 Country of ref document: EP Kind code of ref document: A1 |