CN114666516A

CN114666516A - Display device and streaming media file synthesis method

Info

Publication number: CN114666516A
Application number: CN202210144774.6A
Authority: CN
Inventors: 朱宗花; 李斌; 马立凯; 孙永瑞
Original assignee: Hisense Visual Technology Co Ltd
Current assignee: Hisense Visual Technology Co Ltd
Priority date: 2022-02-17
Filing date: 2022-02-17
Publication date: 2022-06-24

Abstract

The application provides a display device and a streaming media file synthesis method, and relates to the technical field of data processing. The display device includes: a user interface configured to receive a trigger operation input by a user; the player is configured to play the first streaming media file and call back data corresponding to the first streaming media file; a detector configured to: recording the second streaming media file to acquire data corresponding to the second streaming media file; a processor configured to generate audio elementary stream data and video elementary stream data from data corresponding to the first streaming media file and data corresponding to the second streaming media file; a packager configured to package the audio elementary stream data and the video elementary stream data to synthesize a third streaming media file. The method and the device can synthesize a new streaming media file according to the played streaming media file and the recorded streaming media file while playing the streaming media file and recording the streaming media file.

Description

Display device and streaming media file synthesis method

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a display device and a method for synthesizing a streaming media file.

Background

With the development of audio and video technology, especially the rise of short video platforms in recent years, more and more users will participate in video production, for example: users often have a need to compose videos and distribute them over short video platforms.

In the related art, when a user wants to record a streaming media file and synthesize the recorded streaming media file and an existing streaming media file into a new video file, the user needs to record the streaming media file to obtain, and then synthesize the recorded streaming media file and the existing streaming media file as a material to generate a new streaming media file, which cannot record and synthesize a multimedia file synchronously. For example: when a user wants to record a video and selects a designated audio as background music, the user needs to record the video first and then selects the designated audio to synthesize a new video, so that the user cannot select the background music before recording the video, play the background music, record the video and synthesize the new video. For another example: when a user wants to record a segment of audio and use the recorded audio as background music of a designated video, the user needs to record the audio first and then select the designated video to synthesize a new video, and cannot select the video before recording the audio, play the video, record the audio and synthesize the new video.

Disclosure of Invention

In order to solve the above technical problem or at least partially solve the above technical problem, the present application provides a display device and a method for synthesizing a streaming media file, which can synthesize a new streaming media file according to a played streaming media file and a recorded streaming media file while playing the streaming media file and recording the streaming media file.

In a first aspect, the present application provides a display device comprising:

a user interface configured to: receiving a trigger operation input by a user, wherein the trigger operation is used for triggering playing of a first streaming media file, recording of a second streaming media file and synthesis of a third streaming media file based on the first streaming media file and the second streaming media file;

a player configured to: playing the first streaming media file, and calling back data corresponding to the first streaming media file in the process of playing the first streaming media file;

a detector configured to: recording the second streaming media file to acquire data corresponding to the second streaming media file;

a processor configured to: generating audio basic stream data and video basic stream data according to the data corresponding to the first streaming media file and the data corresponding to the second streaming media file;

a wrapper configured to: and encapsulating the audio basic flow data and the video basic flow data to synthesize the third streaming media file.

In some embodiments, the player is configured to: acquiring the first streaming media file, decapsulating the first streaming media file to acquire basic stream data corresponding to the first streaming media file, caching the basic stream data corresponding to the first streaming media file into a cache queue, sequentially decoding data in the cache queue to acquire playing data corresponding to the first streaming media file, playing the first streaming media file according to the playing data corresponding to the first streaming media file, and calling back the playing data corresponding to the first streaming media file;

the processor configured to: and encoding the playing data corresponding to the first streaming media file to generate the audio basic stream data or the video basic stream data.

In some embodiments, the player is configured to: acquiring the first streaming media file, decapsulating the first streaming media file to acquire basic stream data corresponding to the first streaming media file, caching the basic stream data corresponding to the first streaming media file into a cache queue, sequentially reading and decoding data in the cache queue through a decoding module to acquire playing data corresponding to the first streaming media file, playing the first streaming media file according to the playing data corresponding to the first streaming media file, and recalling the basic stream data corresponding to the first streaming media file read by the decoding module;

the processor configured to: and determining the elementary stream data corresponding to the first streaming media file as the audio elementary stream data or the video elementary stream data.

In some embodiments, the wrapper is configured to:

periodically acquiring a player timestamp by taking a preset duration as a period;

determining whether the difference value between the timestamp of the elementary stream data corresponding to the first stream media file called back and the player timestamp is smaller than a preset value;

and if so, packaging the video elementary stream data which is called back.

In some embodiments, the user interface is further configured to: receiving a starting position and an ending position of a target segment input by a user; the target segment is a segment used for synthesizing the third streaming media file in the first streaming media file;

the player is further configured to determine a target position, wherein the target position is a position of a key frame which is located before the starting position and is closest to the starting position;

the detector configured to: when the playing position of the first streaming media file exceeds the target position, starting to record the second streaming media file; and stopping recording the second streaming media file when the playing position of the first streaming media file exceeds the termination position.

In some embodiments, the player is further configured to:

and when the playing position of the first streaming media file exceeds the termination position, stopping playing the first streaming media file.

In some embodiments, the first streaming media file is a video file; the second streaming media file is an audio file;

the detector configured to: and when the rendering of the first video frame of the first streaming media file is finished, starting to record Pulse Code Modulation (PCM) data corresponding to the audio file.

In some embodiments, the first streaming media file comprises: the second streaming media file is a second audio file;

the player configured to: calling back basic stream data corresponding to the video file and PCM data corresponding to the first audio file in the process of playing the first streaming media file;

the processor configured to: and generating an audio data stream according to the PCM data corresponding to the first audio file and the data corresponding to the second streaming media file, and determining elementary stream data corresponding to the video file as the video elementary stream data.

In some embodiments, the player is further configured to: caching PCM data corresponding to the first audio file to a first data queue;

the detector further configured to: caching PCM data corresponding to the second streaming media file to a second data queue;

the processor configured to: mixing the PCM data in the first data queue and the second data queue based on a preset rule to generate mixed audio data, and encoding the mixed audio data to generate the audio basic stream data.

In a second aspect, the present application provides a method for synthesizing a streaming media file, including:

receiving a trigger operation input by a user, wherein the trigger operation is used for triggering playing of a first streaming media file, recording of a second streaming media file and synthesis of a third streaming media file based on the first streaming media file and the second streaming media file;

playing the first streaming media file, and calling back data corresponding to the first streaming media file in the process of playing the first streaming media file;

recording the second streaming media file to acquire data corresponding to the second streaming media file;

generating audio basic flow data and video basic flow data according to the data corresponding to the first streaming media file and the data corresponding to the second streaming media file;

and encapsulating the audio basic flow data and the video basic flow data to synthesize the third streaming media file.

In some embodiments, the playing the first streaming media file and calling back data corresponding to the first streaming media file in the process of playing the first streaming media file includes:

acquiring the first streaming media file, decapsulating the first streaming media file to acquire basic stream data corresponding to the first streaming media file, caching the basic stream data corresponding to the first streaming media file into a cache queue, sequentially decoding data in the cache queue to acquire playing data corresponding to the first streaming media file, playing the first streaming media file according to the playing data corresponding to the first streaming media file, and calling back the playing data corresponding to the first streaming media file;

generating audio elementary stream data and video elementary stream data according to the data corresponding to the first streaming media file and the data corresponding to the second streaming media file, including:

and encoding the playing data corresponding to the first streaming media file to generate the audio basic stream data or the video basic stream data.

acquiring the first streaming media file, decapsulating the first streaming media file to acquire basic stream data corresponding to the first streaming media file, caching the basic stream data corresponding to the first streaming media file into a cache queue, sequentially reading and decoding data in the cache queue through a decoding module to acquire playing data corresponding to the first streaming media file, playing the first streaming media file according to the playing data corresponding to the first streaming media file, and calling back the basic stream data corresponding to the first streaming media file read by the decoding module;

generating audio elementary stream data and video elementary stream data according to the data corresponding to the first streaming media file and the data corresponding to the second streaming media file, including: a

And determining the elementary stream data corresponding to the first streaming media file as the audio elementary stream data or the video elementary stream data.

In some embodiments, said encapsulating said audio elementary stream data and said video elementary stream data comprises:

periodically acquiring a player timestamp by taking a preset time length as a period;

determining whether the difference value between the timestamp of the elementary stream data corresponding to the first stream media file and the player timestamp is smaller than a preset value;

and if so, packaging the video elementary stream data called back.

In some embodiments, the method further comprises:

receiving a starting position and an ending position of a target segment input by a user; the target segment is a segment used for synthesizing the third streaming media file in the first streaming media file;

determining a target position, wherein the target position is the position of a key frame which is positioned before the starting position and is closest to the starting position;

when the playing position of the first streaming media file exceeds the target position, starting to record the second streaming media file;

and stopping recording the second streaming media file when the playing position of the first streaming media file exceeds the termination position.

In some embodiments, the method further comprises:

the recording the second streaming media file to obtain data corresponding to the second streaming media file includes:

and when the rendering of the first video frame of the video file is finished, beginning to record Pulse Code Modulation (PCM) data corresponding to the audio file.

the calling back the data corresponding to the first streaming media file in the process of playing the first streaming media file comprises:

calling back basic stream data corresponding to the video file and PCM data corresponding to the first audio file in the process of playing the first streaming media file;

and generating an audio data stream according to the PCM data corresponding to the first audio file and the data corresponding to the second streaming media file, and determining elementary stream data corresponding to the video file as the video elementary stream data.

In some embodiments, the generating an audio data stream according to PCM data corresponding to the first audio file and data corresponding to the second streaming media file comprises:

caching PCM data corresponding to the first audio file into a first data queue

Caching PCM data corresponding to the second audio file to a second data queue;

mixing the PCM data in the first data queue and the second data queue based on a preset rule to generate mixed audio data;

and encoding the mixed audio data to generate the audio elementary stream data.

In a third aspect, the present application provides a computer-readable storage medium comprising: the computer-readable storage medium stores thereon a computer program which, when executed by a processor, implements the streaming media file composition method as shown in the second aspect.

In a fourth aspect, the present application provides a computer program product comprising: the computer program product, when run on a computer, causes the computer to implement a method of streaming media file composition as shown in the second aspect.

When receiving, through a user interface, a trigger operation for triggering playing of a first streaming media file, recording of a second streaming media file, and composition of a third streaming media file based on the first streaming media file and the second streaming media file, the display device provided in the embodiment of the application plays the first streaming media file through a player, recalls data corresponding to the first streaming media file in a playing process of the first streaming media file, records the second streaming media file through a detector, acquires data corresponding to the second streaming media file, generates, through a processor, audio elementary stream data and video elementary stream data according to the data corresponding to the first streaming media file and the data corresponding to the second streaming media file, and encapsulates the audio elementary stream data and the video elementary stream data through an encapsulator, to compose the third streaming media file. The display device provided by the embodiment of the application can generate audio elementary stream data and video elementary stream data according to data corresponding to the first streaming media file and data corresponding to the second streaming media file in the process of playing the first streaming media file and recording the second streaming media file, and encapsulate the audio elementary stream data and the video elementary stream data to synthesize the third streaming media file, so that the display device provided by the embodiment of the application can synthesize a played streaming media file and a recorded streaming media file in real time when playing the streaming media file and recording the streaming media file, and synthesize a new streaming media file according to the played streaming media file and the recorded streaming media file while playing the streaming media file and recording the streaming media file.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the technical solutions in the embodiments or related technologies of the present application, the drawings needed to be used in the description of the embodiments or related technologies will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without any creative effort.

Fig. 1 is a schematic view of an operation scenario between a display device and a control apparatus according to one or more embodiments of the present application;

fig. 2 is a block diagram of a hardware configuration of the control apparatus 100 according to one or more embodiments of the present application;

fig. 3 is a block diagram of a hardware configuration of a display device 200 according to one or more embodiments of the present application;

FIG. 4 is an architecture diagram of a player according to one or more embodiments of the present application;

FIG. 5 is a diagram illustrating a software configuration of a display device 200 according to one or more embodiments of the present application;

FIG. 6 is a flowchart illustrating steps of a method for synthesizing a multimedia file;

FIG. 7 is a second flowchart illustrating steps of a multimedia file synthesizing method according to the present application;

FIG. 8 is a third flowchart illustrating steps of a method for synthesizing a multimedia file;

FIG. 9 is a flowchart illustrating a fourth step of the method for synthesizing a multimedia file;

FIG. 10 is a flowchart illustrating a fifth step of a method for synthesizing a multimedia file;

FIG. 11 is a flowchart illustrating a sixth step of a method for synthesizing a multimedia file;

FIG. 12 is a seventh flowchart illustrating steps of a method for synthesizing a multimedia file according to the present application;

FIG. 13 is an eighth flowchart illustrating steps of a method for synthesizing a multimedia file;

FIG. 14 is a flow chart illustrating the steps of a method for synthesizing a multimedia document;

fig. 15 is an architecture diagram of a multimedia file composition method provided in the present application.

Detailed Description

In order that the above-mentioned objects, features and advantages of the present application may be more clearly understood, the solution of the present application will be further described below. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways than those described herein; it is to be understood that the embodiments described in this specification are only some embodiments of the present application and not all embodiments.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.

The terms "comprises" and "comprising," and any variations thereof, in this application are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements expressly listed but may include other elements not expressly listed or inherent to such product or apparatus.

In the related art, the multimedia file cannot be synchronously recorded and synthesized, but the multimedia file needs to be recorded first, and then the recorded multimedia file is synthesized with the existing multimedia file. In order to solve the above problem, an embodiment of the present application provides a display device and a multimedia file synthesizing method, where when receiving a trigger operation for triggering playing of a first streaming media file, recording of a second streaming media file, and synthesis of a third streaming media file based on the first streaming media file and the second streaming media file through a user interface, the display device plays the first streaming media file through a player, recalls data corresponding to the first streaming media file in a process of playing the first streaming media file, records the second streaming media file through a detector, acquires data corresponding to the second streaming media file, and generates audio elementary stream data and video elementary stream data according to the data corresponding to the first streaming media file and the data corresponding to the second streaming media file through a processor, and encapsulating, by an encapsulator, the audio elementary stream data and the video elementary stream data to synthesize the third streaming media file. The display device provided by the embodiment of the application can generate audio elementary stream data and video elementary stream data according to data corresponding to the first streaming media file and data corresponding to the second streaming media file in the process of playing the first streaming media file and recording the second streaming media file, and encapsulate the audio elementary stream data and the video elementary stream data to synthesize the third streaming media file, so that the display device provided by the embodiment of the application can synthesize a played streaming media file and a recorded streaming media file in real time when playing the streaming media file and recording the streaming media file, and synthesize a new streaming media file according to the played streaming media file and the recorded streaming media file while playing the streaming media file and recording the streaming media file.

The display device provided by the embodiment of the present application may have various implementation forms, and for example, the display device may be a television, a smart television, a laser projection device, a display (monitor), an electronic whiteboard (electronic whiteboard), an electronic desktop (electronic table), a mobile phone, and the like. Fig. 2 is a specific embodiment of the display device of the present application.

Fig. 1 is a schematic diagram of an operation scenario between a display device and a control device according to one or more embodiments of the present application, as shown in fig. 1, a user may operate the display device 200 through the control device 100 and/or a mobile terminal 300. The control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes infrared protocol communication, bluetooth protocol communication, wireless or other wired method to control the display device 200. The user may input a user command through a key on a remote controller, voice input, control panel input, etc. to control the display apparatus 200. In some embodiments, mobile terminals, tablets, computers, laptops, and other smart devices may also be used to control the display device 200.

In some embodiments, the mobile terminal 300 may install a software application for the display device 200, implement connection communication through a network communication protocol, and implement the purpose of one-to-one control operation and data communication. The audio and video contents displayed on the mobile terminal 300 can also be transmitted to the display device 200, so that the display device 200 with the synchronous display function can also perform data communication with the server 400 through multiple communication modes. The display device 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display apparatus 200. The display device 200 may be a liquid crystal display, an OLED display, a projection display device. The display apparatus 200 may additionally provide an intelligent network television function providing a computer support function in addition to the broadcast receiving television function.

Fig. 2 exemplarily shows a block diagram of a configuration of the control apparatus 100 according to an exemplary embodiment. As shown in fig. 2, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface 140, a memory, and a power supply. The control apparatus 100 may receive an input operation instruction from a user and convert the operation instruction into an instruction recognizable and responsive by the display device 200, serving as an interaction intermediary between the user and the display device 200. The communication interface 130 is used for communicating with the outside, and includes at least one of a WIFI chip, a bluetooth module, NFC, or an alternative module. The user input/output interface 140 includes at least one of a microphone, a touch pad, a sensor, a key, or an alternative module.

Fig. 3 shows a hardware configuration block diagram of the display apparatus 200 according to an exemplary embodiment. The display apparatus 200 as shown in fig. 4 includes at least one of a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, and a user interface 280. The controller includes a central processor, a video processor, an audio processor, a graphic processor, a RAM, a ROM, and first to nth interfaces for input/output. The display 260 may be at least one of a liquid crystal display, an OLED display, a touch display, and a projection display, and may also be a projection device and a projection screen. The tuner demodulator 210 receives a broadcast television signal through a wired or wireless reception manner, and demodulates an audio/video signal, such as an EPG data signal, from a plurality of wireless or wired broadcast television signals. The detector 230 is used to collect signals of an external environment or interaction with the outside. The controller 250 and the tuner-demodulator 210 may be located in different separate devices, that is, the tuner-demodulator 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box.

In some embodiments, the display device is a terminal device with a display function, such as a television, a mobile phone, a computer, a learning machine, and the like.

In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in memory. The controller 250 controls the overall operation of the display apparatus 200. A user may input a user command on a Graphical User Interface (GUI) displayed on the display 260, and the user input interface receives the user input command through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.

An output interface (display 260, and/or audio output interface 270) configured to output user interaction information;

a communicator 220 for communicating with the server 400 or other devices.

In some embodiments, the user interface 280 can receive a trigger operation for triggering the playing of a first streaming media file, the recording of a second streaming media file, and the composition of a third streaming media file based on the first streaming media file and the second streaming media file. The player (when the first multimedia file is a video file, the player is the display 260; when the first multimedia file is an audio file, the player is the audio output interface 270; when the first multimedia file includes both a video file and an audio file, the player includes the display 260 and the audio output interface 270) can play the first streaming media file, and call back data corresponding to the first streaming media file in the process of playing the first streaming media file; the detector 230 (specifically, an image collector when the second multimedia file is a video file; a sound collector when the second multimedia file is an audio file) can record the second streaming media file to obtain data corresponding to the second streaming media file (image data when the second multimedia file is a video file; pulse code modulation data when the second multimedia file is an audio file); the controller 250, which includes a video processor and an audio processor, can generate audio elementary stream data and video elementary stream data and encapsulate the audio elementary stream data and the video elementary stream data according to the data corresponding to the first streaming media file and the data corresponding to the second streaming media file to synthesize the third streaming media file.

Referring to fig. 4, in some embodiments, the implementation of playing the first streaming media file by the player (the display 260 or the audio output interface 270) includes: the resource loading module 41 obtains the first streaming media file from a local storage space or a server, and inputs the first streaming media file into the decapsulation module 42; the decapsulation module 42 decapsulates the first streaming media file to obtain basic stream data corresponding to the first streaming media file, and buffers the basic stream data corresponding to the first streaming media file into the buffer queue 43, the decoding module 44 sequentially reads data in the buffer queue 43, and decodes the read data to obtain playing data corresponding to the first streaming media file (when the first multimedia file is a video file, the playing data is image data, and when the first multimedia file is an audio file, the playing data is pulse code modulation data), and finally the output module 45 plays the first streaming media file according to the playing data corresponding to the first streaming media file.

Based on the implementation process of the player playing the first streaming media file, the implementation manner of the player calling back the data corresponding to the first streaming media file in the process of playing the first streaming media file may include two manners: the first method comprises the following steps: in the process of playing the first streaming media file, the playing data corresponding to each frame played in the decoding module is called back, and then the called-back playing data is encoded to generate an audio elementary stream or a video elementary stream corresponding to the first streaming media file. And the second method comprises the following steps: and in the process of playing the first streaming media file, calling back the basic stream data corresponding to the first streaming media file. In the second implementation manner, the basic stream data corresponding to the first streaming media file is directly called back instead of the playing data, so that the second implementation manner can avoid a process of consuming computing resources for encoding the playing data, and avoid the change of image quality or sound quality in the first multimedia file. In addition, as shown in fig. 4, the elementary stream data corresponding to the first streaming media file is generated by the decapsulation module 42 and stored in the buffer queue 43, and the decoding module 44 reads the elementary stream data corresponding to the first streaming media file from the buffer queue 43, so that in the second implementation, the elementary stream data corresponding to the first streaming media file can be called back from the output of the decapsulation module 42 or the input of the buffer queue 43 or the decoding module 44. However, the time stamps of the elementary stream data output by the decapsulation module 42 and the elementary stream data of the buffer queue 43 often have a large deviation from the player time stamp, and the time stamp of the elementary stream data corresponding to the first stream media file read by the decoding module 44 often has a small deviation from the player time stamp, so that the elementary stream data corresponding to the first stream media file read by the decoding module 44 can be called back to avoid the large deviation of the time stamps of the encapsulated audio elementary stream data and video elementary stream data.

In some embodiments, the controller 250 may further be configured to start a sub-thread, and periodically obtain a player timestamp through the sub-thread with a preset time duration as a period (e.g., 100ms), and determine whether a difference between a timestamp of the elementary stream data corresponding to the first stream media file that is recalled and the player timestamp is smaller than a preset value (e.g., 2 seconds), and if so, send the elementary stream data corresponding to the first stream media file that is recalled to a packaging module to package the audio elementary stream data and the video elementary stream data, so as to completely avoid that the timestamps of the packaged audio elementary stream data and the video elementary stream data are too far apart.

In some embodiments, the application can also support the user to specify the start and end positions of the first multimedia file to be synthesized with the recording file in real time. The implementation mode comprises the following steps: the user interface 280 receives a start position and an end position of a target segment in the first streaming media file for synthesizing the third streaming media file, which are input by a user; the player determines the position of a key frame which is positioned in front of the starting position and is closest to the starting position, and the detector starts to record the second streaming media file when the playing position of the first streaming media file exceeds the target position; and when the timestamp of the basic stream data corresponding to the first stream media file called back exceeds the termination position, stopping recording the second stream media file. When the playing position of the first streaming media file exceeds the target position, the player can send a recording starting instruction to the detector so that the detector starts to record the second streaming media file; and when the timestamp of the basic stream data corresponding to the first stream media file called back exceeds the termination position, sending a recording ending instruction to the detector so that the detector stops recording the second stream media file. In addition, the player may stop playing the first streaming media file when the timestamp of the elementary stream data corresponding to the first streaming media file called back exceeds the termination position.

Since image data needs to be rendered when playing a video file, it often takes a long time from receiving a video playing instruction to completing the first video rendering of the video file. When the first multimedia file is a video file and the second multimedia file is an audio file, if recording of the second streaming media file is started immediately after receiving a trigger operation input by a user, it is likely that the timestamp deviation of the encapsulated audio elementary stream data and the encapsulated video elementary stream data is too large. In view of the above problem, in some embodiments, the detector may start to record Pulse Code Modulation (PCM) data corresponding to the audio file when the player completes rendering the first video frame of the video file.

In some scenarios, the first multimedia file includes both a video file and an audio file, and the second multimedia file is an audio file. For example: in the case of dubbing a video, the first multimedia file includes both a video file and background music, and the second multimedia file is a recorded voice. In such a scenario, for the path of data of the video, the video elementary stream data may be extracted by using the method provided in any of the embodiments, and for the path of data of the audio, synthesis of background sound and human voice is required to be performed, and audio elementary stream data corresponding to the synthesized audio data is obtained. In some embodiments, the player can call back elementary stream data corresponding to the video file and PCM data corresponding to the first audio file during playing the first streaming media file, and the controller 250 can generate an audio data stream according to the PCM data corresponding to the first audio file and the data corresponding to the second streaming media file, and determine the elementary stream data corresponding to the video file as the video elementary stream data. In some embodiments, implementations in which the controller 250 generates the audio data stream according to the PCM data corresponding to the first audio file and the data corresponding to the second streaming media file may include: the player caches PCM data corresponding to the first audio file to a first data queue, and the detector caches PCM data corresponding to the second audio file to a second data queue; mixing the PCM data in the first data queue and the second data queue based on a preset rule to generate mixed audio data, and encoding the mixed audio data to generate the audio basic stream data.

Fig. 5 is a schematic software configuration diagram of the display device 200 shown in fig. 3, and as shown in fig. 5, the system is divided into four layers, which are, from top to bottom, an Application (Applications) layer (referred to as an "Application layer"), a java interface layer, a Native (Native) implementation layer, an Application Framework (Application Framework) layer (referred to as a "Framework layer"), and a hardware implementation layer. The application program layer is responsible for flow control among a plurality of modules, the java interface layer is mainly responsible for service management, each service is mainly realized based on a native layer, and generally, the player service comprises a loading module, a decapsulating module, a video decoding module, an audio decoding module, a video rendering module, an audio rendering module and the like. The encapsulation coding service comprises an audio coding module, a video coding module, an encapsulation module and the like. The recording service comprises a video recording module, an audio separating module and the like. The loading module is mainly responsible for loading the audio files and/or the video files selected by the user in advance from the server or the local storage space. The decapsulation module is mainly responsible for decapsulating the audio file and/or the video file loaded by the loading module to obtain video elementary stream data of the video file and/or audio elementary stream data of the audio file. The video decoding module is mainly responsible for decoding the video elementary stream data obtained by decapsulation of the decapsulation module into image data. The video rendering module is mainly responsible for rendering the image data decoded by the video decoding module so as to display the audio file. The audio decoding module is mainly responsible for decoding the audio elementary stream data obtained by decapsulation of the decapsulation module into PCM data. The audio rendering module is mainly responsible for rendering PCM data decoded by the audio decoding module so as to play audio files. The video recording module is mainly responsible for collecting image data corresponding to the video file. The audio recording module is mainly responsible for collecting PCM data corresponding to the audio file. The audio separation module is mainly responsible for separating PCM data of the audio file from the multimedia file selected by the user to be played in advance when the multimedia file selected by the user to be played in advance comprises the video file and the audio file. The video coding module is mainly responsible for carrying out video coding on image data called back by a player service or image data recorded by the video recording module to generate video elementary stream data. The audio coding module is mainly responsible for performing audio coding on one or more of CPM data called back by a player service, PCM data recorded by the audio recording module and PCM data separated by the audio separation module to generate audio elementary stream data. The encapsulation module is mainly responsible for encapsulating the video elementary stream data output by the video coding module and the audio elementary stream data output by the audio coding module so as to generate a synthesized multimedia file. For example: in a scene of real-time video playing and audio recording, when a user triggers a playing/recording behavior, a loading module loads a video file selected by the user in advance in a player service, a decapsulating module decodes the video file loaded by the loading module to obtain and output video elementary stream data of the video file, a video decoding module decodes the video elementary stream data output by the decapsulating module to obtain and output image data, and a video rendering module renders the image data output by the video decoding module, so that the video file is played. In the recording service, the audio recording module collects PCM data corresponding to the audio file. In the encapsulation coding service, under the condition that the player service calls back image data, a video coding module performs video coding on the called back image data to obtain video elementary stream data, an audio coding module performs audio coding on recorded PCM data to obtain audio elementary stream data, and an encapsulation module encapsulates the video elementary stream data obtained by the video coding and the audio elementary stream data obtained by the audio coding to obtain a new multimedia file obtained by synthesizing a played video file and a recorded audio file; under the condition that the player service calls back the video elementary stream data, the audio coding module performs audio coding on the recorded PCM data to obtain audio elementary stream data, and the packaging module packages the called-back video elementary stream data and the audio elementary stream data obtained by the audio coding to obtain a new multimedia file obtained by synthesizing the played video file and the recorded audio file. The framework layer is a packaged function implemented by the platform, in particular a function related to hardware implementation, such as: the multimedia interface related to the playing of the frame layer implementation encapsulates the related implementation of the hard decoding.

For more detailed description of the present solution, the following description is provided by way of example in conjunction with the accompanying drawings, and it is understood that the steps involved in the drawings may include more steps or fewer steps in actual implementation, and the sequence between the steps may also be different, so as to enable the method for synthesizing the streaming media file provided in the embodiments of the present application to be implemented. As shown in fig. 6, fig. 6 is a flowchart illustrating steps of a method for synthesizing a streaming media file according to one or more embodiments of the present application, the method for synthesizing a streaming media file including:

and S11, receiving the trigger operation input by the user.

The triggering operation is used for triggering playing of a first streaming media file, recording of a second streaming media file, and composition of a third streaming media file based on the first streaming media file and the second streaming media file.

For example, the triggering operation may specifically be an operation of a preset virtual control by a user.

In some embodiments, the first streaming media file is a video file and the second streaming media file is an audio file. For example: in the karaoke scene, a video file of a song (a first streaming media file) is played, a user's singing voice is recorded (a second streaming media file), and a composition of music shorts is performed based on the music shorts of the song and the user's singing voice (a third streaming media file).

In some embodiments, the first streaming media file is an audio file and the second streaming media file is a video file. For example: in the short video creation scene, selected background music (first streaming media file) is played, video content of the short video (second streaming media file) is recorded, and composition of the short video (third streaming media file) is performed based on the background music and the video content.

In some embodiments, the first streaming media file includes both a video audio file and an audio file, and the second streaming media file is an audio file. For example: in a scene dubbed a video with background music, the video and the background music (first streaming media file) are played, dubbing content (second streaming media file) is recorded, and composition of a short video (third streaming media file) is performed based on the video, the background music, and the dubbing content.

S12, playing the first streaming media file, and calling back data corresponding to the first streaming media file in the process of playing the first streaming media file.

In some embodiments, the playing the first streaming media file in step S12 includes the following steps a to d:

step a, decapsulating the first streaming media file to obtain elementary stream data corresponding to the first streaming media file.

And b, caching the elementary stream data corresponding to the first streaming media file into a cache queue.

And c, sequentially decoding the data in the cache queue to obtain the playing data corresponding to the first streaming media file.

And d, playing the first streaming media file according to the playing data corresponding to the first streaming media file.

In some embodiments, the recalling the data corresponding to the first streaming media file in the playing process of the first streaming media file in step S12 includes:

and calling back playing data corresponding to the first streaming media file.

That is, after the playing data corresponding to the first streaming media file is obtained by decoding in the step c, the playing data corresponding to the first streaming media file is called back.

and calling back the elementary stream data corresponding to the first streaming media file read by the decoding module.

That is, when the playback data corresponding to the first streaming media file is read and decoded in step c, the elementary stream data corresponding to the first streaming media file is recalled.

And S13, recording the second streaming media file, and acquiring data corresponding to the second streaming media file.

In some embodiments, the second streaming media file is a video file, and the data corresponding to the second streaming media file may be image data. For example: data in YUV format or data in RGB format.

In some embodiments, the second streaming media file is an audio file, and the corresponding data of the second streaming media file is PCM data.

S14, generating audio elementary stream data and video elementary stream data according to the data corresponding to the first streaming media file and the data corresponding to the second streaming media file.

In some embodiments, the first streaming media file is a video file, the second streaming media file is an audio file, and generating audio elementary stream data and video elementary stream data according to data corresponding to the first streaming media file and data corresponding to the second streaming media file includes:

acquiring video elementary stream data according to the data corresponding to the first streaming media file;

and acquiring audio elementary stream data according to the data corresponding to the second streaming media file.

In some embodiments, the first streaming media file is an audio file, the second streaming media file is a video file, and generating audio elementary stream data and video elementary stream data according to data corresponding to the first streaming media file and data corresponding to the second streaming media file includes:

acquiring audio elementary stream data according to the data corresponding to the first streaming media file;

and acquiring video elementary stream data according to the data corresponding to the second streaming media file.

S15, encapsulating the audio basic flow data and the video basic flow data to synthesize the third streaming media file.

In some embodiments, the encapsulation module may be initialized when the playback thread completes the preparation for playback, and track information of the audio elementary stream data and the video elementary stream data may be created to implement encapsulation of the audio elementary stream data and the video elementary stream data by the encapsulation module.

When receiving, through a user interface, a trigger operation for triggering playing of a first streaming media file, recording of a second streaming media file, and composition of a third streaming media file based on the first streaming media file and the second streaming media file, the display device provided in the embodiment of the application plays the first streaming media file through a player, recalls data corresponding to the first streaming media file in a playing process of the first streaming media file, records the second streaming media file through a detector, acquires data corresponding to the second streaming media file, generates, through a processor, audio elementary stream data and video elementary stream data according to the data corresponding to the first streaming media file and the data corresponding to the second streaming media file, and encapsulates the audio elementary stream data and the video elementary stream data through an encapsulator, to compose the third streaming media file. The display device provided by the embodiment of the application can generate audio elementary stream data and video elementary stream data according to data corresponding to the first streaming media file and data corresponding to the second streaming media file in the process of playing the first streaming media file and recording the second streaming media file, and package the audio elementary stream data and the video elementary stream data to synthesize the third streaming media file, so that the display device provided by the embodiment of the application can synthesize a played streaming media file and a recorded streaming media file in real time when playing the streaming media file and recording the streaming media file, and synthesize a new streaming media file according to the played streaming media file and the recorded streaming media file while playing the streaming media file and recording the streaming media file.

Referring to fig. 7, when a first multimedia file is a video file, a second multimedia file is an audio file, and play data corresponding to the first streaming media file is called back, the multimedia file synthesis method provided in the embodiment of the present application includes:

and S71, receiving the trigger operation input by the user.

The triggering operation is used for triggering playing of a video file, recording of an audio file and composition of a streaming media file based on the video file and the audio file.

S72, playing the video file, and calling back image data corresponding to the video file in the process of playing the video file.

S73, encoding the image data corresponding to the video file to obtain the video elementary stream data corresponding to the video file.

In some embodiments, the implementation of encoding the image data corresponding to the video file to obtain the video elementary stream data corresponding to the video file includes:

and coding the image data corresponding to the video file in an H264 hard coding mode to obtain the video elementary stream data corresponding to the video file.

And S74, recording the audio file, and acquiring PCM data corresponding to the audio file.

In this embodiment of the present application, playing the video file and recording the audio file are started at the same time, and in some embodiments, an implementation manner of controlling the playing of the video file and the recording of the audio file to be started at the same time is as follows: and when the preparation for playing the video file is finished, sending indication information to a recording thread, so that the playing of the video file and the recording of the audio file are started simultaneously.

S75, performing audio coding on the PCM data to generate audio elementary stream data.

S76, packaging the audio basic flow data and the video basic flow data to synthesize the streaming media file.

When the encapsulation is performed, time stamp information of the elementary stream needs to be provided, and for the audio elementary stream, the time stamp of the audio frame can be written from the initial time (0) based on the sampling rate and the channel number information. For example: when the Audio sampling rate is 44.1khz, the number of channels is 2, and the encoding mode is Advanced Audio Coding (ACC), the recorded PCM data is 16-bit endian, the PCM data size of each frame is 1024x2x 2-4096 bytes, and the time stamp duration of each frame is 1024/44.1-23.22 ms. For the video path, the player timestamp is not necessarily from the initial time (0), for example, the live video source, the play start timestamp is generally not the initial time, and at this time, the timestamp of the first frame needs to be subtracted from the timestamp of the video elementary stream data, that is, the timestamp of the video elementary stream data corresponding to the first frame is adopted to perform the encapsulation module.

Referring to fig. 8, when a first multimedia file is an audio file, a second multimedia file is a video file, and play data corresponding to the first streaming media file is called back, the multimedia file synthesis method provided in the embodiment of the present application includes:

and S81, receiving the trigger operation input by the user.

The triggering operation is used for triggering playing of an audio file, recording of a video file and composition of a streaming media file based on the video file and the audio file.

S82, playing the audio file, and calling back PCM data corresponding to the audio file in the process of playing the audio file.

S83, encoding the PCM data corresponding to the audio file to obtain the audio elementary stream data corresponding to the audio file.

And S84, recording the video file, and acquiring image data corresponding to the video file.

Also, in this embodiment of the present application, playing the audio file and recording the video file are started at the same time, and in some embodiments, an implementation manner of controlling the playing of the audio file and the recording of the video file to be started at the same time is as follows: and when the preparation for playing the audio file is finished, sending indication information to a video recording thread, so that the playing of the audio file and the recording of the video file are started simultaneously.

S85, video coding is performed on the image data to generate video elementary stream data.

S86, packaging the audio basic flow data and the video basic flow data to synthesize the streaming media file.

Similarly, when performing encapsulation, time stamp information of an elementary stream needs to be provided, and for a video elementary stream, a time stamp of a video frame can be written from an initial time (0) based on a frame rate and video track information. For the audio path, since the player timestamp is not necessarily from the initial time (0), the timestamp of the audio elementary stream data needs to be subtracted by the timestamp of the first frame, that is, the timestamp of the audio elementary stream data relative to the first frame is adopted to perform the encapsulation module.

Referring to fig. 9, when a first multimedia file is a video file, a second multimedia file is an audio file, and an elementary stream data corresponding to the first streaming media file is called back, the multimedia file synthesis method provided in the embodiment of the present application includes:

and S91, receiving the trigger operation input by the user.

S92, playing the video file, and calling back the video elementary stream data read by the decoding module in the process of playing the video file.

Referring to fig. 4, in the process of playing a video file by a player, video elementary stream data with the same timestamp is generated earlier than image data, if the video elementary stream data in the decapsulation module is recalled and sent to the encapsulation module for encapsulation, an audio/video offset of the same timestamp is relatively large, and a deadlock is triggered when some players play, which causes an abnormality such as incapability of playing or frequent playing, the video elementary stream data read by the decoding module is recalled in the above embodiment, so that the audio/video offset of the same timestamp can be reduced, and further, the abnormality such as incapability of playing or frequent playing caused by the deadlock triggered when the players play is avoided.

And S93, recording the audio file, and acquiring PCM data corresponding to the audio file.

And S94, carrying out audio coding on the PCM data to generate audio elementary stream data.

S95, packaging the audio basic flow data and the video basic flow data to synthesize the streaming media file.

Compared with the method of calling back the image data corresponding to the video file in the process of playing the video file, the method needs to execute the encoding operation while playing the video file, and particularly, the video encoding with high frame rate and high code rate consumes a large amount of system calculation and memory resources, and the video playing jam problem is very easy to occur, so that the abnormity of asynchronous sound and picture and the like of the played video and the recorded audio is caused, and when calling back the data corresponding to the video file, the embodiment shown in fig. 9 selects to call back the video elementary stream data instead of the image data corresponding to the video, so that the embodiment shown in fig. 9 can avoid the encoding of the video data and the change of the video picture quality.

Referring to fig. 10, when a first multimedia file is an audio file, a second multimedia file is a video file, and an elementary stream data corresponding to the first streaming media file is called back, the multimedia file synthesis method provided in the embodiment of the present application includes:

and S101, receiving a trigger operation input by a user.

S102, playing the audio file, and calling back the audio elementary stream data read by the decoding module in the process of playing the video file.

In the same line, the audio elementary stream data read by the decoding module is called back, so that the audio and video offset of the same timestamp can be reduced, and further, the abnormal conditions that the player cannot play or plays frequently and is blocked due to deadlock triggered during playing are avoided.

S103, recording the video file, and acquiring image data corresponding to the video file.

And S104, carrying out video coding on the image data to generate video elementary stream data.

S105, encapsulating the audio basic flow data and the video basic flow data to synthesize the streaming media file.

Compared with the method of calling back the PCM data corresponding to the video file in the process of playing the audio file, the method needs to consume system computation and memory resources to execute encoding operation while playing the audio file, so that the video playing pause problem is very easy to occur, and the abnormity that the played video and the recorded audio are not synchronous in sound and picture occurs is caused, and the embodiment shown in fig. 10 selects to call back the audio elementary stream data instead of the PCM data corresponding to the audio when calling back the data corresponding to the audio file, so that the embodiment shown in fig. 10 can avoid the encoding of the audio data and the change of the audio tone quality.

Referring to fig. 11, when a first multimedia file is a video file, a second multimedia file is an audio file, and an elementary stream data corresponding to the first streaming media file is called back, the multimedia file synthesis method provided in the embodiment of the present application includes:

and S111, receiving a trigger operation input by a user.

And S112, playing the video file, and calling back the video elementary stream data read by the decoding module in the process of playing the video file.

And S113, recording the audio file, and acquiring PCM data corresponding to the audio file.

And S114, carrying out audio coding on the PCM data to generate audio elementary stream data.

And S115, periodically acquiring the time stamp of the player by taking the preset time length as a period.

In some embodiments, a timestamp of a video frame currently played by the player may be obtained as the player timestamp.

In some embodiments, the step S115 may include: and periodically acquiring the player time stamp at a preset time length of 100 ms.

S116, determining whether the difference value between the timestamp of the video elementary stream data called back and the player timestamp is smaller than a preset value.

Illustratively, the preset value may be 2s, 3s, etc.

In the step S116, if it is determined that the difference between the timestamp of the video elementary stream data to be recalled and the player timestamp is greater than the preset value, the step S115 is continuously executed, and if it is determined that the difference between the timestamp of the video elementary stream data to be recalled and the player timestamp is less than the preset value, the following step S117 is executed.

And S117, sending the video elementary stream data to the wrapper.

And S118, encapsulating the audio basic flow data and the video basic flow data.

In the embodiment, the player timestamp is acquired every preset time length, and if the timestamp of the video elementary stream data which is recalled back and the player timestamp are within a certain threshold range, the video elementary stream data is sent to the encapsulation module for encapsulation, so that the embodiment thoroughly avoids the problem of overlarge deviation of the audio and video timestamp and the video timestamp.

Referring to fig. 12, when a first multimedia file is an audio file, a second multimedia file is a video file, and an elementary stream data corresponding to the first streaming media file is called back, the multimedia file synthesis method provided in the embodiment of the present application includes:

and S121, receiving a trigger operation input by a user.

And S122, playing the audio file, and calling back the audio elementary stream data read by the decoding module in the process of playing the audio file.

And S123, recording the video file, and acquiring image data corresponding to the video file.

And S124, carrying out video coding on the image data to generate video elementary stream data.

And S125, periodically acquiring the time stamp of the player by taking the preset time length as a period.

In some embodiments, a time stamp of an audio frame currently played by the player may be obtained as the player time stamp.

And S126, determining whether the difference value between the timestamp of the callback audio elementary stream data and the player timestamp is smaller than a preset value.

Illustratively, the preset value may be 2s, 3s, etc.

In the step S116, if it is determined that the difference between the timestamp of the audio elementary stream data called back and the player timestamp is greater than the preset value, the step S125 is continuously executed, and if it is determined that the difference between the timestamp of the audio elementary stream data called back and the player timestamp is less than the preset value, the following step S127 is executed.

And S127, sending the audio basic stream data into the wrapper.

S128, encapsulating the audio basic flow data and the video basic flow data to synthesize the multimedia file.

In the embodiment, the player time stamp is acquired every preset time, and if the time stamp of the recalled audio basic stream data and the player time stamp are within a certain threshold range, the recalled audio basic stream data is sent to the encapsulation module for encapsulation, so that the embodiment thoroughly avoids the problem of overlarge deviation of the audio and video time stamps and the video time stamps.

In some embodiments, the method for synthesizing a media file provided by the embodiment of the present application further supports a user to select a start position and an end position of a first multimedia file to synthesize a second multimedia file in real time, and as shown in fig. 13, the implementation manner includes the following steps:

s1301, receiving the starting position and the ending position of the target fragment.

Wherein the target segment is a segment in the first streaming media file for synthesizing the third streaming media file.

And S1302, determining the target position.

Wherein the target position is a position of a key frame located before and closest to the start position.

And S1303, receiving a trigger operation input by a user.

And S1304, playing the first streaming media file.

S1305, determining whether the playing position of the first streaming media file exceeds the target position.

In step S1305, if the playing position of the first streaming media file does not exceed the target position, the process returns to step S1304, and if the playing position of the first streaming media file exceeds the target position, the following step S1306 is executed.

And S1306, recording the second streaming media file, acquiring data corresponding to the second streaming media file, and starting to call back the data corresponding to the first streaming media file.

S1307, generating audio basic stream data and video basic stream data according to the data corresponding to the first streaming media file and the data corresponding to the second streaming media file.

S1308, encapsulating the audio elementary stream data and the video elementary stream data to synthesize the third multimedia file.

S1309, determining whether the playing position of the first streaming media file exceeds the ending position.

In step S1309, if the playing position of the first streaming media file does not exceed the ending position, the process returns to step S1306, and if the playing position of the first streaming media file exceeds the target position, the following step S1310 is executed.

S1310, stop recording the second streaming media file, and stop recalling data corresponding to the first streaming media file.

In some embodiments, the multimedia file synthesizing method provided by the embodiments of the present application further includes:

Referring to fig. 14, when the first streaming media file includes: when a video file and a first audio file are provided, the second streaming media file is a second audio file, and the second multimedia file is an audio file, the multimedia file synthesis method provided by the embodiment of the application includes:

and S141, receiving a trigger operation input by a user.

S142, playing the first streaming media file, and calling back basic stream data corresponding to the video file and PCM data corresponding to the first audio file in the process of playing the first streaming media file.

And S143, recording the second streaming media file, and acquiring PCM data corresponding to the second streaming media file.

S144, generating an audio data stream according to the PCM data corresponding to the first audio file and the data corresponding to the second streaming media file.

S145, encapsulating the audio basic flow data and the video basic flow data to synthesize the third streaming media file.

In some embodiments, generating an audio data stream from PCM data corresponding to the first audio file and data corresponding to the second streaming media file comprises:

caching PCM data corresponding to the first audio file to a first data queue; caching PCM data corresponding to the second audio file to a second data queue; mixing the PCM data in the first data queue and the second data queue based on a preset rule to generate mixed audio data; and encoding the mixed audio data to generate the audio elementary stream data.

Referring to fig. 15, in some embodiments, implementations of the above embodiments include: when a triggering operation input by a user is received, a first multimedia file is obtained through the resource loading module, and the first multimedia is sent to the decapsulation module for decapsulation. The decapsulation result of the first multimedia file comprises an audio elementary stream data and a video elementary stream data, and the audio elementary stream data and the video elementary stream data are respectively cached in an audio elementary stream data queue and a video elementary stream data queue. The video elementary stream data acquisition mode is as follows: and calling back the video elementary stream data read by the decoding module. The video elementary stream data acquisition mode is as follows: calling back PCM data corresponding to a first audio file, storing the PCM data into a first data queue, acquiring PCM data corresponding to a second audio file through a sound collector, storing the PCM data into a second data queue, mixing the PCM data in the first data queue and the second data queue based on a preset rule to generate mixed audio data, and coding the mixed audio data to acquire audio basic stream data. And finally, encapsulating the audio basic flow data and the video basic flow data to synthesize the third streaming media file.

Since the size of the audio data for each frame is fixed, for example, one frame of audio data in AAC format, and the size of 2 channels and 16 bits is fixed to 4096 bytes, in some embodiments, the data in the first data queue and the second data queue may be combined in data amount.

In some embodiments, the multimedia file synthesis method provided by the embodiment of the present application may further adjust the delay of the second audio file as needed. For example, only the data in the first data queue is taken in the first 5 frames, the audio data in the first data queue and the second data queue is synthesized from the 6 th frame, and the microphone sound can be delayed by 5x1024/44.1ms for the 44.1k sampled audio.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process executed by the streaming media file synthesizing method, and can achieve the same technical effect, and in order to avoid repetition, the computer program is not described herein again.

The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

The present application provides a computer program product comprising: when the computer program product runs on a computer, the computer is enabled to realize the streaming media file synthesis method.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the foregoing discussion in some embodiments is not intended to be exhaustive or to limit the implementations to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A display device, comprising:

a user interface configured to: receiving a trigger operation input by a user, wherein the trigger operation is used for triggering playing of a first streaming media file, recording of a second streaming media file and composition of a third streaming media file based on the first streaming media file and the second streaming media file;

a processor configured to: generating audio basic flow data and video basic flow data according to the data corresponding to the first streaming media file and the data corresponding to the second streaming media file;

a wrapper configured to: and encapsulating the audio elementary stream data and the video elementary stream data to synthesize the third streaming media file.

2. The display device according to claim 1,

the player configured to: acquiring the first streaming media file, decapsulating the first streaming media file to acquire basic stream data corresponding to the first streaming media file, caching the basic stream data corresponding to the first streaming media file into a cache queue, sequentially decoding data in the cache queue to acquire play data corresponding to the first streaming media file, playing the first streaming media file according to the play data corresponding to the first streaming media file, and calling back the play data corresponding to the first streaming media file;

3. The display device according to claim 1,

the player configured to: acquiring the first streaming media file, decapsulating the first streaming media file to acquire basic stream data corresponding to the first streaming media file, caching the basic stream data corresponding to the first streaming media file into a cache queue, sequentially reading and decoding data in the cache queue through a decoding module to acquire playing data corresponding to the first streaming media file, playing the first streaming media file according to the playing data corresponding to the first streaming media file, and calling back the basic stream data corresponding to the first streaming media file read by the decoding module;

4. The display device of claim 3, wherein the encapsulator is further configured to:

and if so, packaging the video elementary stream data called back.

5. The display device according to claim 1,

the user interface further configured to: receiving a starting position and an ending position of a target segment input by a user; the target segment is a segment used for synthesizing the third streaming media file in the first streaming media file;

the player, further configured to: determining a target position, wherein the target position is the position of a key frame which is positioned before the starting position and is closest to the starting position;

6. The display device of claim 5, wherein the player is further configured to:

7. The display device of claim 1, wherein the first streaming media file is a video file; the second streaming media file is an audio file;

the detector configured to: and when the rendering of the first video frame of the video file is finished, beginning to record Pulse Code Modulation (PCM) data corresponding to the audio file.

8. The display device of claim 1, wherein the first streaming media file comprises: the second streaming media file is a second audio file;

9. The display device according to claim 8,

the player, further configured to: caching PCM data corresponding to the first audio file to a first data queue;

the detector further configured to: caching PCM data corresponding to the second audio file to a second data queue;

10. A method for synthesizing a streaming media file is characterized by comprising the following steps: