CN116347173A

CN116347173A - Editable embedded audio and video server and operation method thereof

Info

Publication number: CN116347173A
Application number: CN202310308793.2A
Authority: CN
Inventors: 李光; 罗雨涵; 李鹏; 谭俊
Original assignee: Beijing Bbef Science and Technology Co Ltd
Current assignee: Beijing Bbef Science and Technology Co Ltd
Priority date: 2023-03-28
Filing date: 2023-03-28
Publication date: 2023-06-27

Abstract

The invention discloses an editable embedded audio and video server, which comprises: ARM main control chip and FPGA processing chip, FPGA processing chip is connected with ARM main control chip. The ARM main control chip provides a front-end web page and back-end processing software, wherein the front-end web page is used for receiving video editing instructions of a user and transmitting video data to be edited and operations to be executed to the back-end processing software; the back-end processing software is used for calling the FPGA processing chip through the ARM main control chip, editing and processing the video data to obtain a processed video code stream, and transmitting the video code stream to the front-end web page for video preview. The video can be edited in real time by the scheme, additional third-party equipment and software are not needed, and the running cost of the equipment is reduced.

Description

Editable embedded audio and video server and operation method thereof

Technical Field

The invention relates to the technical field of audio and video processing, in particular to an editable embedded audio and video server and an operation method thereof.

Background

The video editing and encoding and decoding functions of the current audio and video server are in a separated state, and in the audio and video rebroadcasting and live broadcasting scenes, various different equipment combinations such as video acquisition equipment, video editing equipment, video inserting equipment, video encoding and transmission equipment and the like are often used for meeting the actual demands. The video processing mode often brings the problems of incompatible interfaces, unreliable equipment operation and the like.

Therefore, there is a need for an editable embedded audio and video server that can integrate video online editing functions and improve the reliability of device operation.

Disclosure of Invention

In view of the problems existing in the prior art, the scheme provides the editable embedded audio and video server and the operation method thereof, which can fully utilize the redundant functions of an ARM chip and an FPGA chip in the embedded audio and video server, directly integrate the on-line editing function of the audio and video insert clip on the server, do not need additional third party software and equipment, can reduce the volume of the equipment, reduce the operation cost of the equipment, enrich the functions of the embedded audio and video server and improve the operation reliability of the equipment.

According to one aspect of the present invention, there is provided an editable embedded audio and video server comprising: ARM main control chip and FPGA processing chip, FPGA processing chip is connected with ARM main control chip.

The ARM main control chip provides a front-end web page and back-end processing software, wherein the front-end web page is used for receiving video editing instructions of a user and transmitting video data to be edited and operations to be executed to the back-end processing software; and the back-end processing software is used for calling the FPGA processing chip through the ARM main control chip to edit and process the video data according to the operation required to be executed, obtaining a processed video code stream, and transmitting the video code stream to the front-end web page for video preview.

In view of the strong computing power of ARM chips and FPGA chips used by the embedded audio and video server, the embedded audio and video server has more redundancy functions besides the tasks of encoding and decoding audio and video data. The editable embedded audio and video server provided by the scheme can fully utilize the redundant functions of the FPGA chip and the ARM chip, can directly edit the audio and video inter-cut clips at the server, provides edited videos for users to preview and use through web pages, and can improve equipment compatibility and operation reliability.

Optionally, in the above-mentioned editable embedded audio and video server, the front-end web page may include: the system comprises a material management unit, a time axis editing unit, a special effect processing unit and a video preview unit.

The system comprises a material management unit, a storage unit and a storage unit, wherein the material management unit is used for managing materials stored on an embedded audio and video server, and the materials comprise audio, video, pictures, characters and the like; the time axis editing unit is used for carrying out operations such as mixing, cutting, combining, inserting and the like on the audio, the video and the materials; the special effect processing unit is used for adding the functions of caption special effect, transition special effect, filter special effect and the like to the edited video; and the video preview unit is used for previewing the video subjected to the time axis editing and special effect processing.

Optionally, in the above-mentioned editable embedded audio and video server, the FPGA processing chip includes: the video processing system comprises a key frame extraction unit, a video frame editing unit, a duration control unit and an encoding encapsulation unit.

The key frame extraction unit is used for extracting key frames from the source video to obtain video frames; the video frame editing unit is used for carrying out mixing, cutting, merging, inserting and special effect processing operations on the video frames; the time length control unit is used for dynamically adjusting the playing time length of the video frames edited by the video frame editing unit and adjusting the code rate and the key frame interval of the video according to the ratio between the actual playing time length and the template playing time length; the coding encapsulation unit is used for recoding and encapsulating the edited video frames into a video code stream.

Optionally, in the above-mentioned editable embedded audio-video server, the key frame extraction unit is configured to extract a key frame from the source video by using a feature domain extraction algorithm, where the feature domain extraction algorithm includes:

performing RGB space conversion on a frame image in a source video to generate a histogram;

comparing histograms of the continuous 24 frames of images, and calculating a histogram similarity change value between the adjacent frames;

carrying out statistical sorting on the histogram similarity change values between adjacent frames, determining an intermediate threshold value, and outputting a previous frame of a frame image with the highest similarity change as a key frame if the similarity change value is higher than a first preset threshold value; if the similarity change value is kept consistent or is lower than a second preset threshold value, extracting the histogram as texture features by using a binary feature method, and generating a texture histogram;

and calculating the similarity of the texture histograms between the adjacent frames, and outputting the adjacent frames as key frames when the similarity of the texture histograms between the adjacent frames is lower than the intermediate threshold value of the similarity change value.

Optionally, in the above-mentioned editable embedded audio-video server, the duration control unit is configured to calculate a ratio between a video playing duration after editing and a source video playing duration, if the ratio is greater than 1, reduce the code rate and the key frame interval, and if the ratio is less than 1, increase the code rate and the key frame interval.

Optionally, in the above-mentioned editable embedded audio and video server, the FPGA processing chip is further configured to collect, multiplex, and demultiplex audio and video data; the ARM main control chip is used for encoding, decoding, transmitting and storing audio and video data.

Optionally, in the above-mentioned editable embedded audio and video server, further includes: the SSD memory and the peripheral interface are connected with the FPGA chip, and the peripheral interface comprises an SDI interface, an HDMI interface, a digital video output interface, a network interface and a debugging interface; the SSD memory is connected with the ARM main control chip and is used for storing audio and video data.

According to a second aspect of the present invention, there is provided an operation method based on an editable embedded audio-video server, applied to the editable embedded audio-video server as described above.

Firstly, receiving editing operation of a user on a front-end web page on a source video and a material, and transmitting the source video and the material and the corresponding editing operation to an FPGA processing chip;

then, extracting key frames from the source video by the FPGA processing chip, editing the key frames by mixing, cutting, combining, inserting and special effect processing, optimizing the edited video duration, compressing the processed video frames into video, and transmitting the video to back-end processing software;

and finally, the back-end processing software transmits the video to the front-end web page to preview the video.

According to a third aspect of the present invention there is provided a computing device comprising: at least one processor; and a memory storing program instructions, wherein the program instructions are configured to be adapted to be executed by the at least one processor, the program instructions comprising instructions for performing the above-described method of operation based on the editable embedded audio video server.

According to a fourth aspect of the present invention, there is provided a readable storage medium storing program instructions that, when read and executed by a computing device, cause the computing device to perform the above-described method of operating an editable embedded audio-video server.

According to the scheme of the invention, the audio and video insert editing function is added on the basis of the main functions of audio and video acquisition, decoding, encoding and transmission commonly used by an audio and video server, and no additional third party equipment and third party software are needed. Based on the ARM and the FPGA processor with high performance, the FPGA is used for carrying out hardware acceleration on the video editing processing process, so that the real-time performance of video editing can be improved, and the operation and maintenance cost of a client in an actual scene can be greatly reduced.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 illustrates a schematic diagram of an editable embedded audio video server according to one embodiment of the invention;

FIG. 2 shows a schematic diagram of an audio video editing front-end web page in accordance with one embodiment of the invention;

FIG. 3 illustrates a block diagram of a computing device 300 according to one embodiment of the invention;

FIG. 4 illustrates a flow diagram of a method 400 of operation based on an editable embedded audio video server in accordance with one embodiment of the invention;

fig. 5 shows a schematic diagram of an audio/video editing software and FPGA chip scheduling process according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The embedded video server is an embedded device for providing network video transmission and sharing, and is used for converting audio and video signals into network IP packets after acquisition, compression and recombination, and realizing real-time network transmission of compressed data streams by adopting a network protocol.

The video editing and video encoding and decoding functions of the existing embedded video server are in a separated state, and in the audio and video rebroadcasting and live broadcasting scenes, various different equipment combinations such as video acquisition equipment, video editing equipment, video inserting equipment, video encoding and transmission equipment and the like are often used for meeting the actual demands. There may be problems of incompatibility of interfaces between different devices, low operational reliability, etc.

In order to improve the requirements of more functions and the reliability of equipment operation, the scheme provides the editable embedded audio and video server, which can integrate the functions of audio and video acquisition, coding, decoding, transmission and editing, reduce the volume of equipment, improve the compatibility and the operation reliability of the equipment and greatly reduce the operation and maintenance cost in actual scenes.

Fig. 1 shows a schematic structure of an editable embedded audio-video server according to an embodiment of the invention. As shown in fig. 1, the editable embedded audio and video server includes: the ARM main control chip and the FPGA processing chip are connected with the ARM main control chip, and the ARM main control chip provides a front-end web page and back-end processing software.

In addition, the system also comprises an SSD memory and a peripheral interface, wherein the peripheral interface is connected with the FPGA chip and comprises an SDI interface, an HDMI interface, a digital video output interface, a network interface and a debugging interface; the SSD memory is connected with the ARM main control chip and is used for storing audio and video data.

And the ARM chip runs main control software for encoding, decoding and transmitting the audio and video data and runs background processing software. The FPGA processing chip can run FPGA software for collecting, multiplexing and demultiplexing audio and video data, managing each interface in the server and executing an algorithm in the audio and video editing software.

The audio and video editing software is divided into a front-end web page and a back-end processing software. The front-end web page is used for receiving a video editing instruction of a user and transmitting video data to be edited and operations to be executed to the back-end processing software; the back-end processing software is used for calling the FPGA processing chip through the ARM main control chip, editing and processing the video data to obtain a processed video code stream, and transmitting the video code stream to the front-end web page for video preview.

The audiovisual editing front-end web page may be integrated in a device management page. FIG. 2 shows a schematic diagram of an audio video editing front-end web page in accordance with one embodiment of the invention. As shown in fig. 2, the front-end web page includes: the system comprises a material management unit, a time axis editing unit, a special effect processing unit and a video preview unit.

The material management unit is used for managing video files stored on the embedded audio and video server, and comprises a currently acquired source file, various material files (video, pictures, sound effects, characters) uploaded by a user and the like.

And the time axis editing unit is used for carrying out operations such as mixing, cutting, combining, inserting and the like on the audio or video and the materials. The timeline editing unit may separate audio and video, scale the time track, and globally preview the time track.

The special effect processing unit is used for adding functions of filter special effects (mosaic, black screen, blurring, oil painting and the like), transition special effects (fade-in fade-out, stop-motion and overlapping), caption special effects (scrolling, sliding, popup, overturn, zooming and the like) and the like to the video.

And the video preview unit is used for previewing the edited and special effect processed video. The edited video can be stored on an audio-video server, and a corresponding network protocol can be selected for video transmission according to the requirements of users.

The back-end processing software is configured in the ARM main control chip and is mainly used for realizing real-time editing of the video by calling the FPGA processing chip, wherein the FPGA processing chip mainly comprises a key frame extraction unit, a video frame editing unit, a duration control unit and a coding encapsulation unit.

The key frame extraction unit is used for extracting key frames from the source video. In one embodiment of the present invention, the key frame extraction unit may perform key frame extraction on the source video by using a feature domain extraction algorithm, where a main flow of the feature domain extraction algorithm includes:

1) Performing RGB space conversion on a frame image in a source video to generate a histogram;

2) Comparing histograms of the continuous 24 frames of images, and calculating a histogram similarity change value between the adjacent frames;

3) Carrying out statistical sorting on the histogram similarity change values between adjacent frames, determining an intermediate threshold value, and outputting a previous frame of a frame image with the highest similarity change as a key frame if the similarity change value is higher than a first preset threshold value; if the similarity change value is kept consistent or is lower than a second preset threshold value, extracting the histogram as texture features by using a binary feature method, and generating a texture histogram;

4) And calculating the similarity of the texture histograms between the adjacent frames, and outputting the adjacent frames as key frames when the similarity of the texture histograms between the adjacent frames is lower than the intermediate threshold value of the similarity change value.

Other key frame extraction algorithms can be used to extract key frames from the source video, for example, key frames are extracted based on convolutional neural networks, key frames are extracted based on k-means clusters, and the like, and the scheme is not limited in this regard.

The video frame editing unit is used for carrying out operations such as mixing, cutting, merging, inserting, special effect processing and the like on the video frames. The video frame editing unit can perform various operations such as splicing pictures, making GIF images, digging images, replacing background, adding music background and the like.

The time length control unit is used for dynamically adjusting the playing time length of the video frames edited by the video frame editing unit and adjusting the code rate and the key frame interval of the video according to the ratio between the actual playing time length and the template playing time length.

Specifically, the duration control unit may first calculate a ratio between the edited playing duration and the playing duration of the source video stream, and if the ratio is greater than 1, it indicates that the playing duration of the video exceeds the expected duration. Based on the magnitude of the ratio, the code rate and the amount of key frame spacing that need to be adjusted can be determined.

For example, if the ratio is greater than 1.2, both the code rate and the key frame interval may be reduced by 20% to accommodate the actual playout duration. If the ratio is less than 0.8, the code rate and key frame interval are increased by 20% accordingly. The larger the code rate is, the higher the video quality is, and conversely, the smaller the code rate is, the lower the video quality is, and the smoothness and quality of video playing can be improved by adjusting the code rate and the key frame interval.

The coding encapsulation unit is used for recoding and encapsulating the key frames into a video code stream. For example, when video is clipped, it may be encapsulated in an MOV file format in an intra-frame compressed video coding format. When deriving video for network transmission, the video coding format h.264 of inter-frame compression can be adopted, and encapsulated into MP4 file format.

The FPGA processing chip can transmit the edited and processed video code stream to back-end processing software in the ARM main control chip, and then the back-end processing software transmits the video code stream to a front-end web page for video preview.

The editable embedded audio and video based server as described above can be integrated in the computing device provided by the scheme. FIG. 3 illustrates a block diagram of a computing device 300 according to one embodiment of the invention. As shown in FIG. 3, in a basic configuration 102, a computing device 300 typically includes a system memory 106 and one or more processors 104. The memory bus 108 may be used for communication between the processor 104 and the system memory 106.

Depending on the desired configuration, the processor 104 may be any type of processor, including, but not limited to: microprocessor (μp), microcontroller (μc), digital information processor (DSP), or any combination thereof. The processor 104 may include one or more levels of caches, such as a first level cache 110 and a second level cache 112, a processor core 114, and registers 116. The example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 118 may be used with the processor 104, or in some implementations, the memory controller 118 may be an internal part of the processor 104.

Depending on the desired configuration, system memory 106 may be any type of memory including, but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. Physical memory in a computing device is often referred to as volatile memory, RAM, and data in disk needs to be loaded into physical memory in order to be read by processor 104. The system memory 106 may include an operating system 120, one or more applications 122, and program data 124.

In some implementations, the application 122 may be arranged to execute instructions on an operating system by the one or more processors 104 using the program data 124. The operating system 120 may be, for example, linux, windows or the like, which includes program instructions for handling basic system services and performing hardware-dependent tasks. The application 122 includes program instructions for implementing various functions desired by the user, and the application 122 may be, for example, a browser, instant messaging software, a software development tool (e.g., integrated development environment IDE, compiler, etc.), or the like, but is not limited thereto. When an application 122 is installed into computing device 300, a driver module may be added to operating system 120.

When the computing device 300 starts up running, the processor 104 reads the program instructions of the operating system 120 from the memory 106 and executes them. Applications 122 run on top of operating system 120, utilizing interfaces provided by operating system 120 and underlying hardware to implement various user-desired functions. When a user launches the application 122, the application 122 is loaded into the memory 106, and the processor 104 reads and executes the program instructions of the application 122 from the memory 106.

Computing device 300 also includes storage device 132, storage device 132 including removable storage 136 and non-removable storage 138, both removable storage 136 and non-removable storage 138 being connected to storage interface bus 134.

Computing device 300 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to basic configuration 102 via bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices such as a display or speakers via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communication with one or more other computing devices 162 via one or more communication ports 164 over a network communication link.

The network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media in a modulated data signal, such as a carrier wave or other transport mechanism. A "modulated data signal" may be a signal that has one or more of its data set or changed in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or special purpose network, and wireless media such as acoustic, radio Frequency (RF), microwave, infrared (IR) or other wireless media. The term computer readable media as used herein may include both storage media and communication media. In a computing device 300 according to the invention, the application 122 comprises instructions for performing the method 400 of operation of the invention based on an editable embedded audio video server.

The running method based on the editable embedded audio and video server is characterized in that the functions of audio and video acquisition, online editing, encoding, decoding, transmission, storage and the like are achieved through an ARM main control chip and an FPGA processing chip. Fig. 4 shows a flow diagram of a method 400 of operation based on an editable embedded audio video server, according to one embodiment of the invention. As shown in fig. 4, step S410 is first executed, receiving editing operations of a user on a front-end web page on source videos and materials, and transmitting the source videos and materials and the corresponding editing operations to an FPGA processing chip;

and then, executing step S420, wherein the FPGA processing chip extracts key frames from the source video, performs editing of mixing, cutting, merging, inserting and special effect processing on the key frames, performs optimization processing on the edited video duration, compresses the processed video frames into videos, and transmits the videos to the back-end processing software.

Finally, step S430 is executed, where the back-end processing software transmits the video to the front-end web page for video preview.

Fig. 5 shows a schematic diagram of an audio/video editing software and FPGA chip scheduling process according to an embodiment of the present invention. As shown in figure 5 of the drawings,

(1) the user edits the video through the audio and video editing front-end web page, and transmits the relevant steps to audio and video editing back-end software;

(2) the audio-video editing back-end software transmits related source videos, materials (videos, pictures, characters, audios and the like) to the FPGA processing chip;

(3) a key frame extraction module in the FPGA processing chip extracts key frames for the source video, frames the material video and stores the material video in a cache;

(4) the video frame editing module in the FPGA processing chip invokes editing operation according to user operation, and completes operations such as mixing, video cutting, merging, inserting, special effect processing and the like on the video frame;

(5) and the duration control module in the FPGA processing chip optimizes the edited video duration and outputs the processed video frame.

(6) The encoding and packaging module in the FPGA processing chip compresses and outputs the processed video frames into video and transmits the video frames to audio and video editing back-end software in the ARM main control chip;

(7) and the audio and video editing back-end software transmits the video to the front-end web page to realize video preview.

According to the audio/video editing scheduling flow, the FPGA processing chip is used for executing editing and processing operations required by a user on source videos and materials, such as extracting key frames from videos, editing the video frames, dynamically adjusting the code rate and key frame interval of the edited video frames and the like, so that the video editing operations can be accelerated, and the instantaneity of video editing is improved.

According to the technical scheme, the audio and video insert editing function is added on the basis of the main functions of audio and video acquisition, decoding, encoding and transmission commonly used by the audio and video server, and no additional third party equipment and third party software are needed. Based on the ARM and the FPGA processor with high performance, the FPGA is used for carrying out hardware acceleration on the video editing processing process, so that the real-time performance of video editing can be improved, and the operation and maintenance cost of a client in an actual scene can be greatly reduced.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment, or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into a plurality of sub-modules.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

Furthermore, some of the embodiments are described herein as methods or combinations of method elements that may be implemented by a processor of a computer system or by other means of performing the functions. Thus, a processor with the necessary instructions for implementing the described method or method element forms a means for implementing the method or method element. Furthermore, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is for carrying out the functions performed by the elements for carrying out the objects of the invention.

As used herein, unless otherwise specified the use of the ordinal terms "first," "second," "third," etc., to describe a general object merely denote different instances of like objects, and are not intended to imply that the objects so described must have a given order, either temporally, spatially, in ranking, or in any other manner.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of the above description, will appreciate that other embodiments are contemplated within the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter.

Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is defined by the appended claims.

Claims

1. The editable embedded audio and video server comprises an ARM main control chip and an FPGA processing chip, wherein the FPGA processing chip is connected with the ARM main control chip, and the embedded audio and video server is characterized in that the ARM main control chip provides a front-end web page and rear-end processing software, and the front-end web page is used for receiving video editing instructions of users and transmitting video data to be edited and operations to be executed to the rear-end processing software; and the back-end processing software is used for calling the FPGA processing chip through the ARM main control chip to edit and process the video data according to the operation required to be executed, obtaining a processed video code stream, and transmitting the video code stream to the front-end web page for video preview.

2. The editable embedded audio video server of claim 1, wherein the front-end web page comprises: a material management unit, a time axis editing unit, a special effect processing unit and a video preview unit,

the material management unit is used for managing materials stored on the embedded audio and video server, wherein the materials comprise audio, video, pictures and characters;

the time axis editing unit is used for carrying out operations of mixing, cutting, combining and inserting on the audio, the video and the materials;

the special effect processing unit is used for adding special effect functions to the edited video, wherein the special effect functions comprise a subtitle special effect, a transition special effect and a filter special effect;

the video preview unit is used for previewing the video subjected to the time axis editing and special effect processing.

3. The editable embedded audio video server of claim 1, wherein the FPGA processing chip comprises: a key frame extraction unit, a video frame editing unit, a duration control unit and an encoding encapsulation unit,

the key frame extraction unit is used for extracting key frames from the source video to obtain video frames;

the video frame editing unit is used for carrying out mixing, cutting, merging, inserting and special effect processing operations on video frames;

the time length control unit is used for dynamically adjusting the playing time length of the video frames edited by the video frame editing unit and adjusting the code rate and the key frame interval of the video according to the ratio between the actual playing time length and the template playing time length;

the coding and packaging unit is used for recoding and packaging the edited video frames into a video code stream.

4. The editable embedded audio-video server according to claim 3, wherein the key frame extraction unit is configured to extract key frames from the source video by a feature field extraction algorithm, the feature field extraction algorithm comprising:

performing RGB space conversion on the frame image in the source video to generate a histogram;

carrying out statistical sorting on the similarity change values, determining an intermediate threshold value, and outputting a previous frame of the frame image with the highest similarity change as a key frame if the similarity change value is higher than a first preset threshold value; if the similarity change value is kept consistent or is lower than a second preset threshold value, extracting the histogram as texture features by using a binary feature method, and generating a texture histogram;

5. The server of claim 3, wherein the duration control unit is configured to calculate a ratio between the edited video playback duration and the source video playback duration, decrease the code rate and the key frame interval if the ratio is greater than 1, and increase the code rate and the key frame interval if the ratio is less than 1.

6. The editable embedded audio and video server of claim 1, wherein the FPGA processing chip is further configured to collect, multiplex, and demultiplex audio and video data; the ARM main control chip is used for encoding, decoding and transmitting audio and video data.

7. The editable embedded audio video server of claim 6, further comprising: the SSD memory and the peripheral interface are connected with the FPGA chip, and the peripheral interface comprises an SDI interface, an HDMI interface, a digital video output interface, a network interface and a debugging interface; the SSD memory is connected with the ARM main control chip and used for storing audio and video data.

8. An operating method based on an editable embedded audio-video server, which is applied to the editable embedded audio-video server as claimed in any one of claims 1 to 7, and comprises the following steps:

receiving editing operation of a user on a front-end web page on a source video and a material, and transmitting the source video and the material and the corresponding editing operation to an FPGA processing chip;

the FPGA processing chip extracts key frames from the source video, edits the key frames by mixing, cutting, merging, inserting and special effect processing, optimizes the duration of the edited video, compresses the processed video frames into video, and transmits the video to back-end processing software;

and the back-end processing software transmits the video to a front-end web page to preview the video.

9. A computing device, comprising:

at least one processor; and a memory storing program instructions, wherein the program instructions are configured to be adapted to be executed by the at least one processor, the program instructions comprising instructions for performing the method of operation of the editable embedded audio video server of claim 8.

10. A readable storage medium storing program instructions that, when read and executed by a computing device, cause the computing device to perform the method of operating an editable embedded audio video server as claimed in claim 8.