CN116861470B

CN116861470B - Encryption and decryption method, encryption and decryption device, computer readable storage medium and server

Info

Publication number: CN116861470B
Application number: CN202311138950.6A
Authority: CN
Inventors: 孙忠祥; 张闯; 刘科
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2023-09-05
Filing date: 2023-09-05
Publication date: 2024-01-26
Anticipated expiration: 2043-09-05
Also published as: CN116861470A

Abstract

The embodiment of the application provides an encryption and decryption method, an encryption and decryption device, a computer readable storage medium and a server, wherein the method comprises the following steps: under the condition of receiving data to be processed, carrying out high-level synthesis on the kernel function code to obtain a hardware algorithm, wherein the data to be processed is sent by target equipment; operating a hardware algorithm to perform preset processing on data to be processed to obtain processed data, wherein the preset processing comprises encryption processing or decryption processing, and the kernel function code comprises codes of an encryption function or a decryption function; and feeding back the processed data to the target device, so that the target device stores the processed data in the first memory under the condition that the preset instruction comprises an instruction for writing the data to be processed, and sends the processed data to the terminal under the condition that the preset instruction comprises an instruction for reading the data to be processed. The method solves the problem that the hardware encryption and decryption require writing a complex logic circuit by using a hardware description language, so that the design is complex.

Description

Encryption and decryption method, encryption and decryption device, computer readable storage medium and server

Technical Field

The embodiment of the application relates to the field of computers, in particular to an encryption and decryption method, an encryption and decryption device, a computer readable storage medium and a server.

Background

In the digital information age today, data security and privacy protection are vital. In order to ensure confidentiality and integrity of sensitive data, encryption techniques are widely used in various fields such as communication, storage, cloud computing, and the like. However, conventional software-implemented encryption algorithms tend to be inefficient in processing large-scale data, resulting in increased delays in encryption and decryption operations, thereby reducing the overall performance of the system.

To solve this problem, hardware accelerated encryption and decryption techniques have been developed. FPGAs (Field-Programmable Gate Array, field programmable gate arrays) offer significant advantages over general-purpose processors in terms of data processing speed, so that encryption and decryption operations can be implemented with FPGAs to increase encryption speed and system response time. However, for implementation of encryption and decryption algorithms, conventional hardware design methods require writing codes through hardware description languages and performing complex logic design and verification works, which increases complexity of design and development cycle.

Disclosure of Invention

The embodiment of the application provides an encryption and decryption method, an encryption and decryption device, a computer readable storage medium and a server, which at least solve the problem that hardware encryption and decryption in the related art needs to write a complex logic circuit by using a hardware description language, so that design is complex.

According to one embodiment of the present application, there is provided an encryption and decryption method, including: under the condition of receiving data to be processed, carrying out high-level synthesis on kernel function codes to obtain a hardware algorithm, wherein the data to be processed is sent by target equipment under the condition of receiving a preset instruction sent by a terminal; running the hardware algorithm to perform preset processing on the data to be processed to obtain processed data, wherein the preset processing comprises encryption processing when the preset instruction comprises an instruction for writing the data to be processed, the kernel function code comprises code of an encryption function, and the preset processing comprises decryption processing when the preset instruction comprises an instruction for reading the data to be processed, and the kernel function code comprises code of a decryption function; and feeding back the processed data to the target device, so that the target device stores the processed data in a first memory under the condition that the preset instruction comprises an instruction for writing the data to be processed, and sends the processed data to the terminal under the condition that the preset instruction comprises an instruction for reading the data to be processed.

In an exemplary embodiment, in a case of receiving data to be processed, high-level synthesis is performed on kernel function codes to obtain a hardware algorithm, including: and under the condition that communication connection with the target equipment is established and the to-be-processed data and the kernel function code sent by the target equipment are received, performing high-level synthesis on the kernel function code to obtain the hardware algorithm, wherein the kernel function code is compiled and compiled for the target equipment by using a high-level programming language, under the condition that the predetermined instruction comprises an instruction for writing the to-be-processed data, the to-be-processed data is in a plaintext form carried by the predetermined instruction, under the condition that the predetermined instruction comprises an instruction for reading the to-be-processed data, the to-be-processed data is in a ciphertext form read by the target equipment from a first memory, and under the condition that the predetermined instruction is received, calling an OpenCL (Open Computing Language ) execution model to establish the communication connection with the target equipment.

In an exemplary embodiment, the performing the high-level synthesis on the kernel function code to obtain the hardware algorithm includes: operating the kernel function code, calling a high-level comprehensive tool, and compiling the operated kernel function code into a hardware description language; and sequentially carrying out register transmission level synthesis and layout wiring on the hardware description language to obtain the hardware algorithm.

In an exemplary embodiment, after sequentially performing register transfer level synthesis and place and route on the hardware description language to obtain the hardware algorithm, the method includes: detecting errors of the hardware algorithm, and determining whether the hardware algorithm has errors or not; and sending error information to the target equipment under the condition that the hardware algorithm has errors, so that the target equipment debugs the kernel function code according to the error information.

In an exemplary embodiment, the executing the hardware algorithm to perform a predetermined process on the data to be processed to obtain processed data includes: generating a key and a modulus of the data to be processed by adopting the hardware algorithm; generating a plurality of pipeline tasks in sequence according to the preset processing, and initializing count values corresponding to the pipeline tasks, wherein the pipeline tasks comprise a plurality of subtasks for cyclic execution, and the subtasks are obtained by replacing the data to be processed with current operation results after performing modular exponentiation on the data to be processed, the secret key and the modulus; executing the subtasks in the corresponding pipeline tasks once under the condition that the count value is smaller than a first preset multiple of a clock period, and counting the count value once; starting to execute the next pipeline task under the condition that the current execution time of the pipeline task reaches a second preset multiple of the clock cycle, wherein the second preset multiple is smaller than the first preset multiple; and outputting the processed data under the condition that the counted value of each pipeline task is larger than or equal to the first preset multiple of the clock period.

In one exemplary embodiment, feeding back the processed data to the target device includes: and feeding back the processed data and the secret key to the target equipment.

In an exemplary embodiment, the kernel function code is a function code obtained according to an asymmetric encryption algorithm, and the high-level programming language includes at least one of a c++ language and a C language.

In one exemplary embodiment, feeding back the processed data to the target device includes: writing the processed data into a second memory, so that the target device calls a first preset API (Application Program Interface ) to read the processed data from the second memory, and performing high-level synthesis on kernel function codes under the condition that the data to be processed are received, wherein the method comprises the following steps: and under the condition that the second memory receives the data to be processed, reading the data to be processed from the second memory, and carrying out the high-level synthesis on the kernel function code.

According to another embodiment of the present application, there is provided an encryption and decryption method, including: under the condition that a preset instruction sent by a terminal is received, sending data to be processed to an FPGA; receiving processed data fed back by the FPGA, wherein the processed data are obtained by running a hardware algorithm by the FPGA to perform preset processing on the data to be processed, the hardware algorithm is obtained by performing high-level synthesis on kernel function codes by the FPGA, the preset processing comprises encryption processing when the preset instruction comprises an instruction for writing the data to be processed, the kernel function codes comprise encryption function codes, the preset processing comprises decryption processing when the preset instruction comprises an instruction for reading the data to be processed, and the kernel function codes comprise decryption function codes; storing the processed data in a first memory in case the predetermined instruction comprises an instruction to write the data to be processed, and transmitting the processed data to the terminal in case the predetermined instruction comprises an instruction to read the data to be processed.

In an exemplary embodiment, in a case of receiving a predetermined instruction sent by a terminal, sending data to be processed to an FPGA, including: under the condition that the preset instruction is received, an OpenCL execution model is called to establish communication connection with the FPGA; writing and compiling the kernel function codes by using a high-level programming language; when the preset instruction comprises an instruction for writing the data to be processed, sending the data to be processed and the kernel function code in a plaintext form carried by the preset instruction to the FPGA; and under the condition that the preset instruction comprises an instruction for reading the data to be processed, reading the data to be processed in a ciphertext form from the first memory, and sending the data to be processed and the kernel function code to the FPGA.

In an exemplary embodiment, invoking an OpenCL execution model to establish a communication connection with the FPGA includes: initializing the OpenCL execution model to obtain a list of bindable devices; determining the FPGA from the list of bindable devices; and creating an OpenCL context with the FPGA through the OpenCL execution model so as to establish communication connection with the FPGA.

In an exemplary embodiment, when the predetermined instruction is received, invoking an OpenCL execution model to establish a communication connection with the FPGA includes: under the condition that the preset instruction is received, a second API is called to switch the working mode of the operating system from a user mode to a kernel mode; and under the condition that the working mode is the kernel mode, calling the OpenCL execution model, and establishing communication connection with the FPGA.

In an exemplary embodiment, the hardware algorithm is obtained by sequentially performing register transmission level synthesis and layout wiring on a hardware description language by the FPGA, where the hardware description language is obtained by running the kernel function code, calling a high-level synthesis tool, and compiling the running kernel function code.

In an exemplary embodiment, the processed data is output by the FPGA under the condition that a counted count value corresponding to each pipeline task is greater than or equal to a first predetermined multiple of a clock cycle, the plurality of pipeline tasks are sequentially generated by the FPGA according to the predetermined processing, the count value corresponds to the pipeline task, the pipeline task includes a plurality of subtasks for circularly executing, the subtasks are obtained by performing modular exponentiation on the data to be processed, a key of the data to be processed and a modulus, the data to be processed is replaced by a current operation result, the subtasks are executed by the FPGA under the condition that the corresponding count value is smaller than the first predetermined multiple of the clock cycle, the counted count value is obtained by counting once by the FPGA under the condition that the count value is executed once, an execution interval length between two adjacent pipeline tasks is a second predetermined multiple of the clock cycle, and the key and the modulus are generated by the FPGA with the hardware.

In one exemplary embodiment, receiving processed data of the FPGA feedback includes: and receiving the processed data fed back by the FPGA and the secret key.

According to still another embodiment of the present application, there is provided an encryption and decryption apparatus including: the comprehensive unit is used for carrying out high-level synthesis on the kernel function code under the condition of receiving the data to be processed, so as to obtain a hardware algorithm, wherein the data to be processed is sent by the target equipment under the condition of receiving a preset instruction sent by the terminal; an operation unit, configured to execute the hardware algorithm to perform a predetermined process on the data to be processed to obtain processed data, where the predetermined instruction includes an instruction to write the data to be processed, the predetermined process includes an encryption process, the kernel function code includes a code of an encryption function, and where the predetermined instruction includes an instruction to read the data to be processed, the predetermined process includes a decryption process, and the kernel function code includes a code of a decryption function; and the feedback unit is used for feeding back the processed data to the target equipment, so that the target equipment stores the processed data into the first memory when the preset instruction comprises an instruction for writing the data to be processed, and sends the processed data to the terminal when the preset instruction comprises an instruction for reading the data to be processed.

According to still another embodiment of the present application, there is further provided an encryption and decryption apparatus, including: the first sending unit is used for sending the data to be processed to the FPGA under the condition that a preset instruction sent by the terminal is received; the first receiving unit is used for receiving processed data fed back by the FPGA, wherein the processed data are obtained by the FPGA running a hardware algorithm to perform preset processing on the data to be processed, the hardware algorithm is obtained by the FPGA through high-level synthesis on kernel function codes, the preset processing comprises encryption processing when the preset instruction comprises an instruction for writing the data to be processed, the kernel function codes comprise encryption function codes, and the preset processing comprises decryption processing when the preset instruction comprises an instruction for reading the data to be processed, and the kernel function codes comprise decryption function codes; and the storing unit is used for storing the processed data into the first memory when the preset instruction comprises an instruction for writing the data to be processed, and sending the processed data to the terminal when the preset instruction comprises an instruction for reading the data to be processed.

According to another embodiment of the present application, there is also provided a computer readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments when run.

According to still another embodiment of the present application, there is also provided a server including: an FPGA comprising a second memory, a first processor and a first computer program stored on the second memory and executable on the first processor, the first processor implementing the steps of any of the methods when executing the first computer program; a host system comprising a first memory, a second processor and a second computer program stored on the first memory and executable on the second processor, the second processor implementing the steps of any of the methods when executing the second computer program.

According to the method and the device for encrypting and decrypting the kernel function codes, the kernel function codes are subjected to high-level synthesis to obtain the hardware description language of the encrypting and decrypting algorithm, automatic design of the encrypting and decrypting hardware description language is achieved, a developer does not need to write the hardware description language by himself, complexity of encrypting and decrypting design is reduced, design time and workload are reduced, and development efficiency is improved.

Drawings

Fig. 1 is a block diagram of a hardware structure of a mobile terminal that performs an encryption and decryption method according to an embodiment of the present application;

FIG. 2 is a flow chart of an encryption and decryption method according to an embodiment of the present application;

FIG. 3 is a flow diagram of high-level synthesis according to an embodiment of the present application;

FIG. 4 is a flow diagram of an encryption process using a hardware algorithm according to an embodiment of the present application;

FIG. 5 is a flow diagram of decryption processing using a hardware algorithm according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a four-stage pipelined task model according to an embodiment of the present application;

FIG. 7 is a flow chart of another encryption and decryption method according to an embodiment of the present application;

FIG. 8 is a flow diagram of invoking an OpenCL execution model execution operation according to an embodiment of the present application;

FIG. 9 is a key pair computation flow diagram corresponding to kernel function code according to an embodiment of the present application;

FIG. 10 is a block diagram of an encryption and decryption device according to an embodiment of the present application;

FIG. 11 is a block diagram of another encryption and decryption device according to an embodiment of the present application;

FIG. 12 is a schematic diagram of an OpenCL execution model connection relationship according to an embodiment of the present application;

FIG. 13 is a specific workflow diagram of encryption processing by a server according to an embodiment of the present application;

FIG. 14 is a system level diagram of a host system according to an embodiment of the present application;

FIG. 15 is a schematic diagram of an overall framework of a hardware acceleration transparent encryption and decryption system based on an HLS design according to an embodiment of the present application;

fig. 16 is a schematic diagram of an operation mode of encrypting and decrypting by the transparent file system according to an embodiment of the present application.

Wherein the figures include the following reference numerals:

102. a processor; 104. a memory; 106. a transmission device; 108. and an input/output device.

Detailed Description

Embodiments of the present application will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and in the drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

The method embodiments provided in the embodiments of the present application may be performed in a mobile terminal, a computer terminal or similar computing device. Taking the mobile terminal as an example, fig. 1 is a block diagram of a hardware structure of a mobile terminal of an encryption and decryption method according to an embodiment of the present application. As shown in fig. 1, a mobile terminal may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing data, wherein the mobile terminal may also include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting on the structure of the mobile terminal. For example, the mobile terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.

The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to an encryption and decryption method in the embodiment of the present application, and the processor 102 executes the computer program stored in the memory 104, thereby performing various functional applications and data processing, that is, implementing the method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the mobile terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.

In this embodiment, an encryption and decryption method running in an FPGA is provided, and fig. 2 is a flowchart of the encryption and decryption method according to an embodiment of the present application, as shown in fig. 2, where the flowchart includes the following steps:

step S102, under the condition that data to be processed is received, high-level synthesis is carried out on kernel function codes to obtain a hardware algorithm, wherein the data to be processed is sent by target equipment under the condition that a preset instruction sent by a terminal is received;

specifically, the data to be processed may be data stored in the target device or data sent by the terminal. The data to be processed can be plaintext data or ciphertext data. High-Level Synthesis (HLS) refers to a process of converting descriptions of a High-Level programming language (such as C/C++) into hardware circuits, and the High-Level Synthesis can automatically convert descriptions of the High-Level language into equivalent hardware circuits, so that automatic design of hardware is realized.

Step S104, running the hardware algorithm to perform preset processing on the data to be processed to obtain processed data, wherein the preset processing comprises encryption processing when the preset instruction comprises an instruction for writing the data to be processed, the kernel function code comprises code of an encryption function, and the preset processing comprises decryption processing when the preset instruction comprises an instruction for reading the data to be processed, and the kernel function code comprises code of a decryption function;

Specifically, the kernel function code may be an encryption and decryption function code pre-stored in a memory of the FPGA, or may be an encryption and decryption function code sent to the FPGA by the target device.

And step S106, feeding back the processed data to the target equipment, so that the target equipment stores the processed data into a first memory when the preset instruction comprises an instruction for writing the data to be processed, and sends the processed data to the terminal when the preset instruction comprises an instruction for reading the data to be processed.

Specifically, the first memory is a memory of the target device, and may specifically be a hard disk, a DDR (Double Data Rate) memory, or the like.

Firstly, receiving data to be processed sent by target equipment under the condition of receiving a preset instruction, and carrying out high-level synthesis on kernel function codes comprising encryption and decryption functions to obtain a hardware algorithm; then, the hardware algorithm is operated to encrypt and decrypt the data to be processed to obtain the processed data; and finally feeding the obtained processed data back to the target equipment, so that the target equipment stores the processed data into the first memory or sends the processed data to the terminal. Compared with the prior art that hardware encryption and decryption require writing a complex logic circuit by using a hardware description language, the problem of complex design is caused, the method and the device perform high-level synthesis on kernel function codes to obtain the hardware description language of an encryption and decryption algorithm, realize automatic design of the encryption and decryption hardware description language, enable developers to not need to write the hardware description language by themselves, reduce complexity of encryption and decryption design, reduce design time and workload, and improve development efficiency.

The execution subject of the steps may be a hardware acceleration module such as an FPGA, but is not limited thereto.

In addition, the encryption and decryption method of the invention transmits the encryption and decryption calculation task to the FPGA for processing, thereby effectively releasing the calculation resource of the target equipment side and improving the overall performance of the target equipment.

The basic principle of the high-level synthesis is to convert the description of a high-level language into a Data Flow Graph (DFG for short), and then convert the DFG into an equivalent hardware circuit through a series of optimization and conversion. The high-level comprehensive tool can perform various optimizations, such as resource optimization, time sequence optimization, power consumption optimization and the like, according to the requirements of designers so as to realize hardware design with higher performance and lower power consumption. The high-level synthesis can greatly simplify the flow of hardware design, reduce the design period, and abstract the hardware design into a high-level language by a high-level synthesis designer, so that the design is easier to modify and reuse. In addition, the high-level synthesis also provides a group of advanced optimization and debugging tools, and various optimizations can be automatically performed, so that the performance, the power consumption and the reliability of hardware design are improved.

In one exemplary embodiment, step S102: under the condition of receiving data to be processed, high-level synthesis is carried out on kernel function codes, and the specific implementation mode for obtaining the hardware algorithm can be as follows: and under the condition that communication connection with the target equipment is established, the to-be-processed data and the kernel function code sent by the target equipment are received, the high-level synthesis is carried out on the kernel function code to obtain the hardware algorithm, wherein the kernel function code is written and compiled for the target equipment by using a high-level programming language, under the condition that the predetermined instruction comprises an instruction for writing the to-be-processed data, the to-be-processed data is in a plaintext form carried by the predetermined instruction, under the condition that the predetermined instruction comprises an instruction for reading the to-be-processed data, the to-be-processed data is in a ciphertext form read by the target equipment from a first memory, and under the condition that the predetermined instruction is received, the communication connection with the target equipment is established by calling an OpenCL execution model.

According to the embodiment, a transparent encryption and decryption technology is introduced in a high-level comprehensive process, namely when a user writes data into target equipment through a terminal, the data is encrypted and then stored into a first memory in a ciphertext mode, when the user reads the data from the target equipment through the terminal, the ciphertext data corresponding to the data in the first memory is firstly decrypted and then fed back to the user in a plaintext mode, so that once the user leaves a use environment, the data cannot be automatically decrypted and cannot be opened, the effect of protecting the data content is achieved, meanwhile, the original operation habit of the user is not influenced, the user is ensured to use more conveniently, and the user can perform efficient and safe data encryption and decryption operation under the conventional operation.

Specifically, the kernel function code is compiled for the target device by using a high-level programming language, and the compiled code is compiled into binary code, so that the FPGA can conveniently identify the code.

And moreover, the interconnection of a heterogeneous system formed by the FPGA and the target equipment is established through an OpenCL execution model, the OpenCL execution model can provide some API function calls for the target equipment to transmit data to the FPGA, start the FPGA to execute heterogeneous computation, receive the FPGA processing result and other operations, and parallel computation of different computing equipment is realized.

Under the heterogeneous acceleration framework, parallel computation across different computing devices can be achieved by using an OpenCL execution model, which is generally designed into a middleware program, named hls _host, and runs on an application layer of a target device for heterogeneous acceleration scene call.

Specifically, the communication connection with the target device is established for the target device by establishing an OpenCL context of the FPGA through the OpenCL execution model, wherein the FPGA is a device determined by the target device from a bindable device list, and the bindable device list is obtained by initializing the OpenCL execution model for the target device. In this embodiment, by selecting an appropriate computing platform, an available device list is obtained, then selecting an FPGA from the available device list, and creating an OpenCL context related to the FPGA by the target device, parallel computing across different computing devices is further implemented.

The OpenCL context includes information of an execution environment such as a device, a command queue, and the like. The devices included in the available device list may be, but are not limited to, a CPU, a GPU, an FPGA, and the like.

In addition, the target device is operated with an operating system, communication connection with the target device is that the target device calls the OpenCL execution model to establish under the condition that the working mode is a kernel mode, and the kernel mode is a working mode obtained by the target device after the operating system is switched from a user mode under the condition that the target device receives the preset instruction, and the second API is called. Before communication connection with the FPGA is established through an OpenCL execution model, library functions are called to adjust the working mode of an operating system into a kernel mode, program permission of target equipment is opened, and an implementation environment is provided for realizing interconnection of the target equipment and the FPGA and parallel computing.

In order to further reduce the design complexity of hardware encryption and decryption, according to other alternatives of the present application, as shown in fig. 3, the high-level synthesis is performed on the kernel function code to obtain the hardware algorithm, where the method includes: operating the kernel function code, calling a high-level comprehensive tool, and compiling the operated kernel function code into a hardware description language; register transfer level (Register Transfer Level, RTL for short) synthesis and layout wiring are sequentially carried out on the hardware description language, and the hardware algorithm is obtained. The method and the device have the advantages that the kernel function codes are firstly operated, the verification of the algorithm description correctness of the kernel function codes is realized, then the conversion of the high-level programming language into the comprehensive RTL level codes is automatically realized through the HLS tool, then the integration and the convergence of the layout and wiring design are executed, the hardware algorithm is not required to be described through hardware description languages such as VHDL, verilog, system Verilog and the like, and the development threshold is further reduced. Moreover, accurate time sequence assessment and scheduling of kernel function codes can be achieved through the HLS tool.

In yet another embodiment of the present application, after sequentially performing register transfer level synthesis and layout routing on the hardware description language to obtain the hardware algorithm, the method includes: detecting errors of the hardware algorithm, and determining whether the hardware algorithm has errors or not; and sending error information to the target equipment under the condition that the hardware algorithm has errors, so that the target equipment debugs the kernel function code according to the error information. By executing the test of the system level hardware algorithm, the correctness of the hardware algorithm can be further ensured.

The compiling of the hardware description language, the register transfer level synthesis, the placement and routing, and the detection of the miss Cheng Juti may be an iterative process.

Specifically, compiling the running kernel function code into a hardware description language, including: compiling the running code of the kernel function code and the code of a test file (testband) of the kernel function code into a hardware description language. Debugging the hardware algorithm, including: and according to the hardware algorithm corresponding to the code of the test file, debugging the hardware algorithm corresponding to the kernel function code. In the process, the test file can be reused, so that the complexity of verification is reduced, and the complexity of whole encryption and decryption development can be further reduced.

In one exemplary embodiment, as shown in fig. 4 and 5, step S104: running the hardware algorithm to perform predetermined processing on the data to be processed to obtain processed data, wherein the method specifically comprises the following steps:

step S1041: generating a key and a modulus of the data to be processed by adopting the hardware algorithm;

specifically, the encryption and decryption function may be an encryption and decryption function obtained by adopting a symmetric encryption and decryption algorithm, or an encryption and decryption function obtained by adopting an asymmetric encryption and decryption algorithm, and the corresponding secret key may be a single secret key or may include a private key and a public key.

Step S1042: generating a plurality of pipeline tasks in sequence according to the preset processing, and initializing count values corresponding to the pipeline tasks, wherein the pipeline tasks comprise a plurality of subtasks for cyclic execution, and the subtasks are obtained by replacing the data to be processed with current operation results after performing modular exponentiation on the data to be processed, the secret key and the modulus;

specifically, the count value i is typically initialized to 0. One pipeline task is to circularly execute the process of replacing the data to be processed, the secret key and the modulus with the current operation result after performing modular exponentiation operation, for example, when the pipeline task comprises two subtasks, executing the pipeline task is to execute once to replace the data to be processed with the current operation result after performing modular exponentiation operation on the data to be processed, the secret key and the modulus, then executing one modular exponentiation operation on the new data to be processed, the secret key and the modulus, and then replacing the data to be processed with the current operation result.

Step S1043: executing the subtasks in the corresponding pipeline tasks once and counting the count value once under the condition that the count value is smaller than a first preset multiple MAX_ PIPELINES of a clock period;

Specifically, the clock period and the first predetermined multiple are both preset values.

Step S1044: starting to execute the next pipeline task under the condition that the current execution time of the pipeline task reaches a second preset multiple of the clock cycle, wherein the second preset multiple is smaller than the first preset multiple;

specifically, the second predetermined multiple is also a preset value, in an alternative solution, the second predetermined multiple is 1, that is, the execution interval of two adjacent pipeline tasks is one clock cycle, that is, a new pipeline task is started in each clock cycle, so that the maximum parallelism and throughput of the pipeline tasks are realized, and the hardware performance of the FPGA can be improved.

Step S1045: and outputting the processed data under the condition that the counted value of each pipeline task is larger than or equal to the first preset multiple of the clock period.

Specifically, when the count values corresponding to the pipeline tasks are all greater than or equal to a first predetermined multiple of the clock period, the encryption and decryption processing is described as being completed.

In the embodiment, the pipeline technology is added in the hardware algorithm, so that the parallel execution of the pipeline is realized, the time delay can be reduced, and the operation speed is improved.

In an alternative scheme, as shown in fig. 6, in the case that the implementation flow of the hardware algorithm needs four stages of pipeline tasks, if four clock cycles are needed for sequentially executing one pipeline task, 16 clock cycles are needed for sequentially executing four pipeline tasks, if the technical scheme of the four stages of pipeline tasks in the application is adopted, only 7 clock cycles are needed, so that the time delay of accelerating calculation is greatly reduced, and the efficiency and performance of encryption and decryption operations are further improved.

Further, feeding back the processed data to the target device, including: and feeding back the processed data and the secret key to the target equipment.

In a specific embodiment, the kernel function code is a function code obtained according to an asymmetric encryption algorithm, and the high-level programming language includes at least one of a c++ language and a C language. Of course, the kernel function code may be a function code obtained according to a symmetric encryption algorithm, in addition to the asymmetric encryption algorithm. The high-level programming language may be System C language, etc. in addition to the c++ language and the C language.

Optionally, feeding back the processed data to the target device, including: and writing the processed data into a second memory, so that the target device calls a first preset API to read the processed data from the second memory. And under the condition of receiving the data to be processed, carrying out high-level synthesis on the kernel function code, wherein the high-level synthesis comprises the following steps: and under the condition that the second memory receives the data to be processed, reading the data to be processed from the second memory, and carrying out the high-level synthesis on the kernel function code.

And under the condition that the FPGA writes the processed data into the second memory, the FPGA is further used for sending reminding information to the target equipment so as to remind the target equipment to read the processed data from the second memory.

The second memory is a memory of an FPGA, and may specifically be ROM, RAM, FLASH and DDR (Double Data Rate) memory.

In this embodiment, there is also provided an encryption and decryption method running on a target device, and fig. 7 is a flowchart of the encryption and decryption method according to an embodiment of the present application, as shown in fig. 7, where the flowchart includes the following steps:

step S202, under the condition that a preset instruction sent by a terminal is received, sending data to be processed to an FPGA;

specifically, the data to be processed may be data stored in the target device or data sent by the terminal. The data to be processed can be plaintext data or ciphertext data.

Step S204, receiving processed data fed back by the FPGA, wherein the processed data are obtained by running a hardware algorithm by the FPGA to perform preset processing on the data to be processed, the hardware algorithm is obtained by performing high-level synthesis on kernel function codes by the FPGA, the preset processing comprises encryption processing when the preset instruction comprises an instruction for writing the data to be processed, the kernel function codes comprise encryption function codes, and the preset processing comprises decryption processing when the preset instruction comprises an instruction for reading the data to be processed, and the kernel function codes comprise decryption function codes;

Specifically, the high-level synthesis refers to a process of converting the description of the high-level programming language into a hardware circuit, and the high-level synthesis can automatically convert the description of the high-level programming language into an equivalent hardware circuit, so that the automatic design of hardware is realized. The kernel function code can be an encryption and decryption function code pre-stored in a memory of the FPGA, or an encryption and decryption function code sent to the FPGA by the target equipment.

Step S206, storing the processed data in the first memory in case that the predetermined instruction includes an instruction to write the data to be processed, and transmitting the processed data to the terminal in case that the predetermined instruction includes an instruction to read the data to be processed.

Specifically, the first memory is a memory of the target device, and may specifically be a hard disk, a DDR memory, or the like.

Firstly, according to a preset instruction sent by a terminal, sending data to be processed to an FPGA; then, receiving processed data obtained by the FPGA running a hardware algorithm for encrypting and decrypting the data to be processed, wherein the hardware algorithm is obtained by high-level synthesis of kernel function codes by the FPGA; and finally, the processed data is sent to the terminal or stored in the first memory. Compared with the prior art that hardware encryption and decryption require writing a complex logic circuit by using a hardware description language, the problem of complex design is caused.

The execution subject of the steps may be a processor, a server, or the like, but is not limited thereto.

In one exemplary embodiment, step S202: the specific implementation manner of sending the data to be processed to the FPGA under the condition of receiving the predetermined instruction sent by the terminal may include:

step S2021: under the condition that the preset instruction is received, an OpenCL execution model is called to establish communication connection with the FPGA;

step S2022: writing and compiling the kernel function codes by using a high-level programming language;

step S2023: when the preset instruction comprises an instruction for writing the data to be processed, sending the data to be processed and the kernel function code in a plaintext form carried by the preset instruction to the FPGA;

step S2024: and under the condition that the preset instruction comprises an instruction for reading the data to be processed, reading the data to be processed in a ciphertext form from the first memory, and sending the data to be processed and the kernel function code to the FPGA.

Specifically, the kernel function code is written and compiled using a high-level programming language, including: and writing the kernel function by using a high-level programming language, and compiling the written codes into binary codes, so that the FPGA is convenient to identify.

In an actual application process, sending the data to be processed and the kernel function code to the FPGA includes: and sending request information, the data to be processed and the kernel function code to the FPGA, wherein the request information is used for triggering the FPGA to perform the preset processing on the data to be processed.

Specifically, as shown in fig. 8, invoking an OpenCL execution model to establish a communication connection with the FPGA includes: initializing the OpenCL execution model to obtain a list of bindable devices; determining the FPGA from the list of bindable devices; and creating an OpenCL context with the FPGA through the OpenCL execution model so as to establish communication connection with the FPGA. In this embodiment, by selecting an appropriate computing platform, an available device list is obtained, then selecting an FPGA from the available device list, and creating an OpenCL context related to the FPGA by the target device, parallel computing across different computing devices is further implemented.

In addition, the target device runs an operating system, and under the condition that the predetermined instruction is received, an OpenCL execution model is called to establish communication connection with the FPGA, including: under the condition that the preset instruction is received, a second API is called to switch the working mode of the operating system from a user mode to a kernel mode; and under the condition that the working mode is the kernel mode, calling the OpenCL execution model, and establishing communication connection with the FPGA. Before communication connection with the FPGA is established through an OpenCL execution model, library functions are called to adjust the working mode of an operating system into a kernel mode, program permission of target equipment is opened, and an implementation environment is provided for realizing interconnection of the target equipment and the FPGA and parallel computing.

And moreover, the interconnection of a heterogeneous system formed by the FPGA and the target equipment is established through an OpenCL execution model, the OpenCL execution model can provide some API function calls for the target equipment to transmit data to the FPGA, start the FPGA to execute heterogeneous computation, receive the FPGA processing result and other operations, and parallel computation of different computing equipment is realized. Under the heterogeneous acceleration framework, parallel computation across different computing devices can be achieved by using an OpenCL execution model, which is generally designed into a middleware program, named hls _host, and runs on an application layer of a target device for heterogeneous acceleration scene call.

As shown in fig. 8, after the OpenCL execution model is called to establish a communication connection with the FPGA, the method further includes: memory space is allocated; copying the data to be processed from the first storage into the memory space, and then sending the data to be processed in the memory space to a second storage of the FPGA; creating a command queue, adding the kernel function code into the command queue, and setting execution parameters, wherein the execution parameters are parameters for controlling the parallelism degree of execution of the kernel function code, and the execution parameters comprise global workload or local workload; and executing the command queue to send the kernel function code to the FPGA. After storing the processed data in the first memory or after transmitting the processed data to the terminal, the method further comprises: and releasing the memory space, the command queue and the OpenCL context so as to facilitate the execution of the next encryption and decryption processing.

It should be noted that, the process is a general OpenCL execution model processing flow, and may be adjusted according to specific requirements and device characteristics in an actual application process. In particular applications, work in debugging, performance optimization, and parallelization may also be involved to achieve more efficient computational acceleration.

In order to further reduce the design complexity of hardware encryption and decryption, according to other alternatives of the application, the hardware algorithm is obtained by sequentially performing register transmission level synthesis and layout wiring on a hardware description language by the FPGA, wherein the hardware description language is obtained by running the kernel function code, calling a high-level synthesis tool, and compiling the running kernel function code. The FPGA of the application runs the kernel function code firstly, realizes the verification of the algorithm description correctness of the kernel function code, then automatically converts a high-level programming language into a comprehensive RTL-level code through the HLS tool, then executes the synthesis and layout and wiring to realize the convergence of the design, does not need to describe the hardware algorithm through hardware description languages such as VHDL, verilog, system Verilog and the like, and further reduces the development threshold. Moreover, accurate time sequence assessment and scheduling of kernel function codes can be achieved through the HLS tool.

Specifically, the method further comprises: and under the condition that error information is received, debugging the kernel function code according to the error information, wherein the error information is generated and sent out under the condition that the FPGA detects errors of the hardware algorithm and the hardware algorithm is determined. By executing the test of the system level hardware algorithm, the correctness of the hardware algorithm can be further ensured.

The compiling of the hardware description language, the register transfer level synthesis, the placement and routing, and the detection of the miss Cheng Juti performed by the FPGA may be an iterative process.

Specifically, the hardware description language is obtained by compiling the running kernel function codes and the codes of the test files of the kernel function codes by the FPGA. And determining whether the hardware algorithm has errors or not is determined by the FPGA by detecting the errors of the hardware algorithm corresponding to the kernel function code according to the hardware algorithm corresponding to the code of the test file. In the process, the test file can be reused, so that the complexity of verification is reduced, and the complexity of whole encryption and decryption development can be further reduced.

Specifically, the encryption and decryption function may be an encryption and decryption function obtained by adopting a symmetric encryption and decryption algorithm, or an encryption and decryption function obtained by adopting an asymmetric encryption and decryption algorithm, and the corresponding secret key may be a single secret key or may include a private key and a public key. One pipeline task is to circularly execute the process of replacing the data to be processed, the secret key and the modulus with the current operation result after performing modular exponentiation operation, for example, when the pipeline task comprises two subtasks, executing the pipeline task is to execute once to replace the data to be processed with the current operation result after performing modular exponentiation operation on the data to be processed, the secret key and the modulus, then executing one modular exponentiation operation on the new data to be processed, the secret key and the modulus, and then replacing the data to be processed with the current operation result. In the embodiment, the pipeline technology is added in the hardware algorithm, so that the parallel execution of the pipeline is realized, the time delay of the FPGA can be reduced, and the operation speed of the FPGA is improved.

In one exemplary embodiment, receiving processed data of the FPGA feedback includes: and receiving the processed data fed back by the FPGA and the secret key. The clock period, the first predetermined multiple, and the second predetermined multiple are all preset values. In an alternative scheme, the second predetermined multiple is 1, that is, the execution interval of two adjacent pipeline tasks is one clock cycle, that is, a new pipeline task is started in each clock cycle, so that the maximum parallelism and throughput of the pipeline tasks are realized, and the hardware performance of the FPGA can be improved. And under the condition that the count values corresponding to the pipeline tasks are all larger than or equal to the first preset multiple of the clock period, the encryption and decryption processing is finished.

Further, receiving the processed data fed back by the FPGA, including: writing the processed data and the secret key obtained by the FPGA into the memory space; and transmitting the processed data and the secret key from the memory space to the first memory.

The security of the asymmetric encryption algorithm is based on the fact that large prime numbers are difficult to decompose, namely, multiplication of two large prime numbers is easy, but decomposition of the product is difficult, so that the product can be disclosed as an encryption key. In a more specific embodiment, as shown in fig. 9, the specific operation process of the kernel function code is as follows:

The key pairs (e, n) and (d, n) are generated, first, the modulus n is calculated, a random function is used to generate two larger prime numbers p, q, p and q are prime numbers, and the value of the modulus n can be obtained by multiplying p and q. The public key e and the private key d can be obtained through the modulus n, and the encryption and decryption can also use the value of n, wherein n=p×q;

then, find Euler function phi (n), phi (n) = (p-1) x (q-1) of n;

then, the selection of the public key e, e needs to meet the following two conditions: 1<e < phi (n), (e, phi (n)) =1, i.e. e and phi (n) are prime numbers to each other;

finally, the private key d, (d×e) mod Φ (n) =1, i.e.: d=e ^-1 mod phi (n), all of which are derived to obtain P= (e, n), S= (d, n), wherein P is used for encrypting data during encryption, S is used for decryption during decryption, and a specific encryption calculation formula is C=M ^e mod n, decryption calculation formula is m=c ^e mod n, wherein C represents the processed data in the form of ciphertext, M represents the data to be processed in the form of plaintext, and the processed data is obtained by taking the modulus of n after the e power of the data to be processed, namely, obtaining the remainder; decryption is the inverse of encryption.

The values of p and q are chosen to be sufficiently large.

Optionally, the processed data is processed data written into the second memory by the FPGA, and receiving feedback from the FPGA, including: and calling a first preset API to read the processed data from the second memory. Under the condition that data to be processed is received, the hardware algorithm is obtained by reading the data to be processed from the second memory and performing the high-level synthesis on the kernel function code under the condition that the FPGA receives the data to be processed from the second memory.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the described embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method described in the embodiments of the present application.

In this embodiment, an encryption and decryption device is further provided, and this device is used to implement the embodiment and the preferred implementation, which are already described and will not be described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

Fig. 10 is a block diagram of an encryption and decryption apparatus according to an embodiment of the present application, as shown in fig. 10, the apparatus includes:

the integrating unit 10 is configured to perform high-level integration on the kernel function code under the condition that data to be processed is received, so as to obtain a hardware algorithm, where the data to be processed is sent by the target device under the condition that a predetermined instruction sent by the terminal is received;

specifically, the data to be processed may be data stored in the target device or data sent by the terminal. The data to be processed can be plaintext data or ciphertext data. The high-level synthesis refers to a process of converting the description of the high-level programming language into a hardware circuit, and can automatically convert the description of the high-level programming language into an equivalent hardware circuit, so that the automatic design of hardware is realized.

An operation unit 20, configured to execute the hardware algorithm to perform a predetermined process on the data to be processed to obtain processed data, where the predetermined instruction includes an instruction to write the data to be processed, the predetermined process includes an encryption process, the kernel function code includes a code of an encryption function, and where the predetermined instruction includes an instruction to read the data to be processed, the predetermined process includes a decryption process, and the kernel function code includes a code of a decryption function;

And a feedback unit 30, configured to feed back the processed data to the target device, so that the target device stores the processed data in the first memory if the predetermined instruction includes an instruction to write the data to be processed, and sends the processed data to the terminal if the predetermined instruction includes an instruction to read the data to be processed.

According to the embodiment, the to-be-processed data sent by the target equipment under the condition of receiving the preset instruction is received through the integrating unit, and the kernel function codes comprising the encryption and decryption functions are integrated in a high level to obtain a hardware algorithm; the hardware algorithm is operated through an operation unit to encrypt and decrypt the data to be processed to obtain the processed data; and feeding the obtained processed data back to the target equipment through a feedback unit, so that the target equipment stores the processed data into the first memory or sends the processed data to the terminal. Compared with the prior art that hardware encryption and decryption require writing a complex logic circuit by using a hardware description language, the problem of complex design is caused, the method and the device perform high-level synthesis on kernel function codes to obtain the hardware description language of an encryption and decryption algorithm, realize automatic design of the encryption and decryption hardware description language, enable developers to not need to write the hardware description language by themselves, reduce complexity of encryption and decryption design, reduce design time and workload, and improve development efficiency.

The execution body of the device may be a hardware acceleration module such as an FPGA, but is not limited thereto.

In addition, the encryption and decryption device disclosed by the application can be used for processing the encryption and decryption calculation task by the FPGA, so that the calculation resource of the target equipment side is effectively released, and the overall performance of the target equipment can be improved.

In an exemplary embodiment, the integration unit includes: the synthesis module is used for carrying out the high-level synthesis on the kernel function code to obtain the hardware algorithm under the condition that communication connection with the target equipment is established and the to-be-processed data and the kernel function code sent by the target equipment are received, wherein the kernel function code is compiled and compiled for the target equipment by using a high-level programming language, the to-be-processed data is data in a plaintext form carried by the predetermined instruction under the condition that the predetermined instruction comprises an instruction for writing the to-be-processed data, the to-be-processed data is data in a ciphertext form read by the target equipment from a first memory under the condition that the predetermined instruction comprises an instruction for reading the to-be-processed data, and the communication connection with the target equipment is established by calling an OpenCL execution model under the condition that the predetermined instruction is received by the target equipment.

In order to further reduce the design complexity of hardware encryption and decryption, according to other alternatives of the present application, as shown in fig. 3, the synthesis module includes: the operation sub-module is used for operating the kernel function codes, calling a high-level comprehensive tool and compiling the operated kernel function codes into a hardware description language; and the synthesis sub-module is used for sequentially carrying out register transmission level synthesis and layout wiring on the hardware description language to obtain the hardware algorithm. The method and the device have the advantages that the kernel function codes are firstly operated, the verification of the algorithm description correctness of the kernel function codes is realized, then the conversion of the high-level programming language into the comprehensive RTL level codes is automatically realized through the HLS tool, then the integration and the convergence of the layout and wiring design are executed, the hardware algorithm is not required to be described through hardware description languages such as VHDL, verilog, system Verilog and the like, and the development threshold is further reduced. Moreover, accurate time sequence assessment and scheduling of kernel function codes can be achieved through the HLS tool.

In yet another embodiment of the present application, the apparatus includes: the error detecting unit is used for sequentially carrying out register transmission level synthesis and layout wiring on the hardware description language to obtain the hardware algorithm, then carrying out error detection on the hardware algorithm and determining whether the hardware algorithm has errors; and the second sending unit is used for sending error information to the target equipment under the condition that the hardware algorithm has errors, so that the target equipment debugs the kernel function code according to the error information. By executing the test of the system level hardware algorithm, the correctness of the hardware algorithm can be further ensured.

Specifically, the running sub-module is further configured to compile the running kernel function code and the code of the test file of the kernel function code into a hardware description language. The debug unit includes: and the debugging module is used for debugging the hardware algorithm corresponding to the kernel function code according to the hardware algorithm corresponding to the code of the test file. In the process, the test file can be reused, so that the complexity of verification is reduced, and the complexity of whole encryption and decryption development can be further reduced.

In an exemplary embodiment, as shown in fig. 4 and 5, the operation unit specifically includes:

the first generation module is used for generating a key and a modulus of the data to be processed by adopting the hardware algorithm;

The second generating module is used for sequentially generating a plurality of pipeline tasks according to the preset processing and initializing count values corresponding to the pipeline tasks, wherein the pipeline tasks comprise a plurality of subtasks for cyclic execution, and the subtasks are obtained by performing modular exponentiation on the data to be processed, the secret key and the modulus and then replacing the data to be processed with current operation results;

specifically, one pipeline task is to circularly execute a process of performing modular exponentiation on the data to be processed, the secret key and the modulus, and then replacing the data to be processed with a current operation result, for example, in the case that the pipeline task includes two subtasks, executing the pipeline task is to execute one-time modular exponentiation on the data to be processed, the secret key and the modulus, then replacing the data to be processed with the current operation result, then executing one-time modular exponentiation on the new data to be processed, the secret key and the modulus, and then replacing the data to be processed with the current operation result.

The first execution module is used for executing the subtasks in the corresponding pipeline tasks once under the condition that the count value is smaller than a first preset multiple of a clock cycle, and counting the count value once;

The second execution module is used for starting to execute the next pipeline task under the condition that the current execution duration of the pipeline task reaches a second preset multiple of the clock period, and the second preset multiple is smaller than the first preset multiple;

And the output module is used for outputting the processed data under the condition that the counted value corresponding to each pipeline task is greater than or equal to the first preset multiple of the clock period.

Further, the feedback unit includes: and the feedback module is used for feeding the processed data and the secret key back to the target equipment.

Optionally, the feedback unit includes: and the first writing module is used for writing the processed data into a second memory so that the target equipment calls a first preset API to read the processed data from the second memory. The integration unit includes: and the first reading module is used for reading the data to be processed from the second memory and carrying out the high-level synthesis on the kernel function code under the condition that the second memory receives the data to be processed.

The second memory is a memory of an FPGA, and may specifically be ROM, RAM, FLASH, a DDR memory, or the like.

Fig. 11 is a block diagram of an encryption and decryption apparatus according to an embodiment of the present application, as shown in fig. 11, the apparatus includes:

a first transmitting unit 40, configured to transmit data to be processed to the FPGA when receiving a predetermined instruction transmitted by the terminal;

A receiving unit 50, configured to receive processed data fed back by the FPGA, where the processed data is obtained by running a hardware algorithm by the FPGA to perform predetermined processing on the data to be processed, the hardware algorithm is obtained by performing high-level synthesis on kernel function codes by the FPGA, the predetermined processing includes encryption processing when the predetermined instruction includes an instruction to write the data to be processed, the kernel function codes include encryption function codes, and the predetermined processing includes decryption processing when the predetermined instruction includes an instruction to read the data to be processed, and the kernel function codes include decryption function codes;

A storing unit 60, configured to store the processed data in a first memory if the predetermined instruction includes an instruction to write the data to be processed, and send the processed data to the terminal if the predetermined instruction includes an instruction to read the data to be processed.

According to the embodiment, the data to be processed is sent to the FPGA through the first sending unit according to the preset instruction sent by the terminal; receiving processed data obtained by an FPGA (field programmable gate array) running a hardware algorithm to encrypt and decrypt the data to be processed through a receiving unit, wherein the hardware algorithm is obtained by high-level synthesis of kernel function codes by the FPGA; the processed data is sent to the terminal or stored in the first memory by the storing unit. Compared with the prior art that hardware encryption and decryption require writing a complex logic circuit by using a hardware description language, the problem of complex design is caused.

The execution subject of the apparatus may be a processor, a server, or the like, but is not limited thereto.

In an exemplary embodiment, the first transmitting unit may specifically include:

the first calling module is used for calling an OpenCL execution model to establish communication connection with the FPGA under the condition that the preset instruction is received;

the writing module is used for writing and compiling the kernel function codes by using a high-level programming language;

the sending module is used for sending the data to be processed and the kernel function code in a plaintext form carried by the preset instruction to the FPGA under the condition that the preset instruction comprises the instruction for writing the data to be processed;

and the second reading module is used for reading the data to be processed in the ciphertext form from the first memory and sending the data to be processed and the kernel function code to the FPGA under the condition that the predetermined instruction comprises an instruction for reading the data to be processed.

In an actual application process, the second reading module includes: the sending sub-module is used for sending request information, the data to be processed and the kernel function code to the FPGA, and the request information is used for triggering the FPGA to conduct the preset processing on the data to be processed.

Specifically, as shown in fig. 8, the first calling module includes: an initialization sub-module, configured to initialize the OpenCL execution model to obtain a list of bindable devices; a determining submodule, configured to determine the FPGA from the list of bindable devices; and the creation submodule is used for creating an OpenCL context of the FPGA through the OpenCL execution model so as to establish communication connection with the FPGA. In this embodiment, by selecting an appropriate computing platform, an available device list is obtained, then selecting an FPGA from the available device list, and creating an OpenCL context related to the FPGA by the target device, parallel computing across different computing devices is further implemented.

In addition, the target device runs an operating system, and the first calling module comprises: the first calling sub-module is used for calling a second API under the condition that the preset instruction is received so as to switch the working mode of the operating system from a user mode to a kernel mode; and the second calling sub-module is used for calling the OpenCL execution model and establishing communication connection with the FPGA under the condition that the working mode is the kernel mode. Before communication connection with the FPGA is established through an OpenCL execution model, library functions are called to adjust the working mode of an operating system into a kernel mode, program permission of target equipment is opened, and an implementation environment is provided for realizing interconnection of the target equipment and the FPGA and parallel computing.

As shown in fig. 8, the apparatus further includes: the distribution unit is used for distributing the memory space after the OpenCL execution model is called to establish communication connection with the FPGA; the copying unit is used for copying the data to be processed from the first storage into the memory space and then sending the data to be processed in the memory space to the second storage of the FPGA; the creation unit is used for creating a command queue, adding the kernel function code into the command queue, and setting execution parameters, wherein the execution parameters are parameters for controlling the parallelism degree of execution of the kernel function code, and the execution parameters comprise global workload or local workload; and the execution unit is used for executing the command queue so as to send the kernel function code to the FPGA. The apparatus further comprises: and the releasing unit is used for releasing the memory space, the command queue and the OpenCL context so as to facilitate the execution of next encryption and decryption processing.

Specifically, the device further comprises: and the debugging unit is used for debugging the kernel function code according to the error information under the condition of receiving the error information, wherein the error information is generated and sent under the condition that the FPGA detects the hardware algorithm and the hardware algorithm has errors. By executing the test of the system level hardware algorithm, the correctness of the hardware algorithm can be further ensured.

In an exemplary embodiment, the receiving unit includes: and the receiving module is used for receiving the processed data fed back by the FPGA and the secret key. The clock period, the first predetermined multiple, and the second predetermined multiple are all preset values. In an alternative scheme, the second predetermined multiple is 1, that is, the execution interval of two adjacent pipeline tasks is one clock cycle, that is, a new pipeline task is started in each clock cycle, so that the maximum parallelism and throughput of the pipeline tasks are realized, and the hardware performance of the FPGA can be improved. And under the condition that the count values corresponding to the pipeline tasks are all larger than or equal to the first preset multiple of the clock period, the encryption and decryption processing is finished.

Further, the receiving unit includes: the second writing module is used for writing the processed data and the secret key obtained by the FPGA into the memory space; and the transmission module is used for transmitting the processed data and the secret key from the memory space to the first storage.

then, find Euler function phi (n), phi (n) = (p-1) x (q-1) of n;

The values of p and q are chosen to be sufficiently large.

Optionally, the processed data is written into the second memory for the FPGA, and the receiving unit includes: and the second calling module is used for calling a first preset API to read the processed data from the second memory. Under the condition that data to be processed is received, the hardware algorithm is obtained by reading the data to be processed from the second memory and performing the high-level synthesis on the kernel function code under the condition that the FPGA receives the data to be processed from the second memory.

It should be noted that the respective modules may be implemented by software or hardware, and for the latter, may be implemented by, but not limited to: the modules are all located in the same processor; alternatively, the modules may be located in different processors in any combination.

Embodiments of the present application also provide a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments when run.

In one exemplary embodiment, the computer readable storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.

Embodiments of the present application also provide an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments.

In an exemplary embodiment, the electronic device may further include a transmission device connected to the processor and an input/output device connected to the processor.

The application also provides a server, comprising:

an FPGA comprising a second memory, a first processor and a first computer program stored on the second memory and executable on the first processor, the first processor implementing the steps of any of the methods when executing the first computer program;

A host system comprising a first memory, a second processor and a second computer program stored on the first memory and executable on the second processor, the second processor implementing the steps of any of the methods when executing the second computer program.

Specifically, as shown in fig. 12, the server of the present application uses an OpenCL execution model to implement parallel computation across different computing device host systems and FPGAs under a heterogeneous acceleration framework, where the host systems and FPGAs establish PCIE (Peripheral Component Interconnect Express, peripheral equipment interconnection expansion interface) connection through the OpenCL execution model. The OpenCL execution model provides some API function call interfaces which can transmit data to the FPGA, start the FPGA to execute heterogeneous computation, receive processing results at the FPGA side and the like. The OpenCL execution model is typically designed as a middleware program, named hls _host, that runs at the application layer of the host system for heterogeneous acceleration scene calls.

In a specific embodiment, as shown in fig. 13, a specific workflow of the encryption processing using the server is as follows:

step (1): writing a plaintext file into an application layer of a host system;

Step (2): an operating system of the host system calls an API so that the working mode of the operating system is switched from a user mode to a kernel mode;

step (3): the transparent file system of the host system creates an empty public key file and private key file, sends a request message to the middleware program hls _host, starts encryption operation, and copies the plaintext file to hls _host;

step (4): hls _host receives the request information, starts to configure and initialize heterogeneous acceleration environments required by an OpenCL execution model, sends a plaintext file to the DDR of the FPGA, writes kernel function codes into a command queue, and starts the command queue, so that the FPGA starts to encrypt hardware;

step (5): generating a hardware algorithm corresponding to the kernel function code by the FPGA;

step (6): the FPGA generates a key pair and stores the DDR on the FPGA side;

step (7): the FPGA acquires a plaintext file and encrypts the plaintext file by adopting a public key in a key pair to obtain a ciphertext file;

step (8): the FPGA stores the encrypted ciphertext file in the DDR on the FPGA side;

step (9): hls _host calls related APIs to copy the key pair and the ciphertext file in the DDR on the FPGA side to the DDR on the host system side;

step (c): and the host system writes the key pair and the ciphertext file in the DDR into a storage disk at the host system side.

By the process, the file is automatically encrypted without perception of a user. It should be noted that decryption is the inverse of encryption, and will not be described here.

In order to implement transparent encryption and decryption operations on files, a transparent file system is designed and added in an operating system of a host system, when a user starts writing/reading files in a user state of the operating system, the operating system first enters a kernel state, sends encryption and decryption instruction information to a middleware program hls _host of an OpenCL execution model through the transparent file system, configures and initializes an OpenCL heterogeneous acceleration environment after receiving the information, then starts an FPGA to perform encryption and decryption acceleration calculation, reads a result and transmits the result back to the transparent file system in the kernel state of the operating system after the acceleration calculation is completed, and finally writes encrypted ciphertext files into a storage device on the host system side.

The whole hardware acceleration transparent encryption and decryption system based on HLS design is shown in figure 15, wherein the left side is a host system of a server, a Linux operating system is deployed, a transparent file system is embedded in a Linux kernel, a processor CPU of the host system runs a specific software program, and a magnetic disk is used as storage equipment. The right acceleration board card is provided with a PCIE interface and consists of an FPGA and a DDR board card memory, wherein the FPGA is used for realizing specific encryption and decryption algorithm calculation and forming a specific logic circuit, so that encryption and decryption service can be accelerated to be calculated, and the DDR board card memory is used for temporarily storing plaintext files, ciphertext files and key pairs in encryption and decryption service logic. The host system and the accelerator board card use PCIE bus connections for high-speed communication of data.

The transparent file system performs encryption and decryption in an operation mode as shown in fig. 16, and the transparent operation refers to an operation that a user cannot perceive when operating (reading or writing) a file. When a user writes a certain file in a user layer, an Operating System (OS) automatically calls a kernel-mode transparent file system to trigger an FPGA to execute encryption operation, and finally the encrypted file is written into a storage device. When a user reads a certain file, the file in the storage device is encrypted at the moment, the operating system can also automatically call the file system in the kernel to trigger the decryption operation of the FPGA, and finally the user reads the decrypted plaintext file. Therefore, the transparent file system is added in the kernel mode of the operating system, the encryption and decryption functions are realized in the FPGA through the HLS design method, and finally the transparent encryption and decryption operation on the data file is realized. The transparent encryption and decryption has the advantages that: the whole operation of the user when reading and writing the file is the same as the file reading and writing under normal conditions, the user cannot perceive the specific implementation details of file encryption and decryption. The implementation process of encryption and decryption is completed by an operating system and an FPGA accelerator board card, and the whole process is transparent to a user.

The server can be suitable for any industry sensitive to data security, and can improve the business work efficiency of related industries while guaranteeing the data security.

Specific examples in this embodiment may refer to examples described in the embodiments and the exemplary implementation manners, and this embodiment is not described herein.

It will be appreciated by those skilled in the art that the modules or steps of the application described may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may be implemented in program code that is executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principles of the present application should be included in the protection scope of the present application.

Claims

1. An encryption and decryption method, comprising:

under the condition of receiving data to be processed, carrying out high-level synthesis on kernel function codes to obtain a hardware algorithm, wherein the data to be processed is sent by target equipment under the condition of receiving a preset instruction sent by a terminal;

running the hardware algorithm to perform preset processing on the data to be processed to obtain processed data, wherein the preset processing comprises encryption processing when the preset instruction comprises an instruction for writing the data to be processed, the kernel function code comprises code of an encryption function, and the preset processing comprises decryption processing when the preset instruction comprises an instruction for reading the data to be processed, and the kernel function code comprises code of a decryption function;

feeding back the processed data to the target device, so that the target device stores the processed data in a first memory in the case that the predetermined instruction includes an instruction to write the data to be processed, sends the processed data to the terminal in the case that the predetermined instruction includes an instruction to read the data to be processed,

Under the condition of receiving data to be processed, carrying out high-level synthesis on kernel function codes to obtain a hardware algorithm, wherein the hardware algorithm comprises the following steps:

under the condition that communication connection with the target equipment is established and the data to be processed and the kernel function code sent by the target equipment are received, the high-level synthesis is carried out on the kernel function code to obtain the hardware algorithm,

wherein the kernel function code is compiled and compiled for the target device by using a high-level programming language, the data to be processed is data in a plaintext form carried by the predetermined instruction when the predetermined instruction comprises an instruction for writing the data to be processed, the data to be processed is data in a ciphertext form read by the target device from a first memory when the predetermined instruction comprises an instruction for reading the data to be processed,

the communication connection with the target device is established by calling an OpenCL execution model by the target device under the condition that the target device receives the preset instruction,

running the hardware algorithm to perform predetermined processing on the data to be processed to obtain processed data, including:

Generating a key and a modulus of the data to be processed by adopting the hardware algorithm;

generating a plurality of pipeline tasks in sequence according to the preset processing, and initializing count values corresponding to the pipeline tasks, wherein the pipeline tasks comprise a plurality of subtasks for cyclic execution, and the subtasks are obtained by replacing the data to be processed with current operation results after performing modular exponentiation on the data to be processed, the secret key and the modulus;

executing the subtasks in the corresponding pipeline tasks once under the condition that the count value is smaller than a first preset multiple of a clock period, and counting the count value once;

starting to execute the next pipeline task under the condition that the current execution time of the pipeline task reaches a second preset multiple of the clock cycle, wherein the second preset multiple is smaller than the first preset multiple;

outputting the processed data when the counted count value corresponding to each pipeline task is greater than or equal to a first preset multiple of the clock period,

the kernel function code is the code of the encryption function or the code of the decryption function pre-stored in a memory, or the method further comprises: and receiving the code of the encryption function or the code of the decryption function sent by the target equipment to obtain the kernel function code.

2. The method of claim 1, wherein performing the high-level synthesis on the kernel function code results in the hardware algorithm, comprising:

operating the kernel function code, calling a high-level comprehensive tool, and compiling the operated kernel function code into a hardware description language;

and sequentially carrying out register transmission level synthesis and layout wiring on the hardware description language to obtain the hardware algorithm.

3. The method of claim 2, wherein after sequentially register transfer level synthesis and place and route the hardware description language to obtain the hardware algorithm, the method comprises:

detecting errors of the hardware algorithm, and determining whether the hardware algorithm has errors or not;

and sending error information to the target equipment under the condition that the hardware algorithm has errors, so that the target equipment debugs the kernel function code according to the error information.

4. The method of claim 1, wherein feeding back the processed data to the target device comprises:

and feeding back the processed data and the secret key to the target equipment.

5. A method according to any one of claims 1 to 3, wherein the kernel function code is a function code derived from an asymmetric encryption algorithm, and the high-level programming language comprises at least one of c++ language and C language.

6. The method according to any one of claim 1 to 4, wherein,

feeding back the processed data to the target device, comprising: writing the processed data to a second memory, so that the target device calls a first predetermined API to read the processed data from the second memory,

and under the condition of receiving the data to be processed, carrying out high-level synthesis on the kernel function code, wherein the high-level synthesis comprises the following steps: and under the condition that the second memory receives the data to be processed, reading the data to be processed from the second memory, and carrying out the high-level synthesis on the kernel function code.

7. An encryption and decryption method, comprising:

under the condition that a preset instruction sent by a terminal is received, sending data to be processed to an FPGA;

receiving processed data fed back by the FPGA, wherein the processed data are obtained by running a hardware algorithm by the FPGA to perform preset processing on the data to be processed, the hardware algorithm is obtained by performing high-level synthesis on kernel function codes by the FPGA, the preset processing comprises encryption processing when the preset instruction comprises an instruction for writing the data to be processed, the kernel function codes comprise encryption function codes, the preset processing comprises decryption processing when the preset instruction comprises an instruction for reading the data to be processed, and the kernel function codes comprise decryption function codes;

Storing the processed data in a first memory in case the predetermined instruction comprises an instruction to write the data to be processed, transmitting the processed data to the terminal in case the predetermined instruction comprises an instruction to read the data to be processed,

under the condition that a preset instruction sent by a terminal is received, sending data to be processed to the FPGA, wherein the method comprises the following steps:

under the condition that the preset instruction is received, an OpenCL execution model is called to establish communication connection with the FPGA;

writing and compiling the kernel function codes by using a high-level programming language;

when the preset instruction comprises an instruction for writing the data to be processed, sending the data to be processed and the kernel function code in a plaintext form carried by the preset instruction to the FPGA;

in the case where the predetermined instruction includes an instruction to read the data to be processed, the data to be processed in the form of ciphertext is read from the first memory, and the data to be processed and the kernel function code are sent to the FPGA,

the processed data is output by the FPGA under the condition that the counted count value corresponding to each pipeline task is greater than or equal to the first preset multiple of a clock cycle, a plurality of pipeline tasks are generated by the FPGA according to the preset processing in sequence, the count value corresponds to the pipeline task, the pipeline task comprises a plurality of subtasks for circularly executing, the subtasks are obtained by performing modular exponentiation on the data to be processed, a secret key and a modulus of the data to be processed and replacing the data to be processed with a current operation result, the subtasks are executed by the FPGA under the condition that the corresponding count value is smaller than the first preset multiple of the clock cycle, the counted count value is obtained by counting once under the condition that the FPGA executes the subtasks once, the execution interval duration between two adjacent pipeline tasks is the second preset multiple of the clock cycle, the secret key and the modulus are generated by the FPGA by adopting the hardware algorithm,

The kernel function code is the code of the encryption function or the code of the decryption function pre-stored in a memory of the FPGA, or the method further comprises: and sending the kernel function code to the FPGA.

8. The method of claim 7, wherein invoking an OpenCL execution model to establish a communication connection with the FPGA comprises:

initializing the OpenCL execution model to obtain a list of bindable devices;

determining the FPGA from the list of bindable devices;

and creating an OpenCL context with the FPGA through the OpenCL execution model so as to establish communication connection with the FPGA.

9. The method of claim 7, wherein, upon receiving the predetermined instruction, invoking an OpenCL execution model to establish a communication connection with the FPGA comprises:

under the condition that the preset instruction is received, a second API is called to switch the working mode of the operating system from a user mode to a kernel mode;

and under the condition that the working mode is the kernel mode, calling the OpenCL execution model, and establishing communication connection with the FPGA.

10. The method according to claim 7, wherein the hardware algorithm is obtained by sequentially performing register transfer level synthesis and layout wiring on a hardware description language by the FPGA, the hardware description language is obtained by running the kernel function code, and calling a high-level synthesis tool, and compiling the running kernel function code.

11. The method of claim 7, wherein receiving the processed data of the FPGA feedback comprises:

and receiving the processed data fed back by the FPGA and the secret key.

12. The method according to any one of claims 7 to 10, wherein the kernel function code is a function code obtained according to an asymmetric encryption algorithm, and the high-level programming language includes at least one of c++ language and C language.

13. An encryption and decryption apparatus, comprising:

the comprehensive unit is used for carrying out high-level synthesis on the kernel function code under the condition of receiving the data to be processed, so as to obtain a hardware algorithm, wherein the data to be processed is sent by the target equipment under the condition of receiving a preset instruction sent by the terminal;

an operation unit, configured to execute the hardware algorithm to perform a predetermined process on the data to be processed to obtain processed data, where the predetermined instruction includes an instruction to write the data to be processed, the predetermined process includes an encryption process, the kernel function code includes a code of an encryption function, and where the predetermined instruction includes an instruction to read the data to be processed, the predetermined process includes a decryption process, and the kernel function code includes a code of a decryption function;

A feedback unit configured to feed back the processed data to the target device, so that the target device stores the processed data in a first memory in a case where the predetermined instruction includes an instruction to write the data to be processed, sends the processed data to the terminal in a case where the predetermined instruction includes an instruction to read the data to be processed,

the integration unit includes:

a synthesis module, configured to perform the high-level synthesis on the kernel function code to obtain the hardware algorithm when communication connection with the target device is established and the to-be-processed data and the kernel function code sent by the target device are received,

the operation unit includes:

An output module, configured to output the processed data when the counted count value corresponding to each pipeline task is greater than or equal to a first predetermined multiple of the clock period,

the kernel function code is the code of the encryption function or the code of the decryption function pre-stored in a memory, or the device is further used for: and receiving the code of the encryption function or the code of the decryption function sent by the target equipment to obtain the kernel function code.

14. An encryption and decryption apparatus, comprising:

the first sending unit is used for sending the data to be processed to the FPGA under the condition that a preset instruction sent by the terminal is received;

the first receiving unit is used for receiving processed data fed back by the FPGA, wherein the processed data are obtained by the FPGA running a hardware algorithm to perform preset processing on the data to be processed, the hardware algorithm is obtained by the FPGA through high-level synthesis on kernel function codes, the preset processing comprises encryption processing when the preset instruction comprises an instruction for writing the data to be processed, the kernel function codes comprise encryption function codes, and the preset processing comprises decryption processing when the preset instruction comprises an instruction for reading the data to be processed, and the kernel function codes comprise decryption function codes;

A storing unit configured to store the processed data in a first memory in a case where the predetermined instruction includes an instruction to write the data to be processed, send the processed data to the terminal in a case where the predetermined instruction includes an instruction to read the data to be processed,

the first transmitting unit includes:

a second reading module for reading the data to be processed in the form of ciphertext from the first memory and transmitting the data to be processed and the kernel function code to the FPGA in the case that the predetermined instruction includes an instruction to read the data to be processed,

The kernel function code is the code of the encryption function or the code of the decryption function pre-stored in a memory of the FPGA, or the device is further configured to: and sending the kernel function code to the FPGA.

15. A computer readable storage medium, characterized in that a computer program is stored in the computer readable storage medium, wherein the computer program, when being executed by a processor, implements the steps of the method according to any of the claims 1 to 12.

16. A server, comprising:

FPGA comprising a second memory, a first processor and a first computer program stored on the second memory and executable on the first processor, the first processor implementing the steps of the method of any of claims 1 to 6 when the first processor executes the first computer program;

a host system comprising a first memory, a second processor and a second computer program stored on the first memory and executable on the second processor, the second processor implementing the steps of the method as claimed in any one of claims 7 to 12 when the second computer program is executed.