CN118642826A

CN118642826A - API calling method and device

Info

Publication number: CN118642826A
Application number: CN202410804859.1A
Authority: CN
Inventors: 黄波
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-07-09
Filing date: 2020-07-09
Publication date: 2024-09-13
Also published as: CN113918290A

Abstract

The application relates to an API calling method and a device, wherein the method is applied to a heterogeneous system, the heterogeneous system can comprise a plurality of processing engines, the plurality of processing engines comprise a first processing engine and a second processing engine, the method can be executed by a CPU in the heterogeneous system, and the CPU can be one of the first processing engine and the second processing engine or one processing engine except the first processing engine and the second processing engine in the heterogeneous system. The CPU determines a target API to be called; and then, selecting the first processing engine for calling the target API based on API calling information, wherein the API calling information is used for indicating the efficiency of the first processing engine and the second processing engine for calling the target API respectively. The CPU in the heterogeneous system can select a proper processing engine to call the target API according to the efficiency of different processing engines to call the target API, so that the efficient call of the API is realized.

Description

API calling method and device

The present application is a divisional application, the application number of the original application is 202010656379.7, the original application date is 7/9/2020, and the whole content of the original application is incorporated by reference.

Technical Field

The present application relates to the field of communications technologies, and in particular, to an API calling method and apparatus.

Background

With the deep popularization of computers and intelligent devices in different application fields, in order to cope with the data processing demands of different fields, besides a central processing unit (central processing unit, CPU), many different processing engines such as a graphics processor (graphics processing unit, GPU), an Image Processor (IP), a digital signal processor (DIGITAL SIGNAL processor, DSP), a neural network processor (neural network processing unit, NPU), a field programmable gate array (field programmable GATE ARRAY, FPGA) and the like are emerging. For different data processing scenarios, different types of processing engines may have better data processing capabilities in the corresponding scenarios.

In order to enhance the overall data processing capacity of a computing system, the computing system may include a processing engine such as GPU, IP, DSP, NPU in addition to a central processing unit (central processing unit, CPU). Such a processor computing system having two or more types may also be referred to as a heterogeneous system.

In the heterogeneous system, the CPU can be used as a dispatcher to interact data with other processing engines to assist the other processing engines in processing data.

Currently, because of the different types of program instructions used by different processing engines in heterogeneous systems, a compiler needs to send program instructions, such as application program interface (application programming interface, API) functions, etc., that need to be executed by each processing engine (including the CPU) to the CPU in advance. And then, the CPU sends the program instructions required to be executed by other processing engines to the corresponding processing engines.

Before these program instructions are sent to the CPU, the compiler needs to configure in advance which program instructions are executed by which processing engine during the development of the program instructions.

The manner in which the processing engine executing the program instructions is configured is dependent on a developer, and whether the configured processing engine executing the program instructions is suitable or not can not be guaranteed, and whether the processing engine executing the program instructions can execute the program instructions more efficiently than other processing engines.

Disclosure of Invention

The application provides an API calling method and device, which are used for determining a processing engine capable of efficiently calling an API.

In a first aspect, an embodiment of the present application provides an API call method, where the method is applied to a heterogeneous system, where the heterogeneous system may include a plurality of processing engines, where the plurality of processing engines includes a first processing engine and a second processing engine, and the method may be performed by a CPU in the heterogeneous system, where the CPU may be one of the first processing engine and the second processing engine, or may be one processing engine in the heterogeneous system other than the first processing engine and the second processing engine. The CPU firstly determines a target API to be called; and then, selecting the first processing engine for calling the target API based on the API calling information, wherein the API calling information is used for indicating the efficiency of the first processing engine and the second processing engine for calling the target API respectively. The target API may be a heterogeneous API or may be another type of API.

Through the method, the CPU in the heterogeneous system can select a proper processing engine to call the target API according to the efficiency of different processing engines to call the target API, so that the efficient call of the API is realized.

In one possible implementation, the CPU may first determine a target parameter size of the target API when selecting the first processing engine for calling the target API based on the API call information. For example, the CPU may first obtain a signature of the target API, which is used to indicate the target parameter size of the target API. And then, selecting a first processing engine according to the API call information and the target parameter scale of the target API, wherein the API call information indicates the efficiency of the first processing engine and the second processing engine for calling the target API of the candidate parameter scale, and the candidate parameter scale comprises the target parameter scale.

By the method, the CPU in the heterogeneous system can select the proper processing engine to call the target API with the target parameter scale according to the efficiency of different processing engines to call the target API with different candidate parameter scales.

In one possible implementation manner, the embodiment of the present application is not limited to the manner in which the CPU obtains the signature of the target API, for example, the CPU may first obtain the identifier of the target API; and determining the signature of the target API from the preset API signature set according to the identification of the target API, wherein the target API signature comprises the identification of the target API.

By the method, the CPU can conveniently determine the signature of the target API through the pre-configured API signature set.

In one possible implementation manner, after the first processing engine is selected to call the target API, the CPU may acquire, from an API function library of the first processing engine configured in advance, program instructions required by the first processing engine to call the target API, where the API function library of the first processing engine includes program instructions required by the first processing engine to call one or more APIs respectively, and the one or more APIs include the target API; program instructions required by the first processing engine to call the target API are then sent to the first processing engine.

By the method, the pre-configured API function library of the first processing engine comprises the program instructions required by the first processing engine to call one or more APIs respectively, and the CPU can acquire the program instructions required by the first processing engine to call the target APIs more quickly through the API function library of the first processing engine, so that the efficiency of the first processing engine to call the target APIs can be improved.

In one possible implementation, after the first processing engine is selected to call the target API, the CPU may also obtain an intermediate representation of the pre-stored target API; compiling the intermediate representation of the target API into program instructions required by the first processing engine to call the target API; program instructions required by the first processing engine to call the target API are then sent to the first processing engine.

Through the method, the CPU can generate the program instruction required by the first processing engine to call the target API through the rapid compiling of the intermediate representation of the target API, so that the efficiency of the first processing engine to call the target API is improved.

In one possible implementation, the API call information may further indicate a storage address of the intermediate representation of the target API, and the CPU may determine the storage address of the intermediate representation of the target API from the API call information when acquiring the intermediate representation of the target API stored in advance, and then acquire the intermediate representation of the target API according to the storage address of the intermediate representation of the target API.

Through the method, the CPU can conveniently acquire the intermediate representation of the target API through the API call information, further, the time for generating the program instruction required by the first processing engine to call the target API can be shortened, and the efficiency of the first processing engine to call the target API is improved.

In one possible implementation, the API call information is stored in a table format, which is more intuitive, so that the CPU can obtain relevant information from the API call information.

In one possible implementation manner, the API call information may further indicate the first processing engine and the second processing engine to call the cache addresses of the program instructions required by the target API respectively, and after selecting the first processing engine to call the target API, the CPU may obtain the program instructions required by the first processing engine to call the target API according to the cache addresses of the program instructions required by the first processing engine to call the target API in the API call information; thereafter, the program instructions are sent to the first processing engine.

Through the method, the CPU can conveniently acquire the cache address of the program instruction required by the first processing engine to call the target API through the API call information, so that the program instruction required by the first processing engine to call the target API can be acquired more quickly, and the first processing engine can efficiently call the target API.

In one possible implementation, the cache address of the program instructions required by the first processing engine to call the target API includes the cache address of the program instructions required by the first processing engine to call the target API of the candidate parameter size, which includes the target parameter size.

When the CPU obtains the program instruction required by the first processing engine to call the target API according to the cache address of the program instruction required by the first processing engine to call the target API in the API call information, the cache address of the program instruction required by the first processing engine to call the target API of the target parameter scale may be obtained from the API call information according to the target parameter scale of the target API, and then the program instruction required by the first processing engine to call the target API of the target parameter scale may be obtained according to the cache address of the program instruction required by the first processing engine to call the target API of the target parameter scale.

By the method, the API call information can indicate the cache addresses of program instructions required by the first processing engine to call target APIs with different candidate parameter sizes respectively. The CPU is convenient to select program instructions required by the first processing engine to call the target API of the target reference scale.

In a second aspect, embodiments of the present application further provide an API calling device, which may refer to the description of the first aspect and will not be repeated herein. The device has the functionality to implement the actions in the method instance of the first aspect described above. The functions may be realized by hardware, or may be realized by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above. In one possible design, the structure of the device includes a determining unit, a selecting unit, and optionally, an instruction determining unit and a sending unit, where these units may perform the corresponding functions in the method examples of the first aspect, and detailed descriptions in the method examples are specifically referred to and omitted herein.

In a third aspect, embodiments of the present application further provide a computing device, which may refer to the description of the first aspect and will not be repeated herein. The architecture of the computing device includes a processor and a memory, the processor being configured to support the apparatus to perform the corresponding functions of the method of the first aspect described above. The memory is coupled to the processor, which holds the program instructions and data necessary for the computing device. The architecture of the computing device also includes a communication interface for communicating with other devices.

In a fourth aspect, the application also provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of the first aspect described above.

In a fifth aspect, the application also provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect described above.

In a sixth aspect, the present application also provides a computer chip, the chip being connected to a memory, the chip being configured to read and execute a software program stored in the memory, to perform the method of the first aspect.

Drawings

FIG. 1A is a schematic diagram of a system according to the present application;

FIG. 1B is a schematic diagram of a system according to the present application;

FIG. 1C is a schematic diagram of a system according to the present application;

FIG. 1D is a schematic diagram of a system according to the present application;

FIG. 2 is a schematic diagram of a heterogeneous API call method provided by the present application;

FIG. 3 is a schematic diagram of a heterogeneous system according to the present application;

FIG. 4 is a schematic diagram of a heterogeneous system according to the present application;

FIG. 5 is a schematic diagram of a heterogeneous API call device according to the present application;

fig. 6 is a schematic structural diagram of an apparatus according to the present application.

Detailed Description

As shown in fig. 1A, a schematic structure of a system according to an embodiment of the present application is shown, where the system includes a compiler 100 and a heterogeneous system 200. The heterogeneous system 200 includes a plurality of processing engines 210, where in the embodiment of the present application, the processing engines 210 are units capable of performing data processing operations, and the embodiment of the present application is not limited to the specific type and form of the processing engines 210, and any unit capable of performing data processing operations may be used as the processing engines 210.

The plurality of processing engines 210 may include at least one CPU, and the remaining processing engines 210 may be processing engines 210 of a different type than the CPU, e.g., the remaining processing engines 210 may include some or all of the following:

CPU, GPU, IP, DSP, NPU, or FPGA.

One processing engine 210 of the plurality of processing engines 210 may act as a scheduler for data interaction with the remaining processing engines 210 to assist the remaining processing engines 210 in data processing. In the embodiment of the present application, taking the scheduler as an example, for convenience of explanation, the CPU as the scheduler is referred to as a scheduling CPU, or a host CPU (host CPU).

The embodiment of the present application is not limited to the deployment manner of the heterogeneous system 200, for example, the heterogeneous system 200 may be deployed on one computing node in a centralized deployment manner, or may be deployed on a plurality of computing nodes in a distributed deployment manner.

Compiler 100 is capable of compiling a source program that includes a heterogeneous API into program instructions that can run in heterogeneous system 200 (e.g., host CPU) or into an intermediate representation (INTERMEDIATE REPRESENTATION, IR) of the heterogeneous API. Compiler 100 may be deployed independently of heterogeneous system 200 at different computing nodes or may be deployed concurrently with heterogeneous system 200 at the same computing nodes.

Specifically, after obtaining the prototype declaration files of one or more heterogeneous APIs, compiler 100 may identify the prototype declaration files of the one or more heterogeneous APIs and edit a heterogeneous API header file, where the heterogeneous API header file includes a signature of each heterogeneous API. For ease of description, the set of signatures of one or more heterogeneous APIs is referred to as a heterogeneous API signature set.

The prototype declaration file of each heterogeneous API may indicate a parameter size of the heterogeneous API that can describe a size of parameters (e.g., a type, number, etc. of parameters) required in invoking the heterogeneous API.

The prototype declaration file of each heterogeneous API may also indicate related information of the heterogeneous API, such as some information other than the function body of the heterogeneous API, such as the number of parameters required to call the heterogeneous API, the type of parameters, and the name of parameters, etc.

It should be noted that, in the embodiment of the present application, the size of the parameter does not refer to the numerical value of the parameter, but refers to the memory occupied by the parameter or the number of bytes occupied in the processing engine 210.

The embodiments of the present application are not limited to the manner in which the prototype declaration file of the heterogeneous API indicates the parameter size of the heterogeneous API. For example, the identifiers corresponding to the different parameter scales may be preset, the identifiers are different, and the parameter scales of the heterogeneous APIs are different. If the identifier A can correspond to 2-4 parameters and the parameter size is 2-4 bits, the identifier B can correspond to 5-7 parameters and the parameter size is 5-10 bits. Compiler 100 determines the parameter size of the heterogeneous API by identifying an identification corresponding to the parameter size in the prototype declaration file of the heterogeneous API.

For another example, an expression for calculating a parameter scale may be predefined, and parameter information required for calling the heterogeneous API needs to be brought into the expression to obtain the expression for calculating the parameter scale of the heterogeneous API. Compiler 100 determines the parameter size of a heterogeneous API by identifying an expression of the parameter size in the prototype declaration file of that heterogeneous API.

By way of example, the following statements may be added to the prototype declaration file of the heterogeneous API:

#pragma HAPI_PARA_SIZE(hapi_para_size_expr)

# PRAGMA HAPI _PARA_SIZE is an expression that characterizes the parameter SIZE of the declarative heterogeneous APIs herein, hapi _para_size_ expr is the parameter SIZE.

For example, the parameter Size may be expressed as max (a.size (), b.size (), c.size ()), where max (a.size (), b.size (), c.size ()) means that the value of the three parameters A, B, C where Size is the largest is taken as the parameter Size.

The embodiment of the application is not limited to the specific content of the expression of the parameter scale, and the operand which appears in the expression in general can be a positive integer constant, a positive integer parameter in the corresponding heterogeneous API prototype declaration file, a positive integer member variable of the parameter, or a member function with a return value of a positive integer.

Compiler 100 is capable of generating a signature of the heterogeneous API, where in the embodiment of the present application, the signature of the heterogeneous API includes an identification of the heterogeneous API (for uniquely identifying the heterogeneous API), and may further include related contents of a prototype declaration of the heterogeneous API, such as a number of parameters required for calling the heterogeneous API, a type of each parameter, a size of the parameter, a type of a return value of the heterogeneous API (i.e., a result value generated after calling the heterogeneous API), and a number of bytes occupied by the return value of the heterogeneous API in host CPU. The signature of the heterogeneous API may also indicate the parameter size of the heterogeneous API.

The manner in which the parameter size of the heterogeneous API is indicated in the signature of the heterogeneous API is similar to the manner in which the prototype declaration file of the heterogeneous API indicates the parameter size of the heterogeneous API, and reference is specifically made to the foregoing, and will not be described herein. The signature of the heterogeneous API may indicate the parameter size of the heterogeneous API in the same manner as the prototype declaration file of the heterogeneous API, or may be in a different manner, which is not limited by the embodiment of the present application.

The heterogeneous API signature set may be pre-deployed in the heterogeneous system 200, such as pre-loaded in a host CPU. In addition to heterogeneous API sets of signatures, in embodiments of the present application, heterogeneous API function libraries of the various processing engines 210 in the heterogeneous system 200 may also be pre-deployed in the heterogeneous system 200, such as pre-loaded in host CPUs.

The heterogeneous API function library of any processing engine 210 includes the program instructions required by the processing engine 210 to call the heterogeneous API. The types of program instructions used by different processing engines 210 may be different. Program instructions in the heterogeneous API function library of any processing engine 210 are program instructions that convert a source program comprising heterogeneous APIs into direct calls that the processing engine 210 can make. The heterogeneous API function libraries of the different types of processing engines 210 may be different.

The source program comprising the heterogeneous API is the original code of the heterogeneous API, is the most original program instruction when the heterogeneous API is programmed, and cannot be directly called by a processor engine.

If the heterogeneous system 200 does not pre-deploy the heterogeneous API function library of each processing engine 210, the heterogeneous system 200 may have a dynamic compiling function, for example, the host CPU has a dynamic compiling function. The host CPU may compile the intermediate representation of one or more heterogeneous APIs obtained from compiler 100 into program instructions required by different processing engines 210 to call the heterogeneous APIs.

Wherein the intermediate representation of the heterogeneous API is a program instruction that the compiler 100 compiles based on the heterogeneous API source program and that can be recognized by different heterogeneous systems 200 (e.g., host CPUs in heterogeneous systems 200).

In the embodiment of the present application, the code generated by the compiler 100 triggers the host CPU in the heterogeneous system 200 to call the heterogeneous API, for example, a statement in the code generated by the compiler 100 that calls the heterogeneous API is changed to call the host CPU, and the host CPU calls the heterogeneous API, so as to determine a processing engine executing the API in the heterogeneous system. The heterogeneous API may be referred to as a target heterogeneous API.

After determining that the target heterogeneous API needs to be called, the host CPU in the heterogeneous system 200 may select a processing engine 210 (the processing engine 210 may also be referred to as a target processing engine) corresponding to the target heterogeneous API from the plurality of processing engines 210 in the heterogeneous system 200 to call the target heterogeneous API.

In the embodiment of the application, the host CPU in the heterogeneous system 200 can determine the target processing engine for calling the target heterogeneous API by itself, human participation is not needed, and the efficiency of heterogeneous API calling in the heterogeneous system 200 can be improved.

Two possible deployment modes of the system in a practical scenario are listed below.

As shown in FIG. 1B, which is a schematic diagram of another system for which embodiments of the present application may be implemented, compiler 100 may be implemented in a development machine on a computing node deployed in heterogeneous system 200, where heterogeneous system 200 includes a plurality of processing engines 210, such as CPU, GPU, DSP, etc. Wherein the CPU is a host CPU.

Compiler 100 is capable of compiling a source program that includes a heterogeneous API to generate program instructions that can be run on a host CPU, identified herein as "application-executable code," that is deployed in heterogeneous system 200 from which the host CPU in the heterogeneous system can determine the processing engine (i.e., the target processing engine) that invokes the heterogeneous API.

It should be noted that a module for determining an operation of the target processing engine that calls the heterogeneous API may be included in the host CPU, for example, the module may be a heterogeneous runtime (heterogeneous runtime, HRT). The heterogeneous runtime may be a module in the host CPU that is responsible for executing a heterogeneous program (the heterogeneous program is a program executed by one or more processing engines in the heterogeneous system, such as a heterogeneous API, etc.), implementing heterogeneous program scheduling (such as determining a processing engine that executes the heterogeneous program, sending the heterogeneous program to a corresponding processing engine to trigger the heterogeneous program to execute, etc.). The name of the module is not limited herein, and "heterogeneous runtime" is merely an example.

As shown in FIG. 1C, a schematic diagram of another system in which compiler 100 may be run on a development machine, and in which heterogeneous system 200 is deployed on a computing node, a plurality of processing engines 210, such as CPU, GPU, DSP, are included in heterogeneous system 200, is shown in FIG. 1C, which is an example of a system in which embodiments of the present application may be implemented. Wherein the CPU is a host CPU.

The compiler 100 can compile a source program including a heterogeneous API to generate an intermediate representation of the heterogeneous API, deploy (or store) the intermediate representation of the heterogeneous API in the heterogeneous system 200 (e.g., host CPU), and when determining that the heterogeneous API needs to be called, the host CPU in the heterogeneous system 200 may determine a processing engine (i.e., a target processing engine) that calls the heterogeneous API, and then compile the intermediate representation of the heterogeneous API into program instructions required by the target processing engine to call the heterogeneous API.

The system shown in fig. 1B is distinguished from the host CPU in the system shown in fig. 1C in that the host CPU (or the heterogeneous run time in the host CPU) can execute heterogeneous programs and implement heterogeneous program scheduling, and further has a dynamic compiling function, so that an intermediate representation of one heterogeneous API can be compiled into program instructions required by different processing engines to call the heterogeneous API in real time.

As shown in fig. 1D, another system according to an embodiment of the present application is shown, which includes a compiler 100 and a heterogeneous cluster 20. The heterogeneous cluster 20 includes a plurality of heterogeneous systems 200, and the structure of the heterogeneous systems 200 can be specifically referred to as the structure of the heterogeneous systems 200 shown in fig. 1, which is not described herein.

The heterogeneous cluster 20 includes one heterogeneous system 200 or the processing engines 210 in one heterogeneous system 200 can be used as schedulers for performing data interaction with the processing engines 210 in the other heterogeneous systems 200 (and the remaining processing engines 210 in the heterogeneous system 200) to assist in performing data processing with the processing engines 210 in the remaining heterogeneous systems 200 (and the remaining processing engines 210 in the heterogeneous system 200).

The function of the compiler 100 can be seen from the foregoing, and will not be described here again. The heterogeneous system 200 as a scheduler or the operation performed by the processing engine 210 in the heterogeneous system 200 may refer to the operation performed by the host CPU as shown in fig. 1B, and specifically, the foregoing may be referred to, and will not be described herein.

Taking the system shown in fig. 1A to 1C as an example, the heterogeneous API call method provided in the embodiment of the present application is described with reference to fig. 2, and when the processing engine 210 serving as the dispatcher in the heterogeneous system 200 is another processing engine 210, the mode of the embodiment of the present application is also applicable, and the difference is that the execution main body is different. The method comprises the following steps:

Step 201: the host CPU determines the target heterogeneous API that needs to be called.

The manner in which the host CPU determines the target heterogeneous API that needs to be called is not limited in the embodiments of the present application, for example, the compiler 100 may hand the call right of the target heterogeneous API to the host CPU. Specifically, the compiler 100 may change a call statement of the target heterogeneous API, change a caller of the target heterogeneous API to a host CPU (the call statement may be invokeHRT (…)), and the compiler 100 may change the caller of the target heterogeneous API to a module in the host CPU, where the module may be a heterogeneous runtime or other modules.

Optionally, the host CPU may also determine information of the target heterogeneous API, which may be notified to the host CPU by the compiler 100. The information of the target heterogeneous API may include an identification of the target heterogeneous API, and may further include a cache address of parameters required to call the target heterogeneous API and a storage address of a return value of the heterogeneous API.

The buffer address of the parameter required for calling the target heterogeneous API may be used to indicate a buffer pointer pointing to the parameter required for calling the target heterogeneous API, and if the heterogeneous API has no return value, the storage address of the return value of the heterogeneous API may be indicated by a null address.

Step 202: the host CPU selects a target processing engine from the plurality of processing engines 210 to call a target heterogeneous API based on the heterogeneous API call information.

The heterogeneous API call information can indicate the efficiency of the multiple processing engines 210 in the heterogeneous system to call the target heterogeneous API, and various ways of characterizing the efficiency of calling the target heterogeneous API are available.

For example, the heterogeneous API call information may characterize the efficiency of the processing engine 210 for the target heterogeneous API by the call time (which may be relative time or absolute time).

For another example, the heterogeneous API call information may also characterize the efficiency of the processing engine 210 with respect to the target heterogeneous API by the sequence numbers of the plurality of processing engines 210. The sequence number of the processing engine 210 may be determined according to the time when the multiple processing engines 210 call the target heterogeneous API, or may be determined according to a preset calling sequence of the multiple processing engines 210, or may be determined according to the performance of the multiple processing engines 210, which is not limited to the setting manner of the sequence number of the processing engines 210.

As another example, heterogeneous API call information may also characterize the efficiency of processing engine 210 with respect to a target heterogeneous API by call speed.

Any way of characterizing the efficiency of the multiple processing engines 210 to call the target heterogeneous API is applicable to the embodiments of the present application, and the present application is not limited to the way of characterizing the efficiency of the processing engines to call the target heterogeneous API.

It should be noted that, in the embodiment of the present application, the heterogeneous API call information is taken as an example to indicate the efficiency of all the processing engines 210 in the heterogeneous system to call the target heterogeneous API. In some application scenarios, the heterogeneous API call information may also indicate only the efficiency with which a portion of the processing engines 210 in the plurality of processing engines 210 in the heterogeneous system call the target heterogeneous API.

In the embodiment of the present application, the efficiency of the heterogeneous API to be targeted by the heterogeneous API call information through the call time characterization processing engine 210 is described as an example. Other ways of characterizing efficiency may be referred to herein, except that in embodiments of the present application, the efficiency of the processing engine 210 to the target heterogeneous API is characterized by using a time value in the heterogeneous API call information, where the time value may be replaced by other parameters that characterize efficiency when other ways of characterizing efficiency are employed.

The heterogeneous API call information includes a time required for each processing engine 210 of the plurality of processing engines 210 to call one or more heterogeneous APIs, respectively. The time required for processing engine 210 to call a heterogeneous API may also be understood as the time that processing engine 210 executes the heterogeneous API. The one or more heterogeneous APIs include a target heterogeneous API.

The host CPU may select, from the plurality of processing engines 210, the processing engine 210 that requires the shortest time to call the target heterogeneous API, and the currently available processing engine 210 as the target processing engine, based on the heterogeneous API call information. The host CPU may also determine, based on the heterogeneous API call information, from the plurality of processing engines 210, one or more processing engines 210 of the processing engines 210 that require less than a time threshold to call the target heterogeneous API and that are currently available, and determine the target processing engine from the one or more processing engines 210. Currently available processing engines 210 refer to processing engines 210 that are currently in an idle state and capable of executing the target heterogeneous API.

The heterogeneous API call information may be preconfigured in the heterogeneous system 200, and the time required for each processing engine 210 included in the heterogeneous API call information to call one or more heterogeneous APIs may be determined according to an empirical value, may be determined by means of test statistics, or the like. Specific information included in the heterogeneous API call information is not limited in the embodiment of the present application, and any information that can indicate a time required for each processing engine 210 of the plurality of processing engines 210 to call one or more heterogeneous APIs may be used as the heterogeneous API call information.

The specific information included in the heterogeneous API call information is different, and the host CPU selects the target processing engine in different manners, which are listed below:

(one), the heterogeneous API call information includes an identification of the heterogeneous API and a time required for each of the plurality of processing engines 210 to call one or more of the heterogeneous APIs, respectively.

The information included in the heterogeneous API call information can be found in table 1:

TABLE 1

As can be seen from table 1, the heterogeneous API call information includes time required for each processing engine 210 to call M heterogeneous APIs, and, for example, the processing engine 1 includes time required for the processing engine 1 to call M heterogeneous APIs. The time required is T11, T12, … …, T1M, respectively.

The type of heterogeneous API may be different, and the time required for the processing engine 210 to call the heterogeneous API may be different, for example, for a heterogeneous API for implementing convolution operation, the time for the GPU to call the heterogeneous API is shorter, and the time for the CPU to call the heterogeneous API is longer. For a heterogeneous API for implementing logical operations, the GPU has a longer time to call the heterogeneous API and the CPU has a shorter time to call the heterogeneous API.

As can be seen from table 1, the heterogeneous API call information may also include the parameter size of each heterogeneous API. In this manner, the parameter scale of each heterogeneous API may be merely referred to, and not be used as a basis for selecting the target processing engine, for example, the parameter scale of each heterogeneous API in table 1 may be set to the same value, that is, the time difference that the processing engine 210 calls the same heterogeneous API under different parameter scales is not considered.

Based on the heterogeneous API call information shown in table 1, the host CPU may determine, according to the identifier of the target heterogeneous API, time information corresponding to the target heterogeneous API in the heterogeneous API call information, e.g., may locate a row in table 1 where the identifier of the target heterogeneous API is located.

After determining the time information corresponding to the target heterogeneous API in the heterogeneous API call information, the host CPU selects a target processing engine according to the time information, for example, the processing engine 210 with the shortest time required for calling the target heterogeneous API may be selected as the target processing engine. For example, if the target heterogeneous API is used to implement a convolution operation, the host CPU may select the GPU that requires the shorter time to call the target heterogeneous API as the target processing engine. If the target heterogeneous API is used to implement a logical operation, the host CPU may select a CPU with a shorter time required to call the target heterogeneous API as the target processing engine.

The host CPU may also select the target processing engine based on this time information based on other selection policies (e.g., load balancing policies that are to ensure that the number or time of heterogeneous API call operations performed by the respective processing engines 210 are consistent).

(II), the heterogeneous API call information includes an identification of the heterogeneous API, and a time required for each of the plurality of processing engines 210 to call one or more of the heterogeneous APIs, respectively, the time required for each of the processing engines 210 to call any of the heterogeneous APIs includes a time required for each of the processing engines 210 to call the one or more of the parameter scales (which may also be understood as candidate parameter scales), which may also be understood as candidate parameter scales, which may be one or more, which may include a target parameter scale of the target heterogeneous API, without limiting the number of parameter scales.

The information included in the heterogeneous API call information can be found in table 2:

TABLE 2

As can be seen from table 2, the heterogeneous API call information includes a time required for each processing engine 210 to call M heterogeneous APIs, and, for example, the heterogeneous API call information includes a time required for the processing engine 1 to call M heterogeneous APIs, and the time required for the processing engine 1 to call any heterogeneous API includes a time required for each of one or more parameter scales to call the heterogeneous API. The time required was T11 ₁、T12₂、……、T1M_S, respectively.

As can be seen from Table 2, the heterogeneous API call information is annotated with one or more parameter sizes for each heterogeneous API. In this manner, the parameter size of each heterogeneous API may be used as a basis for selecting a target processing engine, and the time shown in Table 2 is determined by the time difference between processing engine 210 calls to the same heterogeneous API in combination with different parameter sizes.

In table 2, the parameter size corresponding to one heterogeneous API may be all possible values or some possible values of the parameter size of the heterogeneous API when the heterogeneous API is actually called, for example, there may be S values of the parameter size corresponding to the heterogeneous API in table 2. The time at which the processing engine 2101 called the heterogeneous API at each value is noted in table 2. In table 2, taking all possible values of the parameter size of each heterogeneous API as S examples, in fact, the number of possible values of the parameter size of each heterogeneous API may also be different.

Based on the heterogeneous API call information shown in table 2, the host CPU may obtain the signature of the target heterogeneous API according to the identifier of the target heterogeneous API, determine the target parameter size of the target heterogeneous API from the signature of the target heterogeneous API, and determine the time information corresponding to the target heterogeneous API in the heterogeneous API call information according to the identifier of the target heterogeneous API and the target parameter size, for example, may locate the row where the identifier of the target heterogeneous API and the target parameter size are located in table 2. After determining the time information corresponding to the target heterogeneous API in the heterogeneous API call information, the host CPU selects the target processing engine according to the time information according to the related description in the mode (1), which is not described herein.

When the host CPU obtains the signature of the target heterogeneous API, the signature of the target heterogeneous API may be determined from a preconfigured heterogeneous API signature set according to the identifier of the target heterogeneous API.

In table 2, the parameter scale corresponding to a heterogeneous API may be a value range of the parameter scale corresponding to each class S by classifying all possible values of the parameter scale when the heterogeneous API is actually called. The possible values of the parameter sizes corresponding to the heterogeneous APIs as in table 2 may be classified into class S. The time at which the processing engine 2101 called the heterogeneous API for each parameter size is noted in table 2. In table 2, taking as an example that all possible values of the parameter size of each heterogeneous API may be classified into class S, in fact, the possible values of the parameter size of each heterogeneous API may be classified into different numbers of classes according to specific scenarios.

Based on the heterogeneous API call information shown in table 2, the host CPU may obtain the signature of the target heterogeneous API according to the identifier of the target heterogeneous API, determine the target parameter size of the target heterogeneous API from the signature of the target heterogeneous API, and determine the time information corresponding to the target heterogeneous API in the heterogeneous API call information according to the identifier of the target heterogeneous API and the target parameter size, for example, may locate the row in which the identifier of the target heterogeneous API and the class to which the target parameter size belongs in table 2 belong. After determining the time information corresponding to the target heterogeneous API in the heterogeneous API call information, the host CPU selects the target processing engine according to the time information according to the related description in the mode (1), which is not described herein.

The method for the host CPU to obtain the signature of the target heterogeneous API may be referred to in the foregoing description and will not be described herein.

(III) the heterogeneous API call information includes a time required for each of the plurality of processing engines 210 to call the heterogeneous API, the time required for each of the processing engines 210 to call the heterogeneous API being a time value.

The information included in the heterogeneous API call information can be found in table 3:

TABLE 3 Table 3

As can be seen from table 3, the heterogeneous API call information includes the time required for each processing engine 210 to call a heterogeneous API, and, for example, the time required for processing engine 1 to call any heterogeneous API is T1. T1 may be an average or empirical value.

As can be seen from table 3, the heterogeneous API call information may not include the parameter size of each heterogeneous API and the identity of the heterogeneous API. In this manner, the parameter size of each heterogeneous API and the identity of the heterogeneous API are not used as a basis for selecting the target processing engine, i.e., the respective times shown in Table 3 do not take into account the time difference in invoking the heterogeneous APIs by the processing engine 210 at different heterogeneous APIs and different parameter sizes.

Based on the heterogeneous API call information shown in table 3, the host CPU may select, as the target processing engine, the processing engine 210 that requires the shortest time to call the heterogeneous API based on the heterogeneous API call information, and may also select the target processing engine based on other selection policies and heterogeneous API call information (e.g., a load balancing policy, which is a policy that ensures that the number of times or time of heterogeneous API call operations performed by each processing engine 210 is consistent).

It should be noted that, the time required for each processing engine 210 included in the heterogeneous API call information to call one or more heterogeneous APIs may be absolute time or may be relative time, for example, the absolute time required for a certain processing engine 210 to call one heterogeneous API is taken as a reference, and the relative time required for other processing engines 210 to call one or more heterogeneous APIs is determined. The time required for each processing engine 210 to call one or more heterogeneous APIs is essentially used to characterize the efficiency of the processing engine 210 to call heterogeneous APIs, and any time value that characterizes the efficiency of the processing engine 210 to call heterogeneous APIs can be used as the time required for the processing engine 210 to call heterogeneous APIs, thereby constructing heterogeneous API call information.

After selecting the target processing engine, the host CPU may execute step 203.

Optionally, after the host CPU selects the target processing engine, the host CPU may mark the state of the target processing engine as unavailable, so as to avoid the subsequent continuation of the selection of the target processing engine to call other heterogeneous APIs, until the target processing engine calls the target heterogeneous APIs, and after that, the host CPU marks the state of the target processing engine as available.

Step 203: the host CPU triggers the target processing engine to call the target heterogeneous API.

The host CPU may perform processing operations required by the target processing engine to call the target heterogeneous API when executing step 203, and the host CPU may send program instructions required by the target processing engine to call the target heterogeneous API to the target processing engine. The target processing engine may also be notified of the cache address of the parameter required to call the heterogeneous API and the storage address of the return value of the heterogeneous API, so that the target processing engine may obtain the parameter required to call the heterogeneous API from the cache address and store the return value of the target heterogeneous API in the corresponding storage address.

Before the host CPU can send the program instruction required by the target processing engine to call the target heterogeneous API to the target processing engine, the host CPU needs to determine the program instruction required by the target processing engine to call the target heterogeneous API first.

Mode one, heterogeneous API function libraries of each processing engine 210 are preconfigured in heterogeneous system 200.

As shown in fig. 3, a signature set of heterogeneous APIs and heterogeneous API function libraries of the respective processing engines 210 are preconfigured in the heterogeneous system 200.

The host CPU may select program instructions required by the target processing engine to call the target heterogeneous API from the heterogeneous API function library of the target processing engine.

The heterogeneous API function library of the processing engine 210 includes an identification of a heterogeneous API and a program instruction required for the processing engine 210 to call a target heterogeneous API, and the host CPU selects a program instruction required for the target processing engine to call the target heterogeneous API from the heterogeneous API function library of the target processing engine according to the identification of the target heterogeneous API.

In the second mode, the heterogeneous API function library of each processing engine 210 is not configured in the heterogeneous system 200, and the host CPU stores in advance an intermediate representation of the target heterogeneous API.

As shown in fig. 4, compiler 100 may compile a source program including a target heterogeneous API in advance to generate an intermediate representation of the target heterogeneous API, and host CPUs in heterogeneous system 200 may pre-store the intermediate representation of the target heterogeneous API, and embodiments of the present application are not limited to the manner in which host CPUs in heterogeneous system 200 may pre-configure the intermediate representation of the target heterogeneous API, e.g., the intermediate representation of one or more heterogeneous APIs including the intermediate representation of the target heterogeneous API, are pre-stored in host CPUs in heterogeneous system 200. For another example, a host CPU in heterogeneous system 200 configures an intermediate representation of one or more heterogeneous APIs at the user's trigger, wherein the intermediate representation of the one or more heterogeneous APIs includes an intermediate representation of the target heterogeneous API.

When determining that the target heterogeneous API needs to be called, the host CPU can compile the intermediate representation of the target heterogeneous API into program instructions required by the target processing engine to call the target heterogeneous API and send the program instructions to the target processing engine.

In this manner, the host CPU (may also be understood as a heterogeneous runtime) has a dynamic compiling function, and in the embodiment of the present application, the storage address of the intermediate representation of the target heterogeneous API may be further indicated in the heterogeneous API call information, taking the first characterization manner of the heterogeneous API call information in the foregoing description as an example.

Referring to Table 4, the heterogeneous API call information also includes storage addresses for intermediate representations of one or more heterogeneous APIs.

TABLE 4 Table 4

As can be seen from table 4, the heterogeneous API call information includes the time required for each processing engine 210 to call M heterogeneous APIs, respectively, and the storage address of the intermediate representation of the M heterogeneous APIs. The storage addresses represented in the middle of the M heterogeneous APIs are storage address 1, storage addresses 2, … …, and storage address M, respectively.

After determining that the target processing engine of the target heterogeneous API needs to be called, the host CPU can acquire the storage address of the intermediate representation of the target heterogeneous API from the heterogeneous API calling information, and then acquire the intermediate representation of the target heterogeneous API according to the storage address of the intermediate representation of the target heterogeneous API.

It should be noted that, the manner of indicating the storage address of the intermediate representation of the heterogeneous API in the heterogeneous API call information is merely an example, and the embodiment of the present application is not limited to the manner of indicating the storage address of the intermediate representation of the heterogeneous API in the heterogeneous API call information.

In general, program instructions of the heterogeneous API that call different parameter sizes by the same processing engine may also be different for the same heterogeneous API. This is because the host CPU, when compiling the intermediate representation of the heterogeneous API, will adjust according to the parameter size, and make different memory layouts and loop optimizations for the program instructions, resulting in the difference between the same processing engine calling the program instructions of the heterogeneous API with different parameter sizes.

In the embodiment of the application, after compiling the intermediate representation of the target heterogeneous API into the program instruction required by the target processing engine to call the target heterogeneous API, the host CPU may also cache the program instruction required by the target processing engine to call the target heterogeneous API. So that when the next host CPU determines again that the target processing engine needs to call the target heterogeneous API, the host CPU can directly obtain the cached program instruction required by the target processing engine to call the target heterogeneous API.

The host CPU may store the program instructions required by the compiled processing engine to call the heterogeneous API after compiling the intermediate representation of the heterogeneous API into the program instructions required by the processing engine to call the heterogeneous API each time, and optionally, the host CPU may further identify the identity and the parameter size of the heterogeneous API when the compiled processing engine calls the program instructions required by the heterogeneous API.

Alternatively, the host CPU may store the program instructions required for the different processing engines 210 compiled to call the heterogeneous API into the heterogeneous API call information. That is, the heterogeneous API call information may indicate a cache address of a program instruction required for the plurality of processing engines to call the heterogeneous API including the target heterogeneous API. For different parameter scales. The cache address of the program instructions required by each processing engine to call a heterogeneous API includes the cache address of the program instructions required by the processing engine to call a heterogeneous API of a different parameter size.

Taking the second characterization manner of the heterogeneous API call information in the foregoing description as an example, see table 5, the heterogeneous API call information further includes cache addresses of program instructions required by the processing engines to call the heterogeneous API.

TABLE 5

As can be seen from table 5, the heterogeneous API call information includes a time required for each processing engine 210 to call M heterogeneous APIs and a cache address of a program instruction required for each processing engine to call M heterogeneous APIs, and, taking the processing engine 1 as an example, the heterogeneous API call information includes a time required for the processing engine 1 to call M heterogeneous APIs and a cache address of a program instruction required for the processing engine to call M heterogeneous APIs, and the time required for the processing engine 1 to call any heterogeneous API includes a time required for calling the heterogeneous APIs with different parameter scales. The time required was T11 ₁、T12₂、……、T1M_S, respectively. The cache addresses of program instructions required by the processing engine 1 to call a heterogeneous API include the cache addresses of program instructions required by the processing engine 1 to call a heterogeneous API of different parameter sizes. The cache addresses are respectively cache address 11, cache addresses 12, … … and cache address 13.

In the above description, the description has been given taking, as an example, the cache address of the program instruction required for each processing engine to call each heterogeneous API included in the heterogeneous API call information. In practical application, the heterogeneous API call information may include only the cache address of the program instruction required by the part of the processing engine to call one or more heterogeneous APIs, that is, the heterogeneous API call information may include only the cache address of the program instruction required by the part of the processing engine to call a plurality of heterogeneous APIs, and may include the cache address of the program instruction required by the part of the processing engine to call a part of the heterogeneous APIs. In addition, information in the heterogeneous API call information (e.g., time required by each processing engine 210 to call a different heterogeneous API, cache addresses of program instructions required by each processing engine 210 to call a different heterogeneous API, etc.) may be updated in real-time.

When the heterogeneous API call information includes the cache address of the program instruction required by each processing engine to call a plurality of heterogeneous APIs, after determining that the target processing engine of the target heterogeneous API needs to be called, the host CPU determines whether the cache address of the program instruction required by the target processing engine to call the target heterogeneous API exists in the heterogeneous API call information, that is, determines whether the cache address of the program instruction required by the target processing engine to call the target heterogeneous API of a target parameter scale exists. If the target heterogeneous API is not present, the intermediate representation of the target heterogeneous API is compiled into the program instruction required by the target processing engine to call the target heterogeneous API, the program instruction required by the target processing engine to call the target heterogeneous API can be cached, and the cache address of the program instruction required by the target processing engine to call the target heterogeneous API is stored in heterogeneous API call information.

Based on the same inventive concept as the method embodiment, the embodiment of the present application further provides an API calling device, which is configured to execute the method executed by the host CPU in the method embodiment, and related features may be referred to the method embodiment, which is not described herein, and as shown in fig. 5, the heterogeneous API calling device 500 includes a determining unit 501, a selecting unit 502, and optionally, an instruction determining unit 503 and a sending unit 504;

A determining unit 501, configured to determine a target API that needs to be called.

And the selecting unit 502 is configured to select the first processing engine to call the target API based on API call information, where the API call information is used to indicate the efficiency of the first processing engine and the second processing engine to call the target heterogeneous API respectively.

The above-described API calling device 500 may be used to perform the method performed by the host CPU shown in fig. 2, where the determining unit 501 may perform step 201 in the embodiment shown in fig. 2; the selection unit 502 may perform step 202 in the embodiment shown in fig. 2; the instruction determining unit 503 may perform the method of determining the program instructions required for the target processing engine to call the target heterogeneous API in step 203 in the embodiment shown in fig. 2, and the transmitting unit 504 may perform the method of transmitting the program instructions required for the target processing engine to call the target heterogeneous API to the target processing engine in step 203 in the embodiment shown in fig. 2.

It should be noted that, in the embodiment of the present application, the division of the units is schematic, which is merely a logic function division, and other division manners may be implemented in actual practice. The functional units in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In a simple embodiment, those skilled in the art will appreciate that the API calling device 500 in the above embodiment may take the form shown in fig. 6.

The computing device 600 as shown in fig. 6 includes at least one processor 610, a memory 620. Optionally, a communication interface 630 may also be included.

Memory 620 may be a volatile memory, such as a random access memory; the memory may also be a non-volatile memory such as, but not limited to, read-only memory, flash memory, hard disk (HARD DISK DRIVE, HDD) or Solid State Disk (SSD), or memory 620 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Memory 620 may be a combination of the above.

The specific connection medium between the processor 610 and the memory 620 is not limited in the embodiment of the present application. Processor 610 may be a processing engine, such as a CPU, in heterogeneous system 200.

In the computing device of fig. 6, a communication interface 630 is also included, and the processor 610 may communicate data via the communication interface 630 when communicating with other devices, such as the compiler 100 or other processing engine 210.

When the API calling device 500 takes the form shown in fig. 6, the processor 610 in fig. 6 may make it possible for the apparatus 600 to execute the method executed by the host CPU in any of the above-described method embodiments by calling computer-executable instructions stored in the memory 620; such as the device 600, may perform the method performed by the host CPU in steps 201-203 of the method embodiment shown in fig. 2.

Specifically, the functions/implementation procedures of the determining unit 501, the selecting unit 502, the instruction determining unit 503, and the transmitting unit 504 in fig. 5 may be implemented by the processor 610 in fig. 6 calling the computer-executable instructions stored in the memory 620. Or the functions/implementation procedures of the determining unit 501, the selecting unit 502, and the instruction determining unit 503 in fig. 5 may be implemented by the processor 610 in fig. 6 calling computer-executable instructions stored in the memory 620, and the functions/implementation procedures of the transmitting unit 504 in fig. 6 may be implemented by the communication interface 630 in fig. 6.

When the API-calling device 500 takes the form shown in FIG. 6, the processor 610 in FIG. 6 may cause the computing device 600 to perform the method performed by the host CPU in any of the method embodiments described above by calling computer-executable instructions stored in the memory 620; such as the computing device 600 may perform the methods performed in steps 201-203 of the method embodiment shown in fig. 2.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made to the embodiments of the present application without departing from the scope of the embodiments of the application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims and the equivalents thereof, the present application is also intended to include such modifications and variations.

Claims

1. A method of heterogeneous application program interface API calls, the method being applied to a heterogeneous system, the heterogeneous system comprising a first processing engine and a second processing engine, the method comprising:

And selecting the first processing engine to call a target API based on API call information, wherein the API call information is used for indicating the efficiency of the first processing engine and the second processing engine to call the target API respectively.

2. The method of claim 1, wherein selecting the first processing engine for calling a target API based on API call information comprises:

acquiring a signature of the target API, wherein the signature of the target API is used for indicating a target parameter scale of the target API;

And selecting the first processing engine according to the API call information and the target parameter scale of the target API, wherein the API call information indicates the efficiency of the first processing engine and the second processing engine for calling the target API of candidate parameter scales, and the candidate parameter scales comprise the target parameter scale.

3. The method of claim 2, wherein the obtaining the signature of the target API comprises:

Determining an identity of the target API;

Determining the signature of the target API from a preset API signature set according to the identification of the target API.

4. A method according to any one of claims 1-3, wherein the method further comprises:

Acquiring program instructions required by the first processing engine to call the target API from a preconfigured API function library of the first processing engine, wherein the API function library of the first processing engine comprises program instructions required by the first processing engine to call one or more APIs, and the one or more APIs comprise the target API;

Program instructions required by the first processing engine to call the target API are sent to the first processing engine.

5. A method according to any one of claims 1-3, wherein the method further comprises:

Acquiring a pre-stored intermediate representation of the target API;

Compiling the intermediate representation of the target API into program instructions required by the first processing engine to call the target API;

6. The method of claim 5, wherein the API call information is further for indicating a storage address of an intermediate representation of the target API, the retrieving the pre-stored intermediate representation of the target API comprising:

and acquiring the intermediate representation of the target API according to the storage address of the intermediate representation of the target API.

7. A method according to any one of claims 1 to 6, wherein the API call information is stored in tabular form.

8. A method as claimed in any one of claims 1 to 3, wherein the API call information is further for indicating a cache address of program instructions required by the first processing engine and the second processing engine to call the target API, respectively, the method further comprising:

Acquiring a program instruction required by the first processing engine for calling the target API according to a cache address of the program instruction required by the first processing engine for calling the target API in the API call information;

And sending the program instructions to the first processing engine.

9. The method of claim 8, wherein the obtaining the program instructions required by the first processing engine to call the target API according to the cache address of the program instructions required by the first processing engine to call the target API in the API call information comprises:

And acquiring the program instruction required by the first processing engine for calling the target API of the target parameter scale according to the cache address of the program instruction required by the first processing engine for calling the target API and the target parameter scale of the target API.

10. An API-calling device for selecting a processing engine that calls a target API from a heterogeneous system, the heterogeneous system including a first processing engine and a second processing engine, the device comprising:

And the selection unit is used for selecting the first processing engine to call a target API based on API call information, wherein the API call information is used for indicating the efficiency of the first processing engine and the second processing engine to call the target API respectively.

11. The apparatus of claim 10, wherein the selection unit, when selecting the first processing engine to use to call the target API based on API call information, is specifically configured to:

12. The apparatus of claim 11, wherein the selecting unit, when obtaining the signature of the target API, is specifically configured to:

Determining an identity of the target API;

Determining the signature of the target API from a preset API signature set according to the identification of the target API, wherein the target API signature comprises the identification of the target API.

13. The apparatus according to any one of claims 10-12, wherein the apparatus further comprises an instruction determination unit and a transmission unit:

The instruction determining unit is configured to obtain, from a preconfigured API function library of the first processing engine, a program instruction required by the first processing engine to call the target API, where the API function library of the first processing engine includes a program instruction required by the first processing engine to call the API, and the API includes the target API;

and the sending unit is used for sending program instructions required by the first processing engine to call the target API to the first processing engine.

14. The apparatus according to any one of claims 10-12, wherein the method further comprises an instruction determination unit and a transmission unit:

The instruction determining unit is used for acquiring a pre-stored intermediate representation of the target API; and compiling the intermediate representation of the target API into program instructions required by the first processing engine to call the target API;

15. The apparatus of claim 14, wherein the API call information is further for indicating a storage address of an intermediate representation of the target API, the instruction determining unit, when acquiring the pre-stored intermediate representation of the target API, is specifically for:

16. The apparatus of any of claims 10 to 15, wherein the API call information is stored in a tabular form.

17. The apparatus according to any one of claims 10 to 12, wherein the API call information indicates a cache address of a program instruction required for the first processing engine and the second processing engine to call the target API, respectively, the method further comprising an instruction determining unit and a transmitting unit:

The instruction determining unit is used for obtaining the program instruction required by the first processing engine for calling the target API according to the cache address of the program instruction required by the first processing engine for calling the target API in the API calling information;

18. The apparatus of claim 17, wherein the instruction determination unit is specifically configured to:

19. A computing device comprising a memory and a processor, the memory for storing computer instructions; the processor invokes the computer instructions stored in the memory to perform the method of any one of the preceding claims 1 to 9.