Nothing Special   »   [go: up one dir, main page]

CN112445855B - Visual analysis method and visual analysis device for graphic processor chip - Google Patents

Visual analysis method and visual analysis device for graphic processor chip Download PDF

Info

Publication number
CN112445855B
CN112445855B CN202011282862.XA CN202011282862A CN112445855B CN 112445855 B CN112445855 B CN 112445855B CN 202011282862 A CN202011282862 A CN 202011282862A CN 112445855 B CN112445855 B CN 112445855B
Authority
CN
China
Prior art keywords
processor chip
visual analysis
instruction information
graphic processor
design
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011282862.XA
Other languages
Chinese (zh)
Other versions
CN112445855A (en
Inventor
崔恒冠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haiguang Information Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN202011282862.XA priority Critical patent/CN112445855B/en
Publication of CN112445855A publication Critical patent/CN112445855A/en
Application granted granted Critical
Publication of CN112445855B publication Critical patent/CN112445855B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A visual analysis method and a visual analysis device for a graphic processor chip. The visual analysis method for the graphic processor chip comprises the following steps: acquiring event record data in a task scheduler stored in a cache of a graphics processor chip; analyzing the event record data according to the coding protocol to obtain an event database; the plurality of data items in the event database are classified and then displayed in the form of a time map based on a time axis. The visual analysis method provided by the embodiment of the disclosure can present the bottom running condition related to the graphic processor chip to a developer in the form of a time map, so that the developer can adjust the design of the graphic processor chip, and the hardware resource of the graphic processor chip can be more fully utilized.

Description

Visual analysis method and visual analysis device for graphic processor chip
Technical Field
Embodiments of the present disclosure relate to a visual analysis method for a graphic processor chip, a design method for a graphic processor chip, a visual analysis apparatus for a graphic processor chip, a design apparatus for a graphic processor chip, a visual analysis and design apparatus, and a storage medium.
Background
The general graphic processor chip has the advantages of strong parallel computing capability, high throughput, excellent energy efficiency ratio and the like, so that the general graphic processor chip becomes a preferred computing acceleration component for constructing a high-performance platform at present. Under the same architecture, the application performance of the general graphics processor chip in the field of high-performance computing is mainly influenced by a programming model, memory access efficiency and computing task parallelism. The performance of a software program accelerated using a general-purpose graphics processor chip is highly dependent on the use of hardware resources of the general-purpose graphics processor chip.
Disclosure of Invention
At least one embodiment of the present disclosure provides a visual analysis method for a graphic processor chip, including: acquiring event record data in a task scheduler stored in a cache of the graphic processor chip; analyzing the event record data according to a coding protocol to obtain an event database; and classifying a plurality of data items in the event database and displaying the data items in a time map form based on a time axis.
For example, in a visual analysis method provided in at least one embodiment of the present disclosure, obtaining event record data in a task scheduler stored in a cache in the graphics processor chip includes: enabling a function program corresponding to the graphic processor chip to run in the graphic processor chip; and after the function program is operated, reading the event record data stored in the cache in the task scheduler.
For example, in a visual analysis method provided in at least one embodiment of the present disclosure, obtaining event record data in a task scheduler stored in a cache in the graphics processor chip includes: enabling a function program corresponding to the graphic processor chip to run in the graphic processor chip; and reading the event record data stored in the cache in the task scheduler in the running process of the function program.
For example, in a visual analysis method provided in at least one embodiment of the present disclosure, obtaining event record data in a task scheduler stored in a cache in the graphics processor chip includes: enabling a function program corresponding to the graphic processor chip to run in a simulation design corresponding to the graphic processor chip; and after the function program is operated, acquiring the event record data from the simulation design corresponding to the graphic processor chip.
For example, in a visual analysis method provided in at least one embodiment of the present disclosure, the function program includes a function program that performs a matrix operation.
For example, in a visual analysis method provided in at least one embodiment of the present disclosure, the plurality of data items in the event database include a plurality of pieces of instruction information, each piece of instruction information including a time stamp and a corresponding chip module.
For example, in a visual analysis method provided in at least one embodiment of the present disclosure, after classifying a plurality of data items in the event database, displaying the data items in a time map form based on a time axis includes: the plurality of pieces of instruction information are classified according to instruction types, and are displayed in different rows in the form of time maps based on a first time axis according to the classification result.
For example, in a visual analysis method provided in at least one embodiment of the present disclosure, after classifying a plurality of data items in the event database, displaying the data items in a time map form based on a time axis includes: and classifying the instruction information according to the corresponding chip modules, and displaying the instruction information in different rows in the form of time maps based on a first time axis according to the classification result.
For example, in the visual analysis method provided in at least one embodiment of the present disclosure, the plurality of data items further includes register information and hardware scheduling information.
For example, the visual analysis method provided in at least one embodiment of the present disclosure further includes: acquiring application programming interface calling information corresponding to the graphics processor chip when the function program runs, wherein the application programming interface calling information comprises a time stamp; and displaying the application programming interface call information in the form of a time map based on a second time axis different from the first time axis.
For example, the visual analysis method provided in at least one embodiment of the present disclosure further includes: and mapping the application programming interface calling information displayed based on the second time axis into the first time axis according to a time stamp, so that the application programming interface calling information is marked in the plurality of pieces of instruction information displayed based on the first time axis.
At least one embodiment of the present disclosure also provides a method for designing a graphics processor chip, including: any of the visual analysis methods provided according to the embodiments of the present disclosure performs visual analysis on the graphics processor chip to obtain a visual result displayed in the form of a time map; and adjusting the design of the graphic processor chip according to the visualization result.
For example, in the design method provided in at least one embodiment of the present disclosure, the visualization result includes execution densities of instruction information corresponding to a plurality of chip modules included in the graphics processor chip; wherein adjusting the design of the graphics processor chip according to the visualization result comprises: and adjusting the design of a plurality of chip modules of the graphic processor chip according to the visualization result.
For example, in the design method provided in at least one embodiment of the present disclosure, the visualization result includes an execution condition of instruction information when the function program corresponding to the graphics processor chip runs; wherein, adjust the design of the said graphic processor chip according to the analysis result, including: and adjusting parameters in the function program according to the visualization result.
At least one embodiment of the present disclosure also provides a visual analysis apparatus for a graphic processor chip, including: the acquisition module is configured to acquire event record data in a task scheduler stored in a cache of the graphic processor chip; the analysis module is configured to analyze the event record data according to an encoding protocol to obtain an event database; and a classification display module configured to display the plurality of data items in the event database in the form of a time map based on a time axis after classifying the plurality of data items.
At least one embodiment of the present disclosure also provides a design apparatus for a graphics processor chip, including: a visual analysis module configured to perform visual analysis on the graphic processor chip according to the visual analysis method of any one of claims 1 to 11 to obtain a visual result displayed in the form of a time map; and an adjustment module configured to adjust the design of the graphics processor chip according to the visualization result.
At least one embodiment of the present disclosure also provides a visual analysis and design apparatus, comprising: a processor;
A memory including one or more computer program modules; wherein the one or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules comprising any one of the methods provided for implementing embodiments of the present disclosure.
At least one embodiment of the present disclosure also provides a storage medium storing non-transitory computer-readable instructions that, when executed by a computer, may implement any of the methods provided by the embodiments of the present disclosure.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly described below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure, not to limit the present disclosure.
FIG. 1 is a schematic block diagram of a developer tool for a graphics processor chip;
FIG. 2 is a schematic diagram of a visual analysis method 10 for a graphics processor chip provided in accordance with at least one embodiment of the present disclosure;
FIG. 3 is a block diagram of a graphics processor chip;
FIG. 4 is a schematic diagram showing a plurality of instruction messages in the form of a time chart;
FIG. 5 is a schematic diagram showing a plurality of instruction messages in the form of a time chart;
FIG. 6 is a schematic diagram of another visual analysis method 10 for a graphics processor chip provided in accordance with at least one embodiment of the present disclosure;
FIG. 7 is a schematic diagram showing instruction information and application programming interface call information in the form of a time chart;
FIG. 8 is a schematic diagram of a method 20 for designing a graphics processor chip according to at least one embodiment of the present disclosure;
FIG. 9 is a schematic block diagram of a visual analysis apparatus 100 for a graphics processor chip provided in accordance with at least one embodiment of the present disclosure;
FIG. 10 is a schematic block diagram of a design apparatus 200 for a graphics processor chip provided in accordance with at least one embodiment of the present disclosure;
FIG. 11 is a schematic block diagram of a visual analysis and design apparatus 400 provided by at least one embodiment of the present disclosure;
FIG. 12 is a schematic block diagram of a visual analysis and design apparatus 800 provided by at least one embodiment of the present disclosure; and
Fig. 13 is a schematic diagram of a storage medium according to at least one embodiment of the present disclosure.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.
Unless defined otherwise, technical or scientific terms used in this disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. Likewise, the terms "a," "an," or "the" and similar terms do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items.
To maximize the utilization of hardware resources of General-purpose graphics processor chips (GPGPU, general-Purpose computing on Graphics Processing Unit), manufacturers of mainstream General-purpose graphics processor chips provide developer tools to help users better understand the running of programs on hardware, and thus find solutions. For example, FIG. 1 shows a schematic block diagram of a developer tool.
As shown in fig. 1, the developer tool includes a toolbar, an information display area 1, an information display area 2, an information display area 3, an information display area 4, an information display area 5, and the like. For example, the toolbar may display some commonly used development tools for use by a developer; the information display area 1 displays detailed statistical information of the use condition of the hardware resources; the information display area 2 displays statistical information of the use conditions of the computing unit and the memory unit in the GPGPU; the information display area 3 displays statistical information of the calculation task amount; the information display area 4 displays statistics of the running situation of a Kernel program (Kernel), which can be run in a GPGPU, for example; the information display area 5 displays an occupancy ratio analysis of the chip resources, for example comparing theoretical values with actual values.
It should be noted that the developer tool shown in fig. 1 is only an example, and the same general-purpose graphics processor chip may also use different developer tools; in addition, the developer tools adopted by different manufacturers can be the same or different.
Developer tools can generally be divided into two major functional parts: system analysis tools and kernel analysis tools. The system analysis tool may help the developer find some basic performance limiting factors, such as unnecessary GPU-CPU synchronization (where GPU refers to a graphics processor chip, CPU refers to a central processor chip, and the description is the same as this, and will not be repeated here), CPU binding, or bad CPU-side job scheduling algorithms, before analyzing the Kernel code. In optimizing Kernel, the functions provided by the commonly used Kernel analysis tool include collection of monitoring data of the GPGPU hardware resources, statistical display of memory usage, kernel debugging functions, and the like.
With the widespread use of general-purpose graphics processor chips, developers are also increasingly concerned about the low-level module operation of the chip. Currently, the mainstream developer tools do not support the visualization of execution information of various instructions in a GPGPU runtime chip into a time map based on a time axis for use by the developer.
At least one embodiment of the present disclosure provides a visual analysis method for a graphic processor chip, the visual analysis method including: acquiring event record data in a task scheduler stored in a cache of a graphics processor chip; analyzing the event record data according to the coding protocol to obtain an event database; and classifying the plurality of data items in the event database and then displaying the data items in the form of a time map based on a time axis.
Embodiments of the present disclosure also provide a method of designing a graphic processor chip corresponding to the above-described method of visual analysis, a visual analysis apparatus for a graphic processor chip, a design apparatus for a graphic processor chip, a visual analysis and design apparatus, and a storage medium.
The visual analysis method for the graphic processor chip provided by the embodiment of the disclosure can present the bottom running condition related to the graphic processor chip to a developer in the form of a time map based on a time axis, so that more visual information is provided for a designer of a chip architecture or a Kernel developer, and the developer can further adjust the chip architecture design or further optimize Kernel codes according to the information, so that hardware resources of the graphic processor chip can be more fully utilized.
It should be noted that, the graphics processor chip described in the embodiments of the present disclosure may include, in addition to the general purpose graphics processor chip (GPGPU) described above, a special purpose graphics processor chip specifically designed for a certain application scenario, which is not limited in the embodiments of the present disclosure.
At least one embodiment of the present disclosure provides a visual analysis method 10 for a graphics processor chip, which may be applied in a kernel analysis tool in a developer tool, for example. As shown in fig. 2, the visual analysis method 10 includes the following operational steps.
Step S101: event record data in a task scheduler stored in a cache of the graphics processor chip is acquired.
Step S102: the event record data is parsed according to the encoding protocol to obtain an event database.
Step S103: the plurality of data items in the event database are classified and then displayed in the form of a time map based on a time axis.
For example, in the above-described embodiments, the graphics processor chip may be a general purpose graphics processor chip (GPGPU). For example, FIG. 3 shows an architecture diagram of a graphics processor chip. As shown in fig. 3, the graphics processor chip includes a stream processor including a general vector register, a local data storage unit, a task scheduler, and a plurality of parallel computing modules, a task distributor, and a frame buffer. For example, in order to enable the graphics processor chip to be used in a certain application scenario, that is, to complete a certain function, a Kernel function program (Kernel) corresponding to the function of the application scenario needs to be written, and the Kernel is enabled to run in the graphics processor chip to complete the corresponding function.
When the graphics processor chip runs the corresponding Kernel, a plurality of event log data, also commonly referred to as thread trace data (THREAD TRACE DATA), are generated in the task scheduler, which are raw data generated by the Kernel running. For example, the event record data in the task scheduler may be stored in the frame buffer after being generated, and in step S101, the event record data in the task scheduler stored in the frame buffer is acquired. It should be noted that, in this embodiment and the following embodiments, the buffer is taken as an example of the frame buffer, but embodiments of the disclosure include, but are not limited to, that, in different graphics processor chips, the buffer may also be other units, as long as the buffer stores time record data in the task scheduler.
For example, in one embodiment, the step S101 includes: enabling a function program corresponding to the graphic processor chip to run in the graphic processor chip; and after the function program is finished, reading event record data stored in the buffer memory in the task scheduler. For example, a Kernel function program (Kernel) corresponding to the graphics processor chip may be made to run completely once in the graphics processor chip, and then after the Kernel running is finished, the event record data in the task scheduler is read from the frame buffer. It should be noted that, in the present embodiment and the following embodiments, a function program is described as an example of a kernel function program, but embodiments of the present disclosure include, but are not limited to, a function program may also be any function program that may be executed by a graphics processor chip.
For another example, in another embodiment, the step S101 includes: enabling a function program corresponding to the graphic processor chip to run in the graphic processor chip; and reading event record data stored in the cache in the task scheduler in the running process of the function program. For example, a Kernel function program (Kernel) corresponding to the graphics processor chip may be caused to start running in the graphics processor chip, and then during the process of running the Kernel, the event record data in the task scheduler is read from the frame buffer, that is, read while running.
For another example, in another embodiment, the step S101 includes: enabling a function program corresponding to the graphic processor chip to run in a simulation design corresponding to the graphic processor chip; and after the function program is operated, acquiring event record data in the simulation design corresponding to the graphic processor chip. For example, in the design process of a graphics processor chip, a simulation design is typically required prior to a process run, which may be accomplished in simulation software, for example. Therefore, the step of acquiring the event record data may also be performed through simulation, and in this embodiment, the function program is executed in the simulation design corresponding to the graphics processor chip, instead of being executed in the graphics processor chip obtained after the processing and the casting; in this case, the buffer in step S101 represents the buffer in the simulation design corresponding to the graphics processor chip, and does not refer to the buffer in the graphics processor chip obtained after the process throw.
For example, a matrix operation may be performed using a graphics processor chip to accomplish the acceleration of computation, so in embodiments of the present disclosure, a function program may include a function program that performs a matrix operation, e.g., when performing a matrix operation, the function program may call a database. It should be noted that, although the embodiment of the disclosure is described by taking the matrix operation performed by the graphics processor chip as an example, the embodiment of the disclosure is not limited thereto, and the graphics processor chip may perform corresponding operations in other application scenarios to achieve corresponding functions.
As described above, the event record data may be acquired through the above-described step S101, and then the acquired event record data is parsed according to the encoding protocol to obtain an event database including a plurality of data items, for example, in step S102. It should be noted that the encoding protocol in step S102 may be understood as a pre-designed, pre-defined convention between the graphics processor chip and the developer tool, and the encoding form adopted by the embodiment of the present disclosure is not limited to this encoding protocol.
In some embodiments of the present disclosure, for example, the plurality of data items in the event database obtained through step S102 include a plurality of pieces of instruction information, each piece of instruction information including a time stamp and a corresponding chip module, i.e., a time when the piece of instruction information was generated by which chip module of the graphics processor chip operated and the piece of instruction information was executed may be determined according to contents included in the instruction information.
For example, in at least one embodiment, the step S103 includes: the plurality of pieces of instruction information are classified by instruction type, and are displayed in different rows in the form of time charts based on the first time axis according to the result of the classification.
For example, fig. 4 shows an example in which a plurality of pieces of instruction information are displayed in the form of a time map based on the first time axis T1. The time unit of the first time axis T1 is a clock period when the graphics processor chip operates, and the clock periods when different graphics processor chips operate may be the same or different, which is not limited by the embodiments of the present disclosure.
For example, instruction information including scalar computations (Scalar), vector memory operations (Vec mem), branch operations (Branch), local memory operations (LDS), global memory operations (Export), immediate instructions (Immed), and other Internal instructions (international) of seven different instruction types are illustratively shown in fig. 4, with the different types of instruction information being displayed in time-stamped form in different rows. For example, for a time map corresponding to any one instruction information, a white area indicates that the corresponding instruction information is not executed within the time period.
In the visual analysis method provided by the embodiment of the present disclosure, a plurality of pieces of instruction information are classified according to instruction types, and are displayed in different rows in the form of time charts according to the classification result, and since the instruction information is displayed based on a time axis, including the timing at which the instruction information is executed, a developer can more intuitively obtain details about the execution of the instruction information through the visual time charts.
For example, a time map corresponding to a local memory operation (LDS) may reflect local memory access performance when the graphics processor chip is operating; for another example, a time map corresponding to an overall memory operation (Export) may reflect global memory access performance when the graphics processor chip is in operation; for another example, the time maps corresponding to various instruction information may also reflect the throughput of instruction execution (the higher the better). The developer can adjust the design of the graphic processor chip according to the visualized time map, for example, the Kernel developer can adjust various parameters in Kernel, so that the working performance of the graphic processor chip is improved; for another example, if some of the instruction information that is costly is found to be executed more often, the designer of the graphics processor chip may adjust the chip architecture to reduce the number of times that such instruction information is executed.
For another example, in at least one embodiment, the step S103 includes: the plurality of instruction information are classified according to the corresponding chip modules, and are displayed in different rows in the form of time maps based on a first time axis according to the classification result.
For example, FIG. 4 schematically illustrates instruction information corresponding to four parallel computing modules (SIMD 0, SIMD1, SIMD2, SIMD 3) in a graphics processor chip, with instruction information corresponding to different chip modules being displayed in different rows in the form of a time map. Likewise, for a time map corresponding to instruction information corresponding to any one of the chip modules, a white area indicates that no corresponding instruction information is executed within the time period.
In the visual analysis method provided by the embodiment of the disclosure, a plurality of pieces of instruction information are classified according to the corresponding chip modules, and are displayed in different rows in the form of time charts according to the classification result, and because the instruction information is displayed based on a time axis and includes the time sequence at which the instruction information is executed, a developer can more intuitively obtain the details about the execution of the instruction information through the visual time charts.
For example, a designer of the graphics processor chip can learn whether the underlying chip module works according to the designed logic through the visualized time map; for another example, the execution density of the instruction information corresponding to a certain chip module may reflect the usage efficiency of the chip module. For example, fig. 5 shows another example of displaying a plurality of pieces of instruction information in the form of a time map based on the first time axis T1, and as can be seen by comparing fig. 4 and 5, the execution density of the instruction information corresponding to the parallel computing module shown in fig. 5 is higher than that in fig. 4, that is, in the case shown in fig. 5, the use efficiency of the chip module is higher.
It should be noted that, fig. 4 and 5 only show a part of the chip modules and instruction types by way of example, and the embodiments of the present disclosure are not limited thereto. When the visual analysis method provided by the embodiment of the disclosure is used, visual analysis can be performed on each chip module and various instruction types in the graphic processor chip according to the requirement.
In addition, a plurality of rectangular blocks are shown in fig. 4 and 5, which may represent a set of a plurality of pieces of instruction information, for example, if a time map is enlarged, it may be shown that the plurality of pieces of instruction information are included in the rectangular blocks. Also for example, in the four parallel computing modules shown in fig. 4 and 5, the light rectangular block represents 32 bits of instruction information, and the dark rectangular block represents 64 bits of instruction information.
It should be noted that, in fig. 4 and fig. 5, the instruction information of different instruction types is represented by rectangular blocks with different color depths, for example, the color of the instruction information corresponding to the local memory operation is darker than the color of the instruction information corresponding to the immediate instruction. Embodiments of the present disclosure include, but are not limited to, such a display, for example, different types of instructions may also be distinguished by different shapes, different fill patterns.
In the above embodiments, the case where the plurality of data items in the event database includes a plurality of pieces of instruction information is described as an example, embodiments of the present disclosure include, but are not limited to, for example, the plurality of data items in the event database may further include register information and/or hardware scheduling information. It should be noted that the basic units of hardware scheduling employed by different platforms may be different.
The visual analysis method provided by the embodiment of the disclosure not only can display instruction information in a time spectrum form, but also can display register information and/or hardware scheduling information in a time spectrum form, so that more information of multi-chip bottom operation can be provided for a developer, and further the developer is helped to better complete the design of a graphic processor chip or the development of Kernel.
As shown in fig. 6, the visual analysis method provided in at least one embodiment of the present disclosure further includes the following operation steps.
Step S104: application Programming Interface (API) calling information of a function program corresponding to the graphic processor chip in running is obtained, and the API calling information comprises a time stamp.
Step S105: the API call information is displayed in the form of a time map based on a second time axis T2 different from the first time axis T1.
For step S104, kernel code may generate some API call information when running in the graphics processor chip, for example, when executing the visual analysis method described above, the API call information may be collected by a developer tool, and similarly to the instruction information described above, the API call information also includes a timestamp.
Since the API call information obtained in step S104 includes a time stamp, in step S105, the API call information may be displayed in the form of a time map based on the second time axis T2. For example, as shown in fig. 7, the upper half is that the plurality of pieces of instruction information are displayed in the form of a time chart based on the first time axis T1, and the lower half is that the plurality of pieces of API call information are displayed in the form of a time chart based on the second time axis T2.
Note that, the time unit of the second time axis T2 may be the same as or different from the time unit of the first time axis T1, which is not limited in the embodiment of the present disclosure.
The visual analysis method provided by the embodiment of the disclosure not only can display instruction information in the form of a time chart, but also can display API call information in the form of the time chart, so that more information about Kernel code execution can be provided for a developer, and further the developer is helped to complete Kernel development better.
In the visual analysis method provided in at least one embodiment of the present disclosure, as shown in fig. 7, the API call information displayed based on the second time axis T2 may be mapped to the first time axis T1 according to a timestamp, so that the pieces of instruction information displayed based on the first time axis T1 are marked with the API call information, so that it is known at which time in the process of executing the instruction information an API call occurs, or when a certain API call occurs, which instruction information is executed.
By adopting the visual analysis method, the instruction information generated during the working of the graphic processor chip can be displayed by combining with the API call information, and richer chip bottom layer operation information can be provided for a developer, so that the developer can be helped to better optimize Kernel codes.
At least one embodiment of the present disclosure also provides a design method 20 of a graphics processor chip, as shown in fig. 8, the design method 20 including the following operation steps.
Step S201: any of the visual analysis methods provided according to embodiments of the present disclosure performs visual analysis on a graphic processor chip to obtain a visual result displayed in the form of a time map.
Step S202: and adjusting the design of the graphic processor chip according to the visualization result.
For example, in some embodiments, the visualization result includes execution intensity of instruction information corresponding to a plurality of chip modules included in the graphics processor chip; in this case, the step S202 includes: and adjusting the design of the plurality of chip modules of the graphic processor chip according to the visualization result.
As described in the embodiment of the visual analysis method, the execution density of the instruction information corresponding to a certain chip module may reflect the use efficiency of the chip module, and after obtaining the visual result reflecting the use efficiency of the chip module, for example, a developer may adjust the design of a plurality of chip modules of the graphics processor chip, so that the hardware resources of the graphics processor chip may be more fully utilized.
For another example, in some embodiments, the visualization result includes execution of instruction information of the function program corresponding to the graphics processor chip during running; in the case that the function program is a Kernel function program (Kernel), the step S202 includes: and adjusting parameters in Kernel according to the visualization result.
As described in the embodiment of the visual analysis method, the time spectrum corresponding to the local memory operation (LDS) may reflect the local memory access performance when the graphics processor chip works, the time spectrum corresponding to the whole memory operation (Export) may reflect the global memory access performance when the graphics processor chip works, and the time spectrum corresponding to the various instruction information may also reflect the throughput (higher and better) of instruction execution; according to the execution condition of the instruction information, a developer can adjust various parameters in Kernel, so that the working performance of the graphic processor chip is improved.
For example, kernel may call a math library when matrix operations are performed using a graphics processor chip. For example, when using a database, corresponding parameters may be set, such as parameters in a generic matrix multiplication (GEMM); for another example, the calculation type may be set to single precision or double precision.
The design method of the graphic processor chip provided by the embodiment of the disclosure can present the bottom running condition related to the graphic processor chip to a developer in the form of a time map based on a time axis, so that more visual information is provided for a designer of a chip architecture or a Kernel developer, and the developer can further adjust the chip architecture design or further optimize Kernel codes according to the information, so that the hardware resources of the graphic processor chip can be more fully utilized.
At least one embodiment of the present disclosure also provides a visual analysis apparatus 100 for a graphic processor chip, as shown in fig. 9, the visual analysis apparatus 100 includes an acquisition module 110, an analysis module 120, and a classification display module 130.
For example, the obtaining module 110 is configured to obtain event record data in the task scheduler stored in the cache of the graphics processor chip, that is, to complete the above step S101.
The parsing module 120 is configured to parse the event record data according to the encoding protocol to obtain an event database, i.e. to complete the above step S102.
The classification display module 130 is configured to classify the plurality of data items in the event database and then display the classified data items in the form of a time map based on a time axis, that is, to complete the above step S103.
At least one embodiment of the present disclosure also provides a design apparatus 200 for a graphics processor chip, as shown in fig. 10, the design apparatus 200 including a visual analysis module 210 and an adjustment module 220.
For example, the visual analysis module 210 is configured to perform visual analysis on the graphic processor chip according to any of the visual analysis methods provided in the embodiments of the present disclosure to obtain a visual result displayed in the form of a time map, that is, to complete the above step S201.
The adjustment module 220 is configured to adjust the design of the graphics processor chip according to the visualization result, i.e. the above step S202 is completed.
Since details of the operations of the apparatus 100, 200 are described in the course of describing the visual analysis method 10 and the design method 20, the details are not described herein for brevity, and reference may be made to the descriptions of fig. 2 to 8.
It should be noted that each module in the apparatus shown in fig. 9 to 10 may be configured as software, hardware, firmware, or any combination thereof that performs a specific function, respectively. For example, these modules may correspond to application specific integrated circuits, to pure software code, or to a combination of software and hardware. By way of example, the devices described with reference to fig. 9-10 may be, but are not limited to, PC computers, tablet devices, personal digital assistants, smartphones, web applications, or other devices capable of executing program instructions.
In addition, although the apparatus 100, 200 is described above as being divided into modules for performing the respective processes, it is apparent to those skilled in the art that the processes performed by the respective modules may be performed without any specific division of the modules in the apparatus or without explicit demarcation between the respective modules. Furthermore, the apparatus described above with reference to fig. 9-10 is not limited to include the above-described modules, but may also add some other modules (e.g., memory modules, data processing modules, etc.) as needed, or the above modules may be combined.
At least one embodiment of the present disclosure also provides a visual analysis and design apparatus comprising a processor and a memory; the memory includes one or more computer program modules; one or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules including the visualization analysis method 10 and the design method 20 for implementing the embodiments of the present disclosure described above.
FIG. 11 is a schematic block diagram of a visual analysis and design apparatus provided in accordance with at least one embodiment of the present disclosure. As shown in fig. 11, the visual analysis and design apparatus 400 includes a processor 410 and a memory 420. Memory 420 is used to store non-transitory computer-readable instructions (e.g., one or more computer program modules). The processor 410 is configured to execute non-transitory computer readable instructions that, when executed by the processor 410, may perform one or more of the steps of the visual analysis method 10 and the design method 20 described above. The memory 420 and the processor 410 may be interconnected by a bus system and/or other form of connection mechanism (not shown).
For example, the processor 410 may be a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or other form of processing unit having data processing and/or program execution capabilities, such as a Field Programmable Gate Array (FPGA), or the like; for example, the Central Processing Unit (CPU) may be an X86 or ARM architecture, or the like. Processor 410 may be a general purpose processor or a special purpose processor that may control the visual analysis and other components in design apparatus 400 to perform the desired functions.
For example, memory 420 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM) and/or cache memory (cache) and the like. The non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disc read-only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer program modules may be stored on the computer readable storage medium and executed by the processor 410 to implement various functions of the decision device 400. Various applications and various data, as well as various data used and/or generated by the applications, etc., may also be stored in the computer readable storage medium.
It should be noted that, in the embodiments of the present disclosure, the specific functions and technical effects of the visual analysis and design apparatus 400 may refer to the description of the visual analysis method 10 and the design method 20 above, and are not repeated herein.
Fig. 12 is a schematic block diagram of another visual analysis and design apparatus provided by some embodiments of the present disclosure. The visual analysis and design apparatus 800 is, for example, suitable for use in implementing the visual analysis method 10 and the design method 20 provided by embodiments of the present disclosure. It should be noted that the visual analysis and design apparatus 800 shown in fig. 12 is merely an example, and does not impose any limitation on the functionality and scope of use of the disclosed embodiments.
As shown in fig. 12, the visual analysis and design apparatus 800 may include a processing device (e.g., a central processor, a graphics processor, etc.) 810 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 820 or a program loaded from a storage device 880 into a Random Access Memory (RAM) 830. In the RAM 830, various programs and data required for the operation of the visual analysis and design apparatus 800 are also stored. The processing device 810, the ROM 820, and the RAM 830 are connected to each other by a bus 840. An input/output (I/O) interface 850 is also connected to bus 840.
In general, the following devices may be connected to the I/O interface 850: input devices 860 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 870 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 880 including, for example, magnetic tape, hard disk, etc.; and communication device 890. Communication device 890 may allow visual analysis and design apparatus 800 to communicate wirelessly or by wire with other electronic devices to exchange data. While fig. 12 shows the visual analysis and design apparatus 800 with various devices, it should be understood that not all of the illustrated devices are required to be implemented or provided, and that the visual analysis and design apparatus 800 may alternatively be implemented or provided with more or fewer devices.
For example, the visual analysis method 10 and the design method 20 provided by the embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the above-described visual analysis method 10 and design method 20. In such an embodiment, the computer program may be downloaded and installed from a network via communications device 890, or from storage 880, or from ROM 820. The visual analysis method 10 and the design method 20 provided by the embodiments of the present disclosure may be performed when the computer program is executed by the processing device 810.
At least one embodiment of the present disclosure also provides a storage medium storing non-transitory computer readable instructions that, when executed by a computer, may implement any of the visual analysis methods 10 and design methods 20 provided by embodiments of the present disclosure.
Fig. 13 is a schematic diagram of a storage medium according to some embodiments of the present disclosure. As shown in fig. 13, the storage medium 600 is used to store non-transitory computer readable instructions 610. For example, non-transitory computer readable instructions 610, when executed by a computer, may perform one or more steps in accordance with the visualization analysis method 10 and the design method 20 described above.
For example, the storage medium 600 may be applied to the above-described visual analysis and design apparatus 400. For example, the storage medium 600 may be the memory 420 in the visual analysis and design apparatus 400 shown in fig. 11. For example, the relevant description of the storage medium 600 may refer to the corresponding description of the visual analysis and the memory 420 in the design apparatus 400 shown in fig. 11, and will not be repeated here.
The following points need to be described:
(1) The drawings of the embodiments of the present disclosure relate only to the structures to which the embodiments of the present disclosure relate, and reference may be made to the general design for other structures.
(2) The embodiments of the present disclosure and features in the embodiments may be combined with each other to arrive at a new embodiment without conflict.
The foregoing is merely specific embodiments of the disclosure, but the scope of the disclosure is not limited thereto, and the scope of the disclosure should be determined by the claims.

Claims (16)

1. A visual analysis method for a graphics processor chip, comprising:
Acquiring event record data in a task scheduler stored in a cache of the graphic processor chip;
analyzing the event record data according to a coding protocol to obtain an event database; and
Classifying a plurality of data items in the event database and then displaying the data items in a time map form based on a time axis;
Wherein the plurality of data items in the event database include a plurality of pieces of instruction information;
classifying the plurality of data items in the event database and then displaying the data items in the event database in a time map form based on a time axis, wherein the method comprises the following steps of:
Classifying the plurality of pieces of instruction information according to instruction types, and displaying the plurality of pieces of instruction information in different rows in the form of time maps based on a first time axis according to the classification result; or alternatively
Classifying the instruction information according to the corresponding chip modules, and displaying the instruction information in different rows in the form of a time map based on a first time axis according to the classification result;
wherein the display is used to adjust the design of the graphics processor chip.
2. The visual analysis method of claim 1, wherein obtaining event record data in a task scheduler stored in a cache in the graphics processor chip comprises:
enabling a function program corresponding to the graphic processor chip to run in the graphic processor chip; and
And after the function program is operated, reading the event record data stored in the buffer memory in the task scheduler.
3. The visual analysis method of claim 1, wherein obtaining event record data in a task scheduler stored in a cache in the graphics processor chip comprises:
enabling a function program corresponding to the graphic processor chip to run in the graphic processor chip; and
And reading the event record data stored in the cache in the task scheduler in the running process of the function program.
4. The visual analysis method of claim 1, wherein obtaining event record data in a task scheduler stored in a cache in the graphics processor chip comprises:
Enabling a function program corresponding to the graphic processor chip to run in a simulation design corresponding to the graphic processor chip; and
And after the function program is operated, acquiring the event record data from the simulation design corresponding to the graphic processor chip.
5. The visual analysis method according to any one of claims 2 to 4, wherein the function program includes a function program for performing a matrix operation.
6. The visual analysis method of any of claims 2-4, wherein the plurality of data items in the event database comprise a plurality of pieces of instruction information, each piece of instruction information comprising a timestamp and a corresponding chip module.
7. The visual analysis method of claim 6, wherein the plurality of data items further comprises register information and hardware scheduling information.
8. The visual analysis method according to claim 1, further comprising:
acquiring application programming interface calling information of a function program corresponding to the graphic processor chip during operation, wherein the application programming interface calling information comprises a time stamp; and
The application programming interface call information is displayed in the form of a time map based on a second time axis different from the first time axis.
9. The visual analysis method of claim 8, further comprising:
And mapping the application programming interface calling information displayed based on the second time axis into the first time axis according to a time stamp, so that the application programming interface calling information is marked in the plurality of pieces of instruction information displayed based on the first time axis.
10. A method of designing a graphics processor chip, comprising:
The visual analysis method according to any one of claims 1 to 9, performing visual analysis on the graphic processor chip to obtain a visual result displayed in the form of a time map; and
And adjusting the design of the graphic processor chip according to the visualization result.
11. The design method according to claim 10, wherein the visualized result includes execution density of instruction information corresponding to a plurality of chip modules included in the graphic processor chip;
wherein adjusting the design of the graphics processor chip according to the visualization result comprises:
And adjusting the design of a plurality of chip modules of the graphic processor chip according to the visualization result.
12. The design method according to claim 10, wherein the visualized result includes execution conditions of instruction information of the function program corresponding to the graphics processor chip during running;
wherein adjusting the design of the graphics processor chip according to the visualization result comprises:
And adjusting parameters in the function program according to the visualization result.
13. A visual analysis apparatus for a graphics processor chip, comprising:
The acquisition module is configured to acquire event record data in a task scheduler stored in a cache of the graphic processor chip;
the analysis module is configured to analyze the event record data according to an encoding protocol to obtain an event database; and
The classification display module is configured to classify a plurality of data items in the event database and then display the classified data items in a time map form based on a time axis;
Wherein the plurality of data items in the event database include a plurality of pieces of instruction information;
the classification display module is further configured to:
Classifying the plurality of pieces of instruction information according to instruction types, and displaying the plurality of pieces of instruction information in different rows in the form of time maps based on a first time axis according to the classification result; or alternatively
Classifying the instruction information according to the corresponding chip modules, and displaying the instruction information in different rows in the form of a time map based on a first time axis according to the classification result;
wherein the display is used to adjust the design of the graphics processor chip.
14. A design apparatus for a graphics processor chip, comprising:
a visual analysis module configured to perform visual analysis on the graphic processor chip according to the visual analysis method of any one of claims 1 to 9 to obtain a visual result displayed in the form of a time map; and
And the adjusting module is configured to adjust the design of the graphic processor chip according to the visualization result.
15. A visual analysis and design apparatus comprising:
A processor;
A memory including one or more computer program modules;
Wherein the one or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules comprising instructions for implementing the method of any of claims 1-12.
16. A storage medium storing non-transitory computer readable instructions which, when executed by a computer, implement the method of any one of claims 1-12.
CN202011282862.XA 2020-11-17 2020-11-17 Visual analysis method and visual analysis device for graphic processor chip Active CN112445855B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011282862.XA CN112445855B (en) 2020-11-17 2020-11-17 Visual analysis method and visual analysis device for graphic processor chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011282862.XA CN112445855B (en) 2020-11-17 2020-11-17 Visual analysis method and visual analysis device for graphic processor chip

Publications (2)

Publication Number Publication Date
CN112445855A CN112445855A (en) 2021-03-05
CN112445855B true CN112445855B (en) 2024-05-17

Family

ID=74738260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011282862.XA Active CN112445855B (en) 2020-11-17 2020-11-17 Visual analysis method and visual analysis device for graphic processor chip

Country Status (1)

Country Link
CN (1) CN112445855B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204338A (en) * 2021-03-22 2021-08-03 杭州微纳核芯电子科技有限公司 Visual programming method, device, equipment and medium for chip
CN114048086A (en) * 2021-11-09 2022-02-15 北京字节跳动网络技术有限公司 Analyzer performance analysis method, apparatus, device, medium and program product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101470599A (en) * 2007-12-28 2009-07-01 富士通株式会社 Processing unit
CN104969144A (en) * 2013-03-15 2015-10-07 起元技术有限责任公司 Recording program execution
US10073764B1 (en) * 2015-03-05 2018-09-11 National Technology & Engineering Solutions Of Sandia, Llc Method for instruction sequence execution analysis and visualization

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080276252A1 (en) * 2007-05-04 2008-11-06 Microsoft Corporation Kernel event visualization
US9471456B2 (en) * 2013-05-15 2016-10-18 Nvidia Corporation Interleaved instruction debugger
US10402931B2 (en) * 2015-06-07 2019-09-03 Apple Inc. Systrace visualization tool

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101470599A (en) * 2007-12-28 2009-07-01 富士通株式会社 Processing unit
CN104969144A (en) * 2013-03-15 2015-10-07 起元技术有限责任公司 Recording program execution
US10073764B1 (en) * 2015-03-05 2018-09-11 National Technology & Engineering Solutions Of Sandia, Llc Method for instruction sequence execution analysis and visualization

Also Published As

Publication number Publication date
CN112445855A (en) 2021-03-05

Similar Documents

Publication Publication Date Title
US10534591B2 (en) Multistage development workflow for generating a custom instruction set reconfigurable processor
EP3754496B1 (en) Data processing method and related products
US8499287B2 (en) Analysis of thread synchronization events
CN104750603B (en) A kind of multi-core DSP software simulator and its physical layer software test method
US9703670B2 (en) Performance state machine control with aggregation insertion
Altman Accelerating MATLAB Performance: 1001 tips to speed up MATLAB programs
CN112445855B (en) Visual analysis method and visual analysis device for graphic processor chip
US10867362B2 (en) Methods and apparatus to improve operation of a graphics processing unit
Mack et al. CEDR: A compiler-integrated, extensible DSSoC runtime
KR101715986B1 (en) System and method for efficient resource management of a signal flow programmed digital signal processor code
US20220100512A1 (en) Deterministic replay of a multi-threaded trace on a multi-threaded processor
US20230109752A1 (en) Deterministic replay of a multi-threaded trace on a multi-threaded processor
Raghavan et al. Model based estimation and verification of mobile device performance
CN116472533A (en) Development method and device of artificial intelligence AI model
CN114818565A (en) Simulation environment management platform, method, equipment and medium based on python
Suriano et al. DAMHSE: Programming heterogeneous MPSoCs with hardware acceleration using dataflow-based design space exploration and automated rapid prototyping
CN114185874A (en) Big data based modeling method and device, development framework and equipment
KR101745392B1 (en) Program analyzing device and computer readble recording medium recording analyzing program
GB2476548A (en) Relational modeling for performance analysis of multi-core processors using virtual tasks
Amert et al. CUPiD RT: Detecting improper GPU usage in real-time applications
CN113515348A (en) Simulator modeling method and device based on opportunity action flow
CN113033132A (en) Method and related device for determining port timing sequence constraint
Ciambrone et al. HEPSIM: An ESL HW/SW co-simulator/analysis tool for heterogeneous parallel embedded systems
US20110154294A1 (en) Relational Modeling for Performance Analysis of Multi-Core Processors
JP2007080049A (en) Built-in program generation method, built-in program development system and information table section

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant