Nothing Special   »   [go: up one dir, main page]

US20130046912A1 - Methods of monitoring operation of programmable logic - Google Patents

Methods of monitoring operation of programmable logic Download PDF

Info

Publication number
US20130046912A1
US20130046912A1 US13/212,907 US201113212907A US2013046912A1 US 20130046912 A1 US20130046912 A1 US 20130046912A1 US 201113212907 A US201113212907 A US 201113212907A US 2013046912 A1 US2013046912 A1 US 2013046912A1
Authority
US
United States
Prior art keywords
data
graph
hardware
nodes
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/212,907
Inventor
Oliver Pell
Itay Greenspon
James Barry Spooner
Robert Gwilym Dimond
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Maxeler Technologies Ltd
Original Assignee
Maxeler Technologies Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Maxeler Technologies Ltd filed Critical Maxeler Technologies Ltd
Priority to US13/212,907 priority Critical patent/US20130046912A1/en
Assigned to MAXELER TECHNOLOGIES, LTD. reassignment MAXELER TECHNOLOGIES, LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DIMOND, ROBERT GWILYM, GREENSPON, ITAY, PELL, OLIVER, SPOONER, JAMES BARRY
Priority to US13/725,345 priority patent/US8930876B2/en
Publication of US20130046912A1 publication Critical patent/US20130046912A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3636Software debugging by tracing the execution of the program
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3027Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/36Handling requests for interconnection or transfer for access to common bus or bus system

Definitions

  • the present invention relates to methods of monitoring operation of programmable logic as may be used, for example, in a process of debugging a streaming processor.
  • the invention relates to a method for monitoring operation, and optionally then debugging, a field programmable gate array (FPGA).
  • FPGA field programmable gate array
  • a streaming processor such as might be provided by the assignee, Maxeler Technologies Ltd., consists of an FPGA, connected to multiple memories or other external data sources/sinks. On the FPGA, the circuit is made up of a manager containing one or more blocks including kernels. Typically the streaming processor may be used as a hardware accelerator for certain computing applications.
  • Kernels are hardware data-paths implementing the arithmetic and logical computations needed within an algorithm.
  • a “manager” is the collective term for the FPGA logic which orchestrates or controls data flow between Kernels and off-chip input/output (I/O) in the form of streams.
  • I/O input/output
  • managers are able to achieve high utilization of available bandwidth in off-chip communication channels.
  • a user when designing or configuring an FPGA, controls the designs of the kernels and the configuration of the manager so as to ensure that the FPGA performs the desired processing steps on data passing through it.
  • FIG. 1 shows a schematic representation of such a graph.
  • the graph 2 comprises nodes 4 , 6 , 8 and 10 each node being a kernel within the streaming processor.
  • Each node in the graph executes a specific function on incoming data and outputs the result, which becomes the input to another node in the graph.
  • the data being processed “flows” through the graph from one node to the next, without requiring writing back to memory.
  • This graph may then be implemented as an application-specific circuit within an FPGA accelerator.
  • kernel 8 may be a multiplexer arranged to select one of the outputs from nodes 4 and 6 and provide this value to the kernel 10 .
  • FIG. 1 is a simplified example of what such a streaming processor may typically look like. In practice such a graph is likely to have up to thousands or even more nodes with connecting edges. Thus, to map data flow and identify errors in the flow of data in such a large graph represents a significant technical problem.
  • Streaming accelerators implemented using FPGAs or other similar processing technology can offer increased performance on many useful applications compared to conventional microprocessors. See for example our co-pending applications, U.S. Ser. No. 12/636,906, U.S. Ser. No. 12/792,197, U.S. Ser. No. 12/823,432, U.S. Ser. No. 13/023,275 and U.S. Ser. No. 13/029,696, the entire contents of all of which are hereby incorporated by reference. In our co-pending application Ser. No. 13/166,565, the entire contents of which are hereby incorporated by reference, there is described a method for debugging the control flow on an FPGA.
  • streaming processors themselves are enormous useful for various types of computer applications, when constructing a streaming processor, problems can be encountered.
  • One such problem is that there can be no visibility as to why a streaming processor fails to operate as expected. It is often very difficult to determine where investigations into such a failure should start. Indeed, it can be extremely difficult to find the source of data corruption only by observing the inputs and outputs of a streaming processor.
  • the graph 2 represents a streaming processor with each of the nodes in the graph representing a kernel within the streaming processor.
  • a method of monitoring operation of programmable logic for a streaming processor comprising: generating a graph representing the programmable logic to be implemented in hardware, the graph comprising nodes and edges connecting nodes in the graph; and, inserting, on each edge, monitoring hardware to monitor flow of data along the edge.
  • the method provides a means by which hardware can be used to enable problems or faults within a streaming processor to be easily and quickly identified or diagnosed. Given the scale of current streaming processors the method provides a useful means by which faults can quickly and automatically be identified. Once found, such faults can be fixed or debugged in the usual way. Furthermore the method provides a way by which data can easily be collected about the operation of a streaming processor so that this gathered information can be used to reconstruct the state of the data flow graph at a given point in time.
  • the graph may represent an entire stream processor as would be implemented on an FPGA. Alternatively, the graph may represent some subset of the features as would be included on the FPGA.
  • each edge comprises flow control signals and a data bus for flow of data
  • the method comprises coupling the monitoring hardware to both the flow control signals and the data bus.
  • the method comprises reading parameters associated with the data with the monitoring hardware, the parameters including the number of valid data cycles.
  • the method comprises performing a checksum on passing data with the monitoring hardware.
  • the method comprises performing a checksum on at least two consecutive edges and comparing the checksum values. By doing this it can be possible to check whether the node between the two edges is functioning correctly, in the particular case of a node which is not supposed to modify the data, e.g. a FIFO buffer. If the checksum varies, then clearly the data will have been modified and so the FIFO will not have functioned correctly.
  • the method comprises determining the number of valid cycles along every edge in the graph thereby identifying one or more routes taken by data through the graph. This enables the route taken by data through the data flow graph to be easily determined.
  • the method comprises determining the number of valid cycles along at least two consecutive edges and comparing the numbers. By comparing the number of valid cycles along two consecutive edges it is possible to establish whether or not data has been lost in a node in a manner that might not have been expected.
  • At least one of the nodes comprises a FIFO memory.
  • a kernel 12 is arranged to output data to storage 14 . After the stream has completed, it would be possible to inspect the external data storage 14 . However, without knowing exactly what data was written it would be very difficult to arrive at any conclusions.
  • 2002/082269 relates to a system observation bus and provides a method and mechanism for configuring a node in a computing system to route data to a predetermined observation point.
  • U.S. Pat. No. 6,678,861 relates to a FIFO with cyclic redundancy check (CRC) in a programmable logic device (PLD).
  • a PLD is provided comprising one or more memory circuits configured to check a CRC value of an input and generate a CRC for an output.
  • U.S. Pat. No. 7,543,216 relates to cyclic redundancy checking of a field programmable gate array having an SRAM memory architecture.
  • a method of monitoring operation of programmable logic for a streaming processor comprising: generating a graph representing the programmable logic to be implemented in hardware, the graph comprising nodes and edges connecting the nodes; inserting, on at least one edge, data-generating hardware arranged to receive data from an upstream node and generate data at known values having the same flow control pattern as the received data for onward transmission to a connected node.
  • the actual data received by the data generating hardware is not passed on to the next connected node but rather dummy data having the same flow control pattern is passed on.
  • the flow control pattern of data is important to determine operation of the streaming processor so that by emulating this pattern using dummy data, the effect of the flow control pattern on the streaming processor can effectively be isolated since the effect of the data itself is removed.
  • the data-generating hardware is provided on each edge in the graph.
  • the data-generating hardware is arranged to generate a count signal.
  • the known values of data generated are simply a count which can be arranged to increment uniformly. This means that the effect on the data of the nodes is known and so if any differences are encountered between the expected output of the nodes and the actual outputs then it can easily be determined that there is some error with the node.
  • each edge comprises a data bus for flow of data and flow control signals for the transmission of flow control signals
  • the method comprises coupling the data-generating hardware to both the flow control signals and the data bus.
  • the method comprises incrementing the counter when the flow control signals indicate that data should transfer between the nodes.
  • the data-generating hardware is arranged to receive an input from the data bus and to provide as an output a count signal having the same flow control pattern as the data received on the data bus.
  • the method comprises coupling the control signals to a data generator within the count-generating hardware, and in dependence on the flow control signals generating the count signal.
  • the method comprises operating the data-generating hardware at the same clock rate as the data received from the upstream node.
  • a streaming processor comprising: plural nodes for processing streaming data; at least one edge connecting the one or more nodes; monitoring hardware provided on each of the edges to monitor flow of data along the respective edge.
  • a streaming processor comprising: plural nodes for processing streaming data; at least one edge connecting pairs of the one or more nodes; data-generating hardware arranged to receive data from an upstream node in a pair of nodes and generate data at known values having the same flow control pattern as the received data for onward transmission to a downstream node in the pair of nodes.
  • the data-generating hardware comprises a data generator arranged to generate a count signal.
  • the streaming processor is provided on an FPGA. It will be appreciated (and clear from the detailed description below) that the streaming processors of the above-mentioned third and fourth aspects of the present disclosure are preferably configured to be capable of performing the method including any features mentioned above as being provided “in an embodiment”.
  • a computer system comprising a processor and memory and a streaming processor, e.g. a hardware accelerator, according to the third or fourth aspects of the present disclosure.
  • a method of monitoring operation of programmable logic for a streaming processor comprising: generating a graph representing the programmable logic to be implemented in hardware, the graph comprising nodes and edges connecting the nodes, the edges including control signals and a data bus; inserting, on at least one edge monitoring hardware coupled to both the control signals and the data bus.
  • a tool for enabling monitoring of the operation of programmable logic for a streaming processor comprising: a graph generator for generating a graph representing the programmable logic to be implemented in hardware, the graph comprising nodes and edges connecting nodes in the graph; a monitoring hardware generator, for generating monitoring hardware on each edge of the graph, the monitoring hardware being configured to monitor flow of data along the edge.
  • a tool for monitoring operation of programmable logic for a streaming processor comprising: a graph generator for generating a graph representing the programmable logic to be implemented in hardware, the graph comprising nodes and edges connecting nodes in the graph; a hardware generator for generating and inserting, on at least one edge, data-generating hardware arranged to receive data from an upstream node and generate data at known values having the same flow control pattern as the received data, for onward transmission to a connected node.
  • the tool may be used where the graph has been generated independently. In other words the tool would simply comprise the monitoring hardware generator and/or the hardware generator for generating and inserting the data-generating hardware.
  • the tool may be software optionally provided on a computer-readable medium such as a disk or other form of memory.
  • FIG. 1 is a schematic representation of a graph representing a streaming processor
  • FIG. 2 is a schematic representation of a graph representing a streaming processor
  • FIG. 3 is a schematic representation of a graph representing a streaming processor, comprising a single node arranged to output data to memory;
  • FIG. 4 is a schematic representation of a graph representing a streaming processor including stream status blocks
  • FIG. 5 is a schematic representation of a graph representing a streaming processor comprising 2 nodes and arranged to demonstrate the data and control connections between the nodes;
  • FIG. 6 is a schematic representation of the graph of FIG. 5 including a stream status block
  • FIG. 7 is a schematic representation of the graph of FIG. 4 including detailed view of the output of stream status blocks
  • FIG. 8 is a schematic representation of a graph representing a streaming processor including stream status blocks
  • FIG. 9 is a schematic representation of a graph representing a streaming processor including stream status blocks
  • FIGS. 10 and 11 are schematic representations of the graph of FIG. 9 including detailed views of the outputs of stream status blocks;
  • FIG. 12 is a schematic representation of the graph of FIG. 2 including stream status blocks
  • FIG. 13 is a schematic representation of the graph of FIG. 12 including a more detailed view of the stream status blocks;
  • FIGS. 14A to 14C are schematic representations of various data runs within a streaming processor represented as a 3 node graph
  • FIG. 15 is a schematic representation of a graph representing a streaming processor including stream status blocks and a known FIFO checker;
  • FIGS. 16A and 16B show schematic representations of graphs representing a streaming processor comprising 2 nodes and arranged to demonstrate the data and control connections between the nodes using each of two flow control methodologies;
  • FIGS. 17 and 18 show the connections between the data and control paths between the nodes in the graphs of FIGS. 16A and 16B and stream status blocks;
  • FIGS. 19 , 20 A and 20 B show schematic representations of graphs representing streaming processors, including counterizers
  • FIGS. 21 and 22 show schematic representations of graphs representing a streaming processor comprising 2 nodes and arranged to demonstrate the data and control connections between the nodes;
  • FIG. 23 is a timing diagram of a stream status block with its counters in operation.
  • FIG. 24 is a representation of a streaming processors performance using a pie chart.
  • a method and apparatus is provided by which the problems discussed above are addressed.
  • the means can include either or both of stream status blocks and counterizers.
  • a stream status block is a piece of hardware provided between two kernels within a streaming processor. The stream status block is able to monitor the stream along the edge between two kernels and thereby provide information that enables debugging.
  • a counterizer is, similarly hardware provided within a streaming processor. The counterizer provides a way of injecting known data into any point of the data flow graph while maintaining exact flow control patterns.
  • stream status blocks and counterizer blocks when used together form a debugging suite for hardware that processes data by implementing a data-flow graph.
  • Stream status blocks are a tool for debugging data flow, flow control and performance issues inside stream computers. They provide visibility into what essentially is a black box, and hence can dramatically shorten the time for finding problems that would otherwise take a very long time to figure out.
  • a typical real-life streaming processor such as an FPGA
  • the use of stream status blocks and/or counterizers provides an efficient and simple means by which faults can be identified and therefore by which the processor or its design can be debugged.
  • Stream status blocks are designed to be a zero-effort (for the hardware designer) diagnostic tool, that can be enabled whenever visibility into the data flow graph is needed.
  • Counterizer blocks provide a way of injecting known data into any point of the data flow graph while maintaining exact flow control patterns. Maintaining the same flow control patterns is crucial to reproducing problems. Having known data makes debugging much more efficient, as errors can easily be spotted and makes it possible to determine how the problem that is being solved affects the data.
  • Stream status block can be used together with counterizers or the two can be used separately, i.e., one without the other.
  • FIG. 4 shows a schematic representation of a graph of a streaming processor including kernels 16 , 18 , 20 and 22 .
  • kernel 20 is a multiplexer arranged to output the data received from either kernel 16 or kernel 18 on each clock cycle.
  • each kernel is a “node” within the graph.
  • the graph may represent an entire stream processor as would be implemented on an FPGA. Alternatively, the graph may represent some subset of the features as would be included on the FPGA.
  • Edges connect each of the kernels.
  • a first edge 24 connects kernels 16 and 20 .
  • a second edge 26 connects kernels 18 and 20 and a third edge 28 connects kernels 20 and 22 .
  • Stream status blocks 30 , 32 and 34 are provided.
  • the stream status blocks serve to detect and register automatically flow control violations between the kernels and thereby provide the information required to reconstruct the state of the data flow graph at a given point in time.
  • a user is able to stop the stream and read back the values stored in the stream status blocks. From that information, the user is able to reconstruct the state of the data flow graph at that given point in time.
  • the stream status blocks are within the manager of the stream processor.
  • FIG. 7 shows a reconstructed data flow graph corresponding to that of FIG. 4 .
  • the stream status blocks maintain counters and flow control violation flags that have been read out of the streaming processor. It is reconstructed in that the nodes and edges are the same but the data gathered by the stream status blocks is also presented.
  • the total run time for the stream status block was 100 cycles and, out of this, there were 83 valid cycles. There were 17 invalid cycles (cycles where no data is transferred) for which on 14 cycles the kernel A 16 was throttling and for three, the multiplexer M was stalling. Neither type 1 nor type 2 flow control violations (to be described below) were seen by the stream status block during the 100 cycles. For the 88 cycles between kernel 18 and multiplexer 20 , there were five invalid cycles for which kernel B 18 was throttling for two and the multiplexer M was stalling for three. Again, no flow control violations of type 1 or type 2 were seen.
  • FIG. 5 shows a schematic representation of an edge between two kernels 36 and 38 .
  • the edge 40 is the physical and logical connection between two connected kernels.
  • An edge therefore includes a combination of flow control signals and a data bus.
  • a stall signal 42 and a valid signal 44 are provided and a data bus 46 provides a route for data from the kernel A 36 to the kernel B 38 .
  • a stream status block 48 is provided having connections to each of the flow control connections 42 and 44 and the data bus 46 . By these connections, the stream status block is able to collect the data to enable reconstruction as shown in FIG. 7 .
  • a number of kernels 50 , 52 , 54 and 56 are connected by edges 58 , 60 , 62 and 64 .
  • Stream status blocks 66 , 68 , 70 and 80 are arranged connected to edges, 58 , 60 , 62 and 64 , respectively.
  • Kernel 82 is, in this case, a switch S.
  • the stream status blocks with collected data provide an insight into the path that data actually took when it passed through the streaming processor.
  • FIGS. 9 and 10 show another example of the use of stream status blocks.
  • stream status blocks are used to provide insight into misbehaving nodes or kernels.
  • the stream status blocks are able to provide insight in terms of data swallow, over-producing or wrong switching.
  • three kernels 84 , 86 and 88 are connected in series with edges 90 and 92 .
  • Stream status blocks 94 and 96 are connected to edges 90 and 92 , respectively.
  • the valid cycle count from stream status block 94 with respect to edge 90 is 16 whereas the valid cycle count from stream status block 96 with respect to edge 92 is zero. Since the kernel 86 is a FIFO this indicates that the node is misbehaving since the FIFO should have passed through all data but clearly did not. In other words, the FIFO 86 is “swallowing” data.
  • FIG. 10 shows a further example of a similar arrangement.
  • the stream status blocks provide more information enabling the actual efficiency of the streaming processor to be evaluated.
  • kernel A 84 provided all the data it could in the first 60 cycles, whereas it took the FIFO 95 cycles to output the same amount of data.
  • Kernel C 88 was responsible for only five cycles of throttling. Therefore, it can be concluded that for the period that the stream status blocks were monitoring the FIFO was not performing as fast as it could.
  • the efficiency of the processor can be evaluated.
  • FIG. 11 shows a further example of the operation of a stream status block.
  • the stream status block provides a checksum value of the data passing along the edge 90 between kernel 84 and FIFO 86 .
  • a checksum is generated by the stream status block 96 on the data passing along the consecutive edge 92 between FIFO 86 and kernel C 88 .
  • the edge 90 between kernel 84 and FIFO 86 and the edge 92 between FIFO 86 and kernel C 88 can be referred to as consecutive edges. Since a FIFO should not modify data that passes through it, it is easy to spot when there is an error or fault with the FIFO due to the change in checksum value. Thus, by comparing the checksum value provided by the two stream status blocks 94 and 96 , it is easy to identify whether or not a FIFO has introduced errors into data passing through it.
  • a number of input kernels 98 , 100 and 102 are provided and arranged connected via various edges eventually (through other kernels 104 and 106 ) to an output kernel 108 . If the output kernel 108 does not produce any output data, it can be impossible easily to tell which of the kernels upstream is responsible for this. With the use of stream status blocks, it is possible to observe the state of the data flow and therefore diagnose the problem. As shown in FIG. 12 , stream status blocks are provided to determine the number of valid cycles on each of the edges within the processor. As a processor designer, it is possible to know how many valid cycles should be expected for a given input. In the present example, kernel D is expected to output eight data items. There ought, therefore, be eight valid cycles on the edge between kernels 104 and 108 . The stream status block 110 coupled to this edge in fact, shows zero valid cycles. Therefore, kernel 104 is where debugging investigations would commence.
  • FIG. 13 shows the same basic streaming processor as in FIG. 12 .
  • the valid cycle count from stream status block 110 is eight.
  • the basic problem of no valid cycles appearing on an edge no longer applies. It is still necessary to determine if the kernels are operating correctly.
  • the use of a checksum within the stream status block enables this problem to be solved by calculating a checksum value for the data stream passing through the processor at each edge. Since a process designer will typically know the checksum value to expect, it is possible to find data corruptions using simple comparisons, i.e., comparing the determined value with the expected value.
  • FIG. 13 as the user streamed data in to kernel 98 the expected check sum value on this edge is known and can be compared with the value recorded by the stream status block.
  • check sums may be calculated on plural or even all of the edges of a streaming processor. This means that if there is intermittent data corruption it is possible to detect where it occurred by streaming the same input data multiple times.
  • FIGS. 14A to 14C show an example of this.
  • a simple streaming processor design comprises three kernels 112 , 114 and 116 .
  • Stream status blocks 118 and 120 are provided on the edges connecting the various kernels.
  • the check sum value is different in run 2 ( FIG. 14B ) as compared to that in each of runs 1 and 3 ( FIGS. 14A and 14C ).
  • the kernel B 114 is intermittently corrupting data.
  • stream status blocks Considering the function and effect of stream status blocks, it is clear that there are significant distinctions as from known means for monitoring data flows. Considering, for example, use of a known system observation bus, the use of stream status blocks is beneficial in that there is no change to the routing of data. In other words, the flow control pattern of the stream is unchanged and means that it is possible simply to reconstruct the flow graph of a stream status processor using data accumulated by the stream status blocks. No routing or re-routing of data is required with the use of stream status blocks since they simply monitor data passing along the normal established edges within a streaming processor.
  • cyclic redundancy checks are performed on FIFOs within a programmable logic device.
  • the method for detecting data corruption inside a FIFO is provided by calculating CRC values on the input and output of the FIFO and then comparing them.
  • the use of streamed status blocks with checksums provides a more general implementation of this functionality.
  • the FIFO is merely a node or kernel on the data flow graph but could have been any other node as well.
  • stream status blocks provide a generalised approach for calculating checksums on any edge of a data flow graph and are not limited to a specific node type like SRAM.
  • FIG. 15 shows a schematic representation of a streaming processor comprising kernels 122 , 124 , 126 and 128 .
  • Kernel 126 is a FIFO.
  • Stream status blocks 130 , 132 and 134 are provided. Their function, as described above, is to determine checksum values along the edges between the various connected pairs of kernels. In contrast, where a known FIFO checker would be used, this is specific to a FIFO and does not provide the general ability to monitor and model data flow within a streaming processor.
  • FIGS. 16A and 16B show examples of the flow control methodologies that would typically be used within a streaming processor. Two kernels are provided with data flowing from first node A 136 to second node B 138 . A data flow 140 is therefore provided irrespective of the flow control methodology.
  • FIG. 16A shows an example of a push stream flow control methodology in which “valid” and “stall” flow control signals are used to control data flow between the kernels. When the valid flow control signal is asserted, data is defined as transferring from kernel A 136 to kernel B 138 . If kernel 13 cannot accept new data, it asserts the “stall” signal and valid will therefore stop after a number of cycles defined as the stall latency (SL).
  • SL stall latency
  • a pull stream control flow methodology is utilised.
  • data is defined as transferring or moving from kernel A 136 to kernel B 138 exactly RL (real latency) cycles after the read flow control signal has been asserted. If kernel A 136 has no more data to transfer, it will assert an empty signal and the read signal will then de-assert EL cycles afterwards (Empty Latency).
  • the manner in which the stream status blocks are coupled to these inter-kernel connections will now be described with reference to FIGS. 17 and 18 .
  • the connections between a stream status block and the edge are shown for the PUSH stream control stream methodology.
  • the stream status block 140 has inputs from the stall and valid signals and also from the data stream as the data bus itself.
  • a de-assert signal may be hardwired into read and empty inputs on the stream status block 140 since they are not required when a PUSH stream flow control methodology is utilised.
  • the connections for a stream status block 140 are shown when a PULL stream flow control methodology is utilised.
  • the read and empty signals are connected to corresponding inputs on the stream status block as is the data. Stall and valid inputs are de-asserted.
  • FIG. 23 shows a timing diagram for data signals between two kernels when operating as a PUSH stream.
  • a clock 142 defines the clock domain for the input data stream. Initially, at time T 0 valid and stall are both de-asserted.
  • the stream status block is required to provide an accurate picture of how data moves inside the data flow graph, i.e., between kernels and along the edge connecting the kernels in question. This will enable reconstruction of the data flow graph. Therefore, it is preferably arranged to provide values from three cycle counters: a valid counter, a stall counter and a total counter.
  • FIG. 23 shows the behaviour of each of these counters.
  • the counter values can be read back from the hardware through some known mechanism.
  • the counter values are exposed using readable registers.
  • the valid counter represents the number of data items moved.
  • the stall counter represents the number of cycles that the destination was stalling.
  • the time the source node was throttling is derived by subtraction of the valid counter from the total counter.
  • the present stream's performance can be represented in a pie chart as shown in FIG. 24 .
  • the stream was running for a total of 18 cycles, as derivable from the fact that the value for the total counter was 18.
  • Nine of the 18 cycles had data moving as demonstrated by the fact that the value of the valid counter is nine.
  • the data was stalled by the destination and the remaining were therefore throttled by the source.
  • FIG. 21 shows an example of a checksum calculator wiring inside a stream status block.
  • Stream status blocks are not limited to a specific checksum algorithm. However they are mostly suited for algorithms which can be applied to data streams.
  • the checksum calculator recognises this and determines a checksum based on the data passing along the data bus.
  • the checksum calculator determines a checksum based on the data passing along the data bus.
  • the checksum calculated is, effectively, turned off. There would at this point be no data passing along the data bus.
  • a counterizer block is hardware attached to an edge (within the manager) at the output of a kernel and is controlled to replace the output data from the kernel with known data but to maintain precisely the same data flow, i.e., stall pattern, as the original output.
  • edge within the manager
  • stall pattern the pattern of data flowing from the kernel in question is not changed, but the actual values of the data are at known levels. This enables any unexpected variations in subsequent outputs from the streaming processor to be identified and de-bugged as appropriate.
  • the processor includes kernels 144 , 146 and 148 .
  • a counterizer 150 is provided between the first and second kernels 144 and 146 .
  • the counterizer enables it to be known exactly when a node has started to consume data and what part of the data was output.
  • the kernel B 146 is a FIFO. Assuming it has been determined that there is a problem with the FIFO using stream status blocks as described above, it is still not possible to know which data items are missing. In particular, it is desired to know if a first, last or middle data item is missing from the output from the FIFO 146 .
  • the counterizer 150 serves to inject known data values into the FIFO 146 .
  • the data output from the FIFO 146 is then observed and it can be seen at what stage the operation of FIFO B 146 is failing.
  • the counterizer block 150 With the counterizer block 150 , there is a guaranteed input to FIFO 146 so it is possible to calculate what to expect at the output from the FIFO.
  • Table 1 below shows an example of a data capture window, both with and without the counterizer block.
  • FIG. 20A shows a further example of a streaming processor including a counterizer.
  • a streaming processor including a counterizer.
  • counterizer 150 serves to provide a counter data stream which is written to storage 14 and thereby enables a user to inject known data into the storage and therefore to know what to expect when the storage is examined. It is significant that the counterizer block 150 maintains the same flow pattern as the kernel 12 , only substituting the data, as usually, errors will only be triggered when a certain sequence of events happens. Without following the exact flow pattern behaviour of the upstream kernel, it is most likely that the error that is being debugged will not be triggered.
  • FIG. 20B shows a further example of a data flow graph including a counterizer 160 .
  • a kernel 152 is arranged to provide an output to a further kernel 154 which is, in turn, connected to kernel 156 .
  • Stream status blocks 158 are provided connected to the various edges within the data flow graph.
  • a counterizer 160 is provided arranged to receive the output from the kernel U 152 and provide a counted input stream to the kernel 154 .
  • the counterizer block 160 attaches to the output of the kernel 152 and replaces the output data with known data, i.e., a count. Since the counterizer block 160 always outputs known data values, it is possible to calculate what checksum to expect at the output of the “multiply ⁇ 2” kernel 154 , and indeed verify that this is in fact the value that came out of this kernel.
  • FIG. 22 shows a schematic representation of how a counterizer block would typically be wired into a streaming processor.
  • a counterizer block 166 is coupled to the lines between the kernels 162 and 164 . The wiring of the connection between the counterizer block and the kernels 162 and 164 is clearly shown.
  • the counterizer block 166 includes a data generator 168 arranged to receive input from each of the valid and stall connections between the kernels 162 and 164 .
  • the data generator is able to emulate the exact data flow pattern between the kernels.
  • the actual data bus 170 between the kernels 162 and 164 is broken by the data generator such that data output from the kernel 162 is discarded within the counterizer block 166 .
  • the flow control signals are passed through so as to provide precise flow-control pattern preservation.
  • counterizer blocks provide a way of injecting known data into any point of the data flow graph while maintaining exact flow control patterns. Maintaining the same flow control patterns can be crucial to reproducing problems and thereby enabling their identification and de-bugging. Having known data makes debugging significantly more efficient as errors can easily be spotted and it can similarly be easily determined how the problem that is being debugged affects data. In contrast to known attempts at providing means for diagnosing problems within streaming processors and debugging them, a counterizer block replaces the data whilst maintaining data flow patterns.
  • the present method and apparatus provides a useful tool for debugging streaming processors in an efficient and precise manner.
  • Embodiments of the present invention have been described with particular reference to the examples illustrated. However, it will be appreciated that variations and modifications may be made to the examples described and are within the scope of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Disclosed is a method of monitoring operation of programmable logic for a streaming processor, the method comprising: generating a graph representing the programmable logic to be implemented in hardware, the graph comprising nodes and edges connecting nodes in the graph; inserting, on each edge, monitoring hardware to monitor flow of data along the edge. Also disclosed is a method of monitoring operation of programmable logic for a streaming processor, the method comprising: generating a graph representing the programmable logic to be implemented in hardware, the graph comprising nodes and edges connecting the nodes in the graph; inserting, on at least one edge, data-generating hardware arranged to receive data from an upstream node and generate data at known values having the same flow control pattern as the received data for onward transmission to a connected node.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • Not applicable.
  • STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
  • Not applicable.
  • The present invention relates to methods of monitoring operation of programmable logic as may be used, for example, in a process of debugging a streaming processor. In examples, the invention relates to a method for monitoring operation, and optionally then debugging, a field programmable gate array (FPGA).
  • Typically, a streaming processor such as might be provided by the assignee, Maxeler Technologies Ltd., consists of an FPGA, connected to multiple memories or other external data sources/sinks. On the FPGA, the circuit is made up of a manager containing one or more blocks including kernels. Typically the streaming processor may be used as a hardware accelerator for certain computing applications.
  • Kernels are hardware data-paths implementing the arithmetic and logical computations needed within an algorithm. A “manager” is the collective term for the FPGA logic which orchestrates or controls data flow between Kernels and off-chip input/output (I/O) in the form of streams. By using a streaming model for off-chip I/O to the associated external components, e.g. PCI Express bus and DRAM memory, managers are able to achieve high utilization of available bandwidth in off-chip communication channels. A user, when designing or configuring an FPGA, controls the designs of the kernels and the configuration of the manager so as to ensure that the FPGA performs the desired processing steps on data passing through it.
  • Typically dataflow hardware accelerators implement a streaming model of computation in which computations are described structurally (computing in space) rather than specifying a sequence of processor instructions (computing in time). In this model of computation, a high-level language is used to generate a graph of operations. FIG. 1 shows a schematic representation of such a graph. The graph 2 comprises nodes 4, 6, 8 and 10 each node being a kernel within the streaming processor. Each node in the graph executes a specific function on incoming data and outputs the result, which becomes the input to another node in the graph. The data being processed “flows” through the graph from one node to the next, without requiring writing back to memory. This graph may then be implemented as an application-specific circuit within an FPGA accelerator. In this example, kernel 8 may be a multiplexer arranged to select one of the outputs from nodes 4 and 6 and provide this value to the kernel 10. It will be appreciated that the example of FIG. 1 is a simplified example of what such a streaming processor may typically look like. In practice such a graph is likely to have up to thousands or even more nodes with connecting edges. Thus, to map data flow and identify errors in the flow of data in such a large graph represents a significant technical problem.
  • Streaming accelerators implemented using FPGAs or other similar processing technology, can offer increased performance on many useful applications compared to conventional microprocessors. See for example our co-pending applications, U.S. Ser. No. 12/636,906, U.S. Ser. No. 12/792,197, U.S. Ser. No. 12/823,432, U.S. Ser. No. 13/023,275 and U.S. Ser. No. 13/029,696, the entire contents of all of which are hereby incorporated by reference. In our co-pending application Ser. No. 13/166,565, the entire contents of which are hereby incorporated by reference, there is described a method for debugging the control flow on an FPGA.
  • Although streaming processors themselves are immensely useful for various types of computer applications, when constructing a streaming processor, problems can be encountered. One such problem is that there can be no visibility as to why a streaming processor fails to operate as expected. It is often very difficult to determine where investigations into such a failure should start. Indeed, it can be extremely difficult to find the source of data corruption only by observing the inputs and outputs of a streaming processor. For example, consider the graph shown in FIG. 2. The graph 2 represents a streaming processor with each of the nodes in the graph representing a kernel within the streaming processor. When streaming data in through nodes I1, I2 and I3, if the output node O does not produce any output data, it can be impossible to tell which upstream node is faulty.
  • According to a first aspect of the present disclosure, there is provided a method of monitoring operation of programmable logic for a streaming processor, the method comprising: generating a graph representing the programmable logic to be implemented in hardware, the graph comprising nodes and edges connecting nodes in the graph; and, inserting, on each edge, monitoring hardware to monitor flow of data along the edge.
  • The method provides a means by which hardware can be used to enable problems or faults within a streaming processor to be easily and quickly identified or diagnosed. Given the scale of current streaming processors the method provides a useful means by which faults can quickly and automatically be identified. Once found, such faults can be fixed or debugged in the usual way. Furthermore the method provides a way by which data can easily be collected about the operation of a streaming processor so that this gathered information can be used to reconstruct the state of the data flow graph at a given point in time. The graph may represent an entire stream processor as would be implemented on an FPGA. Alternatively, the graph may represent some subset of the features as would be included on the FPGA.
  • In one embodiment, each edge comprises flow control signals and a data bus for flow of data, and wherein the method comprises coupling the monitoring hardware to both the flow control signals and the data bus. By coupling the hardware to the flow control signals as well as to the data bus it possible to ensure that the hardware is aware of when data is passing on the data bus.
  • In one embodiment, the method comprises reading parameters associated with the data with the monitoring hardware, the parameters including the number of valid data cycles.
  • In one embodiment, the method comprises performing a checksum on passing data with the monitoring hardware.
  • In one embodiment, the method comprises performing a checksum on at least two consecutive edges and comparing the checksum values. By doing this it can be possible to check whether the node between the two edges is functioning correctly, in the particular case of a node which is not supposed to modify the data, e.g. a FIFO buffer. If the checksum varies, then clearly the data will have been modified and so the FIFO will not have functioned correctly.
  • In one embodiment, the method comprises determining the number of valid cycles along every edge in the graph thereby identifying one or more routes taken by data through the graph. This enables the route taken by data through the data flow graph to be easily determined.
  • In one embodiment, the method comprises determining the number of valid cycles along at least two consecutive edges and comparing the numbers. By comparing the number of valid cycles along two consecutive edges it is possible to establish whether or not data has been lost in a node in a manner that might not have been expected.
  • In one embodiment, at least one of the nodes comprises a FIFO memory.
  • In some situations, it is difficult to find the point in time relative to the beginning of a streaming processor where a failure occurred. For example, where a FIFO is swallowing data, i.e., not outputting the required amount of data, it is relatively straightforward to identify that there is a problem, e.g. by counting how much data comes out of the FIFO and comparing this to the amount of data that has gone in. However, it is extremely difficult to know which data items are missing. In particular, it would be desirable to know if the missing data is from the start, the end or the middle of the input data.
  • One further problem is that it is difficult to debug a problem when the data stream consists of unknown (or difficult to determine) values. Referring to FIG. 3, a kernel 12 is arranged to output data to storage 14. After the stream has completed, it would be possible to inspect the external data storage 14. However, without knowing exactly what data was written it would be very difficult to arrive at any conclusions.
  • 2002/082269 relates to a system observation bus and provides a method and mechanism for configuring a node in a computing system to route data to a predetermined observation point. U.S. Pat. No. 6,678,861 relates to a FIFO with cyclic redundancy check (CRC) in a programmable logic device (PLD). A PLD is provided comprising one or more memory circuits configured to check a CRC value of an input and generate a CRC for an output. U.S. Pat. No. 7,543,216 relates to cyclic redundancy checking of a field programmable gate array having an SRAM memory architecture.
  • According to a second aspect of the present disclosure, there is provided a method of monitoring operation of programmable logic for a streaming processor, the method comprising: generating a graph representing the programmable logic to be implemented in hardware, the graph comprising nodes and edges connecting the nodes; inserting, on at least one edge, data-generating hardware arranged to receive data from an upstream node and generate data at known values having the same flow control pattern as the received data for onward transmission to a connected node.
  • Thus the actual data received by the data generating hardware is not passed on to the next connected node but rather dummy data having the same flow control pattern is passed on. In some cases the flow control pattern of data is important to determine operation of the streaming processor so that by emulating this pattern using dummy data, the effect of the flow control pattern on the streaming processor can effectively be isolated since the effect of the data itself is removed.
  • In an embodiment, the data-generating hardware is provided on each edge in the graph.
  • In an embodiment, the data-generating hardware is arranged to generate a count signal. In other words, the known values of data generated are simply a count which can be arranged to increment uniformly. This means that the effect on the data of the nodes is known and so if any differences are encountered between the expected output of the nodes and the actual outputs then it can easily be determined that there is some error with the node.
  • In an embodiment, each edge comprises a data bus for flow of data and flow control signals for the transmission of flow control signals, and wherein the method comprises coupling the data-generating hardware to both the flow control signals and the data bus.
  • In an embodiment the method comprises incrementing the counter when the flow control signals indicate that data should transfer between the nodes. Thus by connecting the data generating hardware to the flow control signals along the edge an easy way of ensuring that the flow control pattern is maintained is provided.
  • In an embodiment, the data-generating hardware is arranged to receive an input from the data bus and to provide as an output a count signal having the same flow control pattern as the data received on the data bus.
  • In an embodiment, the method comprises coupling the control signals to a data generator within the count-generating hardware, and in dependence on the flow control signals generating the count signal.
  • In an embodiment, the method comprises operating the data-generating hardware at the same clock rate as the data received from the upstream node.
  • According to a third aspect of the present disclosure, there is provided a streaming processor comprising: plural nodes for processing streaming data; at least one edge connecting the one or more nodes; monitoring hardware provided on each of the edges to monitor flow of data along the respective edge.
  • According to a fourth aspect of the present disclosure, there is provided a streaming processor comprising: plural nodes for processing streaming data; at least one edge connecting pairs of the one or more nodes; data-generating hardware arranged to receive data from an upstream node in a pair of nodes and generate data at known values having the same flow control pattern as the received data for onward transmission to a downstream node in the pair of nodes.
  • In an embodiment, the data-generating hardware comprises a data generator arranged to generate a count signal.
  • In an embodiment, the streaming processor is provided on an FPGA. It will be appreciated (and clear from the detailed description below) that the streaming processors of the above-mentioned third and fourth aspects of the present disclosure are preferably configured to be capable of performing the method including any features mentioned above as being provided “in an embodiment”.
  • According to a further aspect of the present disclosure, there is provided a computer system comprising a processor and memory and a streaming processor, e.g. a hardware accelerator, according to the third or fourth aspects of the present disclosure.
  • According to a further aspect of the present disclosure, there is provided a method of monitoring operation of programmable logic for a streaming processor, the method comprising: generating a graph representing the programmable logic to be implemented in hardware, the graph comprising nodes and edges connecting the nodes, the edges including control signals and a data bus; inserting, on at least one edge monitoring hardware coupled to both the control signals and the data bus.
  • According to a further aspect of the present disclosure, there is provided a tool for enabling monitoring of the operation of programmable logic for a streaming processor, the tool comprising: a graph generator for generating a graph representing the programmable logic to be implemented in hardware, the graph comprising nodes and edges connecting nodes in the graph; a monitoring hardware generator, for generating monitoring hardware on each edge of the graph, the monitoring hardware being configured to monitor flow of data along the edge.
  • According to a further aspect of the present disclosure, there is provided a tool for monitoring operation of programmable logic for a streaming processor, the tool comprising: a graph generator for generating a graph representing the programmable logic to be implemented in hardware, the graph comprising nodes and edges connecting nodes in the graph; a hardware generator for generating and inserting, on at least one edge, data-generating hardware arranged to receive data from an upstream node and generate data at known values having the same flow control pattern as the received data, for onward transmission to a connected node.
  • In some examples, the tool may be used where the graph has been generated independently. In other words the tool would simply comprise the monitoring hardware generator and/or the hardware generator for generating and inserting the data-generating hardware. The tool may be software optionally provided on a computer-readable medium such as a disk or other form of memory.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Embodiments of the present invention will now be described in detail with reference to the accompanying drawings, in which:
  • FIG. 1 is a schematic representation of a graph representing a streaming processor;
  • FIG. 2 is a schematic representation of a graph representing a streaming processor;
  • FIG. 3 is a schematic representation of a graph representing a streaming processor, comprising a single node arranged to output data to memory;
  • FIG. 4 is a schematic representation of a graph representing a streaming processor including stream status blocks;
  • FIG. 5 is a schematic representation of a graph representing a streaming processor comprising 2 nodes and arranged to demonstrate the data and control connections between the nodes;
  • FIG. 6 is a schematic representation of the graph of FIG. 5 including a stream status block;
  • FIG. 7 is a schematic representation of the graph of FIG. 4 including detailed view of the output of stream status blocks;
  • FIG. 8 is a schematic representation of a graph representing a streaming processor including stream status blocks;
  • FIG. 9 is a schematic representation of a graph representing a streaming processor including stream status blocks;
  • FIGS. 10 and 11 are schematic representations of the graph of FIG. 9 including detailed views of the outputs of stream status blocks;
  • FIG. 12 is a schematic representation of the graph of FIG. 2 including stream status blocks;
  • FIG. 13 is a schematic representation of the graph of FIG. 12 including a more detailed view of the stream status blocks;
  • FIGS. 14A to 14C are schematic representations of various data runs within a streaming processor represented as a 3 node graph;
  • FIG. 15 is a schematic representation of a graph representing a streaming processor including stream status blocks and a known FIFO checker;
  • FIGS. 16A and 16B show schematic representations of graphs representing a streaming processor comprising 2 nodes and arranged to demonstrate the data and control connections between the nodes using each of two flow control methodologies;
  • FIGS. 17 and 18 show the connections between the data and control paths between the nodes in the graphs of FIGS. 16A and 16B and stream status blocks;
  • FIGS. 19, 20A and 20B show schematic representations of graphs representing streaming processors, including counterizers;
  • FIGS. 21 and 22 show schematic representations of graphs representing a streaming processor comprising 2 nodes and arranged to demonstrate the data and control connections between the nodes;
  • FIG. 23 is a timing diagram of a stream status block with its counters in operation; and
  • FIG. 24 is a representation of a streaming processors performance using a pie chart.
  • DETAILED DESCRIPTION OF DISCLOSED EMBODIMENTS
  • A method and apparatus is provided by which the problems discussed above are addressed. In particular within a streaming processor means to facilitate debugging of a streaming processor are provided. The means can include either or both of stream status blocks and counterizers. A stream status block is a piece of hardware provided between two kernels within a streaming processor. The stream status block is able to monitor the stream along the edge between two kernels and thereby provide information that enables debugging. A counterizer is, similarly hardware provided within a streaming processor. The counterizer provides a way of injecting known data into any point of the data flow graph while maintaining exact flow control patterns.
  • It will therefore be appreciated that stream status blocks and counterizer blocks, when used together form a debugging suite for hardware that processes data by implementing a data-flow graph.
  • Stream status blocks are a tool for debugging data flow, flow control and performance issues inside stream computers. They provide visibility into what essentially is a black box, and hence can dramatically shorten the time for finding problems that would otherwise take a very long time to figure out. In the case of a typical real-life streaming processor such as an FPGA, when represented as a graph in the manner described above the size of the graph is large. Therefore the use of stream status blocks and/or counterizers provides an efficient and simple means by which faults can be identified and therefore by which the processor or its design can be debugged.
  • Stream status blocks are designed to be a zero-effort (for the hardware designer) diagnostic tool, that can be enabled whenever visibility into the data flow graph is needed. Counterizer blocks provide a way of injecting known data into any point of the data flow graph while maintaining exact flow control patterns. Maintaining the same flow control patterns is crucial to reproducing problems. Having known data makes debugging much more efficient, as errors can easily be spotted and makes it possible to determine how the problem that is being solved affects the data.
  • A number of detailed but non-limiting examples of the use of stream status blocks and counterizers will now be described in detail. Stream status block can be used together with counterizers or the two can be used separately, i.e., one without the other.
  • FIG. 4 shows a schematic representation of a graph of a streaming processor including kernels 16, 18, 20 and 22. The precise role of the kernels in this example is not important. However, kernel 20 is a multiplexer arranged to output the data received from either kernel 16 or kernel 18 on each clock cycle. It will be appreciated that each kernel is a “node” within the graph. For brevity and clarity in the description the nodes will simply be referred to as “kernels”. The graph may represent an entire stream processor as would be implemented on an FPGA. Alternatively, the graph may represent some subset of the features as would be included on the FPGA.
  • Edges connect each of the kernels. A first edge 24 connects kernels 16 and 20. A second edge 26 connects kernels 18 and 20 and a third edge 28 connects kernels 20 and 22. Stream status blocks 30, 32 and 34 are provided. The stream status blocks serve to detect and register automatically flow control violations between the kernels and thereby provide the information required to reconstruct the state of the data flow graph at a given point in time. At a given point in time, a user is able to stop the stream and read back the values stored in the stream status blocks. From that information, the user is able to reconstruct the state of the data flow graph at that given point in time. The stream status blocks are within the manager of the stream processor.
  • FIG. 7 shows a reconstructed data flow graph corresponding to that of FIG. 4. As can be seen, the stream status blocks maintain counters and flow control violation flags that have been read out of the streaming processor. It is reconstructed in that the nodes and edges are the same but the data gathered by the stream status blocks is also presented. Referring to the stream status block between the kernels 16 and 20, the total run time for the stream status block was 100 cycles and, out of this, there were 83 valid cycles. There were 17 invalid cycles (cycles where no data is transferred) for which on 14 cycles the kernel A 16 was throttling and for three, the multiplexer M was stalling. Neither type 1 nor type 2 flow control violations (to be described below) were seen by the stream status block during the 100 cycles. For the 88 cycles between kernel 18 and multiplexer 20, there were five invalid cycles for which kernel B 18 was throttling for two and the multiplexer M was stalling for three. Again, no flow control violations of type 1 or type 2 were seen.
  • To understand the operation of the stream status blocks in greater detail, the arrangement of the edges will now be described with reference to FIGS. 5 and 6. FIG. 5 shows a schematic representation of an edge between two kernels 36 and 38. The edge 40 is the physical and logical connection between two connected kernels. An edge therefore includes a combination of flow control signals and a data bus. In this example, a stall signal 42 and a valid signal 44 are provided and a data bus 46 provides a route for data from the kernel A 36 to the kernel B 38. Referring now to FIG. 6, a stream status block 48 is provided having connections to each of the flow control connections 42 and 44 and the data bus 46. By these connections, the stream status block is able to collect the data to enable reconstruction as shown in FIG. 7.
  • By analysis and review of the data collected by the stream status blocks, it is possible to determine useful information regarding the activity within a streaming processor. In the example of FIG. 8, a number of kernels 50, 52, 54 and 56 are connected by edges 58, 60, 62 and 64. Stream status blocks 66, 68, 70 and 80 are arranged connected to edges, 58, 60, 62 and 64, respectively. Kernel 82 is, in this case, a switch S. The stream status blocks with collected data provide an insight into the path that data actually took when it passed through the streaming processor. It is clear from this example that given that there are 16 valid cycles between kernels 50 and 82, and 16 valid cycles between kernels 82 and 56, but no valid cycles between either kernels 52 and 54 and the switch 82 that the data passing through switch 82 from kernel A 50 continued into the kernel D 56.
  • FIGS. 9 and 10 show another example of the use of stream status blocks. In this example, stream status blocks are used to provide insight into misbehaving nodes or kernels. The stream status blocks are able to provide insight in terms of data swallow, over-producing or wrong switching. In FIG. 9, three kernels 84, 86 and 88 are connected in series with edges 90 and 92. Stream status blocks 94 and 96 are connected to edges 90 and 92, respectively. The valid cycle count from stream status block 94 with respect to edge 90 is 16 whereas the valid cycle count from stream status block 96 with respect to edge 92 is zero. Since the kernel 86 is a FIFO this indicates that the node is misbehaving since the FIFO should have passed through all data but clearly did not. In other words, the FIFO 86 is “swallowing” data.
  • FIG. 10 shows a further example of a similar arrangement. However, in this case the stream status blocks provide more information enabling the actual efficiency of the streaming processor to be evaluated. Looking at the example of FIG. 10, it can be seen that kernel A84 provided all the data it could in the first 60 cycles, whereas it took the FIFO 95 cycles to output the same amount of data. Kernel C 88 was responsible for only five cycles of throttling. Therefore, it can be concluded that for the period that the stream status blocks were monitoring the FIFO was not performing as fast as it could. Thus, by analysis of the data generated by the stream status blocks, the efficiency of the processor can be evaluated.
  • FIG. 11 shows a further example of the operation of a stream status block. In this case, the stream status block provides a checksum value of the data passing along the edge 90 between kernel 84 and FIFO 86. Similarly, a checksum is generated by the stream status block 96 on the data passing along the consecutive edge 92 between FIFO 86 and kernel C 88. In other words the edge 90 between kernel 84 and FIFO 86 and the edge 92 between FIFO 86 and kernel C 88 can be referred to as consecutive edges. Since a FIFO should not modify data that passes through it, it is easy to spot when there is an error or fault with the FIFO due to the change in checksum value. Thus, by comparing the checksum value provided by the two stream status blocks 94 and 96, it is easy to identify whether or not a FIFO has introduced errors into data passing through it.
  • Considering, for example, the streaming processor of FIG. 2, a number of input kernels 98, 100 and 102 are provided and arranged connected via various edges eventually (through other kernels 104 and 106) to an output kernel 108. If the output kernel 108 does not produce any output data, it can be impossible easily to tell which of the kernels upstream is responsible for this. With the use of stream status blocks, it is possible to observe the state of the data flow and therefore diagnose the problem. As shown in FIG. 12, stream status blocks are provided to determine the number of valid cycles on each of the edges within the processor. As a processor designer, it is possible to know how many valid cycles should be expected for a given input. In the present example, kernel D is expected to output eight data items. There ought, therefore, be eight valid cycles on the edge between kernels 104 and 108. The stream status block 110 coupled to this edge in fact, shows zero valid cycles. Therefore, kernel 104 is where debugging investigations would commence.
  • FIG. 13 shows the same basic streaming processor as in FIG. 12. However, in this case, the valid cycle count from stream status block 110 is eight. The basic problem of no valid cycles appearing on an edge no longer applies. It is still necessary to determine if the kernels are operating correctly. The use of a checksum within the stream status block enables this problem to be solved by calculating a checksum value for the data stream passing through the processor at each edge. Since a process designer will typically know the checksum value to expect, it is possible to find data corruptions using simple comparisons, i.e., comparing the determined value with the expected value. In FIG. 13, as the user streamed data in to kernel 98 the expected check sum value on this edge is known and can be compared with the value recorded by the stream status block.
  • In a further example, check sums may be calculated on plural or even all of the edges of a streaming processor. This means that if there is intermittent data corruption it is possible to detect where it occurred by streaming the same input data multiple times. FIGS. 14A to 14C show an example of this. In this case, a simple streaming processor design comprises three kernels 112, 114 and 116. Stream status blocks 118 and 120 are provided on the edges connecting the various kernels. As can be seen, the check sum value is different in run 2 (FIG. 14B) as compared to that in each of runs 1 and 3 (FIGS. 14A and 14C). Thus, it appears likely that the kernel B 114 is intermittently corrupting data.
  • Considering the function and effect of stream status blocks, it is clear that there are significant distinctions as from known means for monitoring data flows. Considering, for example, use of a known system observation bus, the use of stream status blocks is beneficial in that there is no change to the routing of data. In other words, the flow control pattern of the stream is unchanged and means that it is possible simply to reconstruct the flow graph of a stream status processor using data accumulated by the stream status blocks. No routing or re-routing of data is required with the use of stream status blocks since they simply monitor data passing along the normal established edges within a streaming processor.
  • In another known method, cyclic redundancy checks are performed on FIFOs within a programmable logic device. Using such an arrangement, the method for detecting data corruption inside a FIFO is provided by calculating CRC values on the input and output of the FIFO and then comparing them. In contrast, the use of streamed status blocks with checksums provides a more general implementation of this functionality. In other words, the FIFO is merely a node or kernel on the data flow graph but could have been any other node as well. Thus, stream status blocks provide a generalised approach for calculating checksums on any edge of a data flow graph and are not limited to a specific node type like SRAM.
  • Stream status blocks can be automatically inserted into any edge of the data flow graph and are not kernel-type specific. Clearly, the numbers that the stream status block outputs makes sense when considered in the context of the kernel that the stream status block is attached to. FIG. 15 shows a schematic representation of a streaming processor comprising kernels 122, 124, 126 and 128. Kernel 126 is a FIFO. Stream status blocks 130, 132 and 134 are provided. Their function, as described above, is to determine checksum values along the edges between the various connected pairs of kernels. In contrast, where a known FIFO checker would be used, this is specific to a FIFO and does not provide the general ability to monitor and model data flow within a streaming processor.
  • FIGS. 16A and 16B show examples of the flow control methodologies that would typically be used within a streaming processor. Two kernels are provided with data flowing from first node A136 to second node B138. A data flow 140 is therefore provided irrespective of the flow control methodology. FIG. 16A shows an example of a push stream flow control methodology in which “valid” and “stall” flow control signals are used to control data flow between the kernels. When the valid flow control signal is asserted, data is defined as transferring from kernel A 136 to kernel B 138. If kernel 13 cannot accept new data, it asserts the “stall” signal and valid will therefore stop after a number of cycles defined as the stall latency (SL).
  • In FIG. 16B, a pull stream control flow methodology is utilised. In this case, data is defined as transferring or moving from kernel A 136 to kernel B 138 exactly RL (real latency) cycles after the read flow control signal has been asserted. If kernel A 136 has no more data to transfer, it will assert an empty signal and the read signal will then de-assert EL cycles afterwards (Empty Latency). The manner in which the stream status blocks are coupled to these inter-kernel connections will now be described with reference to FIGS. 17 and 18.
  • In FIG. 17, the connections between a stream status block and the edge are shown for the PUSH stream control stream methodology. In this example, the stream status block 140 has inputs from the stall and valid signals and also from the data stream as the data bus itself. A de-assert signal may be hardwired into read and empty inputs on the stream status block 140 since they are not required when a PUSH stream flow control methodology is utilised.
  • In the example of FIG. 18, the connections for a stream status block 140 are shown when a PULL stream flow control methodology is utilised. As can be seen, in this case, the read and empty signals are connected to corresponding inputs on the stream status block as is the data. Stall and valid inputs are de-asserted.
  • To provide a more detailed understanding of the operation of a stream status block, reference is now made to FIG. 23 which shows a timing diagram for data signals between two kernels when operating as a PUSH stream. A clock 142 defines the clock domain for the input data stream. Initially, at time T0 valid and stall are both de-asserted.
  • The stream status block is required to provide an accurate picture of how data moves inside the data flow graph, i.e., between kernels and along the edge connecting the kernels in question. This will enable reconstruction of the data flow graph. Therefore, it is preferably arranged to provide values from three cycle counters: a valid counter, a stall counter and a total counter.
  • FIG. 23 shows the behaviour of each of these counters. When an analysis of the streaming processor is required, the counter values can be read back from the hardware through some known mechanism. In one example, the counter values are exposed using readable registers.
  • As a PUSH stream, the valid counter represents the number of data items moved. The stall counter represents the number of cycles that the destination was stalling. The time the source node was throttling is derived by subtraction of the valid counter from the total counter. Thus, the present stream's performance can be represented in a pie chart as shown in FIG. 24.
  • As can be seen, the stream was running for a total of 18 cycles, as derivable from the fact that the value for the total counter was 18. Nine of the 18 cycles had data moving as demonstrated by the fact that the value of the valid counter is nine. On five cycles, the data was stalled by the destination and the remaining were therefore throttled by the source. Thus, by the simple use of valid, stall and total counters, it is possible to determine the operation of the flow control and data flow along the edge between the respective kernels.
  • Last, with respect to stream status blocks, FIG. 21 shows an example of a checksum calculator wiring inside a stream status block. Stream status blocks are not limited to a specific checksum algorithm. However they are mostly suited for algorithms which can be applied to data streams. In the present example, when a valid signal is asserted, the checksum calculator recognises this and determines a checksum based on the data passing along the data bus. When the valid signal is de-asserted the checksum calculated is, effectively, turned off. There would at this point be no data passing along the data bus.
  • Considering now a further aspect of the present method and apparatus, the concept of a counterizer block will now be described in detail. As explained above, a counterizer block is hardware attached to an edge (within the manager) at the output of a kernel and is controlled to replace the output data from the kernel with known data but to maintain precisely the same data flow, i.e., stall pattern, as the original output. This means that the pattern of data flowing from the kernel in question is not changed, but the actual values of the data are at known levels. This enables any unexpected variations in subsequent outputs from the streaming processor to be identified and de-bugged as appropriate.
  • Referring to FIG. 19, an example of a streaming processor including a counterizer is shown. The processor includes kernels 144, 146 and 148. A counterizer 150 is provided between the first and second kernels 144 and 146. The counterizer enables it to be known exactly when a node has started to consume data and what part of the data was output. In the example of FIG. 19, the kernel B 146 is a FIFO. Assuming it has been determined that there is a problem with the FIFO using stream status blocks as described above, it is still not possible to know which data items are missing. In particular, it is desired to know if a first, last or middle data item is missing from the output from the FIFO 146. The counterizer 150 serves to inject known data values into the FIFO 146. The data output from the FIFO 146 is then observed and it can be seen at what stage the operation of FIFO B 146 is failing. With the counterizer block 150, there is a guaranteed input to FIFO 146 so it is possible to calculate what to expect at the output from the FIFO. Table 1 below shows an example of a data capture window, both with and without the counterizer block.
  • Without 0xA7 0xA7 0xA7 0xA7 0xA7 0xA7 0xA7 0xA7 0xA7 0xA7
    Counterizer
    With 16 17 18 19 20 21 22 23 24 25
    Counterizer

    As can be seen, the data captured at the output from B has started with the value 16, indicating a problem with the first data items (prior to 16) that were streamed into B. Thus the use of a counterizer, presents a simple and robust means by which the effective operation of kernels within a streaming processor can be determined.
  • FIG. 20A shows a further example of a streaming processor including a counterizer. In this example, as explained above, it would always have been possible to inspect the content of storage 14 to determine exactly what has been written to it. However without knowing exactly what data was written from kernel 12, it is difficult to arrive at any conclusions. In other words, it is difficult to know whether the errors in writing data storage 14 have arisen due to the writing process or due to the data output from kernel 12 itself.
  • In this example, counterizer 150 serves to provide a counter data stream which is written to storage 14 and thereby enables a user to inject known data into the storage and therefore to know what to expect when the storage is examined. It is significant that the counterizer block 150 maintains the same flow pattern as the kernel 12, only substituting the data, as usually, errors will only be triggered when a certain sequence of events happens. Without following the exact flow pattern behaviour of the upstream kernel, it is most likely that the error that is being debugged will not be triggered.
  • FIG. 20B shows a further example of a data flow graph including a counterizer 160. In this example, a kernel 152 is arranged to provide an output to a further kernel 154 which is, in turn, connected to kernel 156. Stream status blocks 158 are provided connected to the various edges within the data flow graph. A counterizer 160 is provided arranged to receive the output from the kernel U 152 and provide a counted input stream to the kernel 154. In other words, the counterizer block 160 attaches to the output of the kernel 152 and replaces the output data with known data, i.e., a count. Since the counterizer block 160 always outputs known data values, it is possible to calculate what checksum to expect at the output of the “multiply×2” kernel 154, and indeed verify that this is in fact the value that came out of this kernel.
  • The combination of the use of a counterizer 160 with the stream status blocks 158 enables easy and convenient checking of the data flow graph and debugging, if necessary.
  • FIG. 22 shows a schematic representation of how a counterizer block would typically be wired into a streaming processor. As can be seen, in this case, there are two kernels provided, an input kernel 162 and an output kernel 164. These are typically any kernels within a streaming processor. A counterizer block 166 is coupled to the lines between the kernels 162 and 164. The wiring of the connection between the counterizer block and the kernels 162 and 164 is clearly shown. As can be seen, the counterizer block 166 includes a data generator 168 arranged to receive input from each of the valid and stall connections between the kernels 162 and 164.
  • Thus, by receiving inputs from the flow control messages going in both directions between the kernels, the data generator is able to emulate the exact data flow pattern between the kernels. The actual data bus 170 between the kernels 162 and 164 is broken by the data generator such that data output from the kernel 162 is discarded within the counterizer block 166. Thus, the flow control signals are passed through so as to provide precise flow-control pattern preservation.
  • It can be seen then that counterizer blocks provide a way of injecting known data into any point of the data flow graph while maintaining exact flow control patterns. Maintaining the same flow control patterns can be crucial to reproducing problems and thereby enabling their identification and de-bugging. Having known data makes debugging significantly more efficient as errors can easily be spotted and it can similarly be easily determined how the problem that is being debugged affects data. In contrast to known attempts at providing means for diagnosing problems within streaming processors and debugging them, a counterizer block replaces the data whilst maintaining data flow patterns.
  • Thus, the present applicant has recognised that it is important to maintain data flow patterns whilst the values of the data themselves can, at times not be important. Thus, in embodiments described herein, actual data is replaced with a counter that is implemented whenever the flow control signals indicate that data should transfer. The counterizer block therefore operates on the same clock domain as the input stream which enables control flow patterns to be maintained. Furthermore flow control signals themselves are passed through the counterizer block without interference. Whereas the example of FIG. 22 is for a PUSH stream control flow methodology, it will be appreciated that a similar arrangement can be used for a PULL stream control flow methodology.
  • The present method and apparatus provides a useful tool for debugging streaming processors in an efficient and precise manner. Embodiments of the present invention have been described with particular reference to the examples illustrated. However, it will be appreciated that variations and modifications may be made to the examples described and are within the scope of the present invention.

Claims (23)

1. A method of monitoring operation of programmable logic for a streaming processor, the method comprising:
generating a graph representing the programmable logic to be implemented in hardware, the graph comprising nodes and edges connecting nodes in the graph;
inserting, on each edge of the graph, monitoring hardware to monitor flow of data along the edge.
2. The method according to claim 1, in which each edge comprises flow control signals and a data bus for flow of data, and wherein the method comprises coupling the monitoring hardware to both the flow control signals and the data bus.
3. The method according to claim 1, comprising reading parameters associated with the data with the monitoring hardware, the parameters including the number of valid data cycles.
4. The method according to claim 1, comprising performing a checksum on passing data with the monitoring hardware.
5. The method according to claim 4, comprising performing a checksum on at least two consecutive edges and comparing the checksum values.
6. The method according to claim 1, comprising determining the number of valid cycles along every edge in the graph thereby identifying one or more routes taken by data through the graph.
7. The method according to claim 1, comprising determining the number of valid cycles along at least two consecutive edges and comparing the numbers.
8. The method according to claim 1, in which at least one of the nodes comprises a FIFO memory.
9. A method of monitoring operation of programmable logic for a streaming processor, the method comprising:
generating a graph representing the programmable logic to be implemented in hardware, the graph comprising nodes and edges connecting nodes in the graph;
inserting, on at least one edge, data-generating hardware arranged to receive data from an upstream node and generate data at known values having the same flow control pattern as the received data, for onward transmission to a connected node.
10. A method according to claim 9, in which the data-generating hardware is provided on each edge in the graph.
11. A method according to claim 9 or 10, the data-generating hardware is arranged to generate a count signal.
12. A method according to any of claims 9 to 11, in which each edge comprises a data bus for flow of data and flow control signals for the transmission of flow control signals, and wherein the method comprises coupling the data-generating hardware to both the flow control signals and the data bus.
13. A method according to claim 12 when dependent on claim 11, comprising incrementing the counter when the flow control signals indicate that data should transfer between the nodes.
14. A method according to claim 12 or 13, in which the data-generating hardware is arranged to receive an input from the data bus and to provide as an output a count signal having the same flow control pattern as the data received on the data bus.
15. A method according to any of claims 9 to 14, comprising coupling the control signals to a data generator within the count-generating hardware, and in dependence on the flow control signals generating the count signal.
16. A method according to any of claims 9 to 14, comprising operating the data-generating hardware at the same clock rate as the data received from the upstream node.
17. A streaming processor comprising:
plural nodes for processing streaming data;
at least one edge connecting the one or more nodes;
monitoring hardware provided on each of the edges to monitor flow of data along the respective edge.
18. A streaming processor comprising:
plural nodes for processing streaming data;
at least one edge connecting each pair of the one or more nodes;
data-generating hardware arranged to receive data from an upstream node in a pair of nodes and generate data at known values having the same flow control pattern as the received data for onward transmission to a downstream node in the pair of nodes.
19. A streaming processor according to claim 18, in which the data-generating hardware comprises a data generator arranged to generate a count signal.
20. A streaming processor according to any of claims 17 to 19, in which the streaming processor is provided on an FPGA.
21. A tool for enabling the monitoring of operation of programmable logic for a streaming processor, the tool comprising:
a graph generator for generating a graph representing the programmable logic to be implemented in hardware, the graph comprising nodes and edges connecting nodes in the graph;
a monitoring hardware generator, for generating monitoring hardware on each edge of the graph, the monitoring hardware being configured to monitor flow of data along the edge.
22. A tool for enabling the monitoring of operation of programmable logic for a streaming processor, the tool comprising:
a graph generator for generating a graph representing the programmable logic to be implemented in hardware, the graph comprising nodes and edges connecting nodes in the graph;
a hardware generator for generating and inserting, on at least one edge, data-generating hardware arranged to receive data from an upstream node and generate data at known values having the same flow control pattern as the received data, for onward transmission to a connected node.
23. A method of monitoring operation of programmable logic for a streaming processor, the method comprising:
generating a graph representing the programmable logic to be implemented in hardware, the graph comprising nodes and edges connecting the nodes, the edges including control signals and a data bus;
inserting, on at least one edge monitoring hardware coupled to both the control signals and the data bus.
US13/212,907 2011-08-18 2011-08-18 Methods of monitoring operation of programmable logic Abandoned US20130046912A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US13/212,907 US20130046912A1 (en) 2011-08-18 2011-08-18 Methods of monitoring operation of programmable logic
US13/725,345 US8930876B2 (en) 2011-08-18 2012-12-21 Method of debugging control flow in a stream processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/212,907 US20130046912A1 (en) 2011-08-18 2011-08-18 Methods of monitoring operation of programmable logic

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/725,345 Continuation US8930876B2 (en) 2011-08-18 2012-12-21 Method of debugging control flow in a stream processor

Publications (1)

Publication Number Publication Date
US20130046912A1 true US20130046912A1 (en) 2013-02-21

Family

ID=47713476

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/212,907 Abandoned US20130046912A1 (en) 2011-08-18 2011-08-18 Methods of monitoring operation of programmable logic
US13/725,345 Active US8930876B2 (en) 2011-08-18 2012-12-21 Method of debugging control flow in a stream processor

Family Applications After (1)

Application Number Title Priority Date Filing Date
US13/725,345 Active US8930876B2 (en) 2011-08-18 2012-12-21 Method of debugging control flow in a stream processor

Country Status (1)

Country Link
US (2) US20130046912A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120200315A1 (en) * 2011-02-08 2012-08-09 Maxeler Technologies, Ltd. Method and apparatus and software code for generating a hardware stream processor design
US10303505B2 (en) * 2016-05-19 2019-05-28 International Business Machines Corporation Adjusting a computing environment for processing a data stream with dummy tuples

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130046912A1 (en) * 2011-08-18 2013-02-21 Maxeler Technologies, Ltd. Methods of monitoring operation of programmable logic
WO2014122320A2 (en) 2013-02-11 2014-08-14 Dspace Digital Signal Processing And Control Engineering Gmbh Alteration of a signal value for an fpga at runtime
EP2765528B1 (en) 2013-02-11 2018-11-14 dSPACE digital signal processing and control engineering GmbH Unrestricted access to signal values of an FPGA during runtime

Citations (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5706205A (en) * 1994-09-30 1998-01-06 Kabushiki Kaisha Toshiba Apparatus and method for high-level synthesis of a logic circuit
US6044211A (en) * 1994-03-14 2000-03-28 C.A.E. Plus, Inc. Method for graphically representing a digital device as a behavioral description with data and control flow elements, and for converting the behavioral description to a structural description
US6324496B1 (en) * 1998-06-18 2001-11-27 Lucent Technologies Inc. Model checking of hierarchical state machines
US20020162084A1 (en) * 2000-05-11 2002-10-31 Butts Michael R. Emulation circuit with a hold time algorithm, logic analyzer and shadow memory
US20030028854A1 (en) * 2001-07-16 2003-02-06 Koichi Nishida High level synthesis method, thread generated using the same, and method for generating circuit including such threads
US6539522B1 (en) * 2000-01-31 2003-03-25 International Business Machines Corporation Method of developing re-usable software for efficient verification of system-on-chip integrated circuit designs
US6728945B1 (en) * 2001-02-26 2004-04-27 Cadence Design Systems, Inc. Behavioral level observability analysis and its applications
US6897678B2 (en) * 1997-10-16 2005-05-24 Altera Corporation Programmable logic device with circuitry for observing programmable logic circuit signals and for preloading programmable logic circuits
US20050229165A1 (en) * 2004-04-07 2005-10-13 Microsoft Corporation Method and system for probe optimization while instrumenting a program
US6964029B2 (en) * 2002-10-31 2005-11-08 Src Computers, Inc. System and method for partitioning control-dataflow graph representations
US20050289499A1 (en) * 2004-06-25 2005-12-29 Matsushita Electric Industrial Co., Ltd. High level synthesis method for semiconductor integrated circuit
US7017043B1 (en) * 1999-03-19 2006-03-21 The Regents Of The University Of California Methods and systems for the identification of circuits and circuit designs
US7020862B1 (en) * 2002-07-19 2006-03-28 Xilinx, Inc. Circuits and methods for analyzing timing characteristics of sequential logic elements
US20060117234A1 (en) * 2004-11-30 2006-06-01 Fujitsu Limited Programmable logic device, information processing device and programmable logic device control method
US20060225022A1 (en) * 2005-04-04 2006-10-05 Nec Electronics Corporation Method, apparatus and program for determining the relationship of correspondence between register transfer level description and behavioral description
US7143388B1 (en) * 2002-12-04 2006-11-28 Xilinx, Inc. Method of transforming software language constructs to functional hardware equivalents
US7305649B2 (en) * 2005-04-20 2007-12-04 Motorola, Inc. Automatic generation of a streaming processor circuit
US7373621B1 (en) * 2005-02-01 2008-05-13 Altera Corporation Constraint-driven test generation for programmable logic device integrated circuits
US7480610B2 (en) * 2004-07-12 2009-01-20 Mentor Graphics Corporation Software state replay
US7483824B1 (en) * 2006-03-03 2009-01-27 Azul Systems, Inc. Self-checking test generator for partially-modeled processors by propagating fuzzy states
US20090031268A1 (en) * 2007-06-22 2009-01-29 Interuniversitair Microelektronica Centrum Vzw (Imec) Methods for characterization of electronic circuits under process variability effects
US20090282306A1 (en) * 2001-10-11 2009-11-12 Altera Corporation Error detection on programmable logic resources
US8037437B2 (en) * 2009-01-13 2011-10-11 Microsoft Corporation Optimizing systems-on-a-chip using the dynamic critical path
US20110258610A1 (en) * 2010-04-16 2011-10-20 International Business Machines Corporation Optimizing performance of integrity monitoring
US20120200315A1 (en) * 2011-02-08 2012-08-09 Maxeler Technologies, Ltd. Method and apparatus and software code for generating a hardware stream processor design
US8443315B2 (en) * 2009-07-20 2013-05-14 Achronix Semiconductor Corporation Reset mechanism conversion
US20130145070A1 (en) * 2011-08-18 2013-06-06 Maxeler Technolgies, Ltd. Method of debugging control flow in a stream processor
US8464188B1 (en) * 2005-08-23 2013-06-11 The Mathworks, Inc. Multi-rate hierarchical state diagrams
US8464190B2 (en) * 2011-02-17 2013-06-11 Maxeler Technologies Ltd. Method of, and apparatus for, stream scheduling in parallel pipelined hardware
US8671371B1 (en) * 2012-11-21 2014-03-11 Maxeler Technologies Ltd. Systems and methods for configuration of control logic in parallel pipelined hardware
US8701069B1 (en) * 2012-11-21 2014-04-15 Maxeler Technologies, Ltd. Systems and methods for optimizing allocation of hardware resources to control logic in parallel pipelined hardware
US8745557B1 (en) * 2006-09-11 2014-06-03 The Mathworks, Inc. Hardware definition language generation for data serialization from executable graphical models
US8826072B2 (en) * 2012-05-09 2014-09-02 Imec Method and system for real-time error mitigation
US20140358507A1 (en) * 2010-05-27 2014-12-04 The Mathworks, Inc. Partitioning block diagrams into executable contextual models

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7536669B1 (en) * 2006-08-30 2009-05-19 Xilinx, Inc. Generic DMA IP core interface for FPGA platform design

Patent Citations (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6044211A (en) * 1994-03-14 2000-03-28 C.A.E. Plus, Inc. Method for graphically representing a digital device as a behavioral description with data and control flow elements, and for converting the behavioral description to a structural description
US5706205A (en) * 1994-09-30 1998-01-06 Kabushiki Kaisha Toshiba Apparatus and method for high-level synthesis of a logic circuit
US6897678B2 (en) * 1997-10-16 2005-05-24 Altera Corporation Programmable logic device with circuitry for observing programmable logic circuit signals and for preloading programmable logic circuits
US6324496B1 (en) * 1998-06-18 2001-11-27 Lucent Technologies Inc. Model checking of hierarchical state machines
US7017043B1 (en) * 1999-03-19 2006-03-21 The Regents Of The University Of California Methods and systems for the identification of circuits and circuit designs
US6539522B1 (en) * 2000-01-31 2003-03-25 International Business Machines Corporation Method of developing re-usable software for efficient verification of system-on-chip integrated circuit designs
US20020162084A1 (en) * 2000-05-11 2002-10-31 Butts Michael R. Emulation circuit with a hold time algorithm, logic analyzer and shadow memory
US6728945B1 (en) * 2001-02-26 2004-04-27 Cadence Design Systems, Inc. Behavioral level observability analysis and its applications
US20030028854A1 (en) * 2001-07-16 2003-02-06 Koichi Nishida High level synthesis method, thread generated using the same, and method for generating circuit including such threads
US20090282306A1 (en) * 2001-10-11 2009-11-12 Altera Corporation Error detection on programmable logic resources
US7020862B1 (en) * 2002-07-19 2006-03-28 Xilinx, Inc. Circuits and methods for analyzing timing characteristics of sequential logic elements
US6964029B2 (en) * 2002-10-31 2005-11-08 Src Computers, Inc. System and method for partitioning control-dataflow graph representations
US7143388B1 (en) * 2002-12-04 2006-11-28 Xilinx, Inc. Method of transforming software language constructs to functional hardware equivalents
US20050229165A1 (en) * 2004-04-07 2005-10-13 Microsoft Corporation Method and system for probe optimization while instrumenting a program
US7590521B2 (en) * 2004-04-07 2009-09-15 Microsoft Corporation Method and system for probe optimization while instrumenting a program
US20050289499A1 (en) * 2004-06-25 2005-12-29 Matsushita Electric Industrial Co., Ltd. High level synthesis method for semiconductor integrated circuit
US7480610B2 (en) * 2004-07-12 2009-01-20 Mentor Graphics Corporation Software state replay
US20060117234A1 (en) * 2004-11-30 2006-06-01 Fujitsu Limited Programmable logic device, information processing device and programmable logic device control method
US7373621B1 (en) * 2005-02-01 2008-05-13 Altera Corporation Constraint-driven test generation for programmable logic device integrated circuits
US20060225022A1 (en) * 2005-04-04 2006-10-05 Nec Electronics Corporation Method, apparatus and program for determining the relationship of correspondence between register transfer level description and behavioral description
US7305649B2 (en) * 2005-04-20 2007-12-04 Motorola, Inc. Automatic generation of a streaming processor circuit
US8464188B1 (en) * 2005-08-23 2013-06-11 The Mathworks, Inc. Multi-rate hierarchical state diagrams
US7483824B1 (en) * 2006-03-03 2009-01-27 Azul Systems, Inc. Self-checking test generator for partially-modeled processors by propagating fuzzy states
US8745557B1 (en) * 2006-09-11 2014-06-03 The Mathworks, Inc. Hardware definition language generation for data serialization from executable graphical models
US20090031268A1 (en) * 2007-06-22 2009-01-29 Interuniversitair Microelektronica Centrum Vzw (Imec) Methods for characterization of electronic circuits under process variability effects
US8037437B2 (en) * 2009-01-13 2011-10-11 Microsoft Corporation Optimizing systems-on-a-chip using the dynamic critical path
US8443315B2 (en) * 2009-07-20 2013-05-14 Achronix Semiconductor Corporation Reset mechanism conversion
US20110258610A1 (en) * 2010-04-16 2011-10-20 International Business Machines Corporation Optimizing performance of integrity monitoring
US8949797B2 (en) * 2010-04-16 2015-02-03 International Business Machines Corporation Optimizing performance of integrity monitoring
US20140358507A1 (en) * 2010-05-27 2014-12-04 The Mathworks, Inc. Partitioning block diagrams into executable contextual models
US20120200315A1 (en) * 2011-02-08 2012-08-09 Maxeler Technologies, Ltd. Method and apparatus and software code for generating a hardware stream processor design
US8972923B2 (en) * 2011-02-08 2015-03-03 Maxeler Technologies Ltd. Method and apparatus and software code for generating a hardware stream processor design
US8689156B2 (en) * 2011-02-17 2014-04-01 Maxeler Technologies Ltd. Method of, and apparatus for, optimization of dataflow hardware
US8464190B2 (en) * 2011-02-17 2013-06-11 Maxeler Technologies Ltd. Method of, and apparatus for, stream scheduling in parallel pipelined hardware
US20130145070A1 (en) * 2011-08-18 2013-06-06 Maxeler Technolgies, Ltd. Method of debugging control flow in a stream processor
US8930876B2 (en) * 2011-08-18 2015-01-06 Maxeler Technologies, Ltd. Method of debugging control flow in a stream processor
US8826072B2 (en) * 2012-05-09 2014-09-02 Imec Method and system for real-time error mitigation
US8701069B1 (en) * 2012-11-21 2014-04-15 Maxeler Technologies, Ltd. Systems and methods for optimizing allocation of hardware resources to control logic in parallel pipelined hardware
US8671371B1 (en) * 2012-11-21 2014-03-11 Maxeler Technologies Ltd. Systems and methods for configuration of control logic in parallel pipelined hardware

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120200315A1 (en) * 2011-02-08 2012-08-09 Maxeler Technologies, Ltd. Method and apparatus and software code for generating a hardware stream processor design
US8972923B2 (en) * 2011-02-08 2015-03-03 Maxeler Technologies Ltd. Method and apparatus and software code for generating a hardware stream processor design
US10303505B2 (en) * 2016-05-19 2019-05-28 International Business Machines Corporation Adjusting a computing environment for processing a data stream with dummy tuples

Also Published As

Publication number Publication date
US8930876B2 (en) 2015-01-06
US20130145070A1 (en) 2013-06-06

Similar Documents

Publication Publication Date Title
US20150234730A1 (en) Systems and methods for performing software debugging
Dixit et al. Silent data corruptions at scale
US8214694B1 (en) Lightweight probe and data collection within an integrated circuit
US8930876B2 (en) Method of debugging control flow in a stream processor
US7900086B2 (en) Accelerating test, debug and failure analysis of a multiprocessor device
US8499201B1 (en) Methods and systems for measuring and presenting performance data of a memory controller system
Abdel-Khalek et al. Post-silicon platform for the functional diagnosis and debug of networks-on-chip
US20030233601A1 (en) Non-intrusive signal observation techniques usable for real-time internal signal capture for an electronic module or integrated circuit
US9411007B2 (en) System and method for statistical post-silicon validation
CN101681287B (en) Processor operation check system and operation check circuit
US20150074473A1 (en) Pseudo-error generating device
US8839037B2 (en) Hardware queue for transparent debug
Abdel-Khalek et al. Functional post-silicon diagnosis and debug for networks-on-chip
Boulé et al. Debug enhancements in assertion-checker generation
Hung et al. On evaluating signal selection algorithms for post-silicon debug
Chandran et al. Managing trace summaries to minimize stalls during postsilicon validation
US8234618B2 (en) Trace reconstruction for silicon validation of asynchronous systems-on-chip
Liu et al. Trace-based post-silicon validation for VLSI circuits
Abdel-Khalek et al. DiAMOND: Distributed alteration of messages for on-chip network debug
Du et al. FPGA-controlled PCBA power-on self-test using processor's debug features
US11144687B1 (en) Method and system providing visualization of sub-circuit iterations based on handshake signals
US11030370B2 (en) Modular event-based performance monitoring in integrated circuit development
Moreno Analysis and optimization of a debug post-silicon hardware architecture
Zhou et al. A software reconfigurable assertion checking unit for run-time error detection
Zabel et al. Failure-Rate Analysis based on Microprocessor Trace Data

Legal Events

Date Code Title Description
AS Assignment

Owner name: MAXELER TECHNOLOGIES, LTD., UNITED KINGDOM

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PELL, OLIVER;GREENSPON, ITAY;SPOONER, JAMES BARRY;AND OTHERS;REEL/FRAME:027084/0864

Effective date: 20110915

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION