Parallel Algorithms For Logic Synthesis

Parallel Algorithms for Logic Synthesis using the MIS Approach
Kaushik Det
t LSI Logic Corporation Milpitas, CA 95035 kaushik@lsil.com
John A. Chandyt
Sumit Royg
Steven Parke8
Prithviraj Banerjee~
5 Sierra Vista Research Santa Clara, CA 95053758 parkesQsierravista.com
$ University of Illinois IJrbana, IL 61801 {jchandy,sroy,banerjee}@crhc.uiuc.edu
Abstract
Combinational logic synthesis is a very zmportant but compututionally expensive phase of VLSI system design. Parallel processing offers an attructive solution to reduce this design cycle time. In this paper; we describe ProperMIS, a portable parallel algorithm for logic synthesis based on the MIS multi-level logic synthesis system. As part of this work, we have developed novel parallel algorithms for the d@rent logic transformations of theMIS system. Our algorithm uses an asynchronous messagedriven computing modei with no synchronizing barriers separating phases of parallel computation. The algorithm is portable aclnss a wide van ety ofparallel architectures, and is built around a well-defied sequential algorithm interface, so that we can beneJi!from future expansion oj the sequentral algorithm. We present results on several MCNC and ISCAS benchmark circuits for a varlc ty of shared memory and distributed processing architectures. Our implementation produces speedups oj lzn average o.f 4 on 8 processors.
synthesis algorithmisMIS-II, which is based on iterativefactoring and simplification of nodes in a Boolean network. This algorithm forms the core of numerous university and industrial logic synthesis systems. A previous attempt to parallelize MIS - I I resulted in poor speedups and significant loss in quality, because the MIS-II algorithm is inherently sequential in nature and extremely hard to parallelize [7]. In this paper, we therefore present ProperMIS, a new parallel MIS-I I based algorithm for logic synthesis that uses an asynchronous message-driven computing model with no synchronizing barriers separating phases of parallel computation. Using the ProperCAD II system, the algorithm is portable across a wide variety of parallel architectures.
ProperCAD II Overview
Introduction
Combinational logic synthesis is the optimization of a logic design to realize a specific combinational function in either two level or multilevel form, and typically optimizes the area or delay of the resultant circuit. Efficient algorithms for two-level logic minimization include ESPRESSO [l] and for multilevel logic optimization, SOCRATES 121, MIS [3], SYLON-XTRANS [4], and BOLD [5l. Since logic synthesis is very compute Intensive, parallel processing is fast becoming a desirable solution to reduce the large amounts of time spent in VLSI circuit d esign. This has been recognized by several researchers in VLSI CAD, as many have started to investigate parallel algorithms for problems in logic synthesis and verification [6,7, 8,9]. We recently developed a portable parallel algorithm for the transduction method [4] of logic synthesis [IO], and results ot the parallel algorithm were presented for a variety of parallel platforms. Even though we obtained reasonably good speedups using that algorithm, the original sequential algorithm using the transduction method has very large run times. The more popular logic %s researchwas supportedin part by the National Science Foundation under grant MIP-9320854,the SemiconductorResearch Corporationunder grant SRC 94-DP-109. and the AdvancedResearch ProjectsAgency undercontract DAA-H04-94-G-0273administered by the Army Research Oflice.
Much of the work in parallel CAD reported to date suffers from a major limitation in that these proposed parallel algorithms are designed with a specific underlying architecture in mind. As a result, these applications perform poorly on architectures other than the one for which they were designed. Just as importantly, incompatibilities in programming environments make it difficult to port these programs across different parallel architectures. This limitation has serious consequences, since a parallel algorithm needs to be developed afresh for every target MIMD architecture. One of the primary concerns of the ProperCAD project [ 111 is to address this portability problem by designing algorithms to run on a range of parallel machines including shared memory multiprocessors, distributed memory multicomputers, and networks of workstations. The ProperCAD approach to the design of parallel CAD algorithms is illustrated in Figure 1. A parallel algorithm is designed around an existing uniprocessor algorithm by identifying modules in the uniprocessor code and designing a well-defined interface between the parallel and sequential code. The project has undergone two distinct phases, the first of which, ProperCAD I, involved the use of the C-based Charm language and runtime system [12]. The second phase, ProperCAD Il [13, 141, entailed the creation of a CH library which provided an object-riented parallel interface based on the actor model of concurrent object-oriented computing. The ProperCAD U library provides the mechanisms necessary for parallel execution in through the use of a fundamental object called an actor [ 151. An actor object consists of a thread of control that communicates with other actors by sending messages,and all actor actions are in response to these messages. Specific actor
579 1063.7133/95 $4.00 0 1995 IEEE
Proceedings of the 9th International Parallel Processing Symposium (IPPS '95) 1063-7133/95 $10.00 1995 IEEE
Existing Sequential
The aim of kernel extraction is to searchfor multiple-cube comHITECIPROOFS mon divisors and extract those divisors. There exists a multiplenmberwdfsc
SYLON-XlRANS
Algorimmr MIS-II
PACE
3.2
Types of Logic hansformations
cube common divisor b for 1 nodes 91, 42. . . ., 711 if 3 k,, E ~K(vz),...k~, l K(~~),suchthatk,,nk,,...n K(~l),k,, k,, = 6. For example, given the equationsin a Boolean network
F = af + bf + ag + cg + ade -t bde + cde, G = af+bf+ace+bce, H = ade+cde (1)
the best multiple-cube divisor that can be found is a + b. After creating a new node X for the common divisor iul the network, the equationsin the network become Figure 1: An overvlew of the ProperCAD project methodsare invoked to processeach type of message,and actors are not allowed to block or explicitly make receive requestsfrom other processors.The runtime systemon each processorpicks the next available actor thread with some prrority and that thread is then allowed to run to completion without interruption. As part of ProperCAD, a suite of parallel applications have beendevelopedthat addressthe most sigmficant tasksin VLSI design automation including Lircuit extraction, test generation,fault simulation, cell placement, and logic synthesis. 3 3.1 MISII
OVerVieW
F = deX + fX + ag + cg + cde, G=ceX+fX,
H=ade+cde,
X=a+b
(2)
The original network had 33 liter&, and the modified network after the kernel extraction has 25 literals. Cube extraction searches for single-cubecommon divisors and extracts those divisors. There exists a single cube common divisor 6 for m nodes, ~1, ~2, ., q,,, if cl E F(vl), c2 ci F(r)z), , , cm E F(+,), such that cl ncz . . .r)~?,~ = 5, where F(vi) representsthe Boolean expressionfor the node vi. For example, given the following Boolean network equations
F = abc+ abd + eg: G = abfg (3.)
Definitions
the best single cube common divisor that can be found is ab. After creating a new node X in the network, the equationsin the modified network become
F = Xc+Xd +eg, G = Xfg, X = ab (4
ln this section, we will review some basic definitions as given in [3] to be used later in this paper. A variable is a symbol representinga single coordinateof the Boolean space. A literal is a variable or its negation. A cube is a set C of literals such that 3E C implies f 6 C. An expression is a set f of cubes. The support of an expression f is the set of literals sup(f) which are presentin the sum-of-product expressionof f, An expressionf is cube:free if no cube divides f evenly. Theprimav divisocv of an expressionf form a set of expressions o(f) = { f/C(G is a cube). The kernels of an expression f are the expressionsK(f) = {!glg E o(f) and g is cube-free}. In other words, the kernels of arkexpressionf are the cube-free primary divisors of f The cube C used to obtain kernel k = f,/C is called the cok(mel of k.
This transformation reducesthe number of literals from 12 to 11. Resubsritutionis used to check if an existing function itself is a divisor of anotherfunction. For example, consider the network:
3: = ac+ad+bc+bd+e
and y = a+b
(5)
The function y itself is a divisor of the function Z. Therefore, it can be used to simplify the function 2, which can be rewritten as:
z = y(c+d)
-t e
(6)
Each node of a Boolean network is a Boolean function. expressedin the sum-of-productsformat (two-level logic). Hence. two-level minimization algorithms such as espresso [l] can be usedto minimize eachnode of the Boolean network That process is called simplification.
A Boolean network is a directed acyclic graph (DAG) where each node i is associatedwith 1) a variable yt and 2) a reprcsentarion f; of a logic function. In the graph, an arc connectsnode i in o f il node i is the sei of all to node j if ?/i E sllp(f, ). Th e .fnn-. ot the nodes pointing to node i. Thefun-our of a node i ISthe set of all nodes which node i points to. The satisjiabzity don t wre set of a node i is defined to b< DS4Ti = y%f, + ?$fi = y; @ f,. where y, is the variable representingthe node i and fi is the logic function for that nrlde.
Parallelization
Methodology
An immediately apparentapproachto parallel logic synthesis is to divide the logic circuit into several partitions and then synthesize thosepartitions independentlyin parallel [ 161.IJnfortunately, that results in loss in quality of the synthesizedcircuit, because of the lack of global information. Instead of using such a naive method, in ProperMIS, we partition the circuit for the purpose
of division of work among different processors,however, we do not synthesizethosepartitions independently.Logic minimization is performedon the entire network, so as not to lose the quality of the synthesizedlogic. Partitioning is used only for distribution of work among different processors, so it doesnot have any effect on the quality of the synthesizedcircuit. After reading in the circuit, a very simple strategybased on the input conesof the primary outputs is usedto createthe partitions. For each partition K, an actor object denoted as partition(7r) is created. We createmany small partitions such that load balancing is good, but we make surethat the amountof computationrequired by each partition is roughly an order of magnitudehigher than that of communication time for sending a message between objects. The ProperCAD II run-time systemautomaticallydistributesthese objectsto different processors. Each partition(7r) actor then initiates the optimization procedure by starting with the simplification procedurewhich is detailed in Section 4.3. When all of the nodes in a partition x are simplified, the partition(x) actor is sent a messageto generate kernels for the nodesin P. At this point the other transformations, kemel extraction, cube extraction, and resubstitutionare begun as explainedin Sections4.1 and 4.2. We assignpriorities to different transformations,with initial priorities ordered as follows (highest to 1c)west): simplification, kernel extraction, cube extraction and resubstitution.For example,a processorwill not pick up any messageregarding kernel extraction if someother message regarding simplification is waiting to be processedin that processor.These pnorities are usedto guide the synthesisprocesssuchthat it avoids 10~1 mitima. In order to allow different processorsto simultaneouslypcrform conflicting optimizations on the network, we provide coherenceamong the parallel applicationsof theseoptimizationsby using versionnumbersfor the nodes. Every node in the Booleannetwork has a version number that is incrementedby one whenever the functionality of the node changesdue to sometransformation. Therefore, when a processorfinds a possibletransformationon a node 1, it asks permissionfrom a designatedmaster processorto make the transformation and it provides the version xmmberof 71 in the request.The muster processorchecksif the versionnumber provided in the requestis the sameas the current version number the functionality of 71 has of 9. If so, permissionis grantedbecause not changedsince it was checkedfor the possibletransformation. Otherwise, permissionis denied. 4.1 Parallel Algorithm for Extraction
a I Fa Fb2, F de Ff FcS.. Fg6 Ga

Gb G ce
1. 3 4 I
8 9 IO
b 2 _.
de
9 ; 4. .
4567 513.
6
g
.
ce
.
5 6 12 3
1 1
. 4 :
. : lb
II
: i
9
f
de
i H
10 11
8 12
9 1; (4 --I-
ce I
G G G Fa6. Fb7 a b ce I 2 3 IO
f 2
8
a 3 lb
b 4 1 1
9
de
6
g
1
11
9
: I.. 8 12 13 ;; 6 s 6 12. : 1 : I 3 4 L 4 .
Gf H de F
Ff F F de c g
4
5 -2. 8 9.: 10 II
(b)
Figure 2: Two co-kernel cube matrices for Eq. I
In Section 3.2, we defined the serial kernel extraction process.In our parallel algorithm, all of the possiblekernelsfor eachnode are generated concurrently. Using the kernel generationalgorithm described in [ 11,each partition actor will in parallel generatethe kernels for all the nodes in its partition. and will then broadcast tha! information to all of the processors. Consider the equations given in Eq. I, The kernels (co-kernels) of the equation F are dc: + f + g (u), de + f (b), a t t + c (de), a + b (f), de + g (c) and a + c (9). The kernels of G ;brece + f (a, b), a f b (f, ce), and the only kernel of H is n +- c (de). Finding useful intersectionsof kernels is facilitated with a data structurecalled the co.kernel cube matrix, i1sdescribed
in [ 171. A row in this matrix correspondsto a kernel (and its associatedco-kernel), and a column correspondsto a cube which is presentin some kernel. The entry at position (i, j) is nonzero if kernel i containsthe cube j. For reference,the cubes of the original expressionsin Eq. 1 are numberedfrom 1 to 13. In a parallel environment, the creation of the co-kernel cube matrix is a more involved process.We want to createthe co-kernel cube matrix in all of the processors, and they have to be the same in all processors.In the sequentialalgorithm, when a new kernel (co-kernel)is generated,a new row is assignedfor that kernel, and the row number correspondingto that kernel is noted in a table. Similarly, when a unique kernel-cubeis generated,a new column is assignedfor that cube and the column number correspondingto that cube is noted in a table. Sincekernelsare generated in parallel and then broadcastto all of the processors, the order in which the kernels are receivedmay vary in different processors.For example,one processormay receive the kernels in the order F, G and H. In that case, the cokernel cube matrix will be the sameas that given in Figure 2(a). However,if anotherprocessorreceivesthe kernels in the order G, II and F, the co-kernel cube matrix in that processorwill resemble the one given in Figure 2(b), which has different labeling for the rows and the columns. To keep the row and column labeling consistentacrossall processors,each node is assigneda distinct interval of labels. For example,we can assignthe intervals [l-1000], [lOOl-2ooO] and [X01-30001 to the nodesF, G and H respectively.So any kernel
581
aFa Fb2 F de F f F c F
G G G :
I 3 4 5
6 1001 1002 1003 I004
a 1 .-. . 5
1 i
b 2
de
s 4
f
5
g
6 *&
513 ..62.. 6 7
2
;:4:
4 8 9 I.0 II
Rectangle
b
ce
10
8
11
9
G u- H
f de
2001
12--.l-L-.-L1;
Figure 3: Consistent co-kernel cube matrix for Eq. 1
y-G--J Figure 5: Kernel-extract

matrices
actor search tree and associated
Figure 4: Conceptual search space for finding the maximum-valued rectangle
(LO-kernel) generatedfor thenode G will be labeledas a row in the co-kernel cube matrix with a number starting from 1001. Hence, the labeling of the rows will be consistentin all of the processors irrespectiveof the order in which the kernelsare received. IJsing a similar strategyfor column labeling will producea co-kcrel cube matrix as in Figure 3. A rectangleof the co-kernel cube matrix identifies an intersectton of kernels, that is. a common subexpression in the network. Thecolumns of the rectangleidentify the cubesin the subexpresston, and the rows of the rectangleidentify the particular functions that the subexpression divides. For example, the prime rectangle ({3,4, 1003, 1004}, {I, 2) Jin Figure 3 (shown in bold) identifies the subexpression a + h which divides the function F and G as in Eq. 2. The cubes numbered I, 2, $6, 8,9, 10, and 11 from the original set of functions aru coveredby this rectangle. The value of a rectangle measuresthl* reduction of the number of literals in the network if the particular rectangleis selected. Rectangle-coveringis pertormed by generatingall possiblerectangles and then selecting the maximum valued rectangle. The szarch for the maximum-baluedrectang lecan be performed in parallel as conceptually illustrated in Figure 4 and illustrated with the help of a co-kernel LX& mtrlr& in Figure 5. After the generauon of all of the kernels, a designatednmst~~r processorcreatesan Jctor called kernel-mxtruct LOcompute the maximum valued rect;rmgle by generatmgall of the prime rectariglcs. When a new kurnrlzxtruct actor is created,the followmg information is passed: the submatrix of the co-kernel cube matrix (.M), the rectangle generatedthus far (re&) and the index of the column from which it should searchonwards for new prime rect-
angles (index). For the root kernel-extract actor, M is the cokernel cube matrix itself, rect is a empty rectangle and the value of index is 0 as shown in Figure 5(i). The responsibility of this kernel-extract actor is to generateall of the prime rectangleswith fewer rows but more columns than ret?. That is performed by adding to rect columns with labels more than or equal to in&~. Conceptually. it is equivalent to generating all of the possible subexpressions which can be multiple cube common divisors to the given set of equationsby adding more kernel cubes (columns of M) to the given subexpression (represented by rect). The kernel-extract actor also computesthe value of red and records the information if it is the best rectangleseenby this actor. If the number of columns in the submatrix A4 is less than a user-definedthreshold, all of the possible prime rectangles are generatedby applying the sequentialalgorithm describedin [ 171. Otherwise,new child actorsare createdand the searchspaceis divided among those children. Each column c with d label greater than index is examined as a column to include in the rectangle. The submatrix Ml of the original matrix M is createdby selecting only the rows in which column c has a nonzero value, and a new rectanglerectl is formed from the columns of the old rectangle rect and the rows for which c has a nonzero value. At this point, a new child kernel-extract actor is createdwtth the following arguments:the submatrix h41, the new rectanglerectl and the index c. For example,by adding column 1 to the empty rectangle in the root object, we createa new rectanglerectl with one column and six rows (:3,4,6,9,10,11) as shown in Figure 5(ii). This actor receivesthe submatrix Ml, the rectangle rectl as shown in Figure 5(ii) and the value of the parameterindex u ill be 1. Conceptually, it is equivalent to adding the kernel cube a to the null subexpression in the root object as shown in Figure 4(ii). The leaf nodes of the searchtree generatedprime rectangles by using the sequentialalgorithm and then reportsthe best rectangle seento its parentafter the computationis performed. Non leaf nodes,which createdchild actors,must wait for the best rectangles seenby its children before it can then forward the best rectangle
582
Processor
Procwsor
Pmcessor
hocessor
Figure 6: Creation of objects for resubstitution to its parent. The root kernel-extract actor finally reports the best rectangle information to the ma~rer processor. When the muster processor receives the information regarding the maximum-valued rectangle, it computesthe subexpression corresponding to that rectangle from the co-kernel cube matrix, and broadcasts that information to the other processors,which then remove the kernels of the affected expressionsfrom the cokernel cube matrix. The version numbers of the affected nodesare incremented by 1. A new root kernel-extract actor is created to processthe new co-kernel cube matrix, and the processcontinues until no more positive valued rectanglescan be found. The parallel algorithm for cube extraction is similar to that for kerrteE extraction, and is uot described further for lack of space. More details are available in [ 18.1. 4.2 Parallel Algorithm for Resubstitution
As described in Section 3.2, the resubstitution transformation is used to check if an existing expression itself is a divisor of other expressions. The resubstitution operation can be performed in parallel by creating for each node Q resubiqi actorswhich are then distributed acrossall of the processorsby the ProperCAD U runtime system as shown in Figure (5. The responsibility of resrcb(q) is to search for possible resubstitution by other nodes in the node n. The search is restricted to algebraic divisors as the search for Boolean division is very expensive. A node v can be an algebraic divisor of the node 7~ if the support of v is a subset of the support of 71. Hence, resuh(t/) restricts its searchfor divisors to the nodes whose supportsare subsetsof the support of q, Anytime it finds a divrsor u and the divisiou is going to reducethe literal count of the network, permission for the transformation must be requestedfrom the master processor. The muster processorverifies the currentnessof the version number and, if granted, increments the version number. Since the functionality of the node 9 has changed,the master processorcreates a new resub actor for the node n. 4.3 Parallel Algorithm for Simplification
Two-level minimization is a much more developed science than multilevel minimization, and very efficient algorithms exist for finding minimal two-level representationsof Boolean functions. In Ihe context of a multilevel network, two-level minimization can bt: made more powerful by providing the minimizer with a set of functions known as don t-care sets. These sets are derived from the:structureof the network as well, and its size can determinehow thorough and how fast the minimization process will be. In a sequcntial algorithm for simplification, since only one node is simplified at a time. the don t-care set of any other node can be used 583
in the simplification processwithout error. Since we are performing node simplifications in parallel, there must be constraintson which don t-cares can be used to ensurethe results are correct. For example, consider the set of expressions X = ag+Eb and Y = ab+z6 If both of the nodes are simplified concurrently and they use each other s satisfiability don t-care sets, then the result could be X=YandY=F which is wrong. But if they are simplified one at a time, that will never happen. Then we would have X = a&+Til, and Y = XorX = P and Y = ab+Z$ This simple example shows that two nodes being simplified concurrently should not use each other s satisfiability don t-care sets. To make concurrentnode simplification possible, we thus have to introduce the conceptof locking a node. A node is said to be algebraically locked if no change in functionality is allowed to that node. A node is said to be Boolean locked if simplification of that node is not allowed, but all algebraic operationssuch as kernel extraction, cube extraction and resubstitution are allowed. Conceptually, the satisfiability don t care of any node can be used for simplification of another node as long as it does not form a cycle in the network. To perform the simplification of nodes in parallel, we must find the largest set of nodes such that simplification of those nodes can be performed concurrently. Since this problem is NP-hard, we use a heuristic as described below. Each partition actor creates for each node 7 in its partition a simp(v) actor that is responsiblefor simplification of that node. As with the resub actors,the simp actors are distributed acrossdifferent processorsby the ProperCAD II run-time system. Upon creation, each simp(q) actor asks for permission from the master processorfor the simplification of n. Initially all of the nodes are unlocked. When the master processorpicks up the permission request, it checks if n is Boolean locked. If it is, the nzasrerprocessordenies permission and asks simp(q) to try later. Otherwise, the don t-care set is checked to determine if any of the nodes are are algebraically locked. If so, depending on the type of don t-care set used, the master processor will deny permission or simply supply a reduceddon t-care set. If permissionis granted,again dependingon the don t-care set, the nodes in the set are Boolean or algebraically locked. After simplification is complete, the other processorsare notified and the master processor will unlock the appropriate locks. It is very apparent from the description of the simplification procedurethat the first node to receive permission can use all of the don t-care sets it asks for. Therefore, for better simplification, we assigneddifferent priorities proportional to the size of the nodes(in terms of the literal count) to different simp actors. If two simp actors ask permission from the master processor simultaneously, the larger node will be processedfirst. With such a procedure, the don t-care sets of those nodes will be available for use, potentially improving the quality of node simplification.
Results and Discussion
In this section, we will discussresults obtained by applying Propox-MIS on various MCNC and ISCAS benchmark circuits. In Table 1, we compare the quality of circuits obtained by MIS- II with that of the ProperMIS algorithm in a uniproces-
Table 1: Comparison between MIS - I I and ProperMI

Circuit Initial bw 424 Literal Count Run Time (SW)
Table 2: Results for the Intel Paragon
MIS
Proper MIS
186
MIS
30.2
Propez MIS
32.4
i78
Table 3: Results for the Encore Multimax sor environment. The parametersand don t-care sets supplied to MIS-II and ProperMIS are identical for any circuit. The first 3 columns comparethe literal countsof the initial circuits, the circuits produced by MIS-II and the circuits produced by ProperMIS, respectively. The quality of the circuits produced by ProperMIS is almost the same(sometimesbetter) as that ofcircuits produced by MIS-I I. The last two columns compare the nmtimes for MIS- I I and ProperMIS for a Sun 4/69OMP[runtimes are given in seconds).Again, in most casesthe runtimesare comparable.The runtimesand qualitiesare not identicaldue to the nondeterminismin the order of applying various transformations. Tables 2, 3 and 5 present the results obtained on a distributed memory MIMD machme,the Intel Paragon,as well as two shared memory machines,an 8-processorEncore Multimax and a 4-procfL.ssor Sun 4/69OMPserver. Table 4 shows the results for a network of SPARCstation5 workstations. For different numbersof processors, the quality of the synthesiz.ed circuits in terms of the literal count and the run times in secondsand speedups are shown for some benchmarkcircuits. From the tables it is clear that the ProperMIS algorithm produces good speedups,and maintains quality comparableto that of the MIS- II approach. Some of the superlinearspeedupsare daleto anomaliesin parallel searchtechniques. One can observe that the maximum degradationof quality with multiple processors IS generally less than 10% The quality is sometimesimproved with a larger numberof processors, again, due to nondeterminism. Measurements of the parallielcharacteristicsof ProperME? are shown in Table 6 for C49Y on the Sun 4/69(fMP.While the user time to systemtime ratio is good. load balancing,however.ih poor, and we will be addressingthat issue in future research.
and ISCAS synthesisbenchmarkcircuits. The results, reportedon a variety of parallel architectures,show good speedupswith almost no degradationin the quality of the synthesized network over the uniprocessoralgorithm. By retaining about 80% of the sequentialcode from MIS-II, we were easily able lo update ProperMIS to the latest version of MIS, now part of SIS 1.2 [ 191.Preliminary results using the new version are shown in Table 7. We are in the processof evaluating the reasonsfor the difference in results between ProperMIS and ProperSIS. We are also currently investigating paral-. lelizing other aspectsof logic synthesispresentin s IS 1.2. Overly large circuits can not be handled by ProperMIS due to excessivememory requirements.In light of this, we have conductedresearch to partition thosecircuits and then synthesizeeach partition separatelyin parallel. In doing so, we have designedinnovative partitioning and redistribution methods that do not sacrifice the quality of the synthesizedlogic [16].
Conclusions
Acknowledgement
In this paper, we have presentedan asynchronousportable parallel algorithm for logic synthesis, called ProperMIS. implemented as part of the ProperCAD project As part of this work, vvc have developed novel parallel algorithms for the kernel extraction, cube extruction, rttsubstitution. and node simphjicotiorz proceduresof the MIS- II system. ProperMIS uses an asyn-. c hronousmessage-driven actor model of computation; no explicit synchronizingharriers are allowed in the algorithm. We have implementedthe algorithm and have eva1uatc.d it on various M( NC
We are grateful to Dr. Balkrishna Ramkumar for various discussions on the developmentof parallel algorithms, and to John G. Holm for his assistance in characterizingthe parallel behavior of ProperMIS. We are also thankful to the San Diego Supercomputing Center for granting us accessto their Intel Paragon.
References
[l] R. K. Brayton, G. D. Hachtel, C. McMullen, and A. SangiovanniVincentelli, ESPRESSO-II: A new logic minimizer for program-
Table 4: Results for a cluster of SPARCstation 5s
Table 6: Characteristics of ProperMIS.
Table 5: Results for a Sun 4/690MP
Table 7: ProperSIS
Results for a Sun 4/690MP
mable logic arrays, in Proceedings of tht Custom hltegrured Circuits Cortference, pp 370-376, June 1984 121 A. J. de Geus and W. Cohen, A rule-based system for oprimizmg combinational logic, IEEE Design & Test !!fCompufer~, pp 22.- 32, Aug. 1985. [:im] R. K. Brayton, R. Rudell, A Sangiovanmi-Vincentelli. and A R. Wang, MIS: A multiple-level logic optimization system, IEEE Trunx Computer-Aided Drvign, vol. CAD-6, pp. 1062-1081 Nov. 1987. 141 X.Xiang, Multilevel logic network synthesis systems, SYLONXTRANS. Ph.D. Dissertation, University of Illinois at UrhanaChampaign, 1990. [i] D. Bostick, G. D. Hachtel. R. Jacoby, M. R Lightner, l? Moceyunas, C. R. Morrison, and I). Ravenscroft, Thtk Boulder Optimal Logic Design system, in Dtgest of Pupers, Internutionul Conferencc~ on Computer-Aided De.qw. i,Smta Clara, CA) pp. 62-65. Nov. 19X7.
[IO] K. De, B. Ramkumar, and P. Banejee, A portable parallel algorithm for logic synthesis using transduction, IEEE Trans. Computer-Aided Design, vol. 13, pp. 566-580, May 1994. [I I] B. Ramkumar and P. Banerjee, ProperCAD: A portable object-oriented parallel environment for VLSl CAD, IEEE Truns. ComputerAided Design, vol. 13, pp. 839-842, July 1994. [ 121 L. V. Kale, The Chare Kernel parallel programming system, in Proceedings of the lnternatianul Conference on Purullel Processing, (St. Charles, IL), pp. 17-25, Aug. 1990. [I31 S. Parkes, J. A. Chandy, and P. Bane+, A library-based approach to portable,parallel,object-oriented programming: Interface, implementation, and application, in Supercomputing 94, (Washington, DC), pp. 69--7X, Nov 1994. [ 141 S. M. Parkes, A class library approach to concurrent object-oriented programming with applications to VLSI CAD. Ph.D. Dissertation, University of Illinois at Urbana-Champaign, Sept. 1994. Tech. Rep. CRHC-94-20AJtLLJ-ENC-94-2235.
1151 G. A. Agha. Actors: A Model of Concurrent Computation in Distributed Sysrems. Cambridge, MA: The MIT Press, 1986. [ 161 K. De and P. BanerJ:rJee, Parallel logic synthesis using partitioning, in Proceedqs of the Intern&ma1 Conference on Purullel Processmg, (St. Charles, IL), pp. III: 135-142. Aug. 1994. [ 171 R. L. Rudell, Logic synthesis for VLSI design. Ph.D. Dissertation, University of California, Berkeley, 1989. [ 181 K. De, Parallel algorithms for logic synthesis. Ph.D. Dissertation, University of Illinois at Urbana-Champaign, Sept. 1993. Tech. Rep. CRHC-93-20/UILU-ENG-93-2235. [19] E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. SavoJ, P. R. Stephan. R. K Brayton, and A. Sangiovanni-Vincentelli. SIS: A system for seyuential circuit synthesis, Tech. Rep. UCB/ERL M9u41, Department of Electrical Engineering and Computer Science, University of Catifomia. Berkeley, CA, May 1992
IO] R. Cahvanche and S. M. Reddy, A parallel PLA mimmization program, in Proc~eedings of f/w Design Automation Confkrence, (Miami Beach, FL), pp 600-607, June 1987. [ T] G. G. Zipfel, A parallel algorithm for algebraic factorization with application to multi-level logic synthesis. M.S. thesis, University of Illinois at Urbana-Champaign, June 1991 Tech. Rep CRHC-9I24/UILU-ENG-91-2134 [8] G. D. Hachtel and P. H, Moceyunas, Paralllel algorithms for boolean tautology checking, in Digest of Puper,s. Intwnutronul Co@rencc on Compufer-Aided Design. (Santa Clam CA). pp. 422-425. Nov. 1987. [<I] H.-K. T. Ma, S. Devadas. and A. Sangiovanni-Vincentelh, Logic verification algorithms and their parallel implementations. in Pmceedings ofthe Design Auromution Co@r~ncc~. (Miami Beach, FL), pp. 283-290. June 10X7.
585

Parallel Algorithms For Logic Synthesis

Uploaded by

Copyright:

Available Formats

Parallel Algorithms For Logic Synthesis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parallel Algorithms For Logic Synthesis

Uploaded by

Copyright:

Available Formats

Parallel Algorithms for Logic Synthesis using the MIS Approach

$ University of Illinois IJrbana, IL 61801 {jchandy,sroy,banerjee}@crhc.uiuc.edu

579 1063.7133/95 $4.00 0 1995 IEEE

Types of Logic hansformations

F = deX + fX + ag + cg + cde, G=ceX+fX,

a I Fa Fb2, F de Ff FcS.. Fg6 Ga

Figure 3: Consistent co-kernel cube matrix for Eq. 1

y-G--J Figure 5: Kernel-extract

actor search tree and associated

Figure 4: Conceptual search space for finding the maximum-valued rectangle

Results and Discussion

Table 1: Comparison between MIS - I I and ProperMI

Table 2: Results for the Intel Paragon

Table 4: Results for a cluster of SPARCstation 5s

Table 6: Characteristics of ProperMIS.

Table 5: Results for a Sun 4/690MP

Results for a Sun 4/690MP

You might also like