AiMap+: Guiding Technology Mapping for ASICs via Learning Delay Prediction †
<p>An example of an AIG and its mapped circuit with the Boolean function <math display="inline"><semantics> <mrow> <mi>f</mi> <mo>=</mo> <mi>a</mi> <mo>⊕</mo> <mi>b</mi> <mo>∧</mo> <mi>c</mi> <mo>∧</mo> <mi>d</mi> </mrow> </semantics></math>. The dashed lines on the edges indicate the wires with inverters.</p> "> Figure 2
<p>Framework Overview of our proposed AiMap<sup>+</sup>. The red arrow stands for the data flow in the training and testing phases and the green arrow only denotes that in the testing. “Feature Emb.” refers to “Feature Embedder”.</p> "> Figure 3
<p>Train Loss of the regression problem in our AiMap<sup>+</sup>.</p> "> Figure 4
<p>QoR distribution of different circuits under the three training strategies.</p> "> Figure 5
<p>Feature Importance analysis.</p> ">
Abstract
:1. Introduction
- Rather than employing a heuristic strategy to solely model the fanout feature for supergate delay estimation, we first convert it as a regression learning task. This approach involves integrating both the load-independent gate delay and the learned supergate delay to jointly guide the mapper’s search process.
- To facilitate the capability of the learning model, we further propose three parameterizable strategies from the perspectives of load-independent, load-dependent delay estimation, and the cut sample, to generate supergate delays with better circuit delay as learning labels.
- Benefiting from the learned labels, we design a learning model to predict the supergate delay. It perceives the pin delay through a customized attention mechanism and jointly learns the cut node and cut supergate features through a Neural Tensor Network.
- Experimental results from a wide range of benchmarks show that (i) AiMap+ noticeably improves area by 9.3% and delay by 1.5%, and improves the area by 12.3% with a 9.7% delay penalty compared with ABC [14] and SLAP [20], respectively, AiMap+ also improves delay by an average of 2.6% without any area penalty compared with AiMap [4]; (ii) in terms of delay-oriented mapping, AiMap+ considerably better than ABC, with an area improvement of 12%; (iii) our training strategies for label generation are able to achieve delay improvements of 26.6%, on average.
2. Preliminary and Related Works
2.1. Boolean Networks
- k-feasible Cut. A feasible cut C of the node of the AIG is a set of nodes in its transitive fanin such that every path from PIs to v passes through at least one node in C. Each node is called the leaf of the cut c, and a cut is considered k-feasible if it has a maximum of k leaves. A trivial cut of node v consists solely of v itself. Thus, for a node n with two fanins u, v, the set of k-feasible cuts can be computed by combining the set of cuts of u and v [32]:
2.2. Technology Mapping for ASICs
- Supergate Generation. Before the mapping method is invoked, supergates are typically generated by combining several standard cells into a single-out gate [14,27]. Specifically, the supergate is proposed to mitigate the structural bias problem which is the composition of the standard cells and contains many gates with diverse functionality [26]. Thus, after supergate generation, there are many supergates with the same functionality (usually more than 10), and the available supergates with the same functionality are sorted by their maximum pin-to-pin delay. These supergates are utilized to match the cut with different graph structures during mapping. Note, that the term supergate delay encompasses cell delay, as the set of standard cells is a subset of generated supergates.
- Mapping Flow in ABC. We next review the four main steps in a well-known technology mapper ABC for ASICs [14] which follows a dynamic programming paradigm based on the pre-computed k-feasible cuts and supergates. (1) ABC first computes all the k-feasible cuts for each node with . (2) For each cut, the truth table is computed to check whether it can be implemented by the pre-computed supergates using Boolean matching. (3) It further updates the best arrival time for each node from PIs to POs, where the time of each cut implemented by the supergate is estimated based on the standard cell library (i.e., using estimated gate delay without knowing actual arrival times and output loads). (4) Finally, the best cover is chosen from POs to PIs. Subsequently, area recovery is performed on the non-critical path using area flow and exact area heuristic. Figure 1b depicts a mapped circuit of the AIG with Boolean function f.
- Summary of related work. After introducing the preliminaries for ASIC mapping, we next review the related work in ASIC technology mapping, as shown in Table 2. From Table 2, we have summarized and compared various methods in terms of gate delay estimation, method keyword, optimization objective, and main contributions.
Paper | Gate Delay Estimation | Method Keyword | Optimization Objective | Main Contribution |
---|---|---|---|---|
Rudell [36] | LD-Delay | dynamic programming | delay minimization | first solve it in linear time for tree mapping |
Touati [35] | LD-Delay | load estimation; dynamic programming | delay minimization | piecewise linear functions for considering any load values |
Huang [29] | LD-Delay | adaptive load estimation | delay minimization | more accurate load estimation using piecewise linear funciton |
DAGON [15] | LI-Delay | tree covering problem; dynamic programming | delay minimization | partition into a forest of trees; using a tree pattern matching to match individual trees |
ABC [14] | LI-Delay | iterative gate selection; priority cuts | delay minimization | using delay/area-oriented flow to address structural bias problem on AIG |
SLAP [20] | LI-Delay | supervised learning; priority cut classification | QoR Improvement | use CNN to learn to classify priority cut |
MapTune [23] | LI-Delay | reinforcement learning | QoR Improvement | use reinforcement learning to guide mapping library tuning |
E-Syn [24] | LI-Delay | equivalence graph; regression for cost model | QoR Improvement | use E-Graph to rewrite AIG with technology-aware cost |
AiMap [4] | LI-Delay | regression learning; gate delay prediction | QoR Improvement | learn to improve mapping via delay prediction |
AiMap+ (Ours) | LI-Delay | regression learning; gate delay prediction | QoR Improvement | guide gate selection via learning delay prediction |
2.3. Timing Analysis
2.4. Problem Formulation
3. Learning Label Generation
- Strategy 1: Load-Independent Supergate Delay Estimation. We extend the load-indepen- dent supergate delay estimation by considering the impact of input slews. As illustrated in Equation (2), ABC provides a load-independent linear estimation of the non-linear delay behavior by providing a parameter Ga to measure the gain of pre-unit load. However, Equation (2) implies an assumption that the input slew w.r.t. the gate is already known which, therefore, fails to model the non-linear behavior of the input slew.
- Strategy 2: Load-dependent Supergate Delay Estimation. In order to enhance the accuracy of delay estimation by incorporating the impact of load, we propose a parameterizable strategy that takes into account the number of supergate fanouts w.r.t. the cut.
- Strategy 3: Cut Sample. Due to the sophisticated estimation of supergate delay, increasing the number of cuts in a node cannot directly lead to improvements in the area or delay optimization, and often results in settling for a suboptimal solution [20]. This insight inspires us can enhance the sensitivity of the quality of results to the cut size by filtering the cuts of each node. Specifically, we can randomly sample a relatively smaller number of cuts (with parameter r) as priority cuts for each node in this strategy. In practice, for the purpose of reproducibility of the best QoR, we indeed use five ranking criteria (with parameter t) to sort the candidate cuts to replace the random shuffle manner.
4. Approach–AiMap+
4.1. Framework Overview
4.2. Feature Embedder
- Node Embedding. The node feature extraction in AiMap+ mostly follows SLAP [20] that captures the node of AIG structural information from the node itself and from its two children. From the considerations of model efficiency, we opted not to employ graph representation techniques, e.g., Graph Convolutional Networks [37,38], and instead concentrated on the inherent features related to the supergate delay of the AIG itself. Specifically, the node u itself contains four features, i.e., level of u, fanout of u, whether an outgoing edge of u has an inverter or not, and the relative level of u. The features of two children and only include the first three features of u. The node features are summarized in Table 4.
- Cut Embedding. The features related to supergate delay of k-feasible cuts are also extracted from the fine-grained, where k is typically equal to 5 in the mapper of ABC. We collect 19 features for each cut from the cut–node and cut-structure aspects. In terms of cut–node, it includes six nodes as the features, i.e., the root node of the cut and its five leaves nodes. These features are stacked to create a representation by assembling each node embeddings . In terms of cut structure, it contains 13 features, i.e., (i) the root fanouts, (ii) the cut leaf number and the cut volume for the cut itself and its two parents, (iii) the max, min, and gap of the cut levels and fanouts, respectively. Similar to node embedding, the feature of root fanout in the cut-structure view is also encoded into a four-dimensional learnable embedding. The remaining 12 features of the cut-structure view are further normalized and the representation is denoted as: . In conclusion, the cut embedding is a combination of the cut–node and cut-structure .
- Supergate Embedding. We extract 60 features associated with the mapped supergate delay for each supergate from an open-source ASAP 7nm PDK standard cell library [39] and ABC. Below, we provide a brief introduction to these 60 features. They contain (i) the basic descriptions of the standard cell, e.g., the area, leakage power, and the number of inputs/outputs, (ii) features related to delay on each pin, e.g., the estimation of load-independent delay using Equation (2) and our estimation using Equation (3) for different pre-defined gain factors, and (iii) the overall features of the supergate, e.g., max and sum delay on all pins. Thereby, the supergate embedding is obtained followed by the normalization.
4.3. Learning Supergate Delay
- Pin-delay-aware Attention based on Cut-Node. Due to various factors such as fanout, and load capacitance of the supergate, different pins should receive different attention to estimating the supergate delay. Essentially, the six-dimension features of cut–node embedding correspond to the representation of one output and five input pins, such as fanout and other graph structure features. Thus, we propose the following attention mechanism to learn the different weights of pins to be adhering the varying contributions of different pins on the supergate delay, which is guided by a better overall circuit delay, as shown in Figure 2.
- Cut-Supergate Fusion. The cut-structure view embedding provides the structural features potentially related to the supergate delay, e.g., root fanout of the cut, while the supergate embedding provides the physical characteristic of the standard cell directly related to the supergate delay.
- Delay Prediction. Finally, the embedding for the supergate delay estimation is obtained by concatenating the cut–node and cut–supergate embeddings. A multi-layer perceptron (MLP) regression model is employed to gradually reduce the concatenated embedding for predicting the supergate delay w.r.t. the cut. That is, AiMap+ learns the delays of node–cut–supergate tuples from two perspectives: cut–node and cut–supergate, with labels corresponding to an improved circuit delay as shown in Section 3. These predictions are then utilized to guide the mapper’s search process, ensuring more accurate and efficient mapping by incorporating both load-independent and learned delay estimations.
5. Experiments
5.1. Experimental Settings
- Datasets. We perform the evaluation on the EPFL benchmark [31] and ISCAS’85 benchmark [42]. As for the circuit selection to evaluate our proposed approach AiMap+, we have chosen mostly arithmetic designs and partially control logic designs, which contain 20 circuits in total, i.e., 17 arithmetic circuits and three control circuits, as shown in Table 5.
- Implementation. We implement the framework AiMap+ based on the open source logic synthesis tool ABC [14] and the PyTorch for learning model [43]. In ABC, we implement two commands “gen_train” and “gen_inf” to generate the data collections for the training and testing phases, respectively. We also enhance the command “map” such that it supports updating the supergate delay with the predicted value and iteratively searching with the predicted delay. We employed the Adam optimizer with a learning rate of 0.001 to adjust the weights during training gradually. Additionally, we incorporated a weight decay of 0.005 to regularize the model, mitigating the risk of overfitting by penalizing large weights.
5.2. The Results of Training Model
5.3. QoR of Technology Mapping
- Comparison against ABC. AiMap+ noticeably improves area by 9.3% and delay by 1.5% on the 20 evaluation circuits, on average. As for the area, AiMap+ improves 16/20 circuits, achieving up to 29% on 64b_mult, and for most multiplier circuits, it can bring about a 20% improvement. Furthermore, our performance on arithmetic circuits is significantly better than on control logic circuits, primarily due to the fact that the majority (more than 70%) of node–cut–supergate tuples in the training dataset arise from arithmetic circuits. As for the delay, AiMap+ improves 9/20 circuits, achieving up to 24% on the ctrl logic. The relatively small improvement in delay is primarily attributed to the fact that both ABC and AiMap+ undergo area recovery in this test. However, area recovery can have unpredictable effects on delay for ASIC mappers due to the resulting changes in the graph structure, which in turn, impact the delay of supergates.
- Comparison against SLAP. AiMap+ significantly improves the area by 12.3% with a 9.7% delay penalty on 20 evaluation circuits, on average. The results of the circuit shared by SLAP and AiMap+ are derived from their original article, while we are unable to replicate the results of other circuits, which are denoted as “–”. As for the area, AiMap+ has surpassed SLAP on almost all circuits, achieving up to 33% on the bar circuit, with the exception of the adder (slightly less than 3%). As for the delay, the results of SLAP are superior to ours due to their exclusive focus on delay optimization. It should be noted that although both AiMap+ and SLAP learn strategies from historical decision data to guide the mapper’s search, their approaches differ significantly. SLAP learns to classify the priority cuts among mapping. However, the key issue in ASIC technology mapping is how to estimate the gate delay corresponding to a cut using structural information from the cut and process information from the library, a consideration absent in their methods [4].
- Comparison against AiMap. Compared to AiMap, AiMap+ improves delay by an average of 2.6% without any area penalty across 20 evaluation circuits. This also validates the rationale behind our AiMap revision, which includes (i) designing a more refined network to capture the latent graph structure and physical information of supergates and (ii) integrating heuristic supergate delay estimation with regression-based delay estimation to jointly guide the mapper’s search.
5.4. QoR of Delay-Oriented Technology Mapping
5.5. QoR of Training Strategies
- Training Strategies for delay optimization. We next highlight the QoR of the three strategies we put forward in Section 3 for generating better overall circuit delay. Note, that the three strategies are performed sequentially, e.g., the optimal parameters found in Strategy 1 will be carried forward to conduct the search in Strategy 2. We only recorded the improvement in delay, denoted as ΔDelay, disregarding the changes in the area. The results are presented in Table 7, from which we can draw the following findings.
- Training strategies for ADP optimization. In addition, our strategies can be extended to enhance the mapped network for both area and delay (i.e., Area-Delay-Product, ADP). Figure 4 presents the ADP distribution of different circuits under the three training strategies, from which we can derive the following insights.
5.6. Feature Importance Evaluation
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Chen, L.; Chen, Y.; Chu, Z.; Fang, W.; Ho, T.Y.; Huang, Y.; Khan, S.; Li, M.; Li, X.; Liang, Y.; et al. The dawn of ai-native eda: Promises and challenges of large circuit models. arXiv 2024, arXiv:2403.07257. [Google Scholar]
- Li, X.; Huang, Z.; Tao, S.; Huang, Z.; Zhuang, C.; Wang, H.; Li, Y.; Qiu, Y.; Luo, G.; Li, H.; et al. iEDA: An Open-source infrastructure of EDA. In Proceedings of the IEEE/ACM Asia and South Pacific Design Automation Conference (ASPDAC), Incheon, Republic of Korea, 22–25 January 2024; pp. 77–82. [Google Scholar]
- Li, X.; Tao, S.; Chen, S.; Zeng, Z.; Huang, Z.; Wu, H.; Li, W.; Huang, Z.; Ni, L.N.; Zhao, X.; et al. iPD: An Open-source intelligent Physical Design Toolchain. In Proceedings of the IEEE/ACM Asia and South Pacific Design Automation Conference (ASPDAC), Incheon, Republic of Korea, 22–25 January 2024; pp. 83–88. [Google Scholar]
- Liu, J.; Ni, L.; Li, X.; Zhou, M.; Chen, L.; Li, X.; Zhao, Q.; Ma, S. AiMap: Learning to Improve Technology Mapping for ASICs via Delay Prediction. In Proceedings of the International Conference on Computer Design (ICCD), Washington, DC, USA, 6–8 November 2023; pp. 344–347. [Google Scholar]
- Pei, Z.; Liu, F.; He, Z.; Chen, G.; Zheng, H.; Zhu, K.; Yu, B. AlphaSyn: Logic synthesis optimization with efficient monte carlo tree search. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Francisco, CA, USA, 29 October–2 November 2023; pp. 1–9. [Google Scholar]
- Wang, P.; Lu, A.; Li, X.; Ye, J.; Chen, L.; Yuan, M.; Hao, J.; Yan, J. Easymap: Improving technology mapping via exploration-enhanced heuristics and adaptive sequencing. In Proceedings of the ICCAD, San Francisco, CA, USA, 29 October–2 November 2023; pp. 1–9. [Google Scholar]
- Grosnit, A.; Zimmer, M.; Tutunov, R.; Li, X.; Chen, L.; Yang, F.; Yuan, M.; Bou-Ammar, H. Lightweight Structural Choices Operator for Technology Mapping. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 9–13 July 2023; pp. 1–6. [Google Scholar]
- Yu, L.; Guo, B. Timing-Driven Simulated Annealing for FPGA Placement in Neural Network Realization. Electronics 2023, 12, 3562. [Google Scholar] [CrossRef]
- Wang, Z.; Chen, L.; Wang, J.; Bai, Y.; Li, X.; Li, X.; Yuan, M.; Hao, J.; Zhang, Y.; Wu, F. A Circuit Domain Generalization Framework for Efficient Logic Synthesis in Chip Design. In Proceedings of the International Conference on Machine Learning (ICML), Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Senhadji-Navarro, R.; Garcia-Vargas, I. Mapping arbitrary logic functions onto carry chains in FPGAs. Electronics 2021, 11, 27. [Google Scholar] [CrossRef]
- Chowdhury, A.B.; Tan, B.; Carey, R.; Jain, T.; Karri, R.; Garg, S. Bulls-Eye: Active few-shot learning guided logic synthesis. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2022, 42, 2580–2590. [Google Scholar] [CrossRef]
- Kok, C.L.; Li, X.; Siek, L.; Zhu, D.; Kong, J.J. A switched capacitor deadtime controller for DC-DC buck converter. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Lisbon, Portugal, 24–27 May 2015; pp. 217–220. [Google Scholar]
- Liu, D.; Svensson, C. Power consumption estimation in CMOS VLSI chips. IEEE Solid-State Circuits 1994, 29, 663–670. [Google Scholar]
- Brayton, R.; Mishchenko, A. ABC: An academic industrial-strength verification tool. In Proceedings of the International Conference on Computer-Aided Verification (CAV), Edinburgh, UK, 15–19 July 2010; pp. 24–40. [Google Scholar]
- Keutzer, K. DAGON: Technology binding and local optimization by DAG matching. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), Miami Beach, FL, USA, 20 June–1 July 1987; pp. 341–347. [Google Scholar]
- Cong, J.; Wu, C.; Ding, Y. Cut Ranking and Pruning: Enabling a General and Efficient FPGA Mapping Solution. In Proceedings of the International Symposium on Field Programmable Gate Arrays (FPGA), Monterey, CA, USA, 21–23 February 1999; pp. 29–35. [Google Scholar]
- Mishchenko, A.; Cho, S.; Chatterjee, S.; Brayton, R. Combinational and sequential mapping with priority cuts. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 4–8 November 2007; pp. 354–361. [Google Scholar]
- Cong, J.; Ding, Y. FlowMap: An optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 1994, 13, 1–12. [Google Scholar] [CrossRef]
- Cong, J.; Ding, Y. On area/depth trade-off in LUT-based FPGA technology mapping. IEEE Trans. Very Large Scale Integr. Syst. 1994, 2, 137–148. [Google Scholar] [CrossRef]
- Neto, W.L.; Moreira, M.T.; Li, Y.; Amarù, L.; Yu, C.; Gaillardon, P.E. SLAP: A supervised learning approach for priority cuts technology mapping. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 5–9 December 2021; pp. 859–864. [Google Scholar]
- Chen, D.; Cong, J. DAOmap: A depth-optimal area optimization mapping algorithm for FPGA designs. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 7–11 November 2004; pp. 752–759. [Google Scholar]
- Calvino, A.T.; De Micheli, G. Technology Mapping Using Multi-output Library Cells. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Francisco, CA, USA, 29 October–2 November 2023; pp. 1–9. [Google Scholar]
- Liu, M.; Robinson, D.; Li, Y.; Yu, C. MapTune: Advancing ASIC Technology Mapping via Reinforcement Learning Guided Library Tuning. arXiv 2024, arXiv:2407.18110. [Google Scholar]
- Chen, C.; Hu, G.; Zuo, D.; Yu, C.; Ma, Y.; Zhang, H. E-Syn: E-Graph Rewriting with Technology-Aware Cost Functions for Logic Synthesis. arXiv 2024, arXiv:2403.14242. [Google Scholar]
- Calvino, A.T.; Riener, H.; Rai, S.; Kumar, A.; De Micheli, G. A versatile mapping approach for technology mapping and graph optimization. In Proceedings of the IEEE Asia and South Pacific Design Automation Conference (ASP-DAC), Taipei, Taiwan, 17–20 January 2022; pp. 410–416. [Google Scholar]
- Chatterjee, S.; Mishchenko, A.; Brayton, R.K.; Wang, X.; Kam, T. Reducing structural bias in technology mapping. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2006, 25, 2894–2903. [Google Scholar] [CrossRef]
- Wu, M.C.; Dao, A.Q.; Lin, M.P.H. A Novel Technology Mapper for Complex Universal Gates. In Proceedings of the IEEE/ACM Asia and South Pacific Design Automation Conference (ASPDAC), Tokyo, Japan, 18–21 January 2021; pp. 475–480. [Google Scholar]
- Cai, Y.; Yang, Z.; Ni, L.; Xie, B.; Li, X. Enhancing ASIC Technology Mapping via Parallel Supergate Computing. arXiv 2024, arXiv:2404.13614. [Google Scholar]
- Huang, S.C.; Jiang, J.H.R. A dynamic accuracy-refinement approach to timing-driven technology mapping. In Proceedings of the IEEE International Conference on Computer Design (ICCD), Lake Tahoe, CA, USA, 12–15 October 2008; pp. 538–543. [Google Scholar]
- Hu, B.; Watanabe, Y.; Kondratyev, A.; Marek-Sadowska, M. Gain-based technology mapping for discrete-size cell libraries. In Proceedings of the DAC, Anaheim, CA, USA, 2–6 June 2003; pp. 574–579. [Google Scholar]
- Amarú, L.; Gaillardon, P.E.; De Micheli, G. The EPFL combinational benchmark suite. In Proceedings of the IEEE/ACM International Workshop on Logic Synthesis, Washington, DC, USA, 1–23 September 2015. [Google Scholar]
- Mishchenko, A.; Chatterjee, S.; Brayton, R. Improvements to technology mapping for LUT-based FPGAs. In Proceedings of the International Symposium on Field Programmable Gate Arrays (FPGA), Monterey, CA, USA, 22–24 February 2006; pp. 41–49. [Google Scholar]
- Murgai, R. Technology-dependent logic optimization. Proc. IEEE 2015, 103, 2004–2020. [Google Scholar] [CrossRef]
- Chatterjee, S. On Algorithms for Technology Mapping; University of California: Berkeley, CA, USA, 2007. [Google Scholar]
- Touati, H.J. Performance-Oriented Technology Mapping; University of California: Berkeley, CA, USA, 1990. [Google Scholar]
- Rudell, R.L. Logic Synthesis for VLSI Design; University of California: Berkeley, CA, USA, 1989. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Liu, J.; Zhou, M.; Ma, S.; Pan, L. MATA*: Combining Learnable Node Matching with A* Algorithm for Approximate Graph Edit Distance Computation. In Proceedings of the International Conference on Information and Knowledge Management (CIKM), Birmingham, UK, 21–25 October 2023; pp. 1503–1512. [Google Scholar]
- Clark, L.T.; Vashishtha, V.; Shifren, L.; Gujja, A.; Sinha, S.; Cline, B.; Ramamurthy, C.; Yeric, G. ASAP7: A 7-nm finFET predictive process design kit. Microelectron. J. 2016, 53, 105–115. [Google Scholar] [CrossRef]
- Socher, R.; Chen, D.; Manning, C.D.; Ng, A. Reasoning with neural tensor networks for knowledge base completion. Adv. Neural Inf. Process. Syst. 2013, 26, 926–934. [Google Scholar]
- Bai, Y.; Ding, H.; Bian, S.; Chen, T.; Sun, Y.; Wang, W. Simgnn: A neural network approach to fast graph similarity computation. In Proceedings of the ACM international conference on web search and data mining (WSDM), Melbourne, Australia, 11–15 February 2019; pp. 384–392. [Google Scholar]
- Hansen, M.C.; Yalcin, H.; Hayes, J.P. Unveiling the ISCAS-85 benchmarks: A case study in reverse engineering. IEEE Des. Test Comput. 1999, 16, 72–80. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]
- Berman, C.L.; Carter, J.L.; Day, K.F. The fanout problem: From theory to practice. In Proceedings of the Decennial Caltech Conference on VLSI on Advanced research in VLSI, Cambridge, MA, USA, 1 June 1989; pp. 69–99. [Google Scholar]
- Singh, K.J.; Sangiovanni-Vincentelli, A. A heuristic algorithm for the fanout problem. In Proceedings of the ACM/IEEE Design Automation Conference, San Francisco, CA, USA, 17–22 June 1991; pp. 357–360. [Google Scholar]
- Van Ginneken, L.P. Buffer placement in distributed RC-tree networks for minimal Elmore delay. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), New Orleans, LA, USA, 1–3 May 1990; pp. 865–868. [Google Scholar]
- Fishburn, J.P. LATTIS: An iterative speedup heuristic for mapped logic. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), Anaheim, CA, USA, 8–12 June 1992; pp. 488–491. [Google Scholar]
- Coudert, O.; Haddad, R.; Manne, S. New algorithms for gate sizing: A comparative study. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), Las Vegas, NV, USA, 3–7 June 1996; pp. 734–739. [Google Scholar]
- Savoj, H. Technology dependent timing optimization. In Proceedings of the Workshop Note of IWLS, Tahoe City, CA, USA, 19–21 May 1997. [Google Scholar]
- Srivastava, A.; Kastner, R.; Sarrafzadeh, M. Timing driven gate duplication: Complexity issues and algorithms. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design, San Jose, CA, USA, 5–9 November 2000; pp. 447–450. [Google Scholar]
Circuits | Estimated Results | Actual Results | ΔDelay | ||
---|---|---|---|---|---|
Area (μm2) | LI-Delay (ps) | Area (μm2) | Delay (ps) | ||
adder | 898.31 | 2613.78 | 898.13 | 3770.65 | 44% |
bar | 2681.62 | 152.96 | 2680.39 | 1114.9 | 629% |
log2 | 26,556.98 | 3891.66 | 26,561.26 | 6797.77 | 75% |
cavlc | 463.27 | 185.07 | 463.29 | 93.2 | 50% |
int2float | 158.61 | 174.27 | 158.63 | 91.7 | 47% |
ctrl | 106.92 | 98.53 | 106.84 | 89.9 | 9% |
Notations | Descriptions |
---|---|
() | load-independent gate delay estimation (from the view of slew) |
() | induced delay per unit load (slew) |
() | pre-defined gain factor for load (slew) |
() | load-independent parasitic delay (from the view of slew) |
our designed load-independent gate delay estimation | |
delay of supergate g w.r.t. the cut under output load l | |
fanouts of the root nodes of the cut | |
fanouts of the leaf nodes of the cut | |
arrival time of the node v |
Node Features | Child 1 Features | Child 2 Features |
---|---|---|
level ( u) | level () | level () |
fanout (u) | fanout () | fanout () |
inverter (u) | inverter () | inverter () |
re-level (u) | – | – |
Circuits | # Nodes | ABC | SLAP | AiMap | AiMap+ | AiMap+/ABC | AiMap+/AiMap | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Area (μm2) | Delay (ps) | Area (μm2) | Delay (ps) | Area (μm2) | Delay (ps) | Area (μm2) | Delay (ps) | Area | Delay | Area | Delay | ||
adder | 1020 | 898.1 | 3770.7 | 1031.3 | 3268.7 | 1061.0 | 3486.1 | 1066.6 | 3482.4 | 1.19 | 0.92 | 1.01 | 1.00 |
bar | 3336 | 2680.4 | 1114.9 | 3083.2 | 923.8 | 2059.4 | 1059.0 | 2072.7 | 1140.6 | 0.77 | 1.02 | 1.01 | 1.08 |
log2 | 32,060 | 26,561.3 | 6797.8 | – | – | 23,330.1 | 6855.8 | 23,413.9 | 6963.5 | 0.88 | 1.02 | 1.00 | 1.02 |
multiplier | 27,062 | 25,458.3 | 4649.1 | – | – | 20,106.4 | 4512.4 | 20,202.3 | 4456.0 | 0.79 | 0.96 | 1.00 | 0.99 |
sin | 5416 | 5207.0 | 3955.6 | 5087.6 | 3584.8 | 4503.0 | 3599.2 | 4451.0 | 3435.0 | 0.85 | 0.87 | 0.99 | 0.95 |
sqrt | 24,618 | 20,252.2 | 180,518.1 | 19,918.6 | 185,281.0 | 19,720.6 | 187,546.9 | 0.97 | 1.04 | 0.99 | 1.01 | ||
C6288 | 2335 | 2991.8 | 1248.6 | 3023.5 | 1236.6 | 2479.5 | 1385.4 | 2430.8 | 1314.9 | 0.81 | 1.05 | 0.98 | 0.95 |
C7552 | 2835 | 1978.5 | 797.5 | 2002.0 | 800.1 | 1758.9 | 913.8 | 1796.7 | 948.1 | 0.91 | 1.19 | 1.02 | 1.04 |
mul32-b | 13,711 | 9889.4 | 3229.5 | – | – | 8041.2 | 3410.5 | 7873.0 | 3386.5 | 0.80 | 1.05 | 0.98 | 0.99 |
mul64-b | 53,023 | 39,736.5 | 6715.1 | – | – | 31,087.4 | 7058.3 | 30,838.2 | 6371.0 | 0.78 | 0.95 | 0.99 | 0.90 |
64b_mult | 65,856 | 52,318.6 | 7922.6 | – | – | 37,275.1 | 9343.5 | 37,155.0 | 9527.1 | 0.71 | 1.20 | 1.00 | 1.02 |
aes | 21,119 | 18,190.0 | 656.6 | 16,489.6 | 594.6 | 15,792.1 | 625.6 | 15,938.6 | 643.9 | 0.88 | 0.98 | 1.01 | 1.03 |
max | 2865 | 2312.3 | 3809.0 | 2292.4 | 3710.5 | 2128.2 | 5035.8 | 2154.1 | 4120.7 | 0.93 | 1.08 | 1.01 | 0.82 |
s9234_1 | 776 | 670.9 | 310.8 | – | – | 652.7 | 341.3 | 669.1 | 318.6 | 1.00 | 1.03 | 1.03 | 0.93 |
s5378 | 1186 | 903.3 | 387.4 | – | – | 943.9 | 365.9 | 960.9 | 327.1 | 1.06 | 0.84 | 1.02 | 0.89 |
C5315 | 2004 | 1342.3 | 591.0 | – | – | 1279.3 | 617.8 | 1242.5 | 609.9 | 0.93 | 1.03 | 0.97 | 0.99 |
i2c | 1342 | 981.6 | 300.2 | – | – | 981.6 | 300.2 | 993.1 | 269.8 | 1.01 | 0.90 | 1.01 | 0.90 |
cavlc | 693 | 471.2 | 294.1 | – | – | 480.3 | 259.8 | 474.7 | 249.0 | 1.01 | 0.85 | 0.99 | 0.96 |
int2float | 260 | 159.6 | 160.0 | – | – | 162.1 | 163.3 | 162.1 | 172.9 | 1.02 | 1.08 | 1.00 | 1.06 |
ctrl | 174 | 107.5 | 134.0 | – | – | 107.3 | 102.0 | 107.3 | 102.0 | 1.00 | 0.76 | 1.00 | 1.00 |
Geomean | 3876 | 3073.9 | 1537.8 | 2789.2 | 1555.1 | 2789.4 | 1515.1 | 0.907 | 0.985 | 1.000 | 0.974 |
Circuits | ABC | AiMap | AiMap+ | AiMap+/ABC | AiMap+/AiMap | |||||
---|---|---|---|---|---|---|---|---|---|---|
Area (μm2) | Delay (ps) | Area (μm2) | Delay (ps) | Area (μm2) | Delay (ps) | Area | Delay | Area | Delay | |
adder | 2369.43 | 2618.21 | 1398.51 | 2225.75 | 1417.64 | 2540.38 | 0.60 | 0.97 | 1.01 | 1.14 |
bar | 4198.34 | 1936.9 | 2779.53 | 1428.83 | 3196.17 | 1915.18 | 0.76 | 0.99 | 1.15 | 1.34 |
sin | 9784 | 6456.01 | 8691.31 | 6996.77 | 8412.31 | 6942.23 | 0.86 | 1.08 | 0.97 | 0.99 |
sqrt | 48,105.13 | 303,498.88 | 37,596.11 | 301,290.41 | 40,166.38 | 296,842.06 | 0.83 | 0.98 | 1.07 | 0.99 |
C6288 | 3822.76 | 1191.86 | 2479.53 | 1385.40 | 3977.19 | 1371.81 | 1.04 | 1.15 | 1.60 | 0.99 |
C7552 | 3748.58 | 1605.98 | 3733.41 | 1163.79 | 3608.38 | 1221.21 | 0.96 | 0.76 | 0.97 | 1.05 |
aes | 29,727.57 | 939.01 | 25,064.54 | 791.39 | 23,618.2 | 793.37 | 0.79 | 0.84 | 0.94 | 1.00 |
max | 4079.6 | 4640.24 | 4250.59 | 5667.84 | 4333.64 | 5555.36 | 1.06 | 1.20 | 1.02 | 0.98 |
C5315 | 2605.5 | 721.18 | 2522.92 | 778.18 | 2390.19 | 758.59 | 0.92 | 1.05 | 0.95 | 0.97 |
router | 329.16 | 617.27 | 452.56 | 609.04 | 345.49 | 630.81 | 1.05 | 1.02 | 0.76 | 1.04 |
Geomean | 4834.76 | 2862.15 | 4126.13 | 2729.81 | 4235.29 | 2850.57 | 0.88 | 1.00 | 1.03 | 1.04 |
Circuits | ABC | Strategy 1 | Strategy 2 | Strategy 3 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Area (μm2) | Delay (ps) | Area (μm2) | Delay (ps) | Delay | Area (μm2) | Delay (ps) | Delay | Area (μm2) | Delay (ps) | Delay | |
adder | 2371.12 | 2618.21 | 2325.37 | 2394.06 | 8.6% | 2325.37 | 2394.06 | 8.6% | 2325.37 | 2394.06 | 8.6% |
bar | 4198.34 | 1936.90 | 2389.64 | 934.51 | 51.8% | 3169.06 | 876.30 | 54.8% | 3275.38 | 872.59 | 54.9% |
max | 4092.32 | 4640.24 | 3448.38 | 4372.64 | 5.8% | 2724.51 | 2948.38 | 36.5% | 2731.99 | 2939.40 | 36.7% |
sin | 9785.40 | 6456.01 | 10,840.20 | 4498.79 | 30.3% | 8465.33 | 4438.97 | 31.2% | 8467.88 | 4434.73 | 31.3% |
i2c | 1167.24 | 244.35 | 1209.60 | 222.21 | 9.1% | 1241.32 | 208.55 | 14.7% | 1241.32 | 208.55 | 14.7% |
priority | 1736.08 | 2856.16 | 1697.73 | 2669.83 | 6.5% | 1697.73 | 2669.83 | 6.5% | 1697.73 | 2669.83 | 6.5% |
router | 452.56 | 609.04 | 441.56 | 497.60 | 18.3% | 441.56 | 497.60 | 18.3% | 464.73 | 488.26 | 19.8% |
Geomean | 2323.49 | 1813.76 | 2113.47 | 1442.44 | 20.5% | 2061.37 | 1336.26 | 26.3% | 2087.21 | 1331.07 | 26.6% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, J.; Zhao, Q. AiMap+: Guiding Technology Mapping for ASICs via Learning Delay Prediction. Electronics 2024, 13, 3614. https://doi.org/10.3390/electronics13183614
Liu J, Zhao Q. AiMap+: Guiding Technology Mapping for ASICs via Learning Delay Prediction. Electronics. 2024; 13(18):3614. https://doi.org/10.3390/electronics13183614
Chicago/Turabian StyleLiu, Junfeng, and Qinghua Zhao. 2024. "AiMap+: Guiding Technology Mapping for ASICs via Learning Delay Prediction" Electronics 13, no. 18: 3614. https://doi.org/10.3390/electronics13183614
APA StyleLiu, J., & Zhao, Q. (2024). AiMap+: Guiding Technology Mapping for ASICs via Learning Delay Prediction. Electronics, 13(18), 3614. https://doi.org/10.3390/electronics13183614