MIXED-SIGNAL LAYOUT GENERATION CONCEPTS
THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE ANALOG CI...
57 downloads
1004 Views
9MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
MIXED-SIGNAL LAYOUT GENERATION CONCEPTS
THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE ANALOG CIRCUITS AND SIGNAL PROCESSING Consulting Editor: Mohammed Ismail. Ohio State University Related Titles: HIGH-FREQUENCY OSCILLATOR DESIGN FOR INTEGRATED TRANSCEIVERS Van der Tang, Kasperkovitz and van Roermund ISBN: 1-4020-7564-2 CMOS INTEGRATION OF ANALOG CIRCUITS FOR HIGH DATA RATE TRANSMITTERS DeRanter and Steyaert ISBN: 1-4020-7545-6 SYSTEMATIC DESIGN OF ANALOG IP BLOCKS Vandenbussche and Gielen ISBN: 1-4020-7471-9 SYSTEMATIC DESIGN OF ANALOG IP BLOCKS Cheung & Luong ISBN: 1-4020-7466-2 LOW-VOLTAGE CMOS LOG COMPANDING ANALOG DESIGN Serra-Graells, Rueda & Huertas ISBN: 1-4020-7445-X CIRCUIT DESIGN FOR WIRELESS COMMUNICATIONS Pun, Franca & Leme ISBN: 1-4020-7415-8 DESIGN OF LOW-PHASE CMOS FRACTIONAL-N SYNTHESIZERS DeMuer & Steyaert ISBN: 1-4020-7387-9 MODULAR LOW-POWER, HIGH SPEED CMOS ANALOG-TO-DIGITAL CONVERTER FOR EMBEDDED SYSTEMS Lin, Kemna & Hosticka ISBN: 1-4020-7380-1 DESIGN CRITERIA FOR LOW DISTORTION IN FEEDBACK OPAMP CIRCUITE Hernes & Saether ISBN: 1-4020-7356-9 CIRCUIT TECHNIQUES FOR LOW-VOLTAGE AND HIGH-SPEED A/D CONVERTERS Walteri ISBN: 1-4020-7244-9 DESIGN OF HIGH-PERFORMANCE CMOS VOLTAGE CONTROLLED OSCILLATORS Dai and Harjani ISBN: 1-4020-7238-4 CMOS CIRCUIT DESIGN FOR RF SENSORS Gudnason and Bruun ISBN: 1-4020-7127-2 ARCHITECTURES FOR RF FREQUENCY SYNTHESIZERS Vaucher ISBN: 1-4020-7120-5 THE PIEZOJUNCTION EFFECT IN SILICON INTEGRATED CIRCUITS AND SENSORS Fruett and Meijer ISBN: 1-4020-7053-5 CMOS CURRENT AMPLIFIERS; SPEED VERSUS NONLINEARITY Koli and Halonen ISBN: 1-4020-7045-4 MULTI-STANDARD CMOS WIRELESS RECEIVERS Li and Ismail ISBN: 1-4020-7032-2 A DESIGN AND SYNTHESIS ENVIRONMENT FOR ANALOG INTEGRATED CIRCUITS Van der Plas, Gielen and Sansen ISBN: 0-7923-7697-8 RF CMOS POWER AMPLIFIERS: THEORY, DESIGN AND IMPLEMENTATION Hella and Ismail ISBN: 0-7923-7628-5 DATA CONVERTERS FOR WIRELESS STANDARDS C. Shi and M. Ismail ISBN: 0-7923-7623-4 DIRECT CONVERSION RECEIVERS IN WIDE-BAND SYSTEMS A. Parssinen ISBN: 0-7923-7607-2 AUTOMATIC CALIBRATION OF MODULATED FREQUENCY SYNTHESIZERS D. McMahill ISBN: 0-7923-7589-0
MIXED-SIGNAL LAYOUT GENERATION CONCEPTS by
Chieh Lin Philips Research Laboratories
Arthur H.M. van Roermund Eindhoven University of Technology and
Domine M.W. Leenaerts Philips Research Laboratories
KLUWER ACADEMIC PUBLISHERS NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
eBook ISBN: Print ISBN:
0-306-48725-X 1-4020-7598-7
©2005 Springer Science + Business Media, Inc. Print ©2003 Kluwer Academic Publishers Dordrecht All rights reserved No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher Created in the United States of America Visit Springer's eBookstore at: and the Springer Global Website Online at:
http://ebooks.kluweronline.com http://www.springeronline.com
Preface Layout generation is an important topic in IC design. For digital circuits a lot of research has been conducted in this area, resulting in a large variety of books and layout generation tools. However, with the ever increasing frequencies, we are facing now significant analog types of artifacts in the IC, introduced during the physical design phase, when schematics are translated to physical ICs via a layout. Moreover, with the trend to systems on chip, analog circuitry and massive amounts of digital circuitry are going to be integrated on the same chip, now called a mixed-signal chip. Compatibility between the high-resolution but low-level analog signals and the relatively large-swing digital signals with their fast transition edges is becoming a severe problem, which makes layout generation a tedious and complex job. In this book we focus on two strongly coupled aspects of automatic layout generation, placement and routing. We will discuss the problem in detail in the context of mixed-signal designs. Apart from the physical artifacts and their parasitic influence on the electrical behaviour of the circuit, we will address aspects related to the optimization problem associated with automatic layout generation. These are the optimization methods, with special emphasis on simulated annealing; adequate data structures; appropriate models and representations; and efficient algorithms. As optimization is an iterative process, incremental algorithms that only generate strictly necessary new information are especially interesting to speed up the process. These algorithms get special attention. The book can be seen as a combination of introductory texts and results of new research. Therefore it will be interesting for designers that like to get an overall picture, and for experts in the field who like to see the state of the art, and who will be interested in the new topics discussed in this book. Moreover, it is interesting both for designers and specialists in the area of circuit design and for those working in the area of electronic design automation (EDA). From the new contributions we will mention only a few selected issues here. A novel incremental approach to placement optimization is presented, featuring significantly improved asymptotic computational complexity results for a single placement computation within a simulated-annealing environment. A new consistent linear-time algorithm is described that maps a given placement of modules in a user-specified region to an efficient formal representation, such that the information can be further processed by means of efficient algorithms. Efficiency is an important issue for CAD tools. An improved robust placement algorithm is addressed. The algorithm can incorporate range and boundary constraints that are imposed on specific modules in a very efficient manner. Further, a framework is given that incorporates placement and routing, making it easy to take into account physical problems that are related to the spatial distribution of objects in a plane, which in this case is the plane of the IC. New results are shown on very fast Steiner-minimal-tree approximation algorithms in combination with efficient dynamic routing graph models. Extensive experimental evaluations of the proposed algorithms show that our algoritms compare favorably with the state of the art. Moreover, discrepancies are exposed in current packing-centric works that use inadequate
vi
Preface
routing schemes. This book is the result of research conducted in the Mixed-signal Microelectronics Group at Eindhoven University. At the time of conducting this research, all authors were members of this group. Chieh Lin was a Ph.D. student of mine, Domine Leenaerts staff member and coach. Both of them are now with Philips Research Laboratories in Eindhoven. It really is a pleasure for me to write the preface of this book, and I hope many designers will benefit from it. Prof.dr.ir. A.H.M. van Roermund Chairman Mixed-signal Microelectronics Group Eindhoven University of Technology
Contents Preface
v
List of Abbreviations
xi
1 Introduction 1.1 Outline of the Book
1 4
2 Mapping Problems in the Design Flow 2.1 Top-Down Flow and Bottom-Up Approach 2.1.1 A VLSI Design Cycle 2.1.2 Physical Design 2.1.3 Mixed-Signal Layout Styles 2.1.4 From Circuit to Layout 2.1.5 Layout System Requirements 2.2 The Mapping Problem 2.2.1 High-Level Specifications 2.2.2 Layout System Specifications 2.2.3 Constraint Mapping Problem 2.2.4 High-Level Sensitivities 2.2.5 Lower Level Sensitivities 2.2.6 Sensitivity Computation Problem 2.3 Placement and Routing Constraints
7 7 8 9 11 12 13 14 14 14 15 15 16 16 17
3 Optimization Methods 3.1 VLSI Optimization Methods 3.1.1 Deterministic Algorithms 3.1.2 Stochastic Algorithms 3.1.3 Heuristic Algorithms 3.2 Simulated Annealing 3.2.1 Basic SA Algorithm 3.2.2 Problem Representation 3.2.3 Perturbation Operators 3.2.4 Acceptance and Generation Functions 3.2.5 Temperature Schedule 3.2.6 Stop Criterion 3.2.7 Cost Function 3.3 Concluding Remarks
19 19 20 21 22 23 23 25 26 27 27 28 28 29
viii
Contents
4
Optimization Approach Based on Simulated Annealing 4.1 Optimization Flow 4.2 Problem Representation 4.2.1 Placement 4.2.2 Routing 4.2.3 Substrate Coupling 4.3 Perturbation Operators 4.4 Acceptance and Generation Functions 4.5 Temperature Schedule 4.6 Stop Criterion 4.7 Cost Function 4.7.1 Implicit Cost Evaluation 4.8 Concluding Remarks
31 32 34 34 34 36 36 38 38 39 39 40 40
5
Efficient Algorithms and Data Structures 5.1 Computational Model 5.2 Asymptotic Analysis 5.3 Computational Complexity 5.4 Data Structures for CAD 5.4.1 Corner Stitching 5.4.2 Linked List 5.4.3 Splay Tree 5.4.4 Hash Table 5.4.5 Priority Queue 5.4.6 Other Advanced Data Structures 5.5 Concluding Remarks
41 42 42 43 44 44 47 48 50 52 53 53
6 Placement 6.1 Previous Work 6.2 Effective and Efficient Placement 6.3 Representation Generality, Flexibility and Sensitivity 6.4 Sequence Pair Representation 6.5 Graph-Based Packing Computation 6.5.1 Relative Placement Computation 6.5.2 An Efficient Relative Placement Algorithm 6.5.3 Absolute Placement Computation 6.6 Non-Graph-Based Packing Computation 6.6.1 Maximum-Weight Common Subsequence (MWCS) Problem 6.6.2 Maximum-Weight Monotone Subsequence (MWMS) Problem 6.7 Graph-based Incremental Placement Computation 6.7.1 Incremental Relative Placement Computation 6.7.2 Incremental Relative Placement Computational Complexity 6.7.3 Incremental Absolute Placement Computation 6.7.4 Incremental Absolute Placement Computational Complexity 6.7.5 Average Incremental Computational Complexity 6.8 Implementation Considerations
55 58 59 61 64 68 69 74 76 79 79 81 85 87 94 96 98 104 104
Contents
Experimental Results 6.9.1 A Single Iteration 6.9.2 Packing Optimization 6.9.3 Conclusions 6.10 Placement-to-Sequence-Pair Mapping 6.11 Constrained Block Placement 6.11.1 Non-Graph-Based Constrained Placement 6.11.2 Implementation Considerations 6.11.3 Experimental Results on Non-Graph-Bascd Constrained Block Placement 6.11.4 Incremental Graph-Based Constrained Placement 6.12 Concluding Remarks
118 121 123
Routing 7.1 The Routing Problem 7.2 Classification of Routing Approaches 7.2.1 Routing Hierarchy 7.2.2 Routing Model 7.3 Previous Work 7.4 Computational Complexity 7.5 Global Routing Model 7.5.1 Model Efficiency 7.5.2 Global Routing Graph Computation 7.5.3 Supporting Dynamic Changes 7.6 Global Routing Algorithms 7.6.1 Two-pin Routing Algorithms 7.6.2 Minimal Bounding Box (MBB) Routing 7.6.3 Minimum Spanning Tree (MST) Routing 7.6.4 Path-Based Routing 7.6.5 Node-Based Routing 7.7 Benchmarking of Heuristics in Our Routing Model 7.7.1 Benchmark Problem Instances 7.7.2 Experimental Results 7.7.3 Concluding Remarks 7.8 Incremental Routing 7.8.1 Re-routing Nets Connected to Moved Modules 7.8.2 Re-routine Affected Nets Not Connected to Moved Modules 7.9 Impact of Routing on Placement Quality 7.9.1 Integrated Placement and Routing 7.9.2 Experimental Results 7.9.3 Conclusions 7.10 Concluding Remarks
125 126 127 128 130 131 132 133 133 134 135 137 138 142 143 144 149 151 151 152 158 158 158 162 164 164 165 167 167
6.9
7
ix
105 105 106 110 111 114 115 118
Contents
x
8 Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations 169 8.1 Previous Work 170 8.2 Efficiency and Accuracy Requirements 170 8.3 Self-Parasitics 171 8.3.1 Wire Resistance, Capacitance and Inductance 171 8.3.2 Via Resistance and Area 171 8.4 Crosstalk 172 8.4.1 Substrate Coupling 172 8.4.2 Parasitic Coupling Capacitance 174 8.5 Process Variations 175 8.6 Incorporating Crosstalk and Parasitics into Routing 175 8.7 Incorporating Substrate Coupling into Placement 176 8.7.1 A Basic Module 176 8.7.2 Generalized 2-Dimensional Substrate Coupling Model 177 8.7.3 Substrate Coupling Impact Minimization 179 8.7.4 An Efficient Substrate Coupling Impact Minimization Algorithm 180 8.7.5 Implementation Considerations 181 8.7.6 Experimental Results 181 8.7.7 Conclusions 182 8.8 Incremental Substrate Coupling Impact Minimization 183 8.9 Concluding Remarks 183 9 Conclusions
185
Bibliography
189
About the Authors
199
Index
201
List of Abbreviations
ADBH ADH BSG CA CAD CI CMWCS DAG DV EDA ESMT FPGA GSMT IC ILP IP ISPBH LCS LD LOT MBB MST MW CS MW MS NS PE RAM RSMT SA SCIM SMT SP SPBH SPH SSSP VLSI WL
Average-Distance-Based Heuristic Average-Distance Heuristic Bounded Sliceline Grid Chip Area Computer-Aided Design Coupling Impact Constrained Maximum-Weight Common Subsequence Directed Acyclic Graph Direct View Electronic Design Automation Euclidian Steiner Minimal Tree Field-Programmable Gate Array Graph Steiner Minimal Tree Integrated Circuit Incremental Longest Paths Intellectual Property Iterated Shortest-Paths-Based Heuristic Longest Common Subsequence Left-Down Labeled Ordered Tree Minimal Bounding Box Minimum Spanning Tree Maximum-Weight Common Subsequence Maximum-Weight Monotone Subsequence Non-Slicing Placement Evaluation Random-Access Machine Rectilinear Steiner Minimal Tree Simulated Annealing Substrate Coupling Impact Minimization Steiner Minimal Tree Sequence Pair Shortest-Paths-Based Heuristic Shortest-Paths Heuristic Single-Source Shortest Paths Very Large Scale Integration Wire Length
This page intentionally left blank
Chapter 1
Introduction The design of integrated circuits has been an actively exploited area for almost half a century already. The possibility to integrate a plethora of functions onto a small piece of semiconductor material has enabled the development of many high-tech systems, e.g. the modern personal computer. Without exaggeration, one can state that without the invention of integrated circuits, the world would not be as it is today. With improvements in manufacturing technologies, also the integration density of components within a single integrated circuit (IC) has increased dramatically. The exponentially growing trend with the number of components in an IC as a function of time, still seems to hold and is expected to hold for at least another decade [Semiconductor Industry Association, 1998]. Figure 1.1 depicts this trend graphically. This trend is better known as Moore’s Law. Within this figure, a few keywords clarify some important trends. A very noticeable effect is that with increasingly smaller feature sizes and larger designs, the intrinsic speed of transistors increases, but the (global) wire delays also increase. With this trend, a vast area arose dedicated to the integration of circuits
which is called the field of Very Large Scale Integration (VLSI).
2
Introduction
Increasing the number of components in a given area has an obvious cost benefit, because the number of produced ICs per time u n i t increases when other factors are kept constant. However, as always there is also a problematic dark side that comes with higher integration. The problems are even more pronounced due to higher operating frequencies of current systems. Roughly speaking, one part of the problems is related to the percentage of working ICs, which is called yield. Yield is a complicated factor that has l i n k s with many aspects of VLSI technology; from system design to circuit design, to layout design, to technology. Moreover, due to smaller sizes, accuracy and power dissipation problems emerge. Typically, most of these performance factors can be traded off against each other. The other part consists of the increasing influence of unavoidable parasitic elements such as parasitic resistances, capacitances, and inductances. But also parasitic substrate coupling and, to a lesser extent, electromagnetic coupling cannot be neglected anymore. Simply stated, the non-ideal behavior of semi-conductor material starts affecting the functionality of the IC in such a way, that measures have to be taken to ensure good functioning. The correct functioning of a system is especially susceptible to these parasitic phenomena in the case of mixed-signal designs, where both (insensitive and noisy) digital and (sensitive) analog building blocks are present on the same chip. Practical experience has been used and is still used to l i m i t the adverse effects of nonidealities. However, due to the high complexity of VLSI systems it is an immensely difficult task to handle all problems adequately, even for an expert designer. This is where the computer comes into play. When a computer is used properly, it is able to handle large amounts of data and process it in such a way that the generated output satisfies certain given specifications. The use of computers in a design task is called Computer Aided Design (CAD). A more appropriate term in connection with computer-aided design of ICs is Electronic Design Automation (EDA) which includes electronic CAD tools, but is more general. The purpose of a CAD tool is to support the designer during the process of realizing an IC. The final physical outcome of the design process is a disc of silicon, called a wafer, which consists of a number of more or less identical copies of the same integrated circuit. This wafer is then cut, resulting in a set of dies, each one of them containing the same integrated circuit. The creation of such a wafer is accomplished using a set of masks which are used to deposit several layers of different materials onto the wafer. The task of organizing geometric information in this context, resulting in an answer to the question which materials have to be put where on the wafer, is called physical design. The end result of a physical design step is a called a layout, which is essentially a set of masks that comply with given design rules. Also layout synthesis or layout generation are used frequently in the same context. Although layout generation is a very important step in the VLSI process, it is only one of many steps. Some other important steps are circuit design, simulation and verification. Our primary concern in this work is the layout generation step of VLSI design. More specifically, we deal with a remarkably interesting and challenging subclass consisting of both analog and digital ingredients. In essence, layout generation is accomplished by solving two strongly coupled problems under a set of constraints. These two problems are known as the placement and the routing problem. The design of integrated circuits is not new. Therefore, most problems associated with layout generation are not new. However, most problems became real problems when feature sizes approached submicron and deep-submicron dimensions and operating frequencies made a
3
leap forward. Historically, the roots of algorithmic approaches to designing layouts lie in the digital area. It is there where the circuits grew extraordinary fast to incredibly large sizes. Therefore, layout design automation was first investigated and employed in that area. However, contemporary mixed-signal designs also require the use of computers due to the large number of phenomena that have to be taken into account in order to comply with the specifications. Larger designs, tighter specifications and more interdependencies have led to increased complexity. The layout problem was boosted again recently by the observation that the layout problem for digital designs should be looked at through analog glasses. For, essentially, the phenomena that form bottlenecks in digital layout design, are analog in nature. An illustrative overview is given here which should give a good impression of the work that has been and is carried out in the VLSI layout generation domain, especially in the mixedsignal and analog fields. By no means does this overview intend to be complete. It only provides a representative sampling of existing mixed-signal and analog layout generation systems, in order to give an impression of the state-of-the-art in this field, and to illustrate the variety of ways in which typical constraints are handled and problems are approached. In the works [Malavasi, 1993, Charbon, 1995], all conducted at the University of California at Berkeley, several techniques are introduced for performance-driven layout of analog integrated circuits. The basic concepts are sensitivity computation, modeling of performance constraints and performance-driven placement, routing and compaction. The approach is only suitable for small analog integrated circuits, because the system behavior scales very badly with increasing problem size. Furthermore, the circuits that can be handled are assumed to be linear. Examples of such circuits are operational amplifiers and filters. In [Chang, 1994] the above approach is extended and incorporated into a top-down constraint-driven design methodology for analog integrated circuits. Researchers from Carnegie Mellon University have been quite successful with their tools ASTRX/OBLX [Ochotta et al., 1996] and KOAN/ANAGRAM [Cohn et al., 1991,Cohn et al., 1994]. The tools are known to be successful when applied to linear analog circuits such as filters and operational amplifiers. Also, the Catholic University of Leuven has contributed significantly to the development of analog CAD tools. In [Lampaert, 1998] analog layout generation for performance and manufacturability is described in detail. The system employs a top-down hierarchical design methodology in which the explicit generation of a specific set of low-level constraints is avoided. Instead, the layout tools are driven more or less directly by higher-level performance constraints via pre-determined performance sensitivity values. An implicit assumption of this approach is that the circuit is sufficiently linear in the region in which the layout parameters under consideration have influence. The ultimate goal is a layout that satisfies all performance constraints by construction. Several other groups and researchers have attempted to transfer the methodologies which are used in the digital VLSI domain to the analog and mixed-signal VLSI domain, but most of these approaches have not been very successful as yet. The main reason being the fact that digital approaches rely on certain assumptions (to reduce complexity) that are simply unacceptable in the analog domain. A good example of this is the consideration of only a critical path to determine the quality of wiring. From the previous discussion it should be clear that VLSI design consists of several tasks
4
Introduction
which are very hard to solve in a proper way. Also the design step that considers layout generation is extremely difficult. It should be clear that the quality of a layout is of utmost importance in any IC design. Thus far, only few researchers have concentrated on layout generation for mixed-signal designs. Although layout generators are known for analog designs, those systems are usually not suitable for general application to mixed-signal designs where the layout problems are worst. There is not a single layout generator which is best compared to others. All existing approaches and systems have fundamental limitations and weaknesses. The current layout generation problems suffer somehow from at least one of the following problems. Properly placing objects – sub-circuit modules – in a two-dimensional plane is performed poorly with respect to wiring quality. Only a subset of mixed-signal design constraints is (or can be) taken into account during placement and routing. Ad hoc solutions are used which are not very robust and require a significant amount of (problem dependent) tuning effort. Scalability properties are poor due to inefficient modeling and/or implementation. Thus, the necessity of improved layout generation concepts and systems is clear. The goal of this book is to establish methodologies and concepts to automate the layout generation step within the framework of mixed-signal VLSI design. In principle there are two ways to approach the problem of generating high-quality mixedsignal layouts. One approach is to keep refining well-known techniques from the digital design field, taking into account increasingly important second-order effects, so as to satisfy mixed-signal and analog requirements. The other approach is to review the physical design problem anew in order to find fundamentally more efficient means to tackle it. For instance, a more efficient formal representation could contribute to this, eventually leading to better solutions in less time, combined with better scalability performance. The term scalability is used here to denote the behavior of a computing system when the problem instance size under consideration increases. Scalability and generality of the approach is regarded as a major concern.
1.1
Outline of the Book
Hereafter, the main topics which are covered in this book are described briefly. First the problem we want to solve is described in detail in Chapter 2, from a system view to assessable subproblems. Also the inputs and outputs of the layout system are defined explicitly. It is made clear that various mapping problems need to be solved to establish a transparent interface between real-world specifications and desired specifications. Then, in Chapter 3 an overview of existing optimization methods is given which are possible candidates for VLSI-related problems. Based on our requirements and previous results on similar problems, we indicate which optimization method has preference.
1.1 Outline of the Book
5
Consequently, in Chapter 4 we focus on the chosen optimization method and explore it in more detail, describing its properties and its application to our problem representation. Thus, the most important aspects of our optimization approach are discussed. We then clarify the impact of efficient algorithms and data structures on the performance of the overall system in Chapter 5. Moreover, scalability issues in connection with the efficiency of algorithms and their data structures are discussed. One of the main topics in this book work is on the problem of optimally placing a set of objects in a two-dimensional plane under certain constraints. Chapter 6 discusses this problem in depth and an efficient solution is proposed. Theoretical analyses are carried out and compared with existing results. The chapter also gives an overview of all currently known approaches in the proposed placement context, linking several related but strictly mathematically oriented fields of research to the placement problem. Experimental results are obtained, which confirm the theory. As will be made clear at a later stage, placement of objects does not have any practical use if routing is ignored. Therefore, in Chapter 7 we explore routing issues within the same optimization framework, and put a significant amount of effort into establishing a routing methodology. Apart from this, existing results are compared with new experimental results and discrepancies in previous approaches are exposed. Chapter 8 covers the problems of non-ideal effects in a layout. Both crosstalk and parasitics are well-known culprits of performance degradation in high-frequency designs. In mixed-signal circuits these phenomena manifest themselves quite rapidly, as compared to fully digital designs with large noise margins or fully analog designs with less “noisy” components. An overview is given of existing methods to tackle problems due to crosstalk and parasitics. Furthermore, a method is proposed to incorporate substrate coupling into the optimization framework in a most efficient manner. Since algorithmic and representation efficiency are serious concerns throughout this research work, a major part of this book discusses fundamental concepts to improve efficiency. As these concepts are tightly coupled with a certain problem, it is more convenient to introduce and elaborate on them while discussing the underlying problem. A fundamental concept to improve efficiency is incremental computation. In essence, the idea is to compute only new information when it is strictly necessary. We show that this approach leads to fundamental improvements in placement and global routing efficiency in the adopted stochastic (simulated annealing) optimization framework. Note that higher computational efficiency automatically implies better scalability properties. In each chapter, where appropriate, experimental results are given after the discussion of the respective algorithms. Furthermore, we have attempted to describe and present the experiments (and their results) in such a way that comparison with existing works is not hampered. We end this book with main conclusions in Chapter 9.
This page intentionally left blank
Chapter 2
Mapping Problems in the Design Flow This chapter describes the problem which is attacked in this book in more detail. We show explicitly where the problem of layout generation is located within the overall VLSI design cycle. Then we zoom in on the layout problem and show that it is a non-trivial problem to solve. In order to solve the problem adequately, first it has to be defined in an accurate way. One part of problem definition entails proper modeling of physical entities. The other part is formulation of given “real-life” specifications into simpler specifications that can be handled properly at an algorithmic level. In principle, the layout problem can be split in two, strongly coupled, parts. One part is the placement or floorplanning problem, the other part is the routing problem. A typical layout problem could be stated as follows. Given a set of geometric objects to be placed in a two-dimensional plane, place these objects in such a way that a certain cost function is minimal. A standard textbook on physical design automation will take for the cost function the total length of all interconnecting wires. The catch is that in order to compute or estimate the length of a wire, placement information is needed. But, routing information is needed to compute a placement! This loop can, for instance, be broken by computing a placement based on the amount of interconnections between the blocks; blocks with many interconnections should be placed closer together than blocks with less interconnections. In textbooks, the approach is called the min-cut problem. It is an approximation to the layout problem, in that it does not cope with wire length but with number of interconnections. For more information on the layout problem the reader is referred to, e.g., [Lengauer, 1990, Sherwani, 1993].
2.1 Top-Down Flow and Bottom-Up Approach In order to adequately solve the layout problem, it is necessary to divide this complex problem into less complex and conceptually easier to handle subproblems such as the placement problem and the routing problem. Moreover, we have to define the required information to solve the layout problem more precisely. Hence, the path from electrical circuit description, process technology data, and system specifications, to the layout system is made explicit. Generally, we denote the flow of operations, from a high level to lower levels, leading to refinements at each step, a top-down flow. The information is essentially pushed into one direction, with or without some feedback path, to arrive at a desired target. The need for this essentially hierarchical approach is obvious when the initial problem is too difficult to assess at once.
8
Mapping Problems in the Design Flow
In cases where we precisely know the impact of a certain higher-level decision on a lower level, a top-down approach is very convenient. However, when problems become more complicated and interdependencies start playing an important role, it is almost impossible to accomplish the task in an adequate way with solely top-down information. As the amount of information from a lower level starts getting increasingly more important on a higher level, we speak of a bottom-up flow. We make plausible that, indeed, a bottom-up approach naturally applies to layout generation. During the top-down flow of the design cycle, we arrive at the point where information needs to be supplied to the layout generation system. Hence, the interface of the layout system must be defined explicitly, facilitating communication of relevant information to the layout system. As a consequence, the layout system itself can operate more efficiently and consistently generate predictable results as a function of several i n p u t parameters which will be described shortly.
2.1.1
A VLSI Design Cycle
The fundamental process steps in a VLSI design cycle are shown in Figure 2.1. The four blocks on the left-hand side represent processes that start from a very high level of an idea. Then specifications are defined in the next block. The architectural design process deals with functional blocks at a high level for which (conceptual) realizations exist, and their intercommunication. Finally at the lowest conceptual level, the functional blocks must be transformed to building blocks which are used in circuit design. This is the top-down part of the cycle. The result of the top-down flow is an electrical circuit. This is the basis of the bottom-up flow. The last four blocks of the cycle consist of the generation of building block layout modules at the lowest level, going up to placement and routing of these layout
2.1 Top-Down Flow and Bottom-Up Approach
9
modules. After placement and routing, the layout can be manufactured and finally tested to see if its functionality and performance complies with the original idea and its specifications. Although the overall flow of information is top-down, the last four blocks in the diagram are drawn “bottom-up”. The intention is to make clear that the physical design part requires an essentially bottom-up approach. The reason for this is that a mixed-signal design has both digital and analog components and typically these components are highly interconnected. It is this class of designs that is impacted most severely by parasitic effects such as substrate coupling, delay, mismatch, etc. As such, it is sheer impossible to decouple placement and routing while targeting high quality. Thus, predicting the result of placement and routing is at least as difficult. The outer flow in the figure states what type of information is exhibited at a certain stage in the design cycle. At the highest level the behavioral representation is eminent. After that, the structural representation becomes important, in which more precise information is given on what functions are performed where. At the technological representation level, the implementation aspects come into play. It specifies what type of circuit elements are used, their properties, and so on. The physical representation level comprises of everything that is directly related to the layout of the circuit on the wafer. Finally, a prototype IC is available. Note that the direction of the arrows only indicates the flow of the processes in time for each part of the overall design, not the interdependency of the processes. For example, in order to perform adequate placement and routing, information is needed on certain specifications. Furthermore, in the architectural design, testing facilities should be taken into account. In short, strong interrelationships exist between almost all of the VLSI process steps. Hence, it is impossible to regard a specific process step without taking notice of the other steps. On the other hand, including many process steps in an attempt to find a universal layout methodology will be too idealistic because of the intrinsic problem complexities involved. A way to solve this dilemma, is to define an interface from each block to the other blocks and specify exactly what is input and what is output, and find a methodology that will provide high-quality layouts within the confined framework. This is the well-known top-down approach.
2.1.2
Physical Design
The focus of this book is on several aspects of the physical design step; more specifically the placement and routing phases. Physical design is the last step in the design cycle where a designer can exercise his or her influence on the final performance of an integrated circuit before it is fixed onto silicon. In Figure 2.1 the part of the VLSI design cycle that will be focused on in this work, is shown in the shaded area: the circuit, module generation, and placement/routing. The physical design step in itself can also be seen as an iteration loop. In order to limit the complexity due to interdependencies, we presume that the given circuit, which is one of the inputs of the physical design step, is our nominal reference. As a consequence, we do not attempt to improve or to alter the behavior of our reference circuit; our goal is to prevent deterioration of system performance as much as possible due to undesired but unavoidable implementation phenomena such as crosstalk, wire delays, surface gradients, etc. Figure 2.2 gives a classical flow diagram which represents the physical design. As can be seen from the diagram, the input of the physical design phase consists of the circuit netlist, circuit specifications, and technology data. By means of module generation, the basic objects
10
Mapping Problems in the Design Flow
are conceived for the placement and routing phase, which are the core problems of physical design. The initial layout needs to be checked for design rules compliance. After that, an extraction of the layout needs to be performed. The extracted information is an annotated netlist including all parasitic elements which are not or only partially accounted for in the circuit netlist (schematic). This annotated netlist is compared with the original netlist to see if any discrepancies have been introduced, apart from the parasitics. Using the annotated netlist, circuit simulations are performed, typically with a Spice-like simulation tool. If all is well, and the specifications are complied with, the final layout is ready to be fabricated. If something is wrong, a change in the placement/routing is required and the loop is repeated until the layout is acceptable. However, it may turn out that the layout system cannot find
a satisfactory solution (even if the system would be ideal). In such cases there is an escape route via the dotted arrows to adjust, for example, the specifications, or transistor models which are used by the simulator.
2.1 Top-Down Flow and Bottom-Up Approach
2.1.3
11
Mixed-Signal Layout Styles
Several layout styles are available for implementing mixed-signal and analog integrated circuits. The differences in these styles are typically density, performance, flexibility, and timeto-market, where one characteristic is usually traded off against another. There is a broad variety of different layout styles. It is difficult to classify them from the point of layout flexibility, i.e. the degrees of freedom a designer has to make layout decisions. Hereafter follows a brief overview of a few common layout styles. For more information, the interested reader is referred to [Baker et al., 1998]. Full Custom In a full-custom layout every component in the design is hand-crafted with the ultimate tradeoff between performance, area, and power, which often results in highly irregular placement and routing. Typically, no restrictions are imposed on the width, height, aspect ratio, or terminal positions of the layout blocks. Furthermore, each block is allowed to be placed at any location on the chip surface without restrictions. Of course, design rules have to be taken into account at all times. Obviously, this technique has the largest flexibility, the best performance, and a very high integration density, since the layout can be optimized and tuned for each specific application. A major drawback of the full-custom layout technique is that it is immensely labor-intensive, resulting in large turnaround times and thus a large time-tomarket. In addition, the tools that (partially) support the designer in creating a high-quality layout are very complex (although their main task is to limit design complexity) and can only lead to a good layout with the aid of an expert. Standard Cell In order to overcome the drawbacks of the full-custom layout style, mainly due to complexity, several methods have been proposed to mitigate the overwhelming effect of complexity combined with full design freedom. This is essentially accomplished by putting restrictions and constraints on the physical design of the circuits. Standard cell layout is a common layout technique which was first introduced in the digital VLSI domain. It is featured by the use of a standard library of prefabricated cells with different functionalities. The standard cell (a layout block) is restricted to a fixed height and has variable width. All cells are placed in a number of rows. A certain amount of space between two rows, also called a channel, is reserved for routing. Thus, placement and routing have become (conceptually) simpler. Field-Programmable Gate Array (FPGA) The essence of an FPGA consists of a fixed number of functional (but primitive) building blocks distributed on a chip, where the actual interconnections are defined via electrically programmable switches. FPGAs cannot be used for higher frequencies because of inferior routing and additional parasitics. Sea of Gates The sea-of-gates design style is comparable with FPGA; all layout blocks are predefined on chip and the designer only has to define the interconnect. Unlike FPGA, no switches are
12
Mapping Problems in the Design Flow
used to define the routing. Instead, the interconnect is defined by a separate process step. Therefore, sea-of-gates design cannot be used easily for rapid in-house prototyping such as FPGA. Another noticeable difference between FPGA and sea-of-gates is that the latter has a very fine grain size compared with the former. Typically, an FPGA primitive cell consists of a multiple-transistor circuit, whereas a sea-of-gates primitive cell is a single transistor.
2.1.4
From Circuit to Layout
Figure 2.3 shows the top-down flow of information from a higher circuit-level description (schematic level) to a lower-level description of the same information incorporating implementation details (module level). Along the way, more implementation level details are incorporated using process technology data, with higher-level specifications guiding intermediate decisions that have to be taken. More specifically, at each level the higher-level specifications are translated to specifications which are meaningful for that specific level. In the diagram, we can speak of high-level (overall circuit) specifications, intermediate-level (subcircuit) specifications, and low-level (layout module) specifications. We assume that such a translation is always possible, although it might be difficult. Actually, this translation problem is discussed shortly, under the umbrella of the mapping problem. At the schematic level, the relevant specifications are the resistance value if the module is a resistor, a capacitance value if the module is a capacitor, a width over length value if the module is a CMOS transistor, etc. At the subcircuit level, for instance, matching between circuit elements is taken into consideration. Thus, matched elements must be grouped into circuit modules. At the layout module level, important specifications are drain/source capacitance, gate impedance, module size, etc. As can be seen, the technology data has impact on each level of abstraction. Thick arrows indicate a major translation effort (in order to obtain a high quality layout) as compared to thin arrows. This translation effort falls under the umbrella of the mapping problem. At this moment, viewing the layout system as a black box with certain inputs and outputs is convenient. The exact contents will be specified by decisions which are to be taken later on, based on the information given in this chapter. The layout system interface takes as input the process technology data: design rules, via resistance and capacitance, metal sheet resistances, substrate resistance, guard ring constructions, etc.; information on pins: the allowable range of resistance capacitance inductance seen at the output of a pin, the range of current amplitudes; information on modules: height (H) and width (W) of each module, the exact position of each pin connected to a module, the nets connected to a module, the sensitivity or noisiness of a module, etc.; a cost function: the parameters that need to be optimized in the layout, importance of certain parameters over others, constraints on specific module positions, constraints on the total layout size or aspect ratio, etc. A pin is located at the perimeter of a layout module, and forms a gateway to the outside world as seen from the module. Furthermore, a (layout) module is assumed to be rectangular.
2.1 Top-Down Flow and Bottom-Up Approach
13
The output of the layout system is a layout which is essentially similar to the high-level overall reference circuit. After all parasitic elements have been added to the reference circuit schematic, a simulation should show that the layout complies to all specifications. It is important to note that although issues such as yield and reliability are not taken into account, the layout system should not preclude the integration of these important matters. Therefore, generality of the layout system is a concern throughout this work.
2.1.5
Layout System Requirements
Since a practically feasible layout system is our ultimate goal, some requirements for ensuring this have to be set: computational efficiency must be high, to allow for scalability;
14
Mapping Problems in the Design Flow
generality must be high, to allow for easy incorporation of models of performance degradation; robustness must be high, to produce consistently good and predictable solutions.
2.2
The Mapping Problem
The layout problem has been defined in terms of a magic black box system that, given specific inputs, has to produce a high-quality layout as output. There is a noticeable difference between the input parameters of the layout system interface and the high-level specifications and circuit descriptions. This discrepancy between the real-world high-level specifications and the interface specifications has to be resolved. In other words, we have to solve the mapping problem which is defined as follows. Problem: The mapping problem Instance: Solutions: Minimize:
A set of high-level overall circuit specifications, and a set of desired layout system interface inputs. All mappings that translate the high-level specifications into the layout system inputs. The sensitivity of high-level specifications to layout system input parameters while adhering to all specifications, so that the best possible layout can be output.
Note that minimizing sensitivity is similar to maximizing flexibility in parameter value range in practical circumstances. Implicitly, this mapping is shown in Figure 2.3. Generally it is not trivial to perform this mapping. Hereafter, the mapping problem is discussed to make the reader aware of this problem, but no solution is proposed in this hook.
2.2.1
High-Level Specifications
High-level specifications define and quantify the functionality and the quality of an overall circuit in typical terminology. Depending on the type of circuit under consideration, different terminology is used. Table 2.1 shows a few examples of circuits and their associated highlevel specifications. The relationship between the high-level specifications and the functionality of a circuit is clear. A direct consequence of this is a broad diversity of quality measures.
2.2.2
Layout System Specifications
The most striking difference of the low-level specifications in terms of our layout system interface, as compared to the high-level specifications, is the apparent independency of the measures as a function of the circuit type. Actually, the dependencies are hidden in the pin info and module info inputs of the interface. Recall that the pin information holds, among others, the allowable range of load capacitance, resistance, etc. Furthermore, the module information holds, among others, the locations of the pins at the perimeter of the module, the sensitivity of the module, etc. A direct consequence of this observation is the problem of how high-level specifications are mapped to layout system specifications.
2.2 The Mapping Problem
2.2.3
15
Constraint Mapping Problem
A specification is also called a constraint. Under the general umbrella of constraint management and transformations, the (constraint) mapping problem has been investigated by numerous researchers. However, all known works, with mathematical justifications, have been restricted to purely linear analog systems [Malavasi and Charbon, 1999]. The reason for this is quite obvious; computations with linear systems and their properties are much simpler than with nonlinear circuits. Although some authors have claimed that the linear approach can also handle nonlinear systems by using linearization techniques, this only holds true when operating in the close vicinity of a certain static biasing point. For typical mixed-signal designs such as A/D and D/A converters, in which large signal transitions occur and biasing points are definitely not static, linearization is inappropriate. As a consequence, heuristics have been used to transform high-level constraints to low-level constraints [Jusuf et al., 1990]. A major disadvantage of the use of heuristics is that it might ignore/oversee problems which also need to be considered. Also, it is hard to quantify the quality of obtained results in terms of high-level specifications, even in case of full compliance to heuristic rules.
2.2.4
High-Level Sensitivities
Due to the fact that a (large) set of tunable parameters is available in the mapping problem, sensitivities are needed to obtain a proper set of parameters. This set of parameters should be representative for the robustness of the circuit. Generally, the high-level sensitivities as a function of high-level parameters are strongly nonlinear with strong interdependencies. Consequently, proper transformation of parameters, for instance to a lower-level parameter, with awareness of sensitivities is a daunting task. When a circuit is designed at a high level of abstraction, a good designer knows that the actual component values and properties he or she had in mind when conceiving the design, is highly likely not the exact achieved value in a prototype. Thus, in order to mitigate the effect of deviations from the nominal (desired) values, it is necessary to have an idea of circuit sensitivities. Usually these sensitivities are not specified, but it is a well-known fact that a consumer-ready design must be robust against process deviations. In Table 2.2 a few examples are given of high-level sensitivities for an analog filter, a D/A converter, and a digital decoder. As can be seen, the high-level sensitivities of a circuit are directly related to the functionality of it, resulting in a diversity of sensitivity measures.
16
2.2.5
Mapping Problems in the Design Flow
Lower Level Sensitivities
When high-level specifications are transformed to lower level specifications, a desirable property is that the high-level specifications are adhered to if the lower-level specifications are adhered to. Furthermore, it is undesirable that a small change in a lower level parameter causes a large change in a higher level parameter. Therefore, the sensitivity of high-level parameters needs to be minimal with respect to lower-level parameters. We denote this relationship by low(er)-level sensitivity. An example of a low-level sensitivity (at the subcircuit level) is sensitivity of integral nonlinearity with respect to matching of transistors in a specific differential pair. Generalizing, every sensitivity can be expressed as the level of dependency of a high-level specification on a lower level parameter. Formally this is written as
where is some kind of performance measure and is a lower-level parameter. If high-level sensitivities are known, (2.1) can also be computed using
where is a high-level parameter (such as clock jitter). Layout system sensitivities are (implicitly) represented by the range of allowable values for each pin parameter: etc. Also modules have low-level sensitivity measures associated with them. For instance, module noisiness and module sensitivity are two module parameters that are useful for minimizing the detrimental effect of substrate coupling. The former quantifies the capability of performance degradation that can be inflicted on neighboring modules. The latter quantifies the vulnerability of a certain performance measure to substrate noise.
2.2.6
Sensitivity Computation Problem
As discussed previously, in connection with the mapping problem, sensitivities play an important role; the accuracy of a specification and the ability to adhere to that specification with high probability, is significantly influenced by low-level sensitivities as defined by (2.1). The exact computation of (relevant) sensitivities falls outside the scope of this work. Nonetheless, we point out that sensitivity computation is essential to successful practical layout generation. In order to show the effectiveness of certain concepts, we use randomly generated sensitivity values. These values are not related to a physical property, but merely serve to show the strength of a methodology. Consequently, the sensitivity values should be interpreted in a fuzzy sense with a relative character.
2.3 Placement and Routing Constraints
2.3
17
Placement and Routing Constraints
Classical constraints on placement and routing are smallest possible chip area and minimal wire length. In the context of mixed-signal layout generation this is clearly not sufficient. Therefore, additional constraints are needed such as crosstalk-aware routing constraints, substrate-aware placement constraints, matching-aware placement constraints, and so on. In this book, we attempt to develop a general-purpose framework which allows for the incorporation of these additional constraints in a straightforward way. Although not all concepts have been actually worked out down to the implementation level, the awareness of these constraints is paramount in our approach. From another standpoint, we are forced to put additional constraints on the placement and routing approach. This is mainly a direct consequence of computational efficiency considerations. From this point of view, a very important placement-related constraint is the fact that overlap of modules should be avoided. An argument to allow for overlap is that it could possibly merge source/drain connections of a pair of transistors, leading to a reduction in source/drain capacitances [Lampaert, 1998]. Overlap is, however, detrimental to system performance in two ways. First, overlap typically is undesirable because it generally leads to design rule errors. Hence, the evaluation of overlapping placements is a costly waste of computation time. Second, it is not possible to make accurate estimations with respect to, for example, substrate coupling and wire length for an illegal placement with overlap. We note that allowing general overlap is not a good technique to obtain effective (and efficient) merges, as desired merges can be handled within modules a priori. For instance, candidate transistor pairs for merging can be identified beforehand and put into a single module in advance. This approach does not only solve the issue of overlap, but also reduces the size of the placement solution space. Since we allow a module to consist of sub-circuits, the number and positions of the pins at the circumference of a module should in principle be unrestricted. The constraints we put on routing are as follows. We do not allow over-the-cell routing. Although more than two metal layers are typically available for routing in modern process technologies, the exploitation of only the lowest metal layers for device-level routing is beneficial in many ways: the reduction of routing problems at higher layers during macro-block routing, the reduction of yielddecreasing vias, the avoidance of unpredictable interaction with intellectual property (IP) blocks. Each pin-interconnecting network should have minimal length. For reasons of simplicity, but without being too restrictive, we assume that this is an optimal way to connect the pins of a net. Unfortunately, this apparently simple problem (at least for a small number of pins), is a very hard problem which is better known as the Steiner minimal tree problem [Hwang et al., 1992, Kahng and Robins, 1995]. We justify these restrictions using the following arguments. In mixed-signal layout design the effective use of space around modules in the lowest metal layer decreases the unwanted coupling between, for instance, polysilicon and metal considerably. Moreover, any created coupling can be controlled much more tightly. The oupling from higher metal layers to the bottom layers is significantly smaller, which justifies the use of higher metal layers for overthe-cell routing.
18
Mapping Problems in the Design Flow
The minimal-length metric is less restrictive than it seems since it does not imply a geometric metric. In fact, a very broad class of interconnection networks can be covered by defining (sophisticated) weight functions for the branches in the network. These weight functions typically depend on physical properties of each branch, e.g. the voltage/current variation and magnitude, or the physical location of each branch. The latter accomodates (parasitic) interaction of this branch with neighboring obstacles. Besides the fact that area and wire-length constraints are very important, they are definitely not the sole constraints relevant to placement and routing. Especially in the context of mixed-signal layout generation, we must refine this set of primary constraints and additionally include, or allow for inclusion of performance-related constraints such as substrate coupling impact minimization, crosstalk minimization, optimal matching, etc. The proposed framework should be able to incorporate the overall set of constraints in an efficient manner.
Chapter 3
Optimization Methods In this chapter a variety of well-known VLSI optimization methods is described. As pointed out in Chapter 2‚ there are many constraints involved in mixed-signal layout generation which makes this task intrinsically difficult to solve properly. Moreover‚ due to the many types of constraints that are involved‚ the type of optimization algorithm which is used to generate a layout can have a significant influence on the final result‚ both in quality and in computation time. Naturally‚ each type of optimization framework has its cons and pros. The points that are regarded important in our task are: easy handling of a heterogeneous mixture of constraints‚ efficient placement and routing representations‚ efficient computation of placement and routing solutions‚ practical achievability of near-optimal results‚ low implementation complexity. First an overview of existing approaches to successful VLSI optimization is given. Then one of the approaches is selected‚ based on the previously described criteria‚ and used for our optimization framework. It should be noted that most of the described optimization methods have been shown to work well on a given set of problems. Conversely‚ it is a known fact that an optimization method that performs well on a certain class of problems‚ might perform poorly on another class of problems‚ with or without tuning. Thus‚ generalizing results to related or modified problems should be done with utmost caution. We attempt to place the shortly presented methods under the same uniform umbrella of placement and routing. However‚ only some of these methods have properties which are suitable for general placement and routing‚ taking into account the previously mentioned important points. We elaborate on one of the most promising methods which is known as the simulated annealing algorithm.
3.1
VLSI Optimization Methods
The layout generation problem is inherently very difficult to solve. Even when split in several sub-problems it remains difficult to solve. Generally‚ all non-trivial problem instances
20
Optimization Methods
are intractable‚ i.e. it requires an excessive amount of time to solve a problem instance to optimality when the instance size is increased. In other words‚ the problems are NP-hard [Garey and Johnson‚ 1979]. Nonetheless‚ in practice the layout generation problem is split into a placement and a routing phase. The last one may again be split in a global routing and a detailed routing phase. As a direct consequence of the NP-hardness of layout generation‚ we have to resort to heuristic or approximation methods that yield an acceptable solution within reasonable time. The following classification might not be optimal‚ but it is one that matches well with contemporary ideas. Furthermore‚ it provides a good impression of the vast body of research activities in this field. An extensive overview can be found in [Lengauer‚ 1990]‚ A very recent‚ and more mathematically flavored comprehensive overview is contained in [ Bliek et al.‚ 2001].
3.1.1
Deterministic Algorithms
A deterministic algorithm is a recipe that describes which steps have to be taken sequentially‚ in order to transform a set of input values to a set of output values. For such an algorithm no random number generator is needed to execute and find a solution. These type of algorithms are typically used in a graph representation of a problem. Typical properties of deterministic algorithms are: sub-optimality of the solution‚ high execution speed‚ the same solution is found‚ each time the algorithm is run. As deterministic algorithms were the first type of algorithms to see the light‚ the number of such algorithms is very large. Only a few deterministic algorithms will be mentioned here. Problem-dependent Methods Rule-based algorithms In this approach‚ expert knowledge is translated into rules which are used by the system to generate a proper layout. Clearly‚ the quality of the rules is of paramount importance. Furthermore‚ the set of rules should be adapted to accomodate for new types of circuits and layout techniques that are introduced. As a consequence‚ maintaining a good set of rules is labor intensive. A fundamental problem in connection with a rule-based approach is the difficulty of defining general and context-independent rules. Template-based algorithms As the name implies‚ templates are used as a starting point‚ guided by specific values of input parameters‚ to transform a certain template to a proper layout. The creation of the templates is a knowledge-intensive task‚ which is one of the main bottlenecks of this approach. Moreover‚ the set of obtainable layouts is limited to the set of available templates and their combinations.
3.1 VLSI Optimization Methods
21
Problem-independent Methods Linear programming algorithms A linear programming algorithm describes the problem as an constraint matrix A, an and a cost vector A solution of a linear problem is then one that satisfies the linear constraints and while minimizing Divide-and-conquer algorithms Divide-and-conquer algorithms partition the problem into more or less independent subproblems‚ solve the subproblems recursively‚ and then combine their solutions to solve the original problem. Dynamic programming algorithms Dynamic programming‚ like the divide-and-conquer method‚ solves problems by combining the solutions of subproblems. Programming in this context refers to a tabular method‚ not to writing computer code. In contrast to divide-and-conquer algorithms‚ dynamic programming is applicable when the subproblems are not independent‚ that is‚ when subproblems share subsubproblems. In this respect‚ a divide-and-conquer algorithm does more work than necessary by solving common subsubproblems more than once. The latter is avoided by dynamic programming through the use of a table in which each solution to a solved subsubproblem is stored‚ saving a significant amount of computation time. Branch-and-bound algorithms The branch-and-bound method is an exact method that can be applied to a broad class of problems. All that is required is a tree-structured configuration space and an efficient way of computing tight lower bounds on the cost of all solutions containing a given partial solution. Typically this method can only be applied successfully to small problem instances‚ but with clever pruning techniques a larger solvable range can be reached.
3.1.2
Stochastic Algorithms
Stochastic algorithms have been introduced to circumvent most problems of deterministic algorithms. There is a fundamental difference between inherently stochastic algorithms and stochastic versions of deterministic algorithms. The former type of algorithms is usually inspired by natural phenomena. The latter is a randomized extension of a deterministic algorithm in order to improve worst-case performance and facilitate algorithm analysis. Although most inherently stochastic (or probabilistic) algorithms are based on elegant theories with very desirable properties with regard to their ability to find a globally optimal solution‚ practical constraints limit the usefulness of these theories. The most striking example is a commonly used mathematical operation where the time variable approaches infinity. When convergence is slow‚ the required time for finding an optimal solution may become prohibitively large thus rendering the algorithm practically worthless. Nonetheless‚ an algorithm in this class can be useful when a near-optimal solution is good enough and such a solution can be obtained within a reasonable amount of time. By truncating an unlimited time interval to a limited interval‚ theoretical properties are normally invalidated. Hence‚ no guarantee can be given on how close an obtained solution lies to
Optimization Methods
22
an optimal solution. Techniques to improve speed‚ average solution quality‚ or any other desirable property which is not supported by mathematical evidence‚ turn any algorithm into a heuristic algorithm. Two well-known stochastic algorithms are simulated evolution [Darwin‚ 1859‚ Rechenberg‚ 1973] and simulated annealing [Kirkpatrick et al.‚ 1983]. Lately‚ there has been an increased interest in so-called memetic algorithms‚ a concept which sprouted from the mind of Dawkins [Dawkins‚ 1976]. Memetic algorithms are a generalization of genetic algorithms in which the human mind plays a crucial role; cultural influences have a significant influence on the survival capability of a certain species‚ in conjuction with specific natural genetic properties.
3.1.3
Heuristic Algorithms
Heuristic algorithms or simply heuristics‚ belong to a popular class of algorithms in which intuitive ideas or promising tricks are employed to search for good solutions of NP-hard problems. Also‚ (partial) randomization might be used to achieve this. Generally‚ there is no way of finding out how close we have come‚ in absolute measures‚ to a global optimum and with what probability. Even though some solutions that are obtained for given problem instances might be quite good‚ no guarantee can be given on how good the heuristic will perform on another problem instance. The heuristic approach is by far the most widespread method in practice today. All of the iterative improvement techniques (in the context of NPhard problems)‚ both deterministic and stochastic‚ fall in this category. Note that even an algorithm such as simulated annealing‚ which will find an optimal solution in theory‚ turns into a heuristic due to practical constraints such as limited time. The last observation implies that optimization results produced by heuristics‚ should be evaluated with statistical means in order to enable fair comparison. A true drawback of most heuristics is reproducibility of solutions‚ so that independently obtained results can be verified or compared. Due to the sensitivity of the results of heuristic approaches to the optimization environment‚ it is difficult in practice to make reliable comparisons (even in a statistical sense). Most popular stochastic heuristic approaches are based on either the genetic algorithm‚ or the simulated annealing algorithm. The genetic algorithm or simulated evolution algorithm [Rechenberg‚ 1973‚ Holland‚ 1975] is suitable for a wide variety of optimization problems. It represents an artificial simulation of the biological evolution of species‚ as conceived by Charles Darwin [Darwin‚ 1859]. The optimization problem is described as a set of candidate solutions (the search space) and an object function that has to be maximized. In analogy with biological systems‚ the object function is called fitness function. Each solution is associated with a certain fitness value. In order to find a solution with highest fitness‚ the genetic algorithm produces a sequence of populations of candidate solutions. The generation of each successive population is a random process‚ guided by the fitness of the members of the previous population. Typical biological phenomena are simulated during population generation. The most important ones are: selection‚ mutation‚ and cross-over. The simulated annealing (SA) algorithm is by far the most used stochastic optimization algorithm in contemporary literature in connection with layout generation. The reasons for
3.2 Simulated Annealing
23
its success are mainly: simplicity of implementation 1 ‚ incentives induced by results of previous approaches‚ its flexibility with respect to the type of problems and constraints it can handle‚ and last but not least‚ the conceptual similarity between the problem description and a (straightforward) equivalent formulation in terms of simulated annealing entities. As will be shown in Chapter 6‚ the SA algorithm is a very promising candidate for the layout generation problem. A separate section is dedicated to discussing general features of it.
3.2
Simulated Annealing
Simulated annealing (SA) is an algorithmic approach based on the thermodynamical annealing process. If a hot bath of crystalline material is lowered slowly enough in temperature‚ i.e. annealed‚ a perfect crystalline lattice is obtained. A perfect lattice with minimal stress is associated with a global minimal-energy optimum. Conceptually‚ the computational approach is as follows. The initial temperature is chosen high enough so that the end result will be independent of the initial state. Then the temperature is lowered slowly according to a specific cooling schedule. The cooling must be performed slowly enough‚ so as thermal equilibrium can be reached at each temperature. Small perturbations are generated and applied to each state‚ causing the system to jump to another state with a different associated energy. If the energy of the new state is smaller than the energy of the current state‚ then the new state is accepted. However‚ if the new state has a higher associated energy‚ it is accepted with a certain probability. This acceptance probability depends on the energy difference of the states and the current temperature of the system. This procedure is iterated until the temperature is low enough and the system is said to be frozen. An extensive body of research on simulated annealing or more generally‚ nonlinear stochastic global optimization is active in several scientific fields. To name just a few: statistical mechanics‚ computational biology‚ computer science‚ mathematics‚ electrical engineering‚ operations research‚ etc.
3.2.1
Basic SA Algorithm
The basic simulated annealing algorithm is shown in Figure 3.1. A quick skim through the algorithm directly reveils its simplicity and its generality. Each problem is represented by a set of states. The number of states can be infinite in theory‚ but due to the finiteness of computer representations the set is finite in practice. However‚ this is not a limitation. Essentially‚ the algorithm starts from a random initial state S and applies a perturbation to this state to find a new state on line 6. On line 7 the cost associated with state is computed. Then‚ on line 8‚ the acceptance test is performed. If the cost of the new state is lower2 than the cost of state S then state is unconditionally accepted. On the other hand‚ if the cost of state is higher than the cost of state S then state is accepted with probability
1 The concept is very simple to implement‚ but‚ admittedly‚ a robust implementation requires a large amount of effort. 2 The lower the cost‚ the better.
24
Optimization Methods
where and C are the costs associated with states and S‚ respectively‚ and T is the temperature of the system. The function of the temperature is as follows. When T is large (compared to a typical cost difference)‚ the right-hand-side of ( 3 . 1 ) is close to one. This implies that at high temperatures‚ cost increments are almost always accepted. When the temperature is decreased gradually‚ the impact of the cost difference will get more pronounced. Consequently‚ at low temperatures the right-hand-side of (3.1) will be close to zero for a typical cost increment. Thus‚ the probability of accepting the corresponding will be very small. On lines 13-17‚ the best state and its associated cost are stored for later reference. The temperature is lowered on line 18. Although the basic simulated annealing algorithm is simple in appearance‚ and has good practical performance‚ its internals are not well understood. Simulated annealing has attracted much attention because it treats every problem as a black box. Therefore a very large class of problems can be solved using SA. A few examples are: combinatioral problems [Aarts and Korst‚ 1989]‚ function optimization problems (Ingber‚ 1993]‚ and neural network optimization and training [Stepniewski and Keane‚ 1997‚Chalup and Maire‚ 1999]‚ Many adaptations of and extensions to the classical SA algorithm are known. The existing literature on this topic is too extensive to cover here. We only mention a few interesting concepts and approaches. In [ Boese and Kahng‚ 1994] Boese and Kahng observe that under finite-time conditions‚ the classical monotonically decreasing temperature schedule is not optimal when the best solution seen so far is the output of the SA algorithm‚ as opposed to the last solution seen (that is accepted). A recipe is given to derive a (near) optimal best-so-far temperature schedule. In [Cong et al.‚ 2000] Cong et al. propose to use a dynamic weighting Monte Carlo approach for floorplanning; they obtain promising results. The essential difference in their approach is an SA algorithm with a stochastic temperature schedule. In a
3.2 Simulated Annealing
25
general fashion‚ we can state that function decrease_temperature (T) should be replaced by adjust_temperature (T) in order to maximize the power of SA. The generality of SA comes at the cost of a large amount of computational resources that are required for practical problems. There are two ways to reduce the amount of computational resources. The first one is to minimize the number of iterations. This can be accomplished in various ways; by choosing a better representation of the problem‚ by modifying the cooling schedule‚ by choosing a better generation function‚ by finding more suitable perturbation operators‚ etc. Also a mixture of the aforementioned items is not unimaginable. Actually‚ it is not known how to minimize the number of iterations in an optimal way. Most approaches rely on intuitive notions. Altogether‚ a practical SA implementation is truly heuristic in nature. The second way is to reduce the computations within a single SA iteration to a minimum. This approach is taken in this book. To state this more clearly in a more abstract way: the computational complexity of a single SA iteration is taken as the performance measure. For more information on computational complexity matters‚ the reader is referred to [Cormen et al.‚ 1990]. A point worth noting is the fact that for practically all non-trivial problems‚ computing the cost associated with a new state is the most time consuming task. Therefore‚ it is of interest to investigate this part of the algorithm. The key ingredients for an SA algorithm are discussed next.
3.2.2
Problem Representation
In principle‚ every optimization problem can be formulated in terms of a (discrete or continuous) mathematical function Solving the problem is then equivalent to minimizing as a function of its argument For most real-life problems‚ is multidimensional and evaluating for a given might not be straightforward. Exactly how difficult it is to evaluate strongly depends on the problem representation; every representation gives a different Figure 3.2 gives a visual idea of how different representations give different functions (and solutions). The representation associated with function has a most irregular “landscape”. The global mimimum of is equal to the optimal solution of the original problem. A
better representation‚ in terms of smoothness of the landscape‚ is given by The global minimum of also coincides with the optimal solution Another representation
Optimization Methods
26
is even smoother‚ but its global minimum deviates from This representation might not be a proper candidate. Whether or not this is the case depends on the amplitude of the deviation Note that easy evaluation of does not imply that the landscape is smooth (and vice versa). Smoothness of the landscape is‚ among others‚ determined by the ordering defined on Consequently‚ for different orderings of the appearance of the landscape changes‚ but all global minima remain intact. Furthermore‚ for non-trivial problem instances we usually do not know an optimal solution. As a consequence‚ the de-facto standard way of benchmarking is comparing with best known solutions. Note that the function is also called the cost function in a simulated annealing context‚ and is called a state. The global optima of the cost landscape are solely defined by the cost function. Furthermore‚ the shape of the cost landscape is to a large extent determined by the set of perturbation operators and the generation function defined within the simulated annealing enviroment; they define the local optima [Otten and van Ginneken‚ 1989]. The previous statement can be explained by observing the fact that the perturbation operators and generation function determine whether or not a given state is optimum relative to its neighbors. In order words‚ the perturbation operators and generation function determine the neighbors of a state and thus an ordering on and consequently define all non-global local optima. Both of these ingredients will be discussed shortly.
3.2.3
Perturbation Operators
Simulated annealing is a sampling algorithm. It will (statistically) go into the direction with best samples. Perturbation operators are needed in order to provide a representative set of samples. Essentially‚ a perturbation operator changes the current solution into a new solution; the perturbed solution. A minimum set of perturbation operators is needed to guarantee reachability of every solution. Definition 1 (Complete Perturbation Set) If every state is reachable from an arbitrary initial state using a specific set of perturbation operators‚ then this set of perturbation operators is called complete‚ otherwise it is called incomplete. Thus‚ an important requirement for a perturbation set is completeness. However‚ completeness alone does not mean good convergence behavior of the simulated annealing algorithm. As pointed out by Otten and Van Ginneken [Otten and van Ginneken‚ 1989]‚ the global minima of the state space depend completely on the cost function‚ but the local minima also depend on the perturbation operators. As it is desirable to have as few and as shallow local minima as possible‚ a perturbation set which induces a smooth cost landscape would be ideal. Unfortunately‚ usually there is no way to determine just how smooth the cost landscape is going to be when a certain perturbation set is chosen. As a result‚ intuition and practical experience lead to a choice of perturbation operators in practical circumstances. Another requirement for the perturbation set is its computational efficiency. A complicated perturbation can severely impact the performance of a single iteration of the SA algorithm‚ and this in turn will increase the overall run-time of the optimization process. Thus‚
3.2 Simulated Annealing
27
simple perturbations which can be implemented easily and facilitate evaluation of the cost function from one iteration to another are favored.
3.2.4
Acceptance and Generation Functions
The acceptance function assigns a positive probability to a pair of cost values and the temperature value. In a general form it looks like
The acceptance function for standard SA is given by
which is also called the Metropolis criterion. This has been the standard way of defining the acceptance function since the introduction of simulated annealing for optimization [Kirkpatrick et al.‚1983]. The generation function generates a new state from the current state. In its simplest form‚ the generation of a new state is independent of the current state and the current temperature. In its most sophisticated appearance‚ many optimization parameters can be involved. For instance‚ the current state‚ some kind of estimation of an error gradient‚ the temperature‚ etc. [Ingber‚ 1989]. It should be noted that‚ typically‚ the choice of a certain generation function feature is based on heuristic grounds. Moreover‚ problem-dependent tuning is required in these cases. The generation function has been subject to many modifications. The exact appearance of this function is closely related to the representation (the state space) of the problem and the set of perturbation operators defined on it. It is loosely coupled with the cost function‚ and normally the generation function is not modified when the cost function is. An important requirement for the generation function is that it allows traversal of the entire state space. A good rule of thumb is‚ in a probabilistic sense‚ to allow traversal of the state space in a small number of steps at high temperatures‚ and to lessen reachability of states when the temperature decreases. Ultimately‚ the latter is similar to a local search strategy.
3.2.5
Temperature Schedule
The temperature schedule which determines the rate at which the system is cooled down‚ is a very important tuning parameter of the SA algorithm. A commonly used schedule has a decreasing exponential shape‚ like
where is a cooling constant which determines the rate of cooling‚ is an iteration index which can be associated with discrete time‚ and is an initial constant temperature. The simple temperature schedule of (3.4) ignores in principle all problem-related aspects such as the irregularity of the cost landscape and lacks solution quality awareness. As such‚ it is not very robust and generally it needs tuning for each problem instance in order to obtain acceptable results.
Optimization Methods
28
In standard simulated annealing [Kirkpatrick et al.‚ 1983]‚ the temperature can only be decreased according to a certain schedule. Other‚ more general‚ schedules exist but their general applicability is not known. It is worthwhile to note that non-monotone schedules are certainly worth investigating motivated by promising results on a few (small) problem instances [Boese and Kahng‚ 1994]. The only provably good temperature schedule‚ as yet‚ is due to Hajek [Hajek‚ 1988]. He proved that algorithm basic_simulated_annealing() is guaranteed (in a statistical sense) to find an optimal state ‚ i.e. one with minimal cost‚ when the cooling schedule has the following shape
where is the iteration index and is sufficiently large. Another often used temperature schedule is due to Huang and Sangiovanni-Vincentelli [Huang and Sangiovanni-Vincentelli‚ 1986]. In this scheme the temperature decrement is calculated such that the slope of the observed annealing curve follows an assumed ideal annealing curve in which the average cost of configurations decreases by an essentially constant amount measured against a ln(T) scale. The derived expression is
where is the standard deviation of the cost seen at temperature and is a positivevalued tuning factor which modifies the rate of cooling; with a typical value of 0.7.
3.2.6
Stop Criterion
Each algorithm is supposed to end‚ and return a final solution upon reaching that end. For the SA algorithm‚ the end condition or stop criterion has been implemented in numerous ways by various researchers. The most naive method to implement a stop criterion is to take a very small positive value for This will give in most cases an unnecessarily long running time of the SA algorithm. In some cases it might stop the algorithm prematurely while there is still a reasonable probability of finding a better solution. A more sophisticated method is to estimate the probability of solution improvement. This could‚ for instance‚ be performed by maintaining statistics on the number of generated states and the number of accepted states [Ingber‚ 1989]. Also‚ a threshold value can be incorporated for the maximum number of iterations in a stretch in which no improvement is obtained [Cohn et al.‚ 1994].
3.2.7
Cost Function
The cost function should formulate the properties of a problem in such a way that good properties are associated with low cost and bad properties are associated with high cost. Unfortunately‚ it is not trivial to define this cost function for typical VLSI problem instances. Finding a good cost function is an intuitive matter based on the penalty it imposes on the regularity of the cost landscape and the requirement of solving the right problem. The latter is a commonly neglected issue prevalent in many works [Kahng‚ 2000].
3.3 Concluding Remarks
3.3
29
Concluding Remarks
An overview is given of (global) optimization methods which are particularly well suited for a whole range of strongly nonlinear combinatorial problems. Based on the requirements for successful VLSI optimization‚ and reported experimental results on certain optimization approaches‚ simulated annealing appears to be a very promising algorithm to proceed with. Its flexibility and generality fits well the heterogeneous set of constraints that is involved with layout generation. Furthermore‚ from an algorithmic point of view‚ SA is in principle easy to implement. In practice‚ the SA algorithm does not always return a solution that is close to optimal. Almost all implementations suffer from getting trapped in a local optimum. To which extent this unwanted phenomenon occurs‚ seems to depend heavily on implementation quality and the smoothness of the cost landscape; the smoother the better. Since the smoothness is determined by the cost function‚ the problem representation and the perturbation operators‚ a great amount of attention is required to choose them well.
This page intentionally left blank
Chapter 4
Optimization Approach Based on Simulated Annealing From Chapter 3 it is clear that exact algorithms are not practical for NP-hard problems in the layout generation problem setting [Onodera et al.‚ 1991]. Thus‚ heuristics need to be used to tackle problems in this class. Stochastic heuristics which have been successfully applied to a broad range of problems in the VLSI domain are essentially based on the genetic algorithm or the simulated annealing algorithm‚ or even a combination of them [Sechen‚ 1988‚ Kruiskamp‚ 1996]. In specific cases‚ a simulated evolution approach might be preferable over simulated annealing‚ for example‚ when subcircuits‚ forming a subset of a pre-defined total set‚ need to be combined to conform with a given set of input specifications [Kruiskamp‚ 1996‚ Francken et al.‚ 2000]. In such a problem setting‚ small changes in a solution typically induce large changes in the target function. 1 The latter property is a serious concern in a simulated annealing approach because it is well known that an irregular cost landscape deteriorates convergence and the quality of the final solution. The simulated evolution approach does not suffer much from large differences in fitness function values when a certain solution undergoes changes‚ because a set of solutions is generated during each evolution step. The generation of several candidate solutions has an averaging effect‚ thus relaxing the requirement for a smooth target function. If‚ however‚ a neighborhood relationship that defines a reasonably smooth cost landscape is readily available‚ the simulated annealing approach is a most promising candidate for tackling layout generation problems. Also‚ no evidence is available in which a simulated annealing approach can be consistently outperformed by a simulated evolution approach [Ingber and Rosen‚ 1992]. We choose to use simulated annealing (SA) as the basis of our optimization framework. The underlying reasons for this choice are: the conceptual simplicity of the simulated annealing algorithm‚ the robustness of the simulated annealing algorithm‚ the versatility of simulated annealing with respect to the type and extensiveness of problems and their constraints that can be handled‚ 1 In connection with simulated evolution one should read fitness function‚ and in connection with simulated annealing one should read cost function. Moreover‚ maximizing fitness is equivalent to minimizing cost.
32
Optimization Approach Based on Simulated Annealing
the ease of formulating a layout problem in terms of simulated annealing ingredients‚ the reported effectivity of simulated annealing with respect to layout problems in current literature. In the next sections we explain how the layout generation concepts are integrated into the overall simulated annealing optimization framework. We present a flow in which concepts which will be clarified in later chapters‚ are briefly touched upon in order to facilitate explanation of the integration of these concepts within the global framework. The main concepts are placement (Chapter 6)‚ routing (Chapter 7)‚ and physical phenomena such as crosstalk‚ parasitics and process variations and their impact on layout generation (Chapter 8). The aforementioned concepts are formulated in a novel incremental approach which is one of the main contributions of this work.
4.1
Optimization Flow
The overall flow of information within the SA-based optimization framework is shown in Figure 4.1. For the sake of presentation clarity‚ we start with a global description of the items in the flow diagram. Then‚ in the succeeding sections the diagram items are explained in more detail‚ together with a justification of the choices made. The ellipses denote given or produced information‚ see also Figure 2.2. The rectangles denote actions that are taken. Furthermore‚ the diamonds indicate when a decision is taken. The shaded ellipses indicate that the information is (dynamically) computed inside the system and need not be supplied by an external source. The information in the shaded ellipses is very useful for monitoring optimization progress. It should be noted that the arrows in the flow diagram express a necessary flow of information. This does not mean that at certain stages in the flow no information from elsewhere is needed. To prevent cluttering the diagram with too much information‚ some of these arrows have been left out. The first concept which is clear from the diagram relates to the representation of a placement and how it is modified in order to find a good one eventually. What we mean by “good” is determined by our cost function. Obviously‚ a placement is computed by first generating a relative placement and then computing the actual absolute positions of all blocks. Given a placement‚ a global routing is computed during the next step. The global routing determines where wires run over the chip area in a global fashion. The main purpose of global routing is to aid in estimation of necessary routing space at an early stage in the process and to facilitate finding better detailed routing solutions eventually. Due to the fact that a typical placement of blocks contains unoccupied space‚ there is some margin left to shift blocks around in this so-called slack space. The definition of the slack space on a block-by-block basis‚ is called module expansion. We could see this as a virtual enlargement of the real block so that the amount of slack space is virtually minimized. It is intuitively clear that a local improvement is always possible depending on the amount of slack space that is available around a block. Module expansion is also necessary to allow for enough routing space around a block. In cases where the available slack space is not sufficient for routing purposes‚ a module (which contains the block) needs to be expanded in a clever way such that routing requirements will be satisfied. The next action to take is substrate coupling effect minimization. Depending on specific substrate coupling sensitivities and module noisiness properties‚ a local improve-
4.1 Optimization Flow
33
ment is computed so that the local impact of substrate coupling is decreased. After these steps‚ a detailed placement is obtained. At this point the annealing schedule comes into play. Depending on the current system temperature T and a certain stop criterion on the temperature it is determined to continue into the optimization loop or to proceed to the next step in the sequence which is the detailed routing step. If the choice is made to continue placement optimization‚ then subsequently the temperature T is adjusted according to a predefined temperature cooling schedule. Next‚ the cost function is evaluated and‚ depending on the outcome‚ the current placement is accepted or rejected.2 If it is rejected then the previous placement becomes the current placement and we proceed from this point. If the current placement is accepted then we perturb the placement (by perturbing the sequence pair place2
We assume that the optimization process yields only “yes” evaluations during the first iteration loop.
34
Optimization Approach Based on Simulated Annealing
ment representation) and compute a new placement. This loop it iterated until evaluates to false. Detailed routing is then performed‚ which is assumed to be possible by virtue of proper previous optimization steps‚ and a final layout is generated.
4.2 Problem Representation We already pointed out the relevance of adopting a computationally efficient means to describe a placement of blocks. Moreover‚ an efficient (global) routing representation is also important since placement and routing go hand-in-hand. These representations were implicitly used in the previous discussion of the simulated-annealing-based optimization framework. Consequently‚ lack of efficiency in either of the representations will have a severe detrimental impact on overall performance. In other words‚ the efficiency of problem representation has a significant impact on finding high-quality solutions w i t h i n a minimal amount of computing time. Hereafter‚ a global idea is given on how the problem representation is fitted and integrated into the optimization framework.
4.2.1
Placement
Based on reasons which are given in Chapter 6 we use the sequence-pair placement representation. Basically‚ we can state that the sequence pair (SP) representation fits well into an iterative optimization framework where (small) changes are applied to the placement during each iteration. Furthermore‚ the SP structure has advantageous properties in the context of mixed-signal layouts‚ such as a general non-slicing structure and low global sensitivity to small local changes. Also‚ important issues such as matching constraints [Balasa and Lampaert‚ 1999]‚ range constraints [Murata et al.‚ 1998]‚ boundary constraints [Tang and Wong‚ 2001]‚ and interconnect constraints‚ can be incorporated into a sequence pair formulation. Formally‚ an SP consists of a pair of sequences [Murata et al.‚ 1996]:
Every sequence is a permutation of the set of integers {0,1,2,..., M – 1}, where M is the number of modules to be placed. Consequently, the sequence pair solution space contains (M!) 2 elements. As a result, changing a placement comes down to changing the associated permutation. In the optimization framework, the placement of blocks is split into three parts: a relative placement part, an absolute placement part, and a detailed placement part. This way, the placement problem can be handled more efficiently at an abstract level while at the same time allowing a clear graphical interpretation of what is going on during optimization. The latter property leaves a window open for the designer to obtain insight into the procedure and tune the algorithms and (intermediate) results.
4.2.2
Routing
In order to handle complexity‚ the routing approach is split into separate steps. We adopt a two-step approach: a global routing step followed by a detailed routing step. The reasons for
4.2 Problem Representation
35
this choice are twofold.
1. It is too costly to compute the entire detailed routing information during each optimization iteration. Furthermore‚ it is intuitively clear that computing very detailed routing information is a waste of resources when the placement is not even close to being final. 2. It is doubtful whether a single-step approach can yield good solutions in a reasonable amount of time for larger problem instances‚ as contrasted with a two-step approach. Furthermore‚ a coarse first step can yield enough information to guide both the second refinement step and a possible local adjustment in placement if necessary. Global Routing From a placement‚ a set of rectilinear wires connecting all modules in a net can be computed for all nets. The accuracy and associated computational effort can be traded off against each other. Global routing serves two main purposes in layout generation‚ both of which are especially meaningful in the context of mixed-signal designs:
1. All modules should be connected in such a manner that all constraints on the pins in a net are met; otherwise the placement is inadequate. For the sake of simplicity‚ we assume that a Steiner minimal tree connecting all pins in a net implies adherancy to the previous condition. 2. Enough routing space should be reserved for detailed routing along the sides of the modules. At least the minimum amount of space can be computed using global wiring information in addition to pin‚ net‚ and design rule information. Furthermore‚ the requirement to minimize performance degradation due to crosstalk between adjacent wires belonging to different nets‚ increases minimum spacing. In almost all integrated placement and routing approaches‚ only the first item is considered. In virtually all cases a very crude global routing approach is taken. A de-facto standard routing estimation methodology is minimal bounding box (MBB)‚ or half-perimeter‚ routing. The main reason for doing so is ease of implementation. However‚ apart from the fact that a coarse routing yields‚ by definition‚ routing estimations with large deviations from an optimal routing solution‚ we also observe the following fundamental disadvantages: No (performance-driven) wire spacing and routing space estimation can be employed due to lack of spatial information. The coarse routing values might actually conflict with (near) optimal routing values in the sense that the former might indicate that a certain placement induces a better routing while it is actually worse. An important consequence is dramatical deterioration of optimization results and‚ likely‚ optimization convergence. As a result‚ an accurate global routing methodology is proposed here‚ based on sparse routing graphs and fast and efficient Steiner minimal tree approximation heuristics. Chapter 7 elaborates extensively on global routing.
36
Optimization Approach Based on Simulated Annealing
Detailed Routing When problem instances become large‚ it is infeasible for a single-step routing approach such as classical area routing to determine exactly the spatial properties of each wire for all the nets in one sweep. The problem needs to be made manageable somehow. A hierarchical or multistep approach is a common way to manage complexity. Within an iterative framework‚ a multi-step approach is particularly advantageous because of the fact that computation time can be reduced by early detection of low-quality placements for which no adequate routing can be found. Another advantage of a multi-step approach‚ which in our case resolves to a two-step approach consisting of global routing followed by detailed routing‚ is that detailed routing in itself is a very hard problem which can be mitigated by a priori obtained global routing information. Although a solution for detailed routing is not proposed in this book‚ its relevance to high-quality mixed-signal layout generation should be clear.
4.2.3 Substrate Coupling Parasitic coupling is a well-known phenomenon in circuit layouts‚ which is especially detrimental in mixed-signal designs. By virtue of accurate placement information inherent to the adopted representation‚ it is possible to reduce the effect of substrate coupling within a given placement. For this‚ we exploit the slack space around a module to shift the module in such a way that the negative coupling effect is reduced. This step can be performed without overhead in terms of computational complexity. Chapter 8 discusses this phenomenon and our approach in depth.
4.3 Perturbation Operators In order to search the solution space for high-quality members‚ so-called perturbation operators must be defined to change a given solution into another solution. The set of perturbation operators‚ which can in turn be built from primitive perturbation operators‚ together with a generation function determine the neighborhood of each solution. In mathematical terminology one can speak of a “state” to denote a solution. The neighborhood of each state is determined by the set of perturbation operators and the generation function. Herewith the state space is constructed. Furthermore‚ the cost function assigns a cost value to each state in the state space. Consequently‚ we can say that non-global local minima in the cost landscape are determined by the set of perturbation operators and the generation function. The latter decides on the magnitude and/or type of a perturbation [Otten and van Ginneken‚ 1989‚ Chapter 10]. The global minima are‚ of course‚ solely set by the cost function. In order to allow stochastic search algorithms to sample the solution space efficiently and effectively‚ we should take care of appropriately chosen perturbations. As Otten and Van Ginneken [Otten and van Ginneken‚ 1989] already noted‚ the diameter of the search space should be made sufficiently small in order to reach every solution quickly at high temperatures. In practice this means that (initial) perturbation amplitudes should be sufficiently large. However‚ large perturbations imply large deviations in the cost function value and thus a very irregular cost “landscape”. In order to decrease the cost deviations‚ at appropriate times the solution space should be sampled smoothly‚ too. As a consequence‚ at lower temperatures the
4.3 Perturbation Operators
37
perturbation amplitudes and possibly the type of perturbations should be adapted to comply to this requirement. We define the following primitive perturbation operators. P1.
interchange elements
and
in sequence
P2.
interchange elements
and
in sequence
P3. P4. P5.
interchange elements
and
in both sequences
rotate element over angle mirror element with respect to the
and
in clockwise direction. or
Perturbation operators P1‚ P2‚ and P3‚ form a complete set in the sense that within a finite number of steps‚ any sequence-pair configuration can be obtained from an arbitrary starting solution. We state this more precisely. From permutation theory we know that every sequence‚ which is a permutation of M elements‚ can be written as a product of disjunct 2cycles which we call a swap. Furthermore‚ this product is unique except for the order of the swaps. Lemma 1 Given two arbitrary permutations S and of all elements in Exactly M – 1 swaps are needed to go from configuration worst case.
(and vice versa)‚ in the
Proof The sufficiency condition follows from the fact that there is a swap that will put at least one element in the right place. That element is left untouched afterwards. Furthermore‚ the last swap will necessarily put the last two elements in place. Thus‚ we never need more than M – 1 swaps. The necessity condition follows from inductive reasoning. It is easy to see that
holds and is minimal, i.e. the left-hand side cannot be represented with less than two swaps in the worst case, and these swaps are unique. Here represents concatenation, and is a swap of elements and Let
be a minimum-swaps representation of a permutation with elements. For ously holds. For a permutation with elements we can write
this obvi-
Substituting (4.1) in (4.2) we get
Thus‚ for a (worst-case) permutation of elements we need at least swaps. Therefore‚ for a permutation of M elements we need M – 1 swaps‚ in the worst case.
38
Optimization Approach Based on Simulated Annealing
Theorem 1 From a given arbitrary sequence pair we can create any other sequence pair using at most 2M – 2 perturbations from the perturbation set {P1,P2, P3}.
Proof Applying perturbation P1 (P2) on sequence guarantees finding within M – 1 swaps with the aid of Lemma 1. In principle‚ perturbation operator P3 is redundant since it is a concatenation of P1 and P2. However‚ P3 is an intuitively attractive perturbation operator which is fully symmetrical. Moreover‚ it helps reducing the diameter of the search space because typically the number of required swaps is lessened with the addition of P3. Perturbations P4 and P5 do not change the sequence pair. Perturbation P4‚ however‚ influences the absolute location of modules‚ while P5’s only purpose is for minimizing wire length.
4.4 Acceptance and Generation Functions The acceptance function that is used within the proposed simulated annealing framework is the standard Metropolis criterion given by (3.3)‚ where the probability of accepting a new solution is based on the difference in cost with the previous solution and the annealing temperature. More exactly‚ when the new cost is lower than the previous cost‚ the new solution is unconditionally accepted. In the case where the new cost is higher than the previous cost‚ the probability of acceptance becomes increasingly smaller with higher cost difference and lower temperature. A typical realization of this would look like
where random(0‚1) is a random number generator which generates a real value between 0 and 1 with a uniform distribution. As a consequence‚ at high temperatures almost all cost increases are accepted‚ effectively turning the algorithm into a random walk. At low temperatures mostly only cost decreases are accepted. The generation function (also called the selection function) is taken to be the identity function in the proposed framework‚ adopting the standard approach that is suggested in [Otten and van Ginneken‚ 1989]. Although no attention is given to fine-tuning this parameter‚ it should be noted that its impact may be significant as it embodies the effective solution space sampling behavior. In other words‚ the generation of moves which are going to be rejected with high probability is inefficient‚ thus avoiding such generation is efficient. Of course‚ this is only practically effective when such moves can be identified relatively quickly‚ for instance by means of a distance association.
4.5
Temperature Schedule
An adaptive temperature schedule is used which is controlled in such a way that the annealing process stays in quasi-equilibrium and yet converges as quickly as possible to a global optimum. This approach is entirely adopted from Otten and Van Ginneken [Otten and van
4.6 Stop Criterion
39
Ginneken‚ 1989]. The decrements of the temperature are chosen in such a way that the steps do not disturb the equilibrium density too much. Although the approach taken is justifiable and gives good results in practice‚ it is‚ however‚ still an active field of research in statistics to estimate the equilibrium density of an inhomogeneous Markov chain in order to determine how close we have come to an equilibrium distribution [Diaconis and Stroock‚ 1991]. The interested reader is referred to [Liu‚ 2001] for an excellent recent overview‚ providing more insight into this field. The essential points in a temperature schedule are: initial temperature The initial temperature should be high enough in order to guarantee independence of the final solution with respect to an initial solution. temperature decrement The temperature decrement can be deterministic or stochastic. As yet it is unknown what type of temperature change yields best results. final temperature The final temperature can be fixed or dynamically computed as a function of several optimization parameters such as the estimated standard deviation of the cost. In our optimization framework we adopted the strategy of Otten and Van Ginneken [Otten and van Ginneken‚ 1989‚ Chapters 8 and 11].
4.6
Stop Criterion
The stop criterion which is adopted in our framework is tacitly taken from [Otten and van Ginneken‚ 1989]. Although it is noted in [Otten and van Ginneken‚ 1989] that the observations should be applied with care in a general fashion‚ i.e. to arbitrary problem instances‚ we did not spend any effort on verification. In order to expect good and robust performance we should assert all assumptions and observations in our problem setting. On the other hand‚ it cannot be denied that building a robust simulated annealing implementation appears to be an art [Ingber‚ 1993]. Therefore‚ we accept some degradation in performance by not complying to all requirements. This is justified in two respects: in the light of the knowledge that it is always possible to improve performance by tuning‚ and by virtue of the fact that our goal is to demonstrate feasibility of concepts.
4.7
Cost Function
The cost function that is used in our simulated annealing optimization framework is
where and are user-specified weight factors between 0 and 1 that determine the relative importance of each term in the cost function. The normalization constants and are
40
Optimization Approach Based on Simulated Annealing
determined in such a way that the weight factors have equal importance. Furthermore‚ CA stands for chip area‚ WL stands for wire length‚ and CI stands for coupling impact. The cost function given by (4.3) is a generalization of the de-facto standard cost function used in literature‚ where all normalization constants are typically set to unity.
4.7.1
Implicit Cost Evaluation
It is well known that the terms is the aforementioned cost function typically conflict. Consequently‚ the optimization task is hampered by repelling forces‚ eventually leading to longer computation times and worse solutions. For this reason‚ it would be intuitively better to use a single cost term in which no inherent conflict is apparent‚ and which captures the essence of the designer’s specifications. For instance‚ this could be accomplished by taking only the total wire length into account‚ since short wire length normally means that blocks are placed close together. In cases where various placements of blocks exist with the same total wire length‚ the placement with smallest chip area should be taken. This could be accomplished by using an additional CA term with a small weight value Another way is to translate wire length into wire area and implicitly incorporate this into the total chip area by expanding the modules in the placement. This approach is already quite sophisticated. For the sake of comparability with other published results‚ we will adhere to the general approach in which essentially both chip area and wire length are taken into account. However‚ experimental results with a single cost term are also given in Chapter 7.
4.8
Concluding Remarks
We gave an overview of the components that are used in our optimization framework for generating a mixed-signal layout. The integration of three main‚ strongly coupled‚ aspects was discussed: efficient placement representation‚ efficient global routing computation‚ and efficient substrate coupling impact minimization. However‚ we note that these ingredients are not sufficient to generate a complete mixed-signal layout. One of the missing components is detailed routing. Nevertheless‚ it should be clear from our setup that all requirements for proper mixed-signal layout generation can be complied with in our approach without encountering fundamental obstacles.
Chapter 5
Efficient Algorithms and Data Structures Automation of processes cannot be kept separately from computer algorithms. Even stronger, algorithms form an essential key element of any CAD tool. The efficiency of an algorithm has a direct impact on the performance of a CAD tool. Although it is not always clear from higher-level algorithmic descriptions, data structures are essential for manipulating data in an algorithmic environment. Data structures become especially important during the implementation phase of an algorithm; they can make or break the practical usefulness of an algorithm. In this chapter we discuss data structures which are relevant in the context of mixed-signal layout generation. The intention is to present a non-exhaustive but representative set of tools which can be used to design efficient algorithms, and consequently an efficient overall system. The emphasis will be put on dynamic algorithms, that is, algorithms that can efficiently deal with continuously changing information over time, both in terms of required memory space and required computation time. It is clear that dynamic algorithms play a central role in CAD tools in general, with mixed-signal layout generation as a special case. Before going into detail on efficient algorithms and data structures, a few definitions are in place. Definition 2 (Algorithm) An algorithm is any well-defined computational procedure that takes some input, or a set of inputs, and produces some output, or a set of outputs. The mapping task of an algorithm is performed using sets of elements in which the data is represented and operations that are defined on these sets. Unlike static sets which are used in mathematics, the sets which are used in and are fundamental to computer science are highly dynamic. That is, the sets can grow, shrink or otherwise change over time. Definition 3 (Data Structure) The representation of a finite dynamic set of elements in combination with operations defined on this set, is called a data structure. An implementation of an algorithm with a certain data structure is called a program. Important features of an efficient algorithm are: finiteness: the algorithm stops after a finite amount of steps; correctness: the output of the algorithm complies with the pre-specificd post-condition; efficiency: the number of primitive computer operations used to accomplish the desired mapping is as small as possible (within the limitations of the employed data structures).
Efficient Algorithms and Data Structures
42
5.1
Computational Model
We shall assume a generic one-processor random-access machine (RAM) model of computation as our implementation technology and understand that our algorithms will be implemented as computer programs. In the RAM model, instructions are executed one after another, with no concurrent operations. The main performance measures we will be concerned with are run time and storage space, with unit-cost measure. That is, every operation and every storage element has unit cost. Time is measured by counting the number of executed instructions, and space is measured by counting the number of used memory cells. Each memory cell can hold an arbitrarily large number. Usually, it is very hard or impossible to compute the exact amount of run time needed for a given input for an algorithm. This also applies to computing the exact storage space. Therefore, the performance measures are blurred to make them easier to estimate. Instead of determining the exact performance of an algorithm, the worst-case performance of an algorithm is determined. Definition 4 (Worst-case Performance) The worst-case run time (space) performance of an algorithm is the maximum number of time (memory) units the algorithm needs to process an input I, relative to the input size |I|, for all possible inputs of size |I|. The input represents the problem instance. Hence, the input size is synonymous to the problem instance size. Definition 5 (Problem Instance Size) The size of a problem instance is equal to the minimum number of information elements needed to describe that problem in a specific representation.
5.2
Asymptotic Analysis
Although worst-case performance evaluation might be easier than determining the exact performance of an algorithm, in most cases the former is still non-trivial. The introduction of a so-called “Big Oh” operator is very convenient here. It was originally introduced by Paul Bachmann in 1894 for asymptotic analysis, but it is a de-facto standard in computer science nowadays [Graham et al., 1989]. Dealing with the operator is especially interesting when the problem instance size becomes large. In those cases the approximation error is negligible. On the other hand, when the size of the problem instance at hand is relatively small, then the error may become unacceptably large. In cases where we may apply its beauty becomes apparent in the fact that it suppresses unimportant detail and emphasizes salient features. Essentially, the denotes a set of functions. Formally, for a given function we denote by the set of functions
Similarly, the
(“Big Omega”) and
(“Big Theta”) are defined:
5.3 Computational Complexity
43
In words, the above relations indicate that if is bounded from above by multiplied by a suitable constant, when is sufficiently large. Furthermore, if is bounded from below by multiplied by another suitably chosen constant, for sufficiently large It is not difficult to see that for any two functions and if and only if and Graphic examples are shown in Figure 5.1. Note the abuse of the equality sign to denote member of a set. It is a standard convention in asymptotic analysis.
5.3
Computational Complexity
For the sake of comparison of algorithmic performance we want to know how good (or bad) a certain type of algorithm using specific data structures will perform. Asymptotic analysis is an excellent mathematical tool to measure algorithmic performance. This type of analysis in an algorithmic context is also called computational complexity analysis. We already mentioned worst-case analysis. In practice we can distinguish three types of analyses: worst-case analysis average-case analysis amortized analysis Worst-case analysis is by far the most used analysis approach. An important reason for using worst-case analysis is that the occurrence of a problem instance that induces worst-case behavior, might be disastrous. Maybe an even more important reason is the fact that worstcase analysis is typically far more easier to perform than other types of analyses. However, if the worst-case situation does not occur often, the analysis results might be deviating severely from a more elaborate type of analysis. Average-case analysis is concerned with the average computational complexity of an algorithm for a specific set of inputs. This type of analysis is most accurate, but also very difficult to perform in practice. Moreover, the analysis results depend on the assumed distribution of input problem instances. To simplify the analysis, often a uniform input distribution is chosen. Unfortunately, this may not always be a good assumption. In amortized analysis, the time required to perform a sequence of data structure operations is averaged over all the operations performed. This type of analysis can be used to show that
Efficient Algorithms and Data Structures
44
the average cost of an operation is small if one averages over a sequence of operations, even though a single operation might be expensive. Amortized analysis differs from average-case analysis in that probability is not involved in the sense that no assumptions about the input distribution are made; there is only averaging over time. The averaging occurs over a worstcase sequence of operations. For more information, we refer the reader to [Cormen et al., 1990].
5.4
Data Structures for CAD
The performance of an algorithm strongly correlates with the performance of the data structures that it uses to perform its task. It is therefore necessary to choose the proper data structures for a given algorithm with a specific functionality. Requirements for advanced data structures in the context of complex algorithms, which in turn consist of smaller algorithms, are flexibility and efficiency. Flexibility is needed to enable the usability of the data structure for a variety of operations. Efficiency is needed to guarantee low computational complexity implying that the algorithm allows for substantial scaling. Moreover, high-performance data structures should also feature small constant factors which are hidden in their complexity measure. For example, in terms of computational complexity an algorithm requiring 100 · M time units and an algorithm requiring 2 · M time units to perform the same task, would look identical in terms of the operator: However, in practice the latter algorithm will be substantially faster than the former. On a more practical level, a data structure should also be not too complex so that it becomes overly difficult to implement it correctly. Of course, this depends on the gain in computational complexity and the range of problem instance sizes that will be used. In this section we will present a few interesting data structures which are especially interesting for CAD applications because of their inherent manner of representing data.
5.4.1
Corner Stitching
Corner stitching is a data structure for representing rectangles, or objects that can be segmented into rectangles, in a two-dimensional plane. It was introduced by Ousterhout [Ousterhout, 1984] for the purpose of visually representing layouts that can be modified interactively. The strength of the corner stitching data structure lies in the facts that it is conceptually simple, and that there is large number of operations that can be performed efficiently on it. Each rectangle is represented by its lower left corner coordinate and its width and height Additionally, there are four so-called stitches to neighboring rectangles; one for each direction (up, right, left, and down). Figure 5.2 shows a basic corner stitching rectangle including all four corner stitches. The name is due to the resemblance with a patched cloth. The following operations are defined on the corner stitching data structure [Ousterhout, 1984]: point finding: return the rectangle in which a given
is located;
neighbor finding: return all rectangles that touch a given side of a given rectangle; insert rectangle: insert a rectangle of given width and height at a given
location;
5.4 Data Structures for CAD
delete rectangle: delete a rectangle from a given
45
position;
area search: check if there are any rectangles of a certain type in a given area; area enumerate: enumerate all rectangles of a certain certain type in a given area. A striking feature of the corner stitching data structure is its ability to represent both empty and non-empty regions in the plane. This notion can be generalized to more than two types of rectangles if necessary, without incurring any performance loss in terms of computational complexity. In fact, the corner stitching data structure is a generalization of the doublylinked list data structure to two dimensions, where each list item covers a part of the plane. Figure 5.3 shows an example of a set of rectangles in the plane represented by the corner stitching data structure. Notice the white area which represents unoccupied space, whereas the shaded area represents occupied area. The corner stitches are shown explicitly in the rectangular dashed-outline region. When browsing through the data structure, these corner stitches are used to go to a neighboring rectangle. From this figure it is also clear that examining physically close rectangles is a local operation which can be performed very fast. Another feature of the corner stitching data structure is a property called “maximally hori-
zontal empty tile”. This means that an empty tile is always maximally extended in horizontal direction, which is also shown in Figure 5.3 where the white unoccupied area is split into maximally horizontal (empty) rectangles.
46
Efficient Algorithms and Data Structures
The corner stitching data structure performs well in practice due to its relatively simple structure. However, its implementation requires a lot of care to avoid some tricky pitfalls. Its actual performance can vary quite a lot. For example, inserting a set of rectangles in the plane will be performed faster if less segmentation of the plane is induced around a rectangle during insertion. Thus, the order of insertion plays a role. Typically it is more advantageous to place the larger rectangles first and then the smaller ones. The rationale behind this is that large rectangles can shield a larger portion of the plane from other parts so that less interaction is required. The reader is referred to [Ousterhout, 1984, Sherwani, 1993] for more information. We conclude by giving Table 5.1 of relevant corner stitching operations. From this table we can
see that a few operations can typically be performed in constant lime, independent of the number of items already inserted into the data structure. Although most operations have a worst-case complexity of where is the total momentaneous number of rectangles in the plane, this occurs seldom in practice. Normally, searching for a certain module in the data structure requires a considerable amount of effort; on average. Typically, the actual number of relevant rectangles is equal to
It is also shown in the table that a hint, which is an auxiliary pointer to some object (empty or non-empty) in the data structure, can significantly improve average computational complexity. Of course, the strength of a hint is actually unleashed when it is chosen in such a way that it provides maximum gain. In practice this means that computing a hint should be performed much more efficiently than the average complexity of the operation without a hint. A worst-case sequence of operations will provide more insight on the issue where the break-even point lies. If we know in advance the (approximate) maximum number of objects which are going to be stored within the corner stitching data structure, a hash table can be used to improve some of the average complexities without a hint. The approach is as follows. Create a hash table which can store the objects indexed by their, say, bottom-left coordinates. As this implies each existing object can be found in constant time, the operations that involve a specific object to be found prior to performing the actual operation, can be decreased in complexity.
5.4 Data Structures for CAD
5.4.2
47
Linked List
The list-based data structure is well-known and covered in every textbook on data structures. However, for completeness we will discuss this common type of data structure briefly. The simplest list-based data structures is the singly-linked list where data items are linked to each other in a sequential one-directional fashion. A somewhat more sophisticated list is the doubly-linked list where the data items are connected to each other in a bi-directional way. Conventional lists are useful when dynamic sets need to be maintained, and the primary operations are insertion of an element and deletion of an element. Also, enumeration of all elements in the set can be performed efficiently. Lists are not efficient when a specific element needs to be looked up in the set because every element before that element in the list has to be looked at. For the lookup operation we need time in the worst case. Unfortunately, this is also the average-case computational complexity. Table 5.2 contains computational complexities of list operations. Note
that deleting an arbitrary item requires a operation before the actual deletion. Only deleting an item with a known location, for instance at the head or tail of a list, can be done in constant time. The same holds for inserting an arbitrary item at the head or tail of the list. Note that operations on items with a known location in the list, can be performed in constant time with the aid of a hash table. Of course, this approach is only useful when the maximum number of items in the list can be estimated beforehand and this number is much smaller than the universe of storable items. Recently, a more powerful variant of the list-based data structure has been introduced by Pugh [Pugh, 1990]. It is called skip list and it was proposed as an efficient alternative to balanced trees. The key ingredients in a skip list are: a logarithmic number of levels containing data items and a probabilistic approach to skip pointers. Skip lists appear to have very good performance in practice and can do whatever a balanced tree can do, and that at least as fast. Where balanced trees become inefficient when objects are frequently inserted and deleted from the set, skip lists take over by avoiding expensive re-balancing operations after each modification of the set. Last but not least, skip lists are easy to implement. Figure 5.4 shows the basic notion behind the skip list data structure. In Figure 5.4(a) a conventional linked list is shown. In order to reduce searching time, an additional pointer is introduced with every other object. Each such pointer skips one object. The result is that searching time is reduced by half. This idea can be applied to every fourth every eighth every sixteenth pointer, and so on. Generally the maximum number of pointer levels is chosen It is now clear that each element can be found in time using classical binary search principles. However, inserting or deleting an item, while maintaining the skip list properties, can be very awkward. This problem is solved by Pugh using a probabilistic
48
Efficient Algorithms and Data Structures
approach. The skip list data structure can still degenerate into a linked list, but that probability is utterly small for any reasonable size of
Table 5.3 shows the computational complexities of skip list operations. In a development environment, however, it may be desirable to exactly reproduce results or to compare results after only one specific setting has been changed. With a probabilistic data structure it might be troublesome to judge the impact of a change when also the data structure performance changes. Therefore, a deterministic algorithm might be preferable under these circumstances.
5.4.3 Splay Tree Binary search trees are well-known representations for information. An important feature of a binary search tree is the fact that an order relationship holds true for all nodes in each subtree of the tree. More specifically, all nodes left of a node have a value smaller than the value associated with node and all nodes right of node have a value larger than the value of An example is shown in Figure 5.5(a). Note that A < B < C < . . . < Z. In order to facilitate presentation, we will adopt the examples and tree structure used by the inventors Sleator and Tarjan of the splay tree data structure [Sleator and Tarjan, 1985], which
5.4 Data Structures for CAD
49
is a special type of binary tree data structure. An equivalent full tree representation of the binary search tree in Figure 5.5(a) is shown in Figure 5.5(b) in which each internal node has exactly two child nodes. The leaf nodes are drawn as triangles in this figure. The tree at the left side is obtained directly by contracting the leaf nodes and internal nodes as shown at the right side of the figure. When objects are inserted and deleted randomly, binary search tree performance is unmatched. However, if objects are inserted in order, the binary-tree structure degenerates into a linked list, and performance plummets. A great amount of work has been spent on finding tree-balancing algorithms and techniques to overcome the effect of degeneration. The result is a colorful set of balanced binary tree algorithms: B-tree, AVL trees, red-black trees, randomized binary trees, splay trees, and many more. The splay tree data structure is a very efficient data structure in that it has amortized computational complexity per operation, where the time per operation is averaged over a worst-case sequence of operations. Essentially, each splaying operation, which is a simple restructuring heuristic, resembles a move-to-front technique of the splayed item plus a shortening of the height of the current tree. Exactly three different splaying cases can occur. These cases are shown in Figure 5.6. To splay a tree at a node we repeat the aforementioned primitive splaying operations until is the root of the tree. Splaying a node at depth takes time [Sleator and Tarjan, 1985], that is, time proportional to the time to access node Splaying not only moves x to the root, but roughly halves the depth of every node along the access path. This halving effect makes splaying efficient. Note that splaying, and consequently a splay tree, is fully deterministic. It is clear that under some conditions of access, insertion, deletion probabilities over the universe of elements, the splay tree data structure can perform substantially better than in the worst case. From experiments [Rönngren and Ayani, 1997] and experience in the field, especially with respect to randomized binary search trees [Martínez and Roura, 1998], splay trees typically outperform other balanced tree implementations. Therefore, we have chosen splay trees as our primary balanced tree data structure for implementation. Table 5.4 shows the computational complexities of splay tree operations.
50
Efficient Algorithms and Data Structures
5.4.4 Hash Table Applications that require a dynamic set that supports only the dictionary operations insert, delete, find, and enumerate, could employ a data structure such as the hash table. A hash
5.4 Data Structures for CAD
51
table is an effective data structure for implementing dictionaries. Although searching for an element in a hash table can take as long as searching for an element in a linked list, i.e. time in the worst case with the size of the table, in practice, hashing performs extremely well. Under reasonable assumptions, the expected time to search for an element in a hash table is In fact, a hash table is a generalization of an ordinary array in which direct addressing is performed in a clever way. A hash table becomes especially interesting if the number of keys to be stored at any time moment is small compared to the size of the key space. Instead of using the key directly to access a position in the array, the array index is computed from the key, which is called hashing. This way, the size of the array can be kept proportional to the number of keys instead of the size of the key space, as is the case for ordinary array storage. Figure 5.7 graphically shows the principle of hashing. Keys from the universe of keys U are mapped to the arrary T using a hash function with Due to the fact that the size of T is much smaller than the size of U, and the hash function is not perfect in the sense that it does not know in advance which keys from U are going to be stored in T, collisions can occur. That is, some keys will be mapped to the same slot position in T. An efficient way to resolve collisions is by means of chaining, i.e. keeping colliding keys in a list. For the shown example, keys and collide and are chained.
The essential elements for an efficient hash table implementation are: the hash function, and the capacity handling of the hash table. A hash function
is said to hash an element where U denotes the key space, to slot in the hash table. Since hashing is performed during every hash table operation, the hash function needs to evaluate quickly and have good distributing properties. Knowledge on the probability distribution of the input elements will facilitate the construction of a good hash function. Typically, heuristic techniques are employed for this purpose. In the case where not much is known about the input distribution, except for the fact that it is quite unpredictable, general approaches can be taken. A common technique is the division method, yielding for instance
Efficient Algorithms and Data Structures
52
where should preferably be a prime number at least as large as the number of slots in the table, and not too close to exact powers of 2. Instead of mapping a single key, it is also easy to map a pair of keys which can also be interpreted as a point in the plane, into a hash table. The notion of double hashing fits as a glove in this respect. In double hashing the hash function is
and
where is prime, is a positive integer smaller than for instance Because generally we do not know which elements of U are going to be stored in the hash table, by definition so-called collisions will occur. An effective way to handle collisions is by means of chaining. The chaining principle essentially turns each slot in the hash table into a linked list. If the size of the hash table is well-chosen, the expected length of a chain is very small and does not depend on Regardless of the fact whether or not collision resolution is employed, the number of slots in a hash table needs to be large enough to avoid deterioration of performance. If the set of elements that is going to be stored, is known in advance, a so-called perfect hash function can be computed which guarantees a one-to-one mapping in the hash table. Of course this is a trade-off between performance gain by avoiding collisions and effort needed to compute a perfect hash function. Table 5.5 shows the computational complexities of hashing operations. Collision resolution by chaining is especially attractive when the number of keys in the hash table approaches the number of slots in the hash table; inserting a new key always lakes However, when the number of keys in the hash table grows significantly larger than the hash table size, then finding and deleting a key is performed proportionally slower. If a truly dynamic set has to
be maintained and the number of items in the set is unknown in advance, then a hash table is likely not the best choice.
5.4.5
Priority Queue
In many applications we need to maintain a set S of elements that changes dynamically over time. Each element has an associated value called a key. Furthermore, typically the following
5.5 Concluding Remarks
53
operations are required: which inserts element x into the set S, minimum(S) which returns the element of S with smallest key, extract_min(S) which removes and returns the element of S with the smallest key. A data structure with the aforementioned properties is called a priority queue. One application of priority queues is to schedule jobs on a shared computer. It also has great utility in VLSI design problems. Most (practical) implementations of efficient priority queues have amortized computational complexity bounded by where is the momentaneous number of elements in the queue.
5.4.6
Other Advanced Data Structures
Research on data structures is a very active field. Therefore, it is sheer impossible to present all state-of-the-art works in a dissertation that focuses on the combination of highperformance data structures and electronic design automation. One reason is the fact that a good data structure in theory does not need to imply a good data structure in practice, and vice versa. Since we want to have the best of both, we have to settle with data structures that have been thoroughly investigated both in theory and practice, so we can rely on them and use them as building blocks. Another reason is the fact that research time is limited. As a consequence, a trade-off must be made between application-specific data structures (optimal for a smaller range of applications, but with higher performance) and more generic data structures (usable for a broader range of applications, but with lesser performance). The interested reader might consult [Cormen et al., 1990] for a good overview of other advanced data structures such as Fibonacci heaps and red-black trees. Also, AVL trees [Adel’son-Vel’skii and Landis, 1962] are worth mentioning. Last but not least, the Van Emde Boas data structure, also known as a stratified tree, is an extended priority queue that has unmatched performance. A fundamental limitation is the restriction that the universe of keys is the set {1,2,...,N} [van Emde Boas, 1975,Mehlhorn and Näher, 1990]. A significant improvement with respect to storage requirements was proposed by Mehlhorn and Näher [Mehlhorn and Näher, 1990], improving the previous space bound to with the momentaneous number of elements in the tree.
5.5
Concluding Remarks
An overview is given of advanced data structures which can be successfully used in CAD tools; they are especially interesting, as will be clear from later chapters, in connection with mixed-signal layout generation. Based on specific properties in terms of computational complexity, a data structure (or a combination of data structures) can be selected to perform a specific task with as low computational complexity as possible. Practical considerations such as performance for typical instances and implementation complexity are also important points to consider.
This page intentionally left blank
Chapter 6
Placement When a circuit has been designed in terms of a netlist connecting (properly sized) building blocks, the layout phase is next to follow. This part of the design cycle is called physical design and for contemporary mixed-signal designs this phase is becoming increasingly more important. In fact, it is a dominant limiting performance factor of any state-of-the-art integrated circuit. Two important issues in physical design are placement and routing. This chapter focuses on the placement problem. First we define the placement problem. Then we give an overview of several approaches to solve the placement problem. Based on our requirements on placement quality and on placement-related issues such as substrate coupling and matching, a choice is made regarding the approach for tackling the placement problem. We will elaborate on an efficient placement representation, which is known as the sequence-pair structure. Its theoretical properties are discussed in detail. Moreover, we unify new findings with known theories and algorithms. Theoretical fundamental lower limits on computational complexity are given with respect to state-of-the-art approaches to placement computation using the sequence pair representation. Motivated by promising theoretical results, an incremental placement computation approach is devised which has very attractive features in a simulated annealing optimization environment. Experimental results are shown to demonstrate the effectiveness and efficiency of the incremental approach. We proceed by discussing an important extension to standard placement, which is constrained module placement in which modules can be constrained to a prescribed location in the plane or forced to be placed at one of the chip boundaries. An improved robust approach is proposed and its effectiveness and superiority over latest published works is demonstrated by experiments. Let us first specify more exactly what is meant by a placement. Definition 6 (Placement) A set of given rectangular blocks which are placed in a two-dimensional plane, is called a placement. Since no restrictions are put on possible overlap of blocks, clearly not every placement is practical. Therefore, a feasible placement is defined here as follows. Definition 7 (Feasible Placement) A placement in which no overlap of blocks occurs, is called a feasible placement, otherwise it is called infeasible.
Placement
56
The blocks that are used in a placement are normally of fixed size, but it is also possible to take blocks with flexible sizes. Those flexible blocks, also called soft blocks, can be taken from a given set of candidate blocks, under an aspect ratio constraint, or constrained by some other mathematical function. Here, the placement problem is defined as follows. 1 Problem: The placement problem Instance:
Solutions: Minimize:
A set of blocks of given sizes. A set of pins of which a subset is at the circumference of each of the blocks, representing the connectivity information between the blocks. An objective function which, for instance, captures the total length of interconnecting wires and/or the area of the smallest enclosing rectangle around all blocks. All feasible placements, with all possible orientations of the blocks.
The classical term “floorplanning” is strongly related to placement in that it also deals with placement of objects. Only, the approach of floorplanning is different, because it divides the two-dimensional plane into rooms which are big enough to hold all (flexible) objects. This way, overlap is avoided by construction. Moreover, empty area is not explicitly represented by a floorplan. Before proceeding, let us define a floorplan. Definition 8 (Floorplan) A floorplan is a data structure that captures the relative positions of non-overlapping objects that fully cover a certain rectangle in the 2-dimensional plane. The above definition is a sensible special case, in the current context, of the general definition which was re-coined by Otten recently [Otten, 2000]. Consequently, This notion of a floorplan is similar to relative topological placement representations which can be found in many recent works, e.g. [Murata et al., 1995, Guo et al., 1999, Nakatake et al., 1996, Takahashi, 2000]. In this respect, floorplanning can be compared with feasible placement computation using a topological placement representation. The main difference is that typical (feasible) placement computation deals with fixed-size blocks. When instead of fixed-size blocks, variable-size blocks (also called soft blocks) are used, the placement problem is generalized into a floorplanning problem. The result of a floorplanning phase is a sized floorplan. The latter is defined as follows. Definition 9 (Sized Floorplan) A sized floorplan is a floorplan in which each room contains exactly one block, and the block is not larger than the room. Note that the word “floorplan”, instead of “sized floorplan”, is also used in literature to denote the result of a placement phase which contains absolute position and size information. Hereafter, the term “placement” is used to denote a sized floorplan, even in conjunction with soft blocks.
57
Problem: The floorplanning problem Instance:
Solutions:
A set of flexible blocks, and a sizing (or shape) function that selects a shape alternative for each block. An object function which, for instance, captures the total length of interconnecting wires. All sized floorplans, with all possible combinations of shape alternatives, and all possible relative topologies.
Minimize: Formally, the floorplanning problem can be defined as follows. We will classify placement representations in slicing and non-slicing. The reason for this classification is the obvious difference in generality. Figure 6.1 shows an illustrative example of a non-slicing placement, which is defined as follows. Definition 10 (Slicing) A placement is slicing if and only if it can be obtained by complete recursive bisection of the placement area. If slicing cannot be recursively continued up to the lowest level, a placement is called non-slicing.
The main incentives for using slicing representations over non-slicing representations are the following. Some placement-related problems which are NP-hard for non-slicing placements, can be reduced to polynomial-time problems for slicing placements. Several useful properties can be attributed to slicing placements, of which conflict-free channel routing sequence application is most prominent. A hierarchical design methodology matches well with the slicing floorplan methodology. Hence, it is clear that both slicing and non-slicing representations have advantages and disadvantages. In the following sections we argue that the so-called sequence pair representation is most suitable for use in a mixed-signal layout generation framework. 1
We use
to denote the power set of
58
Placement
6.1 Previous Work Numerous people have contributed to approaches to solve the VLSI placement problem. Algorithms based on principles from various fields have been introduced in order to find better solutions for this intrinsically difficult problem which is known to be NP-hard [Sahni and Bhatt, 1980]. Due to this complexity it is impractical trying to find optimal solutions of any but the smallest problem instances. It is not our intention to give an exhaustive overview. Firstly, the amount of published literature is too large to describe extensively in this book. Secondly, it would lead us too far off the purpose of this section, which is to discuss candidate placement approaches. We refer the reader to good overviews in [Lengauer, 1990, Sherwani, 1993] and the references therein. Our purpose of using the phrase “placement approach” instead of “placement algorithm” is that the former is more generic. For instance, a placement could be obtained using a forcedirected method with a general (non-slicing, overlap-allowed) representation of blocks. The placement algorithm is then clearly the force-directed method, but the placement approach is the general representation of the blocks which is employed while placing using the forcedirected method. Another combination could be to use the force-directed method with a slicing placement representation. Since the representation of a placement has great impact on the performance of a placement algorithm, both in terms of speed as well as solution quality, it is sensible to discuss this in more detail. Otten [Otten, 1982] was among the first who introduced the notion of floorplanning in the early eighties. Motivated by this concept, researchers have begun to look for special cases which could be applied to digital VLSI circuits, without limiting design freedom in a negative sense. One of the most prominent special cases was the slicing floorplan structure, for which certain intractable problems reduced to polynomially solvable cases. This important property has been the main reason for using the slicing floorplan approach. An efficient floorplanning approach is described by Wong and Liu in [Wong and Liu, 1986]. Initially, the slicing floorplan approach was also applied to analog designs. However, it was soon realized that the slicing structure is too restrictive for analog layout [Cohn et al., 1994]. Consequently, a more general placement approach was adopted by members of the analog layout EDA community. Actually, the most general placement approach of all was initially used for this purpose; blocks were allowed to be placed at arbitrary positions in a 2-dimensional plane. Thus, the representation allowed overlap of blocks. One of the first works in this respect is due to Jepsen and Gelatt [Jepsen and Jr., 1983]. Subsequent works, which extended and refined the original concept, are among others due to Sechen [Sechen, 1988] and Lampaert [Lampaert, 1998]. Although the general overlapping placement approach resulted in promising results, fundamental flaws of it prevented researchers from building a viable mixed-signal layout generation system for larger designs. Efforts to refine implementations and tune the layout system to improve performance have been, and can only be, successful up to an extent. Fortunately, a great deal of research effort has been put into the design of efficient non-slicing placement representations. Murata et al. [Murata et al., 1996] developed one of the first efficient general placement representations, called the sequence pair structure. Some other relevant works are due to Nakatake et al. [Nakatake et al., 1996] who developed the bounded slice-line grid structure. Very recently, the O-tree structure was introduced by Guo et al. [Guo et al., 1999] and independently by Takahashi [Takahashi, 2000], A host of extensions and refine-
6.2 Effective and Efficient Placement
59
ments of the original O-tree concept followed rapidly [Chang et al., 2000, Pang et al., 2000]. These representations were soon adopted by others for use in an analog layout generation system [Balasa and Lampaert, 1999].
6.2
Effective and Efficient Placement
Computational efficiency is of paramount importance in connection with the placement problem, since it is NP-hard. The mandatory use of heuristic methods, typically featured by a massive amount of iterations, to obtain an acceptable solution in a reasonable amount of time, leads to the intuitive thought of “using all available information as good as possible, without introducing useless redundancy”. The partial phrases “as good as possible” and “useless redundancy” will be made explicit in this and succeeding sections. In order to achieve the goal of an effective and efficient placement method, a practical requirement on the abstract representation of a placement is so-called P-admissibility. We say that a solution space of a representation is P-admissible if it satisfies the following four requirements [Murata et al., 1996]: the solution space is finite, every solution is feasible, the mapping of a representation into a placement can be performed in polynomial time (P), the solution space contains an optimal solution (admissible). The first requirement is quite weak because finiteness can have a near-infinite appearance [Knuth, 1996]. Requirements two and four are obvious. The third requirement is also quite weak, since polynomial computational complexity includes a linear algorithm, but also an algorithm, where can be a large constant. As a consequence of the first and the third requirements, we can distinct various representations within the boundaries of Padmissibility. The computational complexity associated with a complete placement representation is a combination of essentially two properties of the representation: 1) solution space size, and 2) computational complexity for computing a specific placement. Since scalability, i.e. the computational behavior of a system as a function of the input instance size, is becoming increasingly more important, the use of asymptotic complexity measures is fully justified. In order to make a proper choice on which type of placement representation to use, it is wise to create an overview of important and relevant representations. The final choice of placement representation is made based on a trade-off between computational effort, generality of representation, ease of mathematical manipulation.
60
Placement
Let us first restate the requirements for an efficient mixed-signal layout generation tool from a conceptual point of view. First of all, it is well-known that matching of both wiring and modules is extremely important in analog circuit layout. Therefore, representations that set restrictions to generation of matching-aware layouts should not be used. Thus, non-slicing placement representations are more suitable. Second, the system should have good scaling properties, which means that the computational complexity should be as low as possible. Moreover, in the light of an optimization algorithm which is going to be employed to compute a (near) optimal solution, preference might be given over a specific type of representation which can possibly exploit information efficiency. Third, to achieve efficient usage of computational power and ultimately obtain a high-quality layout in several respects, better understanding of the mechanisms and parameters that control the overall layout quality is required. Therefore, more insight into the representation, especially with respect to its mathematical properties is of importance. A major benefit of identification with known mathematics is the possibility to use a host of existing off-the-shelf techniques and algorithms. Table 6.1 gives an overview of known placement representations. It also shows the size of the associated solution space of each representation. Also, the generality of a representation
is indicated in the column with heading “NS” (non-slicing). From the above table it is clear that there is a big difference in the size of the solution space of the representations. Although an indication is given regarding slicing properties of the representations, it does not cover all aspects of a flexible placement representation. This will be further explained in the next section. When a placement representation is used in an iterative approach, it is of utmost importance that the computation of a placement from an abstract representation is very fast. Also, scalability of the placement evaluation step is a major concern [Kahng, 2000]. Therefore, the computational complexity of a single placement evaluation (PE) step is also shown in Table 6.1 in the column headed by “PE”. Obviously, is the best possible complexity
6.3 Representation Generality, Flexibility and Sensitivity
61
when a “from scratch” computation is desired. Both SP and BSG have super-linear complexities. However, the given values are based on latest published results. Due to the fact that no proof of optimality is known for both the SP and BSG algorithms, we may conclude that improvement is not impossible. Summarizing, we have two types of representations: general non-slicing representations which have no layout restrictions, and specific restricted representations which have layout limitations; typically these representations are very efficient, albeit useful only in cases where such a limitation is allowed. In the context of mixed-signal layout generation, restrictions on the layout form a bottleneck. Thus, general non-slicing representations are preferable.
6.3
Representation Generality, Flexibility and Sensitivity
Normally, generality of a representation refers to the fact whether the placement is slicing or non-slicing. We observe that there are more factors that determine the usefulness of a representation. Two important factors are flexibility and sensitivity of a representation. These two terms are explained hereafter. Flexibility refers to the property of representing most, preferably all, of the meaningful solutions. A meaningful solution is defined next. Definition 11 (Meaningful Placement) A feasible placement in which blocks that are constrained to be adjacent, can indeed be placed that way without changing orientation or topology of other blocks, is called a meaningful placement. Clearly, not every feasible placement is meaningful, as feasibility means no overlap, but it does not impose any constraints on proximity of certain blocks. In mixed-signal layouts, the possibility to enforce spatial proximity (or minimum distance) between certain blocks is of utmost importance. Thus, it is of interest to choose a placement representation which holds as many meaningful solutions as possible. Although recent attempts have been focused on finding a representation with a solution space as small as possible, it should be noted that a small size of the solution space does not necessarily imply desirable results. This is a major discrepancy of packing-centric views, i.e. the tightest non-overlapping placement is not always the best placement in a realistic design. For instance, in Figure 6.2(a) a (meaningful) placement is shown which can not be represented by either the O-tree, LOT, or B*-tree representation. The underlying reason is that all three representations rely on a Tetris-like2 block dropping procedure. Therefore, a block can never “hover in the air”. BSG and SP, on the other hand, can represent such a meaningful placement because they rely on relative 2-dimensional information, whereas the O-tree structure is essentially 1-dimensional. Therefore, block can be placed such that it is above Figure 6.2(b) shows a placement that can be represented by any of the methods mentioned in Table 6.1. 2
The popular computer game in which blocks are dropped down.
Placement
62
Another properly that contributes to the usefulness of a representation is the sensitivity of a placement to (small) non-topological3 changes. For example, in the case of soft-blocks, changing the aspect ratio of a block might dramatically change the positions of several other blocks. Rotation of a block might have the same effect in a very sensitive placement representation. Figure 6.3(a) shows an illustrative example of the large sensitivity of a labeled ordered tree (LOT) placement. If the width of block is decreased a bit such that the width of block is at least the sum of the widths of block and block block falls down to the bottom boundary of the chip area, which is an inherent property of LOT placement (and similar representations such as O-tree and B*-tree). In contrast, a sequence pair placement is shown in Figure 6.3(b). Clearly, the sequence pair representation is much less sensitive to small non-topological changes since the location of block is independent or at most linearly dependent on the dimensions of block Summarizing, Table 6.2 gives for the placement representations mentioned in Table 6.1 their major advantages and disadvantages. These results will have a significant impact on the overall best candidate for placement representation in the context of mixed-signal layout generation. Many of the representations in Table 6.2 have been generalized to placement of rectilinear objects at the cost of increased complexity [Fujiyoshi and Murata, 1999, Kang and Dai, 1998, Xu et al., 1998, Nakatake et al., 1998, Sakanushi et al., 1998]. Also, range constraints which covers both pre-placed blocks and boundary-constrained blocks have been considered [Murata et al., 1997, Murata et al., 1998, Nakatake et al., 1998, Tang and Wong, 2001]. A few of the representations have been adapted to take into account a very important analog-design-related issue which is called matching [ Pelgrom et al., 1989, Balasa and Lampaert, 1999, Balasa, 2000]. Generally, if adding constraints reduces solution space size, this occurs at the cost of increased single placement computation effort. The final decision on which placement representation suits us best is based on our requirements, which in order of decreasing importance are: maximal generality and flexibility in order to lit mixed-signal and analog issues; low computational complexity to evaluate a placement; a small change in the abstract representation is associated with a small change in placement, essentially implying that the cost landscape is smooth; a small solution space so that searching for a good solution can be done more efficiently. 3
A topological change is a change in relative relationships between blocks.
6.3 Representation Generality, Flexibility and Sensitivity
63
Placement
64
The previous discussion justifies the choice of the sequence pair representation for use in a mixed-signal layout generation framework. Details on this representation will he given hereafter.
6.4
Sequence Pair Representation
The sequence pair (SP), recently introduced by Murata et al. [Murata et al., 1996], can efficiently represent any (topological) placement of rectangular modules, mainly because of its general non-slicing structure and its inherent property of representing meaningful solutions. To facilitate understanding of the abstract notion of the sequence pair representation, Figure 6.4 shows a few conceptual aspects of a placement of rectangular blocks in relation with sequence pair properties. The placement or layout space is the 2-dimensional plane spanned by the and The naturally corresponds the horizontal direction and the corresponds to the vertical direction. The sequence-pair space, spanned by the and is a grid space where the grid size is M × M, with M the number of modules in the placement space. Furthermore, we define four disjunct directions, which we call “above”, “below”, “left of” and “right of”. The first two directions align with the vertical axis, and the last two directions align with the horizontal axis. For reasons which will become clear shortly, each direction also corresponds to a two-character identifier. It is intuitively clear that the grid space only represents relative information between modules. With an additional step, absolute information can be added. Combined, this is sufficient for general placement representation. Now the concept of a packing will be described.
6.4 Sequence Pair Representation
65
Definition 12 (Packing) A packing is a minimum-area feasible placement of rectangular modules associated with a given SP. A packing essentially adds absolute information to the relative representation of the SP. However, there are still a few degrees of freedom left (unexploited) within a packing. Therefore, the Left-Down packing is defined. Definition 13 (LD-Packing) An LD-packing is a packing in which each module is moved left and down as much as possible while preserving the topology dictated by the sequence pair. Except when noted otherwise, each packing is an LD-packing in the remainder of this chapter. We will explain the notion of sequence pair first by an example and then more formally. An SP consists of two ordered sequences (or permutations)
where the sequence elements are unique integers from These integers are identifiers of the modules in the placement problem. Wherever convenient, we synonymously use to denote a module. The sequences can be seen as two orthogonal axes that span a 2-dimensional grid-space. An example is shown in Figure 6.5. The ordering of the elements (modules) in both sequences determines the relative relationships between these elements. For each pair of elements we have a before/after relationship within each sequence. The combination of two sequences yields four relative relationships between module pairs: “after”, “before”, “below”, “above”. We say that a module is after (or right of) when is located after in both sequences and A module is before (or left of) module if it is located before in both sequences and If a module is after module in sequence and before module in sequence then we say that is below Similarly, if is before in and after in then we say that is above This is also clear from the visual representation of an example sequence pair shown in Figure 6.5. For example, module 1 is after module 5, and module 4 is above module 6.
Placement
66
From and four sets can be derived, called the sets4, which define the topological relationships “right of”, “below”, “above”, and “left of”, respectively. The definition of these sets is:
where is the union operator and is the dissection operator. By we denote the element on position of sequence S, with a count index starting at 0. By we mean the index of element in sequence S. If the upper index value is lower than the lower index value, the union operator gives Furthermore, we define For example, if then and If, in addition, then It can be easily seen that the sets contain a lot of redundant information. For example in Figure 6.5, the set tells that elements 8 and 1 are right of element 4. Since the set already tells us that element 1 is right of 8, it is unnecessary to record this information once again in because all relative relations are transitive. In other words, if 1 is right of 8 and 8 is right of 4, this implies that 1 is right of 4. As the sets information is stored for computation or later retrieval [Murata et al., 1996], it is more efficient to find a less redundant description. We introduce sets which are
4
The subscripts denote a combination of “after” and “before”.
6.4 Sequence Pair Representation
derived from the
67
sets as follows:
In essence, the sets are derived from the sets by removing all (redundant) transitive information. If we leave out the index then represents any of the sets given in (6.5) to (6.8); represents either of the sets given by (6.5) and (6.7); are defined analogously. For simplicity, we will use instead of if no confusion is possible. Note that the sets and sets are symmetrically related. This also applies to the and sets [Murata et al., 1996]. Formally this is written as:
where Thus all (relative) topological information is available in two orthogonal (non-symmetrical) sets; for instance the and sets. However, for practical applications it is very useful to maintain all sets, for instance to improve run-time performance. Note that the sets maintain local topological information, whereas the sets have a global character. An interesting property of the sets is that they are necessary and sufficient to calculate a packing based on constraint graphs [Lin and Leenaerts, 2000b], under the assumption that we have no a priori knowledge on the sizes of the modules. In practice, we do have this knowledge, but it will turn out that a complete dissection of relative and absolute placement computation is advantageous for incremental computation. This will be explained further on. Note that we are able to represent any packing of rectangular modules with the SP [Murata et al., 1996, Lin and Leenaerts, 2000a], because we can find a sequence pair for every packing. Another advantageous property of the sequence pair is that it can be uniquely visualized in two dimensions by an oblique grid representation. The -45° degree axis represents sequence and the +45° axis represents sequence Figure 6.6 shows an example of an SP and its oblique grid representation, denoted by “grid” hereafter. Furthermore, each module has four so-called views which uniquely correspond to the and sets. For example, the view of module 5 is the shaded area in Figure 6.6; It is clear that every set is a subset of its corresponding set. For example, Currently, two approaches exist to compute a packing from a sequence pair and a set of modules. The first approach, the graph-based method discussed in Section 6.5, is based on constraint graphs which contain the topological information given by a sequence pair. Within each constraint graph a longest path is sought. The longest path length that is found corresponds to the width of a packing (for the horizontal constraint graph) and the height of a packing (for the vertical constraint graph). The second approach, which can be classified as a non-graph-based method, is discussed in Section 6.6. This approach is based on so-called longest common subsequence (LCS) computation [Hunt and Szymanski, 1977]. The classical LCS problem is generalized into a maximum-weight common subsequence problem by Tang
Placement
68
et al. [Tang et al., 2000]. A subsequence of a sequence is an ordered subset of the sequence elements in which the original relative order is preserved and adjacent subsequence elements need not be adjacent in the original sequence. We w i l l give an overview of existing material on this topic and establish a few new links in this context, leading to a new lower bound on the computational complexity for computing a packing from scratch. Although the graph-based packing approach does not yield the most efficient packing computation technique in terms of computational complexity, it will yield convenient means to step into an incremental packing computation approach which is described in detail in Section 6.7. The computational complexity of the incremental approach is The computational complexity of the graph-based packing approach is on average, and in the worst case. The best known average-case and worst-case computational complexity for non-graph-based packing computation is Furthermore, in Section 6.11 constrained placement computation is described in detail. The basic idea is to impose spatial constraints on specific modules. For instance, a module can be forced to be placed at the right boundary of the chip area. Details will be given in the following sections.
6.5
Graph-Based Packing Computation
An advantage of packing computation based on graphs is that the field of graph algorithms and analysis methods has been explored thoroughly in mathematics and computer science. Therefore, we are supported by a readily available set of tools which might facilitate the analysis and design of more efficient placement computation methods in the context of mixedsignal layout generation. A striking feature of the sequence-pair representation is that it separates the relative and the absolute placement information. Thus, these two issues can be handled and eventually optimized independently in an algorithmic sense, yielding a 2-step approach.
6.5 Graph-Based Packing Computation
6.5.1
69
Relative Placement Computation
As mentioned earlier, the sets are obtained from the sets by the observation that there is redundancy in the latter sets. Since the number and topology of the edges in the constraint graphs is one-to-one related to the sets, and computing a packing by means of longest paths computations in a directed acyclic graph (DAG) requires [Cormen et al., 1990], it is important to have small sets. The natural question arises whether or not these sets can be further pruned. This is not the case as stated by the following theorem. Theorem 2 All sets [from (6.5) to (6.8)], e.g. the sets, are necessary and sufficient to compute a longest path through the horizontal or vertical constraint graph associated with an SP, with dynamically changeable module sizes. Proof It is easy to see that the transitive edges induced by the constraint graphs constructed from the sets are redundant, since we are looking for longest paths. Only these redundant transitive edges are removed in the sets. Thus, sufficiency follows. To prove that the sets are necessary, suppose that they are not necessary and we can do with less. Every element can have its weight increased to make it part of the longest path. Furthermore, the number of times an element occurs in an set is equal to the number of unique paths that include this element. Thus if we remove an arbitrary element from a subset of then the path containing this element can no longer exist. This path could, of course, be a longest path by proper adjustment of certain weights. This is a contradiction. Thus necessity follows. As a consequence, we can efficiently map an SP to its corresponding horizontal and vertical constraint graphs using the sets and the sets, respectively. These constraint graphs only represent the relative relationships. In order to obtain absolute placement information, i.e. all coordinates of all blocks, longest paths computations need to be performed. First we discuss how the relative placement information is computed, that is, we propose an algorithm to compute the sets. After the way this algorithm approaches the problem, it is named the Direct View (DV) algorithm [Lin and Leenaerts, 2000b]. A few definitions to clarify some terminology are in place. For ease of discussion, the oblique grid is rotated 45 degrees clockwise. Moreover, we associate a “quadrant”, relative to a module, with each of four possible directions; quadrant 1 up, quadrant 2 left, quadrant 3 down, and quadrant 4 right. Definition 14 (Direct View) A module is said to have a direct view on module in a specific direction, if and only if is in the associated quadrant of and there is no other module in the rectangle spanned by and Module is called the directly viewed module (or simply viewed module if no ambiguity is possible), and is called the viewing module, denoted by For example in Figure 6.7, modules 1 and 4 are directly viewed by module 3 when we look to the right. So module 3 has exactly two modules in its direct view to the right. Note that these viewed sets are exactly the sets, and that every viewed set of size N induces N edges in the corresponding constraint graph. Now that we have shown, in the form of Theorem 2, that the information contained in the sets is necessary and sufficient to represent all relative relationships between modules in
Placement
70
a sequence pair context 5 , it is interesting to investigate the exact size of these sets. Since the size of the sets directly depends on the sequence pair at hand, we need te make an assumption in this respect. Intuitively, it is plausible that during the initial phase of stochastic optimization, no preference is given for a specific type of sequence pair. Therefore, the assumption that random sequence pairs are generated initially is perfectly valid. However, one may object to this assumption and pose that during the final phase of optimization, the optimization algorithm, which is simulated annealing in our case, converges to a specific sequence pair that may be some kind of worst-case sequence pair. Albeit imaginable, there is no clear reason why a final sequence pair should exhibit worst-case behavior. Indeed, our experiments in Section 6.9.2 unambiguously show that final sequence pairs exhibit averagecase behavior. Even in the case where additional constraints are imposed on placement, there is no reason to assume some kind of adverse correlation between the structure of the constraint graphs and the quality of a placement. The previous discussion justifies to lake a random grid distribution for analysis purposes, or more exactly, a randomly selected sequence pair from the sequence pair solution space with all elements being equiprobable. To facilitate the analysis, we define the following. Definition 15 (Subsequence) A subsequence of a sequence S is an ordered subset of the elements of S, where the ordering is with respect to the element positions in S. Furthermore, we observe the following. Theorem 3 Each common subsequence of a sequence pair is equivalent to a unique strictly increasing subsequence of sequence which is a unique permutation of Even so, each 5
We do not use a priori knowledge on the module sizes.
6.5 Graph-Based Packing Computation
strictly increasing subsequence in
71
corresponds to a unique common subsequence in
Proof A common subsequence of where denotes the size of implies that is both a subsequence of as well as By construction of the constraint graph, each common subsequence is equivalent to a path through the constraint graph (from left to right). Define a relabeling function which maps the modules in such a way that is a strictly increasing sequence. Since is a subsequence of is a strictly increasing subsequence of And since is also a subsequence of is also a subsequence of Now choose Since is a strictly increasing sequence, it is clear (from an oblique grid visualization) that a path can only exist in the constraint graph if and only if the nodes on the path occur in strictly increasing order in Thus the nodes on the path are also in a common subsequence of which is easily written as using with Corollary 1 We can analyze properties of sequence pair sequence approach.
indirectly by using the simpler single-
Definition 16 [Maximal increasing subsequence] A maximal increasing subsequence of sequence S is a subsequence of S which can not be enlarged by adding elements from S without violating the monotonicity property. With this definition we arrive directly at the following definition. Definition 17 [Longest increasing subsequence] A longest increasing subsequence of sequence S is a maximum-cardinality subsequence over all maximal increasing subsequences of S. Consequently, a longest increasing subsequence of S is always maximal, but not vice versa. Let us denote a maximal increasing subsequence by and its size by We state a theorem taken from [Knuth, 1989]. Theorem 4 Given a random sequence of length M, which is a permutation of distinct integers. The expected length of the longest increasing subsequence is asymptotically Recapitulating, we want to compute the expected number of edges in the sparsified constraint graphs associated with the sets, under the assumption of uniformly random sequence pair selection. Let us consider the sets (associated with the horizontal constraint graph). The other sets can be treated similarly. Each pair is a directed edge in the constraint graph. So the number of outgoing edges from a node is equal to What we want to determine is the total number of edges N in the constraint graph. As mentioned before, N depends on the actual distribution of the modules in the grid, also called a pattern. The average number of edges is denoted by while the maximum number of edges is denoted by Note that the average and maximum are taken over all possible grid
Placement
72
patterns. Moreover, note that a pattern is equivalent to a permutation (Corollary 1), and that the set of patterns is the set of permutations of M elements. Hence, the total number of patterns is M! if we disregard the node labels. It can be easily verified that the maximum number of edges is obtained with a scenario such as shown in Figure 6.8, which consists of two columns of nodes consisting of M/2 nodes each (M is even, without loss of generality). Thus, the grid pattern has
edges. Furthermore, if all nodes are vertically lined up in a single column, then N = 0. Thus,
the following holds: where P is the set of all M! grid patterns. This implies
For ease of explanation, we define what we mean by a string. Definition 18 [String] A string is a closed subsequence of a sequence, which is uniquely defined by two elements in the sequence to denote the start and end of the string, respectively. Determining is done in the following simplified way, using Corollary 1. For the example shown in Figure 6.7 we can use a simple linear-time algorithm to define a mapping that transforms into an increasing sequence (0, 1, . . . , 14). If the same mapping is applied to sequence we obtain the permutation
which is shown in Figure 6.9. In this permutation we are looking for strings and the numbers between and are smaller than or larger than Formally:
6.5 Graph-Based Packing Computation
73
This is equivalent to the notion that the rectangle induced by elements and is empty. In other words, “sees” or equivalently, is an edge in the sparsified constraint graph. Consider a random pattern in an M × M grid. It is easy to see that the number of strings of length is exactly The probability that a string complies to (6.14) is equal to the probability that and are two consecutive elements of the string set. Let us denote this probability by For example, if the string is (4, 10, 5) then elements 4 and 5 are two consecutive elements and the string complies to (6.14). If the string is (4,10) then elements 4 and 10 are two consecutive elements. The number of ordered pairs of consecutive elements from a set is exactly Furthermore, the total number of pairs is So
where is the length of string minus 1. The expected (or average) number of edges in an M × M grid is now simply computed as:
After rewriting (6.15) with the identity [Graham et al., 1989]
where
is Euler’s constant, this gives the closed-form expression
Placement
74
which states the average number of edges in a constraint graph explicitly. We state this result in a theorem. Theorem 5 The expected number of edges in a constraint graph is equal to if each sequence pair is equiprobable.6 Theorem 5 implies that no graph-based algorithm exists which has average computational complexity lower than for computing the mapping from sequence pair to a packing (from scratch), under the assumption that all sequence pairs are equally probable. It follows directly from (6.17) that the average number of edges per node in the constraint graph is
which is essentially performs about complexity of
6.5.2
This result stimulates us to search for an algorithm which of work per node, resulting in an overall (average) computational for all M nodes in a constraint graph.
An Efficient Relative Placement Algorithm
Motivated by the previous theoretical results, it is interesting to investigate the existence of an algorithm that can compute the sets in an efficient manner. That is, an algorithm that has computational complexity equal or very close to the number of nodes and edges in a constraint graph, which has been shown to be on average. The result of our investigation is the algorithm shown in Figure 6.10. We call it the Direct View (DV) algorithm, after the visual interpretation of the grid points that are visible from a given node, in accordance with Definition 14. The DV algorithm performs a right-to-left scan and a left-to-right scan of the grid to gather enough information to construct the sets. During the scans so-called “bracket pairs” are used which are defined to be a pair of nodes on adjacent horizontal grid lines. For every bracket pair the second node should lie above or below the first node for a right-to-left or leftto-right scan, respectively. An opening bracket is associated with the first node of such a pair, and the closing bracket with the second node. The first node of a bracket pair is also its identifier. For example, in Figure 6.7 the bracket pairs for the right-to-left scan are: [10,7], [1, 3], [13,6], [2,13], [9,21, [12,14] and [8,5]. Function Find_Bracket_Pairs finds all bracket pairs in one vertical scan of the grid, which requires O ( M ) time and space complexity. Function Initialize_BP_POS holds some accounting information on the closest viewing node associated with a bracket pair. This accounting information can also be obtained in O(M). During the right to left scan of the grid, two trees are maintained which hold the viewed nodes for the next node in the scan. The top-down tree holds all nodes for the view, and the bottom-up tree holds all nodes for the view. On line 8 a conditional statement checks if views which can be performed in constant time. If the statement is true then the appropriate set is updated. Update_BP_POS updates the accounting information for this bracket pair which takes constant time. This loop breaks when the leaf node of the trace has been processed. 6 For simplicity, but without loss of generality, we disregard the edges coming from the source node and going to the target node, wherever convenient.
6.5 Graph-Based Packing Computation
75
Placement
76
Lines 14 through 19 check whether a bracket pair should be opened or closed or not, and calls the functions to update the top-down tree and bottom-up tree accordingly. Analogously the left-to-right scan is performed. After both scans have been performed, the sets can be constructed. In Figure 6.11 and Figure 6.12 the update routines of the traces in the top-down and bottom-up trees are shown. When the DV algorithm finishes, all sets have been determined, and herewith the constraint graphs are also known. Using these constraint graphs, we show next how to compute the absolute placement information.
6.5.3
Absolute Placement Computation
In order to evaluate the quality of a placement in any sense, we need to have absolute placement information. For example, absolute module information is necessary to derive exact locations of all pins connected to the modules in order to assess the (global) routing quality, estimate the impact of substrate coupling between modules, determine the total chip area. Since a sequence pair and the derived constraint graphs or sets do not provide absolute placement information in themselves, an additional mapping step is required to obtain absolute placement information from the graph representation. This required missing information to compute the absolute coordinates of the modules in a packing is directly derived from the module sizes. An efficient way to determine the absolute positions of all modules using the constraint graphs and the module sizes, is by means of the longest-paths algorithm. This algorithm effectively determines from a given source node all longest paths distances to all reachable nodes in the constraint graph. Since the constraint graph is directed and acyclic, the longest paths algorithm requires complexity for a constraint graph G(V, E) [Cormen et al., 1990]. This can be written as With Theorem 5 this leads to the result that the lower bound on the average complexity of computing the absolute module positions, using constraint graphs, is This result is a substantial improvement when compared with the original algorithm by Murata et al. [Murata et al., 1996], which has average (and worst-case) time and space complexity where M is the number of modules to be placed. In the following, we will show how to compute the absolute module positions from the predetermined sets and the module sizes, by a simple example. For simplicity, we only discuss the horizontal case. The vertical case is similar. The sets, uniquely define the horizontal constraint graph, where all outgoing edges of a node are given by In order to compute absolute positions, every node is assigned a positive value (weight). This value is, for the the horizontal case, equal to the width of the corresponding module. For example, with the sizes of the modules in this example shown in Table 6.3, and sequence pair 3, 4, 5, 6, 8, 1, 7, 0, 9), (0, 5, 7, 9, 6, 4, 2, 3, 8, 1)), the weighted constraint graph in Figure 6.13 is obtained. For didactical convenience, two additional nodes are introduced: a “start” node and an “end” node (both with zero weight).
6.5 Graph-Based Packing Computation
77
They serve as start point and end point while walking through the constraint graph from left to right. With the length of an edge equal to the weight of the node inducing that edge, the result of this walk is that for each node, the longest-path distance to that node is recorded with the node. In practice, a very efficient longest paths algorithm can be used to compute these distances. The final distance values are the coordinates of the bottom-left corner points of the associated modules. Note that the distance recorded with the end node is equal to the width of the chip area. In Figure 6.14, both horizontal and vertical constraint graphs are shown for the example sequence pair. After the longest-path distances have been computed for all nodes in the vertical constraint graph, the coordinates of the modules are known and an actual absolute placement is conceived. Figure 6.15 shows the final packing. Let us annotate the previous notions in a formal manner. The horizontal constraint graph is defined by and the vertical constraint graph is defined by where
78
Placement
Hence
holds. If an efficient implementation based on an adjacency graph representation [Cormen et al., 1990] is used, the memory requirements and time complexity of the graph operations are for constructing the constraint graphs, where E are the edges of the constraint graph. Without loss of generality, we will only consider the horizontal constraint graph hereafter, denoted by G if no confusion is likely. Formally, the longest-paths information is described by a longest-paths forest denoted by where the weight function associates a module dimension with its corresponding node, and where dist is a recursive distance function defined
6.6 Non-Graph-Based Packing Computation
79
by
A node
is a start node if it does not have any incoming edges in or equivalently The equations given by (6.20) are so-called Bellman-Ford equations. Due to the fact that the constraint graph is directed and acyclic, the set of equations given by (6.20) can be solved uniquely, by performing an ordering step (depth first search) followed by a relaxation step, both requiring [Cormen et al., 1990]. Furthermore, it is clear that Summarizing, we proposed a graph-based approach for sequence-pair-to-packing computation which has (approximate) average computational complexity
where M is the number of modules to be placed. The worst-case computational complexity of the approach is but this can only occur in rare cases. The proposed algorithm is a significant improvement over the original (worst-case and average-case) algorithm by Murata et al. [Murata et al., 1996].
6.6
Non-Graph-Based Packing Computation
A non-graph-based approach in the context of packing computation was first proposed by Takahashi [Takahashi, 1996]. He formulated the packing computation problem as a problem of finding a maximum-weight 7 decreasing subsequence in a single sequence. Recently, Tang et al. [Tang et al., 2000] observed that a longest common subsequence in a sequence pair, is equivalent to a path through the constraint graph. Therefore, a well-known longest common subsequence (LCS) algorithm [Hunt and Szymanski, 1977] was employed to tackle the packing computation problem. It is intuitively clear that both approaches must be closely related. Actually, with the help of Theorem 3 and Corollary 1, we can argue that these approaches are essentially equivalent (from an abstract point of view). Figure 6.16 illustrates the previous ideas and shows their relationship. Surprisingly enough, the non-graph-based approach allows for a more efficient computation of a packing than the previously proposed graph-based method. The reason for this is that, given fixed known element weights, not all edges in the constraint graph are needed for proper longest paths computation. In other words, by exploiting a priori information on the actual node weights, some edges in the graph need not be generated.
6.6.1
Maximum-Weight Common Subsequence (MWCS) Problem
The approach taken by Tang et al. [Tang et al., 2000, Tang and Wong, 2001], is based on the longest common subsequence (LCS) computation technique [Hunt and Szymanski, 1977]. The standard longest common subsequence algorithm assigns unit weight to each sequence 7
A maximum-weight sequence, is a sequence that has a maximum sum of the sequence element weights.
80
Placement
element. This is not appropriate in the context of packing computation. Therefore, the original LCS algorithm has been generalized to handle weighted sequence elements. This generalized algorithm solves the maximum-weight common subsequence (MWCS) problem. It can be easily verified that a solution of the MWCS algorithm corresponds one-to-one to a longest path in the horizontal constraint graph. In [Tang and Wong, 2001] a very efficient weighted LCS algorithm is introduced, which is in fact the same algorithm as in [Tang et al., 2000] but with a more efficient priority queue. Since the sequence elements are taken from a finite set the Van Emde Boas data structure can be applied successfully here. The MWCS algorithm is given in Figure 6.17. For a detailed explanation of this algorithm we refer to the original paper [Tang et al., 2000]. From the amortized analysis given in [Tang
6.6 Non-Graph-Based Packing Computation
81
et al., 2000] it is clear that the following theorem must hold for the MWCS algorithm. Theorem 6 The asymptotic complexity of algorithm MWCS is where is the amortized complexity of the priority queue operations: insert(·), delete(·), successor(·), predecessor(·). Proof Obviously the loop from line 2 to line 14 iterates M times. Let us denote the computational complexity of each of the queue operations by where is the first character of the queue operation name. Then it follows directly that the worst-case computational complexity of all operations, excluding the while loop from line 10 to 13, is equal to For the while loop, we can perform an amortized analysis which goes as follows. Since each element is inserted exactly once into Q, the total number of deletions is never more than M. Only if an element is deleted, the successor(·) operation on line 12 is executed. Therefore, the amortized computational complexity is As a result, the overall worst-case computational complexity of algorithm MWCS is which can also be expressed as A direct consequence of Theorem 6 is that an asymptotic time complexity and space complexity implementation is possible of algorithm MWCS, using the Van Emde Boas data structure [van Emde Boas, 1975] which is featured by worst-case time complexity per queue operation. Note that the complexity values associated with the non-graph-based approach are worst case, as opposed to the average-case complexity of the graph-based approach.
6.6.2
Maximum-Weight Monotone Subsequence (MWMS) Problem
As observed by Takahashi [Takahashi, 1996], a longest path through the constraint graph is equivalent to a maximum-weight increasing or decreasing (sub)sequence (after relabeling). Note that a weighted increasing subsequence, also called an up-sequence, is associated with a path through the horizontal constraint graph, while a weighted decreasing subsequence, also called a down-sequence, is associated with a path through the vertical constraint graph. This observation can be proved easily with the aid of Theorem 3, which essentially states that a sequence pair can be easily mapped to an equivalent single sequence. Formally, the problem can be stated as follows. Problem: Maximum-weight monotone subsequence (MWMS) problem Instance:
A permutation
Solutions:
The set
Maximize:
of the elements in and a weight function of all monotone increasing or decreasing subsequences of with over
Since there are no fundamental differences between an increasing instance and a decreasing instance of the MWMS problem, we will simply call it the MWMS problem.
Placement
82
Although a clear link has been established between computation of a packing and the maximum-weight monotone subsequence problem [Takahashi, 1996], incomplete links have been established between the maximum-weight monotone subsequence problem and related works in mathematics and computer science. The elegance and conceptual simplicity of the MWMS problem almost dictates that this problem is known and has been tackled before. Indeed, Mäkinen [Mäkinen, 1999] surveyed the up-sequence problem and commented on the relationship between the MWMS problem and the maximum-weight clique problem in permutation graphs. It turns out that the maximum-weight clique problem in permutation graphs is equivalent to the MWMS problem. The former had been investigated by Chang and Wang [Chang and Wang, 1992] well before the introduction of the sequence pair representation. They proposed efficient algorithms for both the maximum-weight clique and maximumweight independent set problems on permutation graphs with complexity Thus, in principle, the first and fastest known algorithm in terms of computational complexity for non-graph-based placement computation using the sequence pair representation, was left undiscovered just until this moment. For completeness, we will mention the approach taken in [Chang and Wang, 1992]. First, we define a clique. Definition 19 (Clique) A clique in a graph G is a complete subgraph of G. In Figure 6.18, the set of nodes {3,4,5,6} forms a clique since every node in the set is connected to every other node in the set. In order to find a maximum-weight (sum of all clique element weights) clique in a permutation graph, Chang and Wang observe that an isomorphic interval graph can be constructed in linear time from a permutation, which is a compact equivalent representation of a permutation graph. The obtained interval graph is then used to find a maximum-weight set of weighted intervals with a known algorithm due to Hsu [Hsu, 1985] with complexity where M is the number of intervals (and also the size of the permutation). Effectively, a maximum-weight decreasing8 subsequence is obtained in For didactical purposes, let us consider again the sequence pair
visualized in Figure 6.5. Using Corollary l, we map this representation to a single sequence representation with the mapping defined by
which turns into a strictly increasing sequence. If this mapping is applied to at the single-sequence representation (or permutation)
8
The approach for finding a maximum-weight increasing subsequence is similar.
we arrive
6.6 Non-Graph-Based Packing Computation
A permutation graph associated with a permutation is defined by is the set of nodes and is the set of edges defined by
83
where
with In words this means that an edge exists between two nodes and if and only if is larger than and is located before in the permutation or is smaller than and is located after in permutation Obviously, a one-to-one relationship exists between the permutation graph and the permutation. It can be verified, using (6.23), that a clique in a permutation graph corresponds uniquely to a strictly decreasing subsequence within the associated permutation. For example, (3,2,0) is a decreasing subsequence of and (3 - 2) · (1 - 5) < 0, (3 - 0) · (1 - 6) < 0, and (2 - 0) · (5 - 6) < 0. Thus, set {0,2,3} forms a clique, as expected. Also, (3,7) is not a decreasing subsequence. Consequently, there should not be an edge between 3 and 7 in the permutation graph This is indeed the case, as (3 - 7) · (1 - 2) > 0. This notion can be easily generalized to the situation of weighted nodes. In that case, there is a corresponding maximum-weight clique with a maximum-weight decreasing subsequence, and vice-versa.9 The technique proposed in [Chang and Wang, 1992] is to map the permutation graph (with permutation given) to a so-called isomorphic interval graph representation for which Hsu [Hsu, 1985] presented an algorithm to compute maximumweight cliques. The crucial point here is the computational complexity of the isomorphic transformation. It is proven in [Chang and Wang, 1992] that this transformation can be performed in linear time. The formal transformation is discussed after illustrating the above ideas with an example. For ease of understanding, we use unweighted nodes in the following example. Furthermore, since we want to discuss the horizontal subcase (equivalent to increasing subsequences) of packing computation and the algorithm given in [Chang and Wang, 1992] works by default on decreasing subsequences, we reverse the sequence of (6.22) and get
Figure 6.18(a) shows the permutation graph for permutation Figure 6.18(b) shows the associated interval graph representation for this permutation graph. The construction of this graph is discussed shortly. The reader can easily verify that each pair of partially overlapping interval segments, say and in Figure 6.18(b), with each segment associated with an element in corresponds uniquely to an edge in Figure 6.18(a). Note that we deliberately put the nodes in the graph of Figure 6.18(a) in the same positions as given by the original sequence pair. Comparing this graph with the horizontal constraint graph of Figure 6.13 should directly reveil similarities. It is important to note at this point, that a clique in the permutation graph of Figure 6.18(a) corresponds uniquely to a horizontal path in the constraint graph of Figure 6.13. As discussed before, an increasing subsequence corresponds uniquely to a clique. As a consequence, these notions are fully equivalent. Formally, a given permutation graph, with permutation given, is mapped to an interval graph representation as follows. Each interval is defined as Add a super-interval which is required for Hsu’s algorithm. 9 If we want to apply the ideas to strictly increasing subsequences, we can simply reverse the permutation sequence. Another interesting equivalent problem in connection with sequence reversing is left out of this discussion. The interested reader is referred to [Chang and Wang, 1992].
84
Placement
If and only if two intervals have partial overlap, i.e. either or then an edge exists between nodes and in the permutation graph. With the constructed interval graph, the algorithm proposed by Hsu [Hsu, 1985] can be used to compute a maximum-weight clique in the interval graph in time and space complexity, essentially similar to the approach and results of Tang et al. which was published many years later. However, it must be noted that the algorithm of Tang et al. is conceptually easier to understand. It is posed as an open problem whether or not the MWMS problem can be solved in linear time within our standard model of computation. However, we can derive that (in theory) it is possible to solve the MWCS problem in smaller complexity than which is
6.7 Graph-based Incremental Placement Computation
85
obtainable through the use of existing practical data structures. In [Beame and Fich, 1999] optimal bounds on the predecessor problem are established. The theoretical result of that paper is a new data structure which stores , integers from a universe of size N in space and performs predecessor queries in
time. In conjunction with Theorem 6, we may conclude that the computational complexity of algorithm MWCS can be improved to
Since a solution of the MWCS problem is also a solution to the MWMS problem, the same achievable computational complexity holds for the latter. Summarizing, we can say that The non-graph-based placement computation approach is computationally more efficient than a graph-based approach. The former can be practically implemented with complexity, while the latter needs Owing to the fact that some redundancy is incorporated in the graph-based placement computation approach, it can be more easily generalized to an incremental approach. We do not make any claim that this is impossible with the non-graph-based approach. However, it is surely much more difficult. Another advantage is that the division between relative and absolute computation of the graph-based approach yields better (visual) insight into the problem. Consequently, analysis and design of relevant algorithms is substantially facilitated. From both a theoretical and a practical point of view, it is more interesting to investigate an incremental generalization of the graph-based placement computation approach. From theorical analyses, we might find interesting and exploitable properties that were previously unknown. Also, fundamental links with other approaches might be indirectly established. From practice, we gain important experience on how the incremental approach relates to the non-incremental approach in terms of run-time performance. From this information, practical guidelines can be derived for the usage of the incremental algorithms.
6.7
Graph-based Incremental Placement Computation
In the context of a stochastic optimization framework such as simulated annealing, typically very small changes (perturbations) are applied during each iteration (move generation). It is intuitively clear that a small change in input, is usually associated with a small change in the output. More specifically, a small change in the topology of the constraint graphs or the weight of a node, typically causes only a part of the absolute placement to change. It is obvious that recomputing the entire absolute placement after each small change is a waste of computation time. Therefore, it is interesting to find more efficient means to compute the absolute placement after applying a small change to the abstract sequence pair representation. This efficient manner of updating strictly necessary information is called incremental
Placement
86
computation. Note that the incremental computation approach does not involve any approximations and, therefrom, induced errors; it is exact. This is often not the case in contemporary literature. A change in the placement can come about in several ways, induced by the perturbation operator set we have defined within the simulated annealing environment. In our case, essentially, a distinction can be made between the following perturbations: the topology of the constraint graph generally changes, and also the longest paths information must be updated; swap: the topology of the constraint graph is unaffected, but the longest paths graph must be updated; rotate (over there is no change in relative relationships, and the longest paths graph must be considered for an update only if the rotation angle is or 10
mirror (horizontally or vertically): the constraint and longest paths graphs are unaffected. Typically, only a small part of a placement is actually affected by a perturbation. This fact can be exploited by an incremental computation approach. Furthermore, as will be clear shortly, because no error in the placement is introduced, the incremental approach is exact. As a consequence, the quality of a placement obtained by incremental techniques is essentially the same as one that is obtained by compute-from-scratch techniques. It is convenient to specify exactly what is meant by an affected module and a moved module in this context. Definition 20 (Affected module) A module is affected if it is an operand during a perturbation, or if its location can be influenced by that perturbation. Definition 21 (Moved module) A module is called a moved module if its location has actually changed due to a perturbation between consecutive iterations. If a module changes orientation or is moved, generally all nets connected to that module have to be re-routed. As a result, in all of the above perturbation cases, the routing has to be recomputed in some way. Generally, we can state that the more modules are affected, the more routing effort is required. We split the incremental packing computation in two parts. The first step computes the modified constraint graph in an incremental way. The second step computes the longest paths information in an incremental way. 10 Note that rotation over does not have any influence on the placement, but that it does influence the routing because pin positions are changed.
6.7 Graph-based Incremental Placement Computation
6.7.1
87
Incremental Relative Placement Computation
The incremental computation of the constraint graphs is shown for the horizontal case. The vertical case can be treated similarly. For the constraint graph computation we need only consider perturbations or as they are the only ones that affect the topology of the constraint graph. The swap operation does not change the topology of the constraint graph. The only thing that needs to be done is interchanging the labels of two nodes (and their associated fields). As shown in Figure 6.19 and Figure 6.20 the change in topology is dependent on the type of swap and the
relative orientation of the nodes under consideration. Let us call these two nodes and and let and divide the grid in nine regions denoted by A, B, C, D, E. F, G, H, I. The gridlines belonging to nodes and are not part of these regions. Furthermore, for ease of discussion, we will rotate the oblique grid 45 degrees so that we get a grid with horizontal and vertical lines, in which sequence is vertically aligned with the left side (top down) and sequence is horizontally aligned with the bottom side (left to right). Figure 6.21 visualizes this idea. The nodes in these regions are elements of nine disjunct sets denoted by
Placement
88
From Figure 6.19 and Figure 6.20 it is clear that there are 4 possible scenarios for computing the new sets after a perturbation operation. In order to construct a fast and efficient algorithm we assume that all elements of an set are stored in order. The order is defined by the position of the element in sequence This could, for instance, be achieved using an implementation based on balanced binary search trees, such as splay trees [Sleator and Tarjan, 1985]. We discuss an example to illustrate the way in which sets are updated. Suppose we have the situation as shown in Figure 6.22. The situation before perturbation is
denoted by and the situation after is denoted by The two nodes and are “vertically oriented” (in the oblique grid) at time and after the at time they are “horizontally oriented” (in the oblique grid). Assume we want to update the sets and then the following two cases can occur: 1.
where the algorithm in Figure 6.25 can be applied, and
2.
where the algorithm in Figure 6.26 can be applied.
First an example is discussed to illustrate the approach, which is subsequently formalized in an algorithm. Figure 6.23 depicts an illustrative example in which the computation of the set is shown. The time order in which the nodes are processed is indicated by a number; the node with number 1 is processed first, and the node with number 6 is processed last. Furthermore, the dotted arrows visualize that a node is directly viewed by another node, and thus is an element of that node’s set. At the left-hand side of Figure 6.23, the situation before the of nodes and is depicted. We start from this scenario to find the nodes that will be in set after the perturbation, in other words the set. The search for the nodes in is initiated by looking for the rightmost node in region I which is not directly viewed by another node in region I. Once found, the nodes that are directly viewable by node at time are added from right to left to (which is initially empty). In Figure 6.23, the sequentially added nodes are numbered 2, 4 and 5. These steps deserve some additional explanation. Searching for the rightmost node in region I which is not directly viewed by another node in region I is easily accomplished as follows. Start looking for the rightmost node in Assume, without loss of generality,
6.7 Graph-based Incremental Placement Computation
89
that this node (node 1 in Figure 6.23) lies in region H. Clearly, none of the nodes in region H can be an element of Therefore, if the rightmost node that has been found is not in region I, iteratively look for the rightmost node in the set of the last found rightmost node until the newly found node is in region I. Suppose this node is (node 2 in Figure 6.23). Search for the node just left of in Suppose this node is At this point, we distinct two cases: and In the latter case, we simpy add to So assume (node 3 in Figure 6.23). In this case, we set and proceed looking for the node in just left of the last found node in region I (node 4 in Figure 6.23). This process is iterated (node 5 in Figure 6.23)) until the node (node 6 in Figure 6.23) does not contain an element in its set which is left of the last added node (node 5). This completes the computation of Let us proceed with an example in which we want to compute Assume without loss of generality, that region E is non-empty. Figure 6.24 depicts an example scenario. The determination of is performed in two phases. First, all directly viewable nodes in region E and H are determined. Second, all directly viewable nodes in region F are determined. Note that if and only if region E is empty, node is an element of At the left-hand side of Figure 6.24, we see that the nodes in regions E and H which are to be added to are searched for in the view direction. We start with node and search for the leftmost node right of in (node 1 in Figure 6.24). This node is called the reference node. Now we proceed by using the last found node, say to find the leftmost node right of node in (node 2 in Figure 6.24). This step is iterated (node 3 in Figure 6.24) until no such node can be found. The remainder of the elements in is located in region F. Finding those nodes is straightforward. An strict requirement is that all those nodes should lie above the reference node (node 1 in Figure 6.24). First find the rightmost node, say in (node 4 in Figure 6.24). Now find the rightmost node in which is left of the last found node (node 5 in Figure 6.24). Repeat this step until the previously given requirement is violated or no further nodes can be found. Formally, the algorithm that computes the new sets, for this specific orientation and perturbation sceneario is shown in Figure 6.25. Function find_max searches for the element in with largest position index in sequence The execution of find_max on line 2 finds the rightmost node in region H and I (if it exists). If no such element exists, then In the first while loop from line 4 to 7, the algorithm efficiently searches for the rightmost element in region I. If no such element exists then
90
Placement
The second while loop from line 8 to line 19 determines all elements in If a node is found in region I using function find_max on a previously determined reference node then this node must be in which is accomplished on line 11. Once the rightmost node has been found, the rightmost node which is left of a previously added node to and directly viewable by the reference node is searched for. Computation of the is performed on lines 20 through 33. On line 22 the algorithm searches for the leftmost node which is right of and in This implies that this node must be located in region E of H. Moreover, this node must be an element of which is established on line 25. Figure 6.24 illustrates the search for the nodes in region E and H which must be included in However, the construction of is not complete yet as region F might contain more nodes that should be in The determination of these nodes is accomplished by the while loop from line 30 to 33. Node which is defined on line 23, acts as a reference node. All nodes in region F must lie above this node and an element of which is a requirement to be part of Construction of the and sets starting from an initially horizontal orientation of and goes according to the algorithm shown in Figure 6.26. With the explanation of the previous algorithm, it is easy to understand the actions of the algorithm in Figure 6.26. Note that for notational convenience a dummy node is introduced, with Analogously, the other sets can be computed by properly adapting the previously discussed algorithms to the specific situations. It is also possible to use the symmetry relationship given by (6.9) and (6.10) to compute the sets for the other viewing directions. Due to the perturbation of nodes and nodes in the regions A, B, D, and E might have been affected too, in terms of their sets. Therefore, we have to trace down which of these nodes need to update their sets. Fortunately, by virtue of the symmetry relationships (6.9) and (6.10), these nodes can be determined easily. For the discussed scenario, the set is at most
6.7 Graph-based Incremental Placement Computation
91
It is not sufficient to use the union of difference sets given by
where set difference is defined by This is demonstrated by the scenario shown in Figure 6.27. Clearly, updating only the nodes in the set difference would ignore node due to the fact that node c prevents node from being in and node is not in It is clear that definitely needs updating because node should not be in Therefore, it is clear that using (6.26) is not adequate. Obviously, the use of (6.25) is sufficient. However, it may be possible to exploit a more clever technique based on the symmetry properties of the sets given by (6.9) and
92
Placement
(6.10) which minimally extends the set (6.26) to render it sufficient. This can be intuitively understood by the observation which we derive from the following theorem. Theorem 7 If, due to a perturbation, a non-empty rectangle induced by unperturbed nodes and becomes empty, or vice versa, both the set and the require updating. The view direction is from node to node and the opposite view direction is from node to node
6.7 Graph-based Incremental Placement Computation
93
Proof By definition, if and only if the rectangle induced by and is empty. Therefore, if a non-empty rectangle becomes empty, or vice, versa, we have a change and consequently the and sets must be updated. A direct consequence of Theorem 7 is the following. Corollary 2 If a previously empty rectangle becomes non-empty or a non-empty rectangle becomes empty, and is the moving node, then and must be updated. The directions and are defined in Theorem 7. We can use Corollary 2 as an aid to identify pairs of nodes such as defined by Theorem 7. In other words, if a node requires an update of its set, we determine node and update its set. Finding can be accomplished with The complete set of unperturbed (static) nodes for which we have to update the and sets must be located in regions A, B, D, E. For ease of discussion, let us assume that we have a node in located in region D, as shown in Figure 6.28. Without loss of generality, assume nodes and are in region G and E, respectively, and both nodes are viewed by After the perturbation, the shaded are must be explored in order to find nodes which should be included in With the aid of previously discussed techniques it is quite easy to identify those nodes efficiently. The steps to be taken are formalized in the
94
Placement
algorithm shown in Figure 6.29, which efficiently computes the set. On line 2, we search for node If it exists, variable records its position in sequence If not, the shaded region extends to the lower boundary of the grid and is assigned a very large number. Line 4 determines node If a node is found in region E then, on line 7, we look for a module in view at time which is left of If the node is located in region F or does not exist then node will be viewed by after the perturbation. In this case, lines 9 and 10 are executed, finding a node in the shaded region which extends to the right of region H and in view at time Finally, in the loop from line 12 to line 15, we repeatedly add nodes to from view, if they are above node This is done in a right to left fashion using the predecessor approach. It is somewhat elaborate but quite straightforward to adapt the algorithm in Figure 6.29 to the other cases.
6.7.2
Incremental Relative Placement Computational Complexity
The computational complexity of the complete incremental update algorithm for computing the new constraint graphs after an or has been applied, is derived next. The analysis is based on the algorithm shown in Figure 6.25, where we assume that the perturbed nodes and at time are vertically oriented. Recall that each set is stored in a separate balanced binary search tree. The balanced binary search tree operation find_max(·) runs in where N is the number of elements in the tree (see Chapter 5). This complexity also holds for other basic operations on the tree such as: find_min(·), find_predecessor(·), find_successor(·), insert(·), delete(·). From (6.18), we have for the average size of an set for a randomly picked (with each If (and only if)
equiprobable). Therefore, the search will lake the function returns The loop from line 4
6.7 Graph-based Incremental Placement Computation
95
to line 7 will be iterated at most times due to (6.27). Moreover, the member check on line 4 can be performed in constant time. Consequently the first while loop has complexity The second while loop from line 8 to 19 is also iterated at most times due to (6.27). At most elements are added on line 1 1 . Adding elements to an (initially empty) set takes because
where
is some constant. Since find_predecessor(·) on lines 13 and 17 within this loop takes the total computational complexity for constructing the updated set
is
Similar arguments hold for the computational complexity of the while loops from lines 24 to 28 and from 30 to 33; both run in The resulting total computational complexity of the incremental update algorithm to compute the updated sets, is The same line of reasoning can be applied to the algorithm of Figure 6.26, which computes the sets, but where the nodes and are horizontally oriented before perturbation. To complete the analysis, the algorithm shown in Figure 6.29 should be included. This algorithm is the basis of the overall approach to compute the updated sets for all On line 1 the old set is copied to the new set. This takes using (6.28). Operation find_predecessor(·) and fincd_successor(·) on line 2 and 4, respectively, use The same complexity applies to the operations on lines 7, 9 and 10. The dominant part of the algorithm is the while loop from line 12 to 15. By (6.27), the loop iterates times. Within each iteration we have to add an element to and run find-predecessor(.). Thus, by similar arguments as before, the result sums up to total computational complexity. Since there are nodes for which the aforementioned complexity is required, an upper bound on the average computational complexity is given by
Using (6.25), (6.27), and
an upper bound for the expression in (6.29) is
Hence, the resulting grand total for the complete incremental constraint graph computation approach is
Placement
96
Clearly, an absolute lower bound on the average computational complexity is
Note that both complexities in (6.31) and (6.32) cannot be reduced by using the more efficient Van Emde Boas data structure [van Emde Boas, 1975]. The reason for this is that the universe of elements within a single set has size In theory, the aforementioned complexities can be improved but from a pragmatic point of view this is at least impractical since no implementations of theoretically more efficient data structures have been reported as yet.
6.7.3
Incremental Absolute Placement Computation
Computing the absolute information in a placement boils down to computing the longest paths information. This can also be performed in an incremental manner, as will be shown next. In order to do this efficiently, the set of affected nodes is needed as an input parameter for the incremental longest paths algorithm. A new algorithm is given here to compute longest paths through a perturbed constraint graph. This part was published before in [Lin et al., 2001], but in a less extensive form. First, a simple algorithm is given to find a single strictly increasing sequence from a given sequence pair. 1. Find a unique permutation that maps sequence ing sequence ( 0 , 1 , 2 , . . . , M – 1). 2. Map sequence
element-wise to a sequence
element-wise to a strictly increas-
using
It can be verified that the following property holds for sequence Going from left to right through sequence each sequence item that is smaller than all of its predecessors is a “start” node. The first sequence item is a start node by definition. It is easy to map this sequence element in back to the original module number using the inverse mapping For example, when we take the sequence pair shown in Figure 6.6, the mapping is given by
and the inverse mapping is obtained by reversing the direction of the arrows. When applied to sequence of the example, the following sequence is obtained for
is
The “start” nodes of this sequence are {8,3, 2,0}, which can be found in linear time. Using the inverse mapping it is straightforward to find that the original “start” node numbers are {0,5,4,2} which can be verified with Figure 6.6. It is clear that the aforementioned approach also works for sub-sequences, i.e. a permutation of a subset of This is also the benefit of this procedure, since the complexity is linear in the size of sequence 11 . Note that in general, does not hold for start nodes of a sub-sequence of 11 With current sorting algorithms, the worst-case complexity is increased to when
and only
6.7 Graph-based Incremental Placement Computation
97
The new incremental algorithm which computes the longest paths through constraint graph after a perturbation is discussed hereafter. Essentially, the longest-paths forest is made inconsistent after perturbing and the purpose of the incremental algorithm is to recompute (partial) longest paths in order to make consistent again. We define four types of inconsistencies: 1. under-consistent; applies when the distance value of a node is lower than its consistent
value given by (6.20), 2. over-consistent; applies when the distance value of a node is higher than its consistent value given by (6.20),
3. LP-underconsistent; applies when and dist and is distance-consistent, 4. LP-overconsistent; applies when
where dist
and
We also refer to the first two inconsistencies as distance-inconsistencies, while the last two inconsistencies are also called LP-inconsistencies. Graph is called distance-consistent when all distance values comply to (6.20). A graph is called consistent when it is both distance-consistent and LP-consistent. The incremental longest-paths (ILP) algorithm is given in Figure 6.30 and operates as follows. On lines 1-3 the distance fields of all affected nodes are set to zero so as to force correct computation of their new distance due to (new) incoming edges. The outer loop starting on line 5 and ending on line 29 checks if all candidate nodes given by set C have been processed. Each processed candidate node is eligible for annotation as a moved module. Consequently, the number of moved modules is at most equal to the total number of candidate nodes. On line 6 the start nodes are found for set C using the single sequence approach described earlier. Note that line 6 is executed at least once (with C = A), and possibly thereafter when the priority queue Q is empty and This occurs when the start node(s) propagate(s) changes through the longest paths forest, but not all affected nodes are processed during this update. The inner loop starting at line 7, processes all (distance) inconsistent nodes that are encountered during a single propagation wave. By virtue of the absence of cycles in (and an edge is processed at most once. Furthermore, extracting the smallest distance node from Q on line 8, guarantees that all candidate nodes are made consistent exactly once. The latter is performed on lines 9 through 14. For each node that is made consistent, all its outgoing edges are processed and the corresponding nodes are checked for inconsistency on lines 15 through 23. Each outgoing node of is checked for under-consistency on line 17, and checked for over-consistency on line 18. Note that over-consistency can only occur if of the inconsistent input graph. An inconsistent node will have its distance updated on line 20 and it will be put (back) on the heap with its new distance value for further processing on line 21. Line 22 will tag an inconsistent node so that it will be made consistent in the iteration where it is extracted from the priority queue Q. Lines 23 through 26 cover the case in which a node is LP-inconsistent. In these cases is added to to re-establish consistency. Below are some brief descriptions of the functions that are used in the incremental longest paths algorithm. adjust_heap (H, key, value) inserts (key, value) in heap H if key is not an element
Placement
98
of the key space of H, otherwise it adjusts the value field associated with key if it is smaller than value. extract_min(H) removes the pair with minimal value field from the heap and returns the key field of that pair. recompute_dist
computes the longest distance from all predecessors of
update_1p_pred
updates the longest path information from the predecessors of
insert_1p_pred
adds node
to to
as a longest path predecessor node of node
find_start_nodes(C) finds all “start” nodes from set C using the single-sequence technique as described previously.
6.7.4
Incremental Absolute Placement Computational Complexity
Consider the (horizontal) constraint graph induced by the sets. The graph G does not need to be maintained explicitly. Instead, we can keep a reduced version of G, denoted
6.7 Graph-based Incremental Placement Computation
99
by which contains the minimal longest paths information. A property of is that each node is reachable from a source node. A node is a source node if it does not have any incoming edges in G. Ramalingam and Reps [Ramalingam and Reps, 1996b, Ramalingam and Reps, 1996a] have shown that the computational complexity of the incremental single-sink-shortest-paths algorithm for general graphs with positive edge weights is bounded by where is an adaptive parameter that captures the set of vertices with a changed input or output value. Moreover, is the number of vertices of which the input or output value changes, and is equal to plus the number of edges incident on some node in Because the single-sink-shortest-paths problem is similar to the single-source-longest-paths problem, the algorithm is suitable for incremental computation of longest paths in the constraint graphs. Fortunately, the constraint graphs under consideration are directed acyclic graphs (DAGs) and we know that for this subclass of graphs, the longest path algorithm runs in As a consequence of this property, we are able to use algorithms that have incremental computational complexity A practical question remains on the parameter how does it relate to the problem size? In general, is an unknown parameter; it can only be quantified after the actual computation. However, we know In the specific case of a constraint graph induced by a sequence pair, we are able to say some things about in a quantitative way, under certain presumptions. Note that the impact of a random perturbation on is of a global nature, as opposed to the impact on the sets which is of a local nature. The underlying reason for this fact is that essentially embodies both relative and absolute information changes, whereas the sets only represent relative information. Analyzing the average in terms of a random perturbation is most convenient from a mathematical point of view. The validity of this approach is demonstrated in Section 6.9. Note that we have a bijective mapping between constraint graphs and sequence pairs. As a consequence we can write every property of the constraint graph as a property of the corresponding sequence pair. Furthermore, the sequence pair properties can be analyzed in a simplified way using a single permutation (Corollary 1). We want to quantify the average size of which is the expected number of nodes in a longest paths subtree of the reduced constraint graph We know that the constraint graphs have the property of being node-weighted, i.e. the outgoing edges of a node all have the same weight determined by the corresponding node. However, to simplify analysis we assume that on average the node weights do not determine the (average) topology of the longest paths subtrees, but the depth values (determined by a depth first search from the source node(s)) of the vertices do. This statement deserves some additional explanation. A graphical representation of an example packing of 10 modules, its horizontal constraint graph G and its associated longest paths subtree is shown in Figure 6.31. Intuitively, the previous assumption is quite reasonable, as the weight of a node does not change the topology of the constraint graph G; only might be affected. On the other hand, a change in depth value of a node (caused by a perturbation) does affect the topology of G and therefore is always affected. Figure 6.32 shows the impact of a change in weight of a node. In this example, node 6 (module 6) has its weight (width) increased from 58 to 78. We see directly from Figure 6.32(b) that the topology of the constraint graph does not change compared to Fig-
100
Placement
ure 6.31(b). However, note the change in the longest paths subtree the longest path edge (3,8) is removed and edge (6,8) has become part of a longest path. This is also clearly visible from the packing in Figure 6.32(a) where modules 8 and 1 are moved to the right to allow module 6 to expand in width. Note that the width of the chip area increases from 199 to 206, as indicated by the distance value of dummy node Figure 6.33 shows the packing and the associated constraint and longest paths graph after performing an of nodes 4 and 7 on the state represented by Figure 6.31. We see directly from Figure 6.33(a) that the packing is quite substantially affected. Moreover, Figure 6.33(b) shows that the constraint graph and longest paths subtree are both affected. Define the size of a subtree in as the sum of the nodes reachable from node plus one. Then the average subtree size is defined by
Each node in a subtree contributes to the total the number of times it occurs in any subtree. Let us call the total number of occurrences of a node in any subtree, the multiplicity
6.7 Graph-based Incremental Placement Computation
101
of that node. Thus we can write (6.33) as
The multiplicity of a node is exactly the number of ancestor nodes of plus one. We assume without loss of generality that each node has at most one parent. If a node has more than one parent in this means that there is more than one longest path to this node. If this situation occurs frequently on average, the diversity of the module dimensions must be low. In a practical problem instance this is highly unlikely, and thus the probability that a node has more than one parent is negligible.12 So the expected multiplicity of a node is also equivalent to the expected length of a maximal common subsequence of sequence pair The expectation should be taken for a given element over all possible configurations for a (typical) fixed topology of The latter is done for simplicity but without loss of generality. Note that the average is taken over all possible set elements for a fixed topology. Thus, we have
From Theorem 3 we know that a maximal common subsequence is equivalent to a maximal increasing subsequence which is denotes by (see Definition 17). As a conse12 This
implies that
on average.
102
Placement
quence, we have Thus, given a random sequence pair (with an associated constraint graph), the following can be derived. Using (6.36) we have
With the aid of Theorem 4 and given
this results in
Applying the Euler-Maclaurin summation formula to this f i n i t e sum, the approximation
is readily obtained. Finally, again using
we have
6.7 Graph-based Incremental Placement Computation
So the expected size of a subtree of
is
103
As a consequence, on average
However, this is the average computational complexity which is needed in the worst case to update after deleting or inserting an edge or changing an edge length, hereafter collectively denoted by “edge change” [Ramalingam and Reps, 1996a]. Intuitively it is plausible that efficient incremental algorithms for single-source longest paths in general graphs, such as described in [Ramalingam and Reps, 1996b, Ramalingam and Reps, 1996a, Frigioni et al., 1997], are not most efficient when applied to our restricted problem instances. It turns out that a more efficient approach can be used on the constraint graphs, by exploiting the knowledge that there is a lot of correlation between the longest paths induced by edge changes. Under the assumption that each module is equally probable to be affected, the expected size of a longest-paths subtree T rooted at a randomly chosen node is directly obtained by taking the expectation of both sides of (6.34) and combining with (6.37):
Finally, applying (6.35) we obtain
This implies that the expected amount of change in is per affected node. This can be seen as a lower bound for the expected amount of work to be performed for a single affected node. Let us now analyze the incremental longest-paths (ILP) algorithm shown in Figure 6.30. The initialization on lines 1-3 takes |A| steps. The set of affected nodes A is assigned to C, the set of candidate modified nodes, on line 4. The actions on lines 1 1 - 1 3 are only performed for the candidate nodes, for which holds. Taking yields a too optimistic estimation. Under the reasonable assumption that a longest-paths subtree rooted at any one of the start nodes derived from set A, is highly likely to contain other affected nodes, the expected total number of iterations of the while loop at line 7 can be approximated by as a consequence of (6.38). Thus is a better estimation. This also implies that the expected number of times that the while loop at line 5 is executed is since candidate (and affected) nodes are subtracted from C as they are encountered during the recomputation of the longest-paths subtrees. In other words, each time the algorithm executes line 5, |C| is smaller than the previous time. Each invocation of line 15 explores nodes. Function adjust_heap(·) operates within and is at most called times as a consequence of the assumption. Furthermore, the average computational complexity of function find_start_nodes(·) implemented with splay trees is at most All other operations have complexity. Summing up these results, an approximation of the average computational complexity of
104
Placement
incremental longest-paths algorithm is
Note that all and operations are conditional. This completes the analysis. It can be concluded that under reasonable assumptions, the ILP algorithm has near-optimal computational complexity.
6.7.5
Average Incremental Computational Complexity
A perturbation does not affect the constraint graph topology, because it is essentially equivalent to two node weight (inter)changes. However, generally the longestpaths graph needs to be updated because The incremental longest-paths algorithm can be used for this purpose, with the initial set of affected nodes When a module is rotated over an angle of or the height and the width of a module are interchanged. Therefore, the weights in the respective constraint graphs are changed, resulting in a change of the longest-paths information. The longest paths graph can be updated using the incremental longest-paths algorithm with the initial affected set equal to It is clear that the induces most work for the incremental update algorithms. Therefore, the associated computational complexity is an upper bound on the average-case complexity over all perturbations.
6.8
Implementation Considerations
In principle, it is not necessary to construct the constraint graphs explicitly, as the sets represent the nodes and edges in the constraint graph in a bijective manner. However, for sake of modularity and re-usability of code, the entities sets and longest-paths graphs are maintained separately and explicitly. A great advantage of the sets is the symmetry property given by (6.9) and (6.10). Storing the longest-paths information in a separate graph, renders longest-paths information lookup possible in essentially constant time. In terms of
6.9 Experimental Results
105
complexity this approach does not incur any overhead, but clearly it is a trade-off between space, time, and flexibility.
6.9
Experimental Results
It is interesting to evaluate the new algorithms from two points of view. First, to verify the validity of the theory. Second, to compare the performance with known algorithms. Performance is not only measured in execution speed, which of course is very important, but also in terms of scalability. That is, how does the running time increase as a function of the problem instance size? Although both parameters depend on implementation quality of the programs, optimization quality of the compiler, computing hardware platform, et cetera, scalability is easier to evaluate in an absolute sense. It is much more difficult to provide reliable results for comparison with other published results. Therefore, it is advisable to view these results in a relative way, for instance by comparing CPU time and solution quality pair-wise. A quality measure which is commonly used is the percentage of dead or slack space, defined by
The above definition of slack space implies that a 0% slack space packing is an optimal packing. A 50% slack space packing contains the same amount of empty and non-empty space. Larger slack space values are associated with progressively worse packings. Note that an optimal packing is not always a 0% slack space packing. Summarizing, we conduct various experiments to establish experimental evidence of the correctness of the theoretical analyses for a single random iteration, the practical computational complexity under the assumption of equiprobable sequence-pair selection, the validity of the theoretical assumptions in a practical SA optimization environment which employs a large sequence of iterations, the efficiency of an incremental approach over a conventional approach in a practical simulated annealing environment.
6.9.1
A Single Iteration
In Figure 6.34(a), experimental results are shown for a single iteration of an incremental placement computation. The perturbation is chosen randomly and equiprobably from Two modules to be swapped are drawn randomly from independently and uniformly distributed. All values are averaged over 10,000 iterations and the problem instance sizes that were under test range from 20 to 300. The actual behavior of the plotted curve is clarified by three additional plots shown in Figure 6.34(b)-(d). They are obtained
106
Placement
from the original curve
by
respectively. If, indeed, then it is expected that is equal to From Figure 6.34(c) we can see that is quite “noisy” with a very small positive trend, implying that the original curve follows (6.39) quite well. This is confirmed by Figure 6.34(d), showing a decreasing trend towards zero. Hence, we may conclude that the average computational complexity of the implemented algorithms is near optimal.
6.9.2
Packing Optimization
We compute optimized packings for a series of problem instance sizes with randomly generated modules (within a specified range). Note that these results have been obtained using an implementation of the sequence-pair-to-packing mapping. The used hardware/software configuration is: Intel PIII 800MHz CPU, 512 MByte RAM, SuSE 7.0 L i n u x
6.9 Experimental Results
107
OS. The optimization program is executed three times for each of the problem instance sizes in the sequence: 20, 40, 80, 160, 320. Furthermore, a standard MCNC benchmark is used [Kozminski, 1990]. The benchmark name is ami49 and it consists of 49 modules, 408 nets and 953 pins. For the packing experiment only the number of modules and their sizes is relevant. The experimental results are summarized in Table 6.4.
A plot of the slack space of the optimized packings for three independent runs is shown in Figure 6.35. It is clear from this figure that there is quite some variation among the results
of different runs of the same problem instance. Obviously, our implementation of the SA optimization algorithm has problems getting out of local optima when the number of modules is small. This could be explained intuitively by the fact that with a small number of modules, a perturbation easily leads to a relatively large change in the cost value. The net result is a very irregular cost landscape and, therefore, worse convergence properties of the optimization algorithm. Furthermore, we can see from the figure that the amount of slack space increases significantly as a function of the problem instance size. This phenomenon can be explained by the relatively simple perturbation scheme that is used in our optimization framework. As
108
Placement
we only consider relative perturbations with no knowledge of the absolute positions of the modules, many generated moves will affect modules which are spatially far apart. In the current approach there is no way of choosing modules which are relatively close together, even when the optimization is in its final phase. In other words, we cannot force the SA algorithm to sample the solution space more smoothly when optimization proceeds. Clearly, this unwanted effect will be increasingly more pronounced with increasing M. We may conclude that the sampling behavior of our SA optimization scheme will become increasingly more inefficient for problem instances containing more than roughly 50 modules. Note that no tuning was involved for obtaining these results. Figure 6.36 shows the CPU time for a single packing optimization run for a range of problem instances. The plot indicates a super-linear growing trend in CPU time for compuing packings as a function of the number of modules M in a packing. A closer inspection reveals that the trend is quadratic. One might wonder if there is a direct relationship between the computational complexity of a complete optimization run and the complexity of a single iteration. As will be clear shortly, indeed, a close correlation between these two complexities can be observed.
We also verify the validity of equiprobable selection assumption of Theorem 5 by plotting the average longest-paths tree size of the final optimization result, for a wide range of problem instances. The program is run three times for each problem instance size. The average subtree size as a function of the problem instance size M is plotted in Figure 6.37. The plot shows a clear sub-linear trend as a function of M. Indeed, the trend is according to which is evident from the plot shown in Figure 6.38, which is the result of dividing the values plotted in Figure 6.37 by Additionally, it is interesting to verify whether these results also hold under differerent circumstances. For instance, we can change the cost function (4.3) to include routing issues. For the moment, we only mention that a sophisticated (global) routing scheme, denoted by SPBH_I, is used which w i l l be discussed in detail in Chapter 7. The cost function weights are
6.9 Experimental Results
109
set to: and The obtained results indicate that the average subtree size grows according to a function which lies in between and Finally, we show in Figure 6.39 the CPU time of incremental packing optimization versus non-incremental packing optimization for a range of randomly generated benchmarks and the largest MCNC benchmark ami49. It is clear from this plot that the incremental placement computation approach outperforms the non-incremental placement computation approach starting from about M = 100. This means that the incremental approach is practically feasible and leads to increasingly larger improvements as M grows larger.
110
6.9.3
Placement
Conclusions
The assumption of equiprobable sequence-pair selection for analyzing the average computational complexities in connection with incremental sequence-pair-to-packing computation is justified. With and without consideration of global routing the average subtree size of a longest-paths graph with M nodes lies between and The previous complexities hold both for a single sequence-pair-to-packing iteration as well as for an actual sequence of iterations within a practical SA optimization run. From the experimental results we can clearly observe that the performance of the optimization framework depends on the size of the problem instance at hand, and is arguably dependent on the quality of the generated perturbations (moves). It is likely that a more sophisticated perturbation scheme which generates (mostly) better moves, in terms of a higher probability of acceptance, will improve overall performance of the SA optimization framework. Finally, we observed an approximate one-to-one correlation between the complexity of a single sequence-pair-to-packing iteration and the time required for the optimization process to arrive at a final solution. Since the average computational complexity of a single incremental placement computation iteration is which is better than the computational complexity of any “from scratch” placement computation algorithm (either one of and the overall run time of an incremental SA optimization run must be better for all Indeed, when we compare a small-constant-factor quadratic implementation with our (unoptimized) incremental implementation, we find
6.10 Placement-to-Sequence-Pair Mapping
6.10
111
Placement-to-Sequence-Pair Mapping
For several reasons one can imagine that efficient mapping of a given placement of modules to a sequence-pair representation is useful. A few of many possible applications are directly related to the following scenarios. When a placement is input via a graphical user-interface, one needs to translate it to a sequence pair representation before any automated concepts can be applied to the placement. An actual placement gives exact information on spatially close modules to a specific module, as opposed to the relative sequence pair representation or (equivalent) oblique grid representation. Therefore, the actual placement is better suited for use in connection with choosing distant or close modules which is useful for selecting perturbation types in a sophisticated implementation of simulated annealing. As will be demonstrated in Chapter 8, Section 8.7, efficient enumeration of all modules in a placement is very useful for the minimization of performance-degrading physical coupling phenomena. We argue that efficient enumeration is a key ingredient for placement-to-sequence-pair computation. In [Murata et al., 1996] a method called “gridding” was introduced to map a packing to an equivalent sequence pair (SP) representation. However, the described approach is quite ambiguous, in the sense that the packing can be mapped to more than one SP. Furthermore, if modules are placed with cutting zones of slack space such as in Figure 6.40, it is impossible to apply Murata’s gridding procedure. As argued in [Nakatake et al., 2001] it is possible to
determine a sequence pair from a given packing by the following procedure. To determine sequence push the modules out of the packing in an top-left order without having to move aside any other module. For sequence the same push-out procedure can be applied except that it should occur in a bottom-left order now. It is known that this procedure does not always yield a unique solution which can be easily seen from the example cases in Figure 6.41. We note a serious shortcoming in the previous push-out algorithm, being the fact that not all placements can be mapped to the original (or equivalent to the original) sequence
112
Placement
pair [Nakatake et al., 2001], and, as a result, establish a new observation which is called the idempotent property of the placement-to-sequence-pair mapping. Formally, if we denote the mapping from sequence pair to a packing by and denote the mapping from a packing to a sequence pair by then is said to establish an idempotent mapping, or is simply called idempotent if Due to the fact that many sequence pairs can map to exactly the same packing (depending on the sizes of the modules), is not unique. In [Nakatake et al., 2001] an attempt is made to formalize this idea under the ambiguously defined notion of 1-dimensional compaction. We circumvent the non-uniqueness of in this discussion by stating a natural requirement for In words, (6.41) means that when a packing is re-computed for a sequence pair which is obtained by applying mapping to the originally computed packing, then these two packings should be equal in every sense. The procedure for proposed in [Nakatake et al., 2001] does not guarantee an idempotent mapping as defined by (6.41). This can be easily seen from the packing in Figure 6.42 which is the result of that procedure applied to the packing of Figure 6.40, yielding sequence pair ((2, 3, 4, 5, 6, 8, 1,7, 0,9), (0, 5, 7, 9, 6, 4, 8, 1, 2, 3)). The
discrepancy is caused by the cutting zone of slack space which isolates a module or group of
6.10 Placement-to-Sequence-Pair Mapping
113
modules from its left or lower neighbor module while these modules could be shifted leftward or downward without incurring overlap13. At least two ways exist to resolve the problem. Remove cutting zones of slack space by (virtually) enlarging the size of specific modules. Adapt the naive packing-to-sequence-pair push-out method so that it takes cutting zones of slack space into account. First, we formalize the packing-to-sequence-pair (P2SP) method into an algorithm. With a slightly adapted version of the area enumeration operation of the corner stitching data structure (see Chapter 5), it is possible to realize the packing-to-sequence-pair algorithm. The aforementioned algorithm, shown in Figure 6.43, has computational complexity,
where M is the number of modules in the packing. This can be easily understood as follows. Step 2 enumerates all modules at the left boundary of the chip area. In the worst case, this takes and on average this is During step 3, the modules are extracted in a first-in-first-out manner. Hence the total computational complexity of extracting all modules of L takes in the worst case. The most time-consuming operation of step 4 is the traversal of all neighboring modules at the right side of a given module. As neighbor enumeration is performed once for every pushed-out module, and the computational complexity (with hint) for neighbor enumeration is the amortized computational complexity of step 4 is when implemented efficiently. As a result, the overall (average) computational complexity is The determination of sequence can be performed in a similar manner, except now the modules are pushed out starting from the bottom-left corner. We will propose here a method to guarantee an idempotent mapping by virtually expanding specific modules. Therefore, we call this approach the expansion method. For the sake of simplicity, but without loss of generality, the algorithm is discussed for one dimension, i.e. the horizontal expansion case. Expansion uses the steps as shown in Figure 6.44. Note that the placement is assumed to be converted into an equivalent corner-stitching data structure (see Chapter 5). As a result, every piece of space in the packing is represented by an empty rectangle or a non-empty rectangle (module). Clearly, the algorithm has computational complexity equal to since the modules are processed in an efficient order 13Note
that the relative relationships dictated by the sequence pair are violated by doing so.
114
Placement
given by and step 3 is performed with amortized computational complexity since neighbor enumeration with hint takes (See Table 5.1). Note that an i n e f f i c i e n t processing order of the modules could easily result in complexity. This occurs, for instance, when we need to search back and forth in the corner stitching data structure for certain modules. Figure 6.45 shows the packing of 10 modules after (horizontal) expansion, which is the equivalent packing of Figure 6.40. Note that in most cases there will be no slack space left
after full expansion of a packing in two dimensions. The reader can easily verify that expansion implies that algorithm P2SP is equivalent to an idempotent packing-to-sequence-pair mapping. The alternative approach, as mentioned earlier, is to guarantee an idempotent packing-tosequence-pair mapping is by refining algorithm P2SP such that ambiguity is prevented from occurring by making explicit use of the empty rectangles. This is not further elaborated in this book.
6.11
Constrained Block Placement
In a real-world layout problem, additional constraints are imposed on subsets of blocks that have to be placed. A common situation is the constraint on I/O blocks which must be placed at the periphery of the chip. Moreover, in some cases it might be preferable to place blocks within a pre-specified area of the layout. Especially in the context of mixed-signal layout
115
6.11 Constrained Block Placement
generation, such constraints are very important. Up to now we have ignored such spatial constraints which are imposed on the blocks in a placement, apart from the sequence-pair constraints. However, in practice they must be dealt with. Of course, the inclusion of such constraints should induce as little computational overhead as possible. Fortunately, the sequence pair approach allows for the incorporation of constraints in an efficient manner. Let us first specify exactly what type of constraints we have. Definition 22 (Range constraint) A block has a range constraint if a rectangular region is defined for that block. A constrained block adheres to its range constraint wise it violates its range constraint.
if it lies inside that region, other-
Definition 23 (Boundary constraint) A block has a boundary constraint if the block is to be placed at one of the four sides of the packing area. A corner constraint is also a boundary constraint. In the former case the module is constrained to two adjacent boundaries. Some authors use a pre-placed module constraint, but this constraint can be seen as either a special case of a range constraint or a boundary constraint. For example, if in case of a range constraint the range is set to the actual width and height of a module, effectively the module is pre-placed. Note that a pre-placed module can also be seen as an obstacle in the placement area [Murata et al., I998]. We will discuss hereafter the incorporation of range and boundary constraints into both the non-graph-based placement computation approach and the graph-based placement computation approach. The original idea for the former approach was introduced by Tang and Wong [Tang and Wong, 2001] very recently. Also, the incorporation of range and boundary constraints into the incremental graph-based placement computation approach is discussed. Note that matching does not fall under range constraints because a range constraint requires a priori knowledge of absolute placement information. This information is not always available for matched modules. However, as discussed in Chapter 4 matching constraints can be taken into account using techniques such as described in [Balasa and Lampaert, 1999].
6.11.1
Non-Graph-Based Constrained Placement
The original idea of sequence-pair-based block placement with constraints in the context of a non-graph-based approach will be sketched shortly. Tang and Wong [Tang and Wong, 2001] proposed to use so-called dummy blocks to enforce range and boundary constraints. The dummy blocks have no area but only a length or a width. Figure 6.46 shows this idea graphically. In Figure 6.46(a) block has a range constraint defined by four dummy modules: one to the left of with width and height 0, one to the right of with width and height 0, one at the bottom side of with height and width 0, and one at the top side of with height and width 0. Note that depends on the height H of the chip, and depends on the width of the chip. Actually these values are pre-set desired values and adapted during optimization. In [Tang and Wong, 2001] this issue is handled by defining initial values for both W and H such that W · H equals 150% of the sum of the individual
116
Placement
block areas. During optimization by means of simulated annealing both or either one of W and H are randomly chosen to be decreased by a certain amount when a constraint-violationfree placement is found. Furthermore, the cost function used in [Tang and Wong, 2001] is defined as where and are the actual width and height of the chip area, WL stands for wire-length, and and are weight factors. Ideally, at the end of the optimization process, and However, it is clear that during optimization, and frequently occurs. In such a case we have the situation that the dimensions of the current placement do not comply with the desired dimensions W and H. A straightforward solution would be to penalize such cases with very high cost values. A direct consequence of this measure is that the cost landscape is rendered unnecessarily irregular. This, in turn, badly affects the convergence behavior of the simulated annealing algorithm. Fortunately, there is a more elegant solution. Since by construction we have and we can simply put and directly in the cost function of (6.42). Minimizing the cost function implies minimizing and This approach seems to work remarkably well as evidenced by the results in [Tang and Wong, 2001].14 For completeness, and for the sake of overview, we give the general non-graph-based placement computation algorithm of Figure 6.17 again in Figure 6.47. We call it the con14 Unfortunately, the authors of [Tang and Wong, 2001] did not publish detailed information on their implementation of the simulated annealing algorithm. Hence, comparison with results of other works should be done with care.
6.11 Constrained Block Placement
117
strained maximum-weight common subsequence (CMWCS) algorithm. The essential difference is located on lines 12 and 14 which enforce the placement constraints for the horizontal case. The vertical case is similar. Line 12 checks if module has a constraint by verifying (in
constant time) whether or not it is an element of the set of constrained modules which consists of the range-constrained modules and the boundary-constrained modules l5 Of course, Line 2 assigns the correct values to left- and right-side constraints associated with a range-constrained module. Lines 3 and 4 establish similar results for left-boundary and right-boundary constrained modules, respectively. If a module is constrained then the position of module is adapted such that the constraint is adhered to. Essentially, the dummy modules associated with the constraints force the module-underconstraint into a certain preferred region of the placement area. At line 12 a left-side constraint on a module is enforced. At line 14 we check again if is a constrained module. This time the width of the total chip area is adapted in such a way that a violation of the right-side constraint will induce larger chip width. Hence, violations are penalized and thus minimized. The overall computational complexity of algorithm CMWCS can be derived in a similar fashion as algorithm MWCS shown in Figure 6.17, being where is the amortized 15 Note
that the top and right chip boundary locations are unknown beforehand.
118
Placement
computational complexity of the priority queue operations. If we use a priority queue based on splay trees we obtain and if we use a Van Emde Boas data structure [van Emde Boas, 1975, Mehlhorn and Näher, 1990] a better result, is obtained. We observe an important point which is worthy of further investigation because it can lead to a simplified overall optimization algorithm and give better placement results. The observation is that the use of stochastic adaptation of the target dimensions (W, H) adds to the computational complexity of the problem. Moreover, the impact of this stochastic adaptation on the overall performance is not known. Therefore, it is probably better to avoid it. We propose a modified 2-step algorithm to compute a constrained placement without the use of iterative adaptation of W and H. The algorithm is shown in Figure 6.48. Essentially, the algorithm, which we denote by MWCS2, is the same as the original algorithm except for the fact that the input target dimension is not needed anymore. The algorithm finds this dimension by performing a first placement pass (from line 1 to 16). The found value of W guarantees that no redundant margin is introduced. Consequently, no estimation error is made which will eventually lead to better results in less iterations. Experimental results which confirm this claim are presented next. Note that the boundary constraints can be easily generalized to include corner constraints. This is accomplished by enforcing both a top or bottom constraint and a left or right constraint. Experiments show that neither the run-time performance nor the solution quality deteriorate (under a reasonable number of constraints).
6.11.2 Implementation Considerations In practice there might be various reasons to choose for a range constraint. For instance, a range constraint can effectively cluster a given set of modules within a prespecified area of the chip. When no intervening modules are allowed between the modules that need to be clustered, the following technique can be used to accomplish this. Choose the range constraint in such a way that the square-like area is about 5% larger than the total area of all separate modules in the cluster. Although the optimization algorithm will attempt to fit all modules of the cluster within the specified range, it is not unthinkable that there is a (small) range-constraint violation introduced in the final placement, for instance, in order to arrive at a smaller total chip area. Although this is a perfectly valid method to achieve effective clustering, the apparent dimensions of the chip area will be larger than the actual dimensions (W, H) in case of a constraint violation. This is a problem when the final placement is computed and compared with other results. Therefore, an additional placement computation step is required which ignores all dummy modules at the top of and right of a constrained module, in order to compute the actual chip dimensions.
6.11.3 Experimental Results on Non-Graph-Based Constrained Block Placement In order to verify the claimed improvement of performance (CMWCS2 versus CMWCS) in Section 6.11.1, we perform extensive experiments with the largest standard floorplanning MCNC benchmark ami49. Its use is de-facto standard in many contemporary placement works. Furthermore, in the context of constrained placement problems, it appears acceptable
6.11 Constrained Block Placement
119
to disregard all routing issues witnessed by many recent publications. For comparison purposes we will adhere to the same strategy and ignore routing. The ami49 benchmark contains 49 blocks with a diverse set of dimensions. Similar to the latest state-of-the-art publications on constrained placement [Tang and Wong, 2001], we set the following seven constraints. We select a block to be constrained to one of the four boundaries, and do this for each boundary. Moreover, we select three blocks to be constrained within the same preselected placement area. Furthermore, an attempt was made to choose blocks of the same size as in the reference publication but due to a different block labeling scheme there might be some inconsistency here.
120
Placement
We ran both CMWCS and CMWCS2 20 times on ami49. Since we still have a tuning parameter in CMWCS, embodied by the decrement size of the target dimension(s) each time a constraint-violation-free placement is seen, we compare the CMWCS2 results with the best results from a series of CMWCS runs with different values of the decrement parameter. The set of values we used for this parameter are: 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.98, 0.99 and 0.9999. More specifically, when a constraint-violation-free placement is found, the chosen target dimension, say W, is adjusted as follows: with the decrement parameter equal to 0.98. Figure 6.49 shows the behavior of the solution quality over 20 runs of the SA algorithm for various values of the decrement parameter. Both the average and the best solutions in terms of chip area are plotted. For comparison purposes, the average and best chip area obtained by algorithm CMWCS2 is also plotted in the same figure. We can see that algorithm CMWCS2 consistently yields solutions close to the
best solution. Moreover, the best CMWCS2 solution is typically better than the best solution over all 220 runs of the CMWCS algorithm. In all cases the average CMWCS solution is significantly worse than the average CMWCS2 solution. It should be noted that a more honest comparison should compare 220 runs of algorithm CMWCS2 with the best solution obtained with algorithm CMWCS. Indeed, from additional experiments we obtained a value of after 80 runs which is already better than the best result obtained with algo-
6.11 Constrained Block Placement
121
rithm CMWCS. Consequently, we may state that algorihm CMWCS2 is very robust, does not require (problem-dependent) tuning, and yields excellent solutions. However, as is clear from the structure of the algorithms, CMWCS2 in its present form is bound to be significantly slower than CMWCS. Indeed, this is confirmed by Table 6.5 which summarizes some additional information gathered from the experiments. When compared to the unconstrained
optimization results (see Table 6.4), there is no apparent degradation in solution quality due to the imposed constraints. This is quite surprising. It implies that the taken approach for including placement constraints does not (significantly) deteriorate the convergence properties of the overall SA optimization algorithm. The CPU time of CMWCS-based optimization is substantially smaller than CMWCS2-based optimization. This is the only drawback of algorithm CMWCS2. However, it may be possible to improve the latter algorithm by using sophisticated techniques. For instance, one could try avoiding recomputation of module positions which are not affected by constrained modules. This is not further explored in this book. The average number of iterations and rejections (of generated moves) gives a good indication of the quality of the overall optimization algorithm and enables platform-independent comparison of optimization times. An important feature which favors CMWCS2 over CMWCS is the fact that the former does not introduce additional and unnecessary tunable parameters which, among others, adversely affect the stochastic properties of the optimization. Additionally, the tunable parameter settings are likely to be problem-dependent. When we spend some more effort on the generation of a few additional constrained placement results, we can for instance obtain with CMWCS2 the placement shown in Figure 6.50. The result is significantly better than the current state-of-the-art [Tang and Wong, 2001] with reported slack space of 6%. It is interesting to note that the latter result is obtained with about 850,000 iterations and 600,000 rejections [Tang, 2001]. If we constrain module 4 to the top-right corner and module 6 to the bottom-right corner, and set the constraints on the other modules as before, a run of the SA optimization algorithm could give the placement result shown in Figure 6.51.
6.11.4 Incremental Graph-Based Constrained Placement The previously discussed range and boundary constraints which can be imposed on any of the blocks in the placement problem, were used in a “from scratch” computation scenario. We extend this idea to the incremental computation scenario. As before we perform incremental computations directly on the constraint graph representations. The incorporation of placement constraints into the incremental approach is quite straightforward. For simplicity only the horizontal case is discussed here. A module which has an associated constraint, i.e. is only processed when it is an affected module. The distance update step for a
122
constrained module is
Placement
6.12 Concluding Remarks
123
which is very similar to the function which is used for the incremental computation of longest paths. For the dummy “end” node in the constraint graph, the following update step is sufficient:
However, (6.44) can be updated more efficiently when the affected range-constrained and right-boundary constrained modules are stored in a separate list which keeps both the old information as well as the new information. As a result, (6.44) can be computed incrementally. Thus the average computational complexity of incremental graph-based constrained placement is upper bounded by If the number of constrained modules is fixed and independent of the total number of modules M, it is easily seen that the placement computational complexity is not affected due to the addition of constraints. Experimental verification of the effectiveness of incremental constrained graph-based placement computation has not been performed. Based on the experimental results of the constrained non-graph-based placement computations, it is expected that the computational complexity is not significantly affected. It is plausible to assume that previously established properties may be extrapolated to the current situation.
6.12
Concluding Remarks
We have given an overview of state-of-the-art placement representations. These representations have been compared with each other with regard to important mixed-signal layout requirements such as generality and flexibility of module placement. Also with respect to the requirement for a well-behaving simulated annealing stochastic optimization process, small sensitivity of a placement representation is argued to be of importance. The sequence pair (SP) placement representation is selected because of its attractive features. A drawback of SP is a comparatively long computation time for a single placement computation iteration, originally where M is the number of modules. We have shown how this quadratic complexity can be substantially reduced to using fundamentally different approaches: graph-based algorithms, and weighted longest common subsequence algorithms. Although the graph-based algorithms are inferior in a worst-case scenario, they are very suitable to be generalized into an incremental approach. Efficient near-optimal incremental placement computation algorithms have been proposed and implemented. Experimental results demonstrated the validity of the theoretical analyses and have shown the feasibility of the incremental approach. However, we note that in a practical framework the usage of both incremental and non-incremental approaches should be considered depending on certain features of the stochastic optimization engine. Since the non-incremental approach is less sophisticated it has smaller constant factors, hidden by the big-Oh notation, as compared to the larger constant factors due to the elaborateness of the incremental algorithms. The difference, however, can be further reduced by optimizing the implementation.
124
Placement
We have shown that range and boundary constraints imposed on arbitrary modules can be easily taken into account without incurring computational complexity overhead. A modified algorithm is proposed which is easier to implement, is very robust and consistently yields significantly better solutions than those in current literature. Furthermore, it is shown how the idea of constrained module placement can be easily transferred to incremental placement computation.
Chapter 7
Routing This chapter covers several aspects related to routing. The routing process is that part of the physical design step with the task of laying out the interconnect between preplaced geometrical structures, as defined in a point-to-point manner by the circuit netlist. The interconnect consisting of wires and vias of a certain net, is also called the routing of that net. Placement without considering routing in a proper qualitative manner makes only sense in connection with designs where the quality of interconnect has negligible effect on system performance. Normally, these are low-performance designs. Contemporary state-of-the-art mixed-signal integrated circuits require high-quality layouts which are robust in any sense. With increasingly higher operating frequencies into the gigahertz range, and feature sizes going far into the ultra-deep submicron range, routing issues are becoming indisputably dominant. As a consequence, placement should take all quality aspects connected with routing into account. Unfortunately, a routing cannot be computed without having an idea of where the objects that have to be routed are located. But then, how can we find a good routingaware placement? This problem asks naturally for an iterative approach. See Chapter 4 for a global overview of how placement and routing are integrated into an iterative optimization framework. First, the routing problem is defined exactly. Then, we give a general classification of routing approaches which facilitates categorization of relevant works. This is followed by a brief discussion of relevant previous work. A brief discussion on computational complexity is presented thereafter. We proceed by defining a routing model and routing algorithms which are most promising within a mixed-signal layout generation context. Based on experimental results, we will choose for a routing heuristic that has best performance compared to other heuristics, relative to optimal routing solutions for a broad range of problem instances. The selection criteria are based on both run-time performance and routing quality. Also, emphasis is put on incremental capabilities of the adopted routing methodology consisting of a fast and effective graph-based routing heuristic in combination with an efficient irregular-grid routing model. For the chosen heuristic, we discuss extensions for incremental computations. The incremental routing heuristic then is evaluated and experimental results are reported. Finally, the overall routing methodology is integrated into the iterative optimization framework. Experimental results of the integrated placement and global routing approach are given and compared with existing state-of-the-art works. Furthermore, discrepancies in current works are exposed and discussed. The viability of the adopted methodology is further demonstrated by pinpointing and discussing areas for improvement. Finally, we end with some concluding remarks.
126
Routing
7.1
The Routing Problem
As mentioned earlier in Chapter 2, in our approach we do not allow over-the-cell routing. When a large number of metal layers is available, over-the-cell routing can be a solution to resolve routing problems, especially with regard to congestion. However, high-quality routing along module boundaries is always needed for any of the following reasons. Routing in higher level metal layers requires the use of additional vias which are expensive in terms of parasitics and yield. If intellectual property blocks are used in a module, parasitic interaction due to crossing wires with those blocks is best avoided because it is unknown how the circuit performance will be influenced. Furthermore, we assume that an interconnecting network of wires of which no wire runs over a module, and with minimal total length, is optimal. 1 Moreover, we adopt a rectilinear wiring model in which wire segments are only allowed to go in either horizontal or vertical direction. Such a network is called an obstacle-avoiding rectilinear Steiner minimal tree [Ganley, 1995, Hwang et al., 1992]. Finding such a tree is an NP-hard problem [Garey and Johnson, 1977, Hwang et al., 1992]. A standard method to represent Steiner trees uses graphs, where the nodes represent junctions, bends, or crossings and the edges represent possible wiring segments. The graph approach can be applied without loss of generality thanks to Hanan’s theorem stating that a Steiner minimal tree exists in a Hanan grid [Hanan, 1966]. The Hanan grid is a rectilinear grid in which the grid lines are induced by the pins, and their crossings form nodes in the graph. Naturally the line segments are the edges in the graph. Formally stated the global routing problem is as follows. Problem: Steiner minimal tree (SMT) in a graph (GSMT) Instance: Solutions: Minimize:
A graph G(V, E, and a set of pins that form a net and : a weight function. All trees that connect the elements of in G. where
with
The nodes in the graph G are represented by the set V. The edges in the graph, represented by set E, are undirected. The pins are also called demand nodes in this context, whereas the other nodes in V are called Steiner nodes which are candidate nodes for the trees in There are two special cases which have polynomial-time complexity. The first case is This case is also known as the single-pair shortest-path problem. Dijktra’s algorithm [Dijkstra, 1959] can be used to solve it in lime for instance with a Fibonacci heap data structure. Here and are the set of edges and nodes, respectively, which are contained in the equivalent graph enclosing all modules in the rectangular region defined by the modules attached to the 2-pin net at hand. A more efficient target-directed path search algorithm called A* algorithm can be used to find an optimal path between two pins in time proportional to the number of edges (and nodes) on the path [Nilsson, 1980, Pearl, 1984] . The other special case occurs when where all the nodes in the graph 1
Symmetry considerations can be taken into account by employing additional algorithmic steps.
7.2 Classification of Routing Approaches
127
need to be connected in a minimal sense. This is called the minimum spanning tree (MST) problem, and it can be solved in with Fibonacci heaps using Prim’s algorithm [Prim, 1957]. In appearance, Prim’s algorithm is very similar to Dijkstra’s algorithm. The fundamental difference is that the former stores the edge weights associated with candidate extension nodes on the heap, while the latter stores the path lengths associated with candidate shortest-path extension nodes on the heap. Unfortunately, all other cases are known to be NP-hard. This even holds for many conceptually simplified versions of the Steiner minimal tree problem. For instance, the Steiner minimal tree in planar rectilinear graphs is NP-hard [Hwang et al., 1992].
7.2
Classification of Routing Approaches
A routing approach consists of a routing model and a routing algorithm. The model dictates how much information is stored of the routing space, essentially defining the solution space. The algorithm defines how we search for a solution in this space. Clearly, routing model and routing algorithm are strongly correlated. Which approach and choice is best, depends on the desired routing quality in connection with a definition of an optimal routing. In the literature, a large variety of routing models and algorithms can be found. Typically, no explicit distinction is made between the two; a certain routing model is used implicitly by a given routing algorithm. From this literature we can, roughly, classify routing approaches into: single-step versus two-step approach This is a classification based on hiercharchy. The two-step approach essentially adopts a hierarchical divide-and-conquer strategy2 in order to (conceptually) simplify the problem and eventually find good solutions more efficiently 3 . regular-grid versus irregular-grid approach This is a classification based on the efficiency and effectivity of information representation; higher efficiency means less redundant information in the representation, and higher effectivity means that (more) higher quality solutions can be found using the representation. An advantage of the regular-grid approach is its conceptual and practical simplicity. However, it is mostly very inefficient in terms of space and time requirements. In our opinion it is advantageous to separate the routing model and the routing algorithm explicitly. Consequently, both items can be constructed, analyzed and improved separately. Also, we gain more insight into the properties of both items. In the context of mixed-signal layout generation we choose for a two-step routing approach based on an irregular grid model. The reasons for this choice are as follows (first two reasons why a 2-step approach is preferred, followed by two reasons why an irregular-grid model is preferred): 2 The difference with the classical divide-and-conquer algorithm is that we do not impose a uniform conquer strategy. 3 In fact, the classification could be generalized into multi-step versus single-step but even in those cases the bisection is dominant.
128
Routing
A two-step approach consisting of a global routing step followed by a detailed routing step enables controllable routing quality refinement which is advantageous in an iterative optimization framework such as simulated annealing. A two-step approach mitigates the problems in connection with properly handling all routing-related issues, which typically have complex interdependencies, at the same time. By introducing hierarchy, the problems can be made easier manageable. A routing strategy requires a routing model which has as little redundant information as possible and, at the same time, guarantees the existence of an optimal solution. An overall efficient integrated placement and routing approach requires a lowcomplexity algorithm to compute necessary routing information from a given placement. Furthermore, a very important additional requirement is an efficient update mechanism of (small) dynamic changes in the graph. For instance, after a small change in a placement due to a perturbation operation, it would be a waste of computation time to re-compute the whole routing graph again from scratch. Clearly, a significant gain is possible if the routing graph can be efficiently updated in an incremental sense. Below, the classifications based on routing hierarchy and routing model are discussed more in depth. Note that a choice with regard to the routing hierarchy is essentially independent of a choice with respect to the routing model.
7.2.1
Routing Hierarchy
At a high level, essentially two different approaches exist to accomplish routing of a circuit. These approaches are extremes in a hierarchical sense. In practice, a suitable combination of these approaches is well imaginable to find a better routing. Single-step Routing A non-hierarchical single-step routing approach performs routing including exact determination of the wire segments in the plane. We define single-step routing to be synonymous with area routing. Although area routing is commonly associated with maze routing, we make a strict distinction between the two in this book.4 Area routing is defined as detailed routing on a global basis. Hence, the underlying routing model is not specified. A specific implementation of an area router could therefore use a regular-grid routing model or a tile-based routing model. The latter is a generalization of the former model in the sense that each tile can be associated with a set of grid elements. Furthermore, maze routing is defined as a routing approach based on a regular-grid model. Essentially, the actual routing algorithm is not specified. Usually, a shortest-path(s)-like algorithm is employed in connection with maze routing. Maze routing can be employed on a local as well as on a global level. Due to the fact that a single-phase routing approach lacks a global view of the routing problem, problems can easily arise. We name a few problems which are especially eminent in our mixed-signal layout generation framework. 4 Lengauer [Lengauer, 1990] already noted that the distinction between area routing and some detailed routing approaches can become very fuzzy.
7.2 Classification of Routing Approaches
129
It is very difficult to predict whether or not a net is routable with specified quality margins, due to previously routed nets which form obstacles for succeeding nets. Consequently, it is almost impossible to solve the routing problem adequately, i.e. to find near-optimal solutions for all nets with respect to the input specifications, for most but the simplest problem instances. It is difficult to evenly spread the wires over the chip area while at the same time targeting good solutions for all nets. Computational complexity increases very rapidly due to forced ripup-and-reroute strategies in order to achieve compliance with specifications. Furthermore, the computational complexity is very hard to analyze and bound. From an algorithmic point of view, it is difficult to comprehend the impact on the output of a routing algorithm as a function of tunable algorithmic parameters. As a consequence, possible improvements are based mainly on a trial-and-error basis, which can overshadow possible fundamental improvements based on scientific insight. As a side note it is questionnable whether fixating on details, while the global line has not been formulated yet, is a fundamentally sound approach. We argued that for large circuits the area routing approach is infeasible because of the complexities that are involved in managing all details on a global level. However, regional area routing approaches can give excellent results when the size of the area that is routed in a single phase is limited [Tseng, 1997]. A practical implication of this fact, is the importance of defining manageable regions for area routing. Essentially, this is accomplished by means of hierarchy. Two-step Routing An approach that uses hierarchy to split the overall problem into conceptually easier to grasp sub-problems is the well-known two-step routing approach consisting of global routing followed by a detailed routing step. Important advantages of two-step routing are as follows. The optimization of global routes is conceptually simpler without considering detailed routing aspects and the optimization of detailed routes is conceptually simpler without considering global aspects. Algorithm design and analysis is simpler for spatially confined detailed routing problems. Moreover, the quality of routing results is easier assessed as a function of algorithmic features. In the context of an iterative optimization framework, the possibility to trade off runtime against solution quality is of paramount importance; typically, detailed routing becomes increasingly more important when the global routes are approaching the status of being “good enough”. An example scenario is appropriate to illustrate the idea of hierarchical routing. Imagine that the total layout area is divided into segments by overlaying a grid on the plane. During global routing it is determined through which grid cells of the layout area, a global route will go. Every grid cell has an associated capacity which limits the maximum number of
130
Routing
global routes that can pass through that cell 5 . The actual number of routes through a cell is called the demand. When the demand is larger than the capacity, we speak of routing congestion. Typically, more crucial nets are routed first, followed by less important nets, in order to satisfy imposed timing or wire-length constraints. After all global routing congestion has been resolved the detailed routing step is started. During detailed routing, the actual geometric location of each wire is computed, guided by the global routing information. Generally, channel and switchbox generation is needed before the detailed routing step. It can occur that detailed routing is impossible with the given placement and global routing information. In such cases some of the global routes are removed and, with adjusted constraints, re-routed. This classical approach is called ripup-and-reroute. We stress here that the aforementioned hierarchical approach serves only to illustrate the idea and is not the approach we advocate in this book. From the previous discussion it is clear that the best choice for mixed-signal layout generation is a two-step approach. Note that the routing model that is used in both steps need not be equal.
7.2.2
Routing Model
Depending on the desired routing accuracy, affordable computation time, affordable implementation effort, and the definition of the optimal solution, a certain routing approach can be employed. For the implementation of any routing approach, a representation of the routing information, i.e. the routing model, is required. From an information representation point of view these models can be classified into irregular-grid-based and regular-grid-based. Hereafter, in-depth information on these routing models is given, illustrated by the use within a typical routing environment. Regular-Grid Routing Model Plane-based routing uses the 2-dimensional plane as its routing space. Usually, only integer coordinates are allowed, avoiding numerical problems, without adversely affecting the routing performance or limiting algorithmic flexibility. As a consequence, the relevant points in such a plane lie on a regular grid. This grid can be used directly to perform routing, by searching for a sequence of grid points which interconnect the pins of a net. The grid can also be used to find congestion-avoiding routing solutions by associating each grid-line segment with a certain capacity which depends on how many routes we allow to cross that segment. The size of a single grid tile can be adapted according to, for instance, some hierarchical scheme. Usually, the congestion-minimization problem is cast into some kind of flow problem which is modeled by a graph [Chiang et al., 1990]. A well-known routing strategy based on a grid, is due to Lee [Lee, 1961]. The associated routing algorithm, is also called a maze router. The main drawback of maze routing is its high memory usage and large computational complexity. In terms of an grid, the memory usage is and the time complexity is at best when a linear-time algorithm is used to compute single-source shortest paths [Kanchanasut, 1994]. 5
Equivalently, we can treat each cell as a node, with adjacent cells inducing corresponding edges in a (grid) graph, and apply the same ideas.
7.3 Previous Work
131
An advantage of maze routing is that it is precise in terms of wire locations. The latter renders maze routing also a useful candidate for detailed routing. The plane space can also be used directly to find solutions to a routing problem, without considering each grid point or grid tile on a (partial) path separately. Finding an interconnecting network for multi-pin nets, can be performed by solving the rectilinear Steiner minimal tree (RSMT) problem. Each (grid) point in the plane is then a candidate Steiner node. Hanan’s theorem tells us that it suffices to consider only the grid points which overlap with the Hanan grid [Hanan, 1966]. Since the RSMT problem is NP-hard [Garey and Johnson, 1977], we can also use an approximation algorithm to approximate an RSMT in where is the number of pins to be connected [Matsumoto et al., 1991]. Note that the complexity is independent of the grid size. Because the RSMT only considers pin locations and Hanan grid points (which generally have little in common with modules in the plane), it ignores obstacles (represented by modules) in the plane. Therefore, it violates our non-over-the-cell routing requirement. However, as we shall see in Section 7.7 RSMT routing solutions are typically less than 6% away from obstacle-avoiding routing solutions for a wide range of routing instances. Irregular-Grid Routing Model Motivated by the massive memory requirements of grid-based routing and the inability of plane-based routing to take obstacles into account (while guaranteeing a (near) optimal solution if it exists), researchers have thought of ways to minimize memory usage without precluding optimal or near-optimal routing solutions. Routing based on graphs, has been shown to be very efficient in terms of computational effort. Furthermore, general graph theory is a broad and active field of research, with many useful techniques and algorithms that can be exploited. The graph-based routing approach relies on the proper definition of nodes in the graph which correspond to locations in the plane. The edges in the graph are used for routing. Each such edge represents a routing path segment which can be used for interconnecting a set of pins. Furthermore, weights can be associated with each edge to denote its importance, Manhattan length, capacity, or a combination thereof. In the extreme case where each node in the graph is a grid point and vice versa, the graph-based approach degenerates into a grid-based approach. The efficiency of the graph representation is directly affected by the efficiency of the grid representation since, typically, the computational complexity of a graphbased approach is expressed in number of nodes and edges in the graph. Intuitively it is clear that an irregular grid is more efficient than a regular grid, since the former uses a denser grid at locations in the plane where it is needed and a sparser grid at locations where it is allowed. A practical bottleneck of this non-uniform manner of information representation is to find out where and how much this grid is to be sparsified. Fortunately, efficient methods exist to perform this task, one of which is proposed in this book in Section 7.5.2. As a result, an irregular-grid routing approach is preferred over a regular-grid routing approach.
7.3
Previous Work
The field of routing research is very broad. In recent years, high-quality routing has attracted increasingly more attention due to the importance of routing in contemporary designs. In
132
Routing
this section, we will provide a global overview of previous works in the routing field that are considered of interest to mixed-signal designs. Especially those methodologies that apply to analog circuit routing are eligible candidates for application to mixed-signal layout generation. However, we do not confine ourselves to solely analog routing methodologies, a priori, since there are no fundamental reasons why methodologies used in the digital domain cannot be useful in the mixed-signal domain. Typical approaches in analog routing are based on area routing. That is, the exact wiring pattern of each net is determined, taking into account previously routed nets and analog constraints such as parasitic resistances, capacitances, and crosstalk. This approach is advocated in the works of Cohn et al. [Cohn et al., 1991], Lampaert [Lampaert, 1998], and Malavasi and Sangiovanni-Vincentelli [Malavasi and Sangiovanni-Vincentelli, 1993]. A promising approach in which the current through a wire is also taken into account to size the widths of interconnecting wires is reported by Adler and Barke [Adler and Barke, 2000]. Other area routing approaches, from a digital point of view, are described by Tseng [Tseng, 1997]. The works that advocate an essentially two-step routing approach, with a refined detailed routing step that incorporates or can incorporate timing, crosstalk, and parasitic RC, are [Sechen, 1988, Tseng, 1997, Liu and Sechen, 1999]. Generally, when a strategy is refined so as to take into account additional constraints related to performance degradation due to routing, it is denoted by performance-driven routing [Choudhury and SangiovanniVincentelli, 1990,Cong and Madden, 1998,Charbon, 1995].
7.4
Computational Complexity
We observe that most routing approaches assume that a placement of modules is given. Under these circumstances, typically, adjusting the placement to improve routing is not considered. The basic strategy is to find a placement based on some heuristic, and space the modules in such a way that a suitable routing is expected to be found. The spacing can be based on given statistics, experience in the field, or a knowledge database. As the spacing usually contains some margin, an additional compaction step is usually employed to minimize area. Since compaction in itself is a very hard problem, it is better to avoid it. And it can be avoided if a better estimation of routing resources can be obtained. Our approach does not rely on a fixed placement for which a proper routing is to be found. Instead, we attempt to find a placement in which the compaction step can be skipped because the routing fits well within the given placement. This is only possible using iteration. Due to the fact that the number of iterations in the employed simulated annealing environment can be excessively large, the computational complexity of a single routing iteration should be as low as possible without giving in too much on quality. Therefore, efficient routing heuristics with low computational complexity are needed. Of course, the quality of such a heuristic should be adequate. The computational complexity of a routing methodology depends on the routing model and the routing algorithm. Both should be efficient in order to arrive at an overall efficient routing strategy. For the sake of minimizing computational complexity, a two-step, irregulargrid approach is more suitable in an iterative optimization framework. From the previous discussion it is clear that the routing model is a key element in efficient routing. We propose a global routing model which can efficiently capture all the information
7.5 Global Routing Model
133
of global routes. A detailed routing model can then be used to exploit the gathered information to compute the exact geometries of the wires. We do not focus on detailed routing in this book, but merely mention its importance to arrive at the final layout.
7.5
Global Routing Model
The proposed global routing model is similar to the routing model used by Cohoon and Richards [Cohoon and Richards, 1988], which is called an escape graph. Conceptually, the construction of an escape graph is easy. Each module boundary segment is extended maximally either horizontally or vertically until it hits another module or the boundary of the chip area. For the example placement shown in Figure 7.1 (a), the corresponding escape graph is shown in Figure 7.1(b). We will use escape graph and global routing graph interchangeably, hereafter. Note that the global routing graph, denoted by must be
extended for each net to include the pins of that net and their escape line segments. An extended global routing graph, shown in Figure 7.1(c), is denoted by Of course, and Computing is not trivial. Cohoon and Richards proposed a line-sweep method which yields an algorithm. It is not clear from their approach how pins and their associated escape segments can be dynamically inserted and deleted from the escape graph. Since construction of a static global routing graph is of limited use in the present context, we propose a new dynamic method based on corner stitching and a hash table.
7.5.1
Model Efficiency
The efficiency of a routing model is, among others, measured by the amount of useful information it carries. In the case of a graph model, which is by far most common, the number of nodes and the number of edges account for the efficiency. In terms of storage requirements, the amount of required space is as can be concluded from the previous discussion, and
134
Routing
the complexity of constructing is It is possible to reduce the latter by using implicit connection graphs as proposed by Zheng et al. [Zheng et al., 1996], but since the routing process generally covers the complete packing space, this approach does not seem to provide any substantial gain for our purpose. Another important feature is the ability of the global routing graph to represent effective solutions, i.e. solutions which are very close or equal to an optimal solution. Cohoon and Richards already showed that an optimal shortest path between any two nodes in the escape graph always exists [Cohoon and Richards, 1988]. More recently, Ganley proved the following interesting theorem [Ganley, 1995]. Theorem 8 The extended global routing graph
contains an optimal rectilinear Steiner minimal tree.
This result is of interest when we want to find an optimal solution for a nontrivial m u l t i p i n net where the number of pins is larger than two.
7.5.2
Global Routing Graph Computation
The recipe to construct the global routing graph from a given placement is given in Figure 7.2. The computational complexity of step 1 is determined by the packing complexity,
which is at best for a from scratch computation (see Chapter 6). For step 2 we have to insert each module sequentially into a corner-stitching data structure. All absolute module positions are known, so we can insert the modules in using the depth-first search order of the modules in, say, the horizontal constraint graph. When all modules are put into an equivalent corner-stitching data structure, the exact locations of all line segments in the placement are known. Furthermore, due the maximally horizontal empty tile property of corner stitching, all horizontal escape line segments are generated automatically. Step 3 comprises the enumeration and segmentation of each empty and non-empty tile in the cornerstitching data structure. The required complexity is where is the sum of empty and non-empty tiles. Since there are M non-empty tiles, In the worst case but typically The underlying reason is that in a typical placement, each module will be shielded by a number of surrounding modules from the rest. As a consequence, a typical escape line affects only a limited number of surrounding tiles. Hence, the independence of M for any Therefore, sorting of the line segments
7.5 Global Routing Model
135
6 Finally, in step 4, all line segments can be can typically be performed in traversed sequentially, each pair generating at most three edges in The latter property can be easily seen from the example segmentation of the right-side segment of module 5 and the left-side segment of module 6 in Figure 7.1(a), drawn with a dotted line. The resulting edges are denoted in Figure 7.1(b) by and Insertion in the hash table requires per node, with at most four edges being implicitly represented within each node. Summarizing, the whole procedure can be performed in
7.5.3
Supporting Dynamic Changes
Changes are applied continuously in the global routing graph; either due to insertion of pins of a specific net or due to a change in the placement. Consequently, updating the global routing graph after such a change occurs, should be done in an efficient manner. Dynamic Net Change One of the difficulties in dynamically maintaining a global routing graph is due to the insertion of new pins. The deletion of pins and their escape segments is relatively easy. As we know where an existing pin is located, we can iteratively delete an escape line segment and its incident nodes until we hit an obstacle. Figure 7.3 shows graphically what happens during deletion of a pin. Suppose pin was inserted and caused the creation of the escape line
segments (edges): (Figure 7.3(a)). When we want to delete pin first its incident escape edge has to be deleted, i.e. edge Then we can delete the node associated with pin After deletion of nodes, the two perpendicular edges if they exist, must be joined again into a single edge, unless the node at position must remain there because it is a module corner point or induced by some module corner. If, after deleting we have not hit an 6 Theoretically, this can be reduced to with a sorting algorithm such as bucket sort [Cormen et al., 1990], under the assumption that the distribution of the line segments has a (uniformly) random behavior. This results in a linear overall complexity of the algorithm.
136
Routing
obstacle yet, we delete another edge and its (lower) incident node (Figure 7.3(b)). Finally, after deleting edge and its (lower) incident node, we notice that the other incident node is connected to a module. Thus, an obstacle has been found and the final node is deleted, after which the split edge and is restored again to its original condition (Figure 7.3(c)). Insertion of a pin and its escape line segments into requires much more thought since we do not know where an escape segment might cross another line (edge). At such a crossing, the edge needs to be split and a node has to be inserted. Fortunately, the corner-stitching data structure can mitigate the problem because it can find a closest point in with a hint 7 , and going to a neighboring edge also takes Thus, constructing after inserting a pin requires complexity essentially proportional to the number of escape line segments induced by the pin. And, following the same line of reasoning as above, this number is typically a constant. The total computational complexity required for performing all insertion and deletion steps is proportional to the number of generated escape line segments during insertion of the pin. From experiments with randomly generated placements, we found out that the number of nodes is usually smaller than 15 times the total number of modules in the placement, and that the number of edges is usually smaller than 30 times the number of modules. Since randomly generated placements are normally quite sparse (containing a lot of slack space), global routing graphs associated with optimized placements are much smaller. Figure 7.4 gives an impression of the number of nodes and edges in global routing graphs derived from randomly generated placements for a wide range of instance sizes. It is clear from Figure 7.4(a) that the
number of nodes and edges in increases linearly with the placement instance size M. Furthermore, Figure 7.4(b) shows that the average ratio of number of edges over number of nodes becomes approximately 2 for increasing M. This observation is important enough to put in a Claim. 7
In this case a good hint would be the pointer to the last found module.
7.6 Global Routing Algorithms Claim 1 The size of the global routing graph modules M in a random packing.
137
depends, on average, linearly on the number of
A direct consequence is stated by the following Corollary. Corollary 3 The number of nodes and edges Ê in a connected subgraph of the number of modules in the associated confined routing region.
is a linear function of
Based on these results, we may conclude that pin insertion and pin deletion (including escape line segment processing) takes essentially constant time, on average. Dynamic Placement Change A concern in a dynamic environment where placements change often, is the complexity of updating after a placement change. Fortunately, the corner-stitching (CS) data structure allows for dynamically inserting and deleting modules in an efficient way. Clearly can be updated directly when a module is inserted into or deleted from the CS data structure. This requires some localized operations on which takes essentially per affected module with the aid of corner-stitching operations. A way to minimize computational complexity is to use a double-wave technique. The first wave clears the way for the second wave by deleting all affected modules. As soon as enough space has been cleared by the first wave, the second wave rebuilds the placement by inserting modules. This idea is shown in Figure 7.5. It is clear that the total update complexity is (#affected modules).
7.6
Global Routing Algorithms
Now that an accurate global routing model has been defined, we have to define how to search for a tree which connects a set of given pins in the global routing graph Ideally, this tree should be minimal in cost, which is defined to be a Steiner minimal tree in the extended
138
Routing
graph Unfortunately, finding such a tree is NP-hard [Garey and Johnson, 1977], even in a planar graph such as Since the size of a routing problem instance can he large, we have to resort to efficient heuristics which can find near-optimal solutions in as little time as possible. In the following subsections we give an overview of several existing heuristics, where we explicitly distinguish (exact) algorithms for 2-pin nets in Subsection 7.6.1 and (heuristic) algorithms for multi-pin nets, containing at least 3 pins, in Subsection 7.6.2 through Subsection 7.6.5. We also propose a few modified versions of these heuristics. The performance of these heuristics is compared with optimal solution values on a broad range of synthesized problem instances. The optimal solutions are obtained with the help of several state-of-theart programs that incorporate advanced techniques [Warme, 2000, Koch et al., 2000]. The synthesized problem instances are directly derived from actual sequence-pair-based placement results using randomly generated block sizes and nets. We choose the best-performing heuristic as the basis of a cost-constrained pin-to-pin global router. Furthermore, the heuristic is modified to incorporate incremental routing of partially changed routing segments which can be, for instance, a consequence of placement-induced changes in the routing graph.
7.6.1
Two-pin Routing Algorithms
As pointed out before, the problem of finding an optimal (shortest) pin-to-pin route is solvable in polynomial time. Several exact algorithms exist that can perform this task efficiently within the proposed routing graph model. Probably the most widely known algorithm is Dijkstra’s single-source-shortest-paths (SSSP) algorithm [Dijkstra, 1959], which has complexity on planar graphs. A more efficient algorithm, and provably optimal in a wide sense, is the A* algorithm. This algorithm was originally introduced by Nilsson [Nilsson, 1980] and further elaborated by Pearl [Pearl, 1984]. The essential differences between the SSSP and A* algorithms are as follows. The SSSP algorithm finds shortest paths between a single source pin and all (reachable) other nodes in the routing graph, whereas the A* algorithm finds a shortest path between a single source pin and a single target pin of a 2-pin net, thereby avoiding the exploration of a huge amount of irrelevant nodes. The SSSP algorithm does not use a priori information on the location of the target pin, whereas the A* algorithm owes its efficiency to its target awareness. We discuss both aforementioned algorithms briefly because they are used quite extensively in our framework. For instance, in the multi-pin net routing algorithms, two-pin routing problems are encountered and solved iteratively. Details will be given shortly. Single-Source-Shortest-Paths (SSSP) Algorithm A general-purpose version of Dijkstra’s SSSP algorithm explores each node in the graph that is input to the algorithm. In case of a pin-to-pin path search problem it makes sense to stop when a shortest path connecting the two pins has been found. Therefore, we propose a modified version of Dijkstra’s algorithm which is shown in Figure 7.6. The difference is essentially due to the confinement of routing space. Whereas Dijkstra’s original algorithm
7.6 Global Routing Algorithms
139
processes all nodes and edges in the graph, our modified algorithm processes only the nodes and edges within the a priori defined relevant region. Hence, the computation time is significantly reduced. In the remainder of this book we will refer to the modified SSSP algorithm, simply as the SSSP algorithm, unless explicitly noted otherwise. First, we describe the basic operation of the algorithm on a more intuitive level. Thereafter we give a more formal description. Assume all node distances are initially set to a very large value. We start exploring the graph from the predefined source node. Each incident (weighted) edge is explored and the connecting nodes are conditionally put in a priority queue, keyed by their distance from the source node. We continue this process with the cheapest (shortest-distance) node extracted from the priority queue. Note that the extracted node might be a node that was discovered long before the expansion of the last node. One can see that the exploration of the nodes and edges occurs in a manner similar to an outward wave propagation induced by a falling drop of water at the source node location. The algorithm stops when the wave front hits the target node and a shortest path between source and target node has been established.
We proceed with a more formal discussion of the (modified) single-source-shortest-paths algorithm given in Figure 7.6. All nodes in the graph are initially set to a very large distance value, except for the source pin which has its distance value set to zero at line 2. On line 3 the source pin is inserted into the priority queue Q. Then in the loop from line 4 to line
140
Routing
14 the following actions are performed iteratively until either Q becomes empty or the target pin is encountered. The cheapest node in the priority queue Q is extracted for expansion using extract_min(·). Expanding a node means that all its adjacent nodes are explored. If the distance value of an adjacent node is larger than the distance value of the expanding node plus the weight of edge then node is relaxed on line 9. Relaxing a node means that its distance field is decreased to the lowest currently known distance value to that node. In case a relaxation step is performed, the relaxed node w i l l have its parent pointer set to the expanding node These parent pointers are useful for backtracking the shortest path nodes and edges when the algorithm finishes. Moreover, if a relaxed node is not in the queue yet, then it is inserted into Q on line 11. The loop from line 15 to line 25 makes sure that the found shortest path between and is indeed shortest. In order to guarantee this, the algorithm extracts the nodes in the queue which could possibly lead to a shorter path. When a node is extracted from Q with distance not smaller than the current distance value of it is clear that no improvement is possible and the algorithm can stop. It can be verified that when the weight function is positive, which is the ease when the weight of an edge is equal to its Euclidian length, the algorithm guarantees to f i n d an optimal path. Furthermore, the worst-case computational complexity is [Cormen et al., 1990]. A* Algorithm The A* algorithm is essentially a generalized best-first search strategy. These type of algorithms are also called labeling algorithms because during the algorithm execution, status labels are attached to a node. If a node is candidate for expansion, then we label it with OPEN. If a node has been expanded it is labeled with CLOSED. However, labeling is not strictly required in an implementation, because making a node part of a set can be effectively equivalent to labeling. But for the sake of clarity and ease of discussion, labels may be very useful. We will use labeling wherever appropriate. In the rest of this book we assume the global routing graph is connected. Before discussing A* we will state some definitions. The cheapest cost of a path between two nodes and is defined by In general we will use * to indicate a function which yields an optimal value, and ^ to indicate an estimating function. The algorithmic steps of A* are shown in Figure 7.7. The A* algorithm operates as follows. Initially, all nodes are labeled INITIAL and the backtracking parent fields of all nodes are set to The algorithm starts by labeling the source node OPEN and puts it in queue Q. By definition, all elements in Q are labeled OPEN. Then the algorithm proceeds by selecting the best node, i.e. the node with smallest distance value from Q, in the sense defined by
where is the sum of edge costs along the current path of pointers from
to the source
node is the estimate of the cheapest cost of paths going from node
to the target node
7.6 Global Routing Algorithms
141
If the selected node is the target node then we have found an optimal path and the algorithm will terminate. If then all neighboring nodes of are evaluated with respect to their current shortest path distance to and their estimated distance to and, if appropriate, backtracking information is updated and nodes are (re-)inserted into the queue Q (or, equivalently, labeled with OPEN ). This procedure is repeated until the target node is found, which is guaranteed. If we choose then (by definition) A* is admissible, i.e. it is guaranteed that A* will yield an optimal solution [Nilsson, 1980,Pearl, 1984]. However, if we want to measure A*’s effectiveness by its ability to exclude as many nodes as possible from expansion, then admissibility alone is not sufficient. As proven in [Pearl, 1984, Dechter and Pearl, 1985], A* never reopens a CLOSED node under the following consistency condition:
Note that consistency implies admissibility. If (7.2) holds, we have the following useful property [Nilsson, 1980]:
which means that the backtracking path from every CLOSED node to the source node least cost path. For our rectangle-packing-derived global routing graph, we use
is a
where denotes the Manhattan distance between the target node and Note that the distance measure need not necessarily be Euclidian or Manhattan in general. Actually, researchers tend to use a different (nonlinear) metric based on congestion and specific circuit
142
Routing
constraints [Malavasi and Sangiovanni-Vincentelli, 1993, Tseng, 1997]. However, usually this violates the consistency property, thus enlarging the practical average complexity of A*. It can be verified (using case distinctions) that (7.4) preserves consistency. The average computational complexity of the A* algorithm is substantially better than the SSSP algorithm. Typically, the complexity is proportional to the total number of nodes on the shortest path between the routed pins. In case we have modules in the relevant routing region, this results in average computational complexity using Claim 1 and Corollary 3.8 Although A* is better than SSSP in terms of computational complexity, it is not always suitable for finding a 2-pin shortest path. For instance, in cases where the location of the target node is unknown, A* cannot be used and we have to resort to the SSSP algorithm.
7.6.2
Minimal Bounding Box (MBB) Routing
For completeness and comparison purposes a commonly used global routing algorithm is discussed here. The algorithm is very simply as it estimates the global routing length of a net simply by taking the half perimeter of the smallest rectangle that encloses all pins of the net. This method is also called minimal-bounding-box (MBB) routing estimation. Clearly, this estimated value is an absolute lower bound on the length of the interconnect. It is also obvious that the actual error with respect to the optimal value can be very large. Two illustrative examples are shown in Figure 7.8. In cases where over-the-cell routing is not allowed it can be seen that MBB routing estimation yields poor results in general. But even if over-the-cell routing is allowed MBB routing performs poorly.
The total wire length is calculated by summing up the half-perimeter lengths for all nets. The computational complexity of this method is given by
8
We assume here that the relevant routing region has a squarish size. If this is not the case, the average computational complexity can be expected to be at most
7.6 Global Routing Algorithms
143
where is the number of pins in net Since each pin is in exactly one net, the total complexity is This computational complexity is very low. However, a major drawback of this method is poor accuracy of the estimation. Therefore, it is not appropriate for use in our optimization framework in which controlled speed and accuracy are of utmost importance.
7.6.3
Minimum Spanning Tree (MST) Routing
An improvement in routing quality over the minimal-bounding-box estimation method is the computation of a minimal spanning tree (MST) in the plane of all pins in a given net. Note that this problem is fundamentally different from the Steiner minimal tree problem because no Steiner points are used. Similar to the Steiner minimal tree (SMT) problem in the plane, the minimal spanning tree problem ignores obstacles, which are the modules in the placement problem. A fundamental difference between the SMT problem in the plane and the MST problem is that the latter can be solved in polynomial time while the former is NP-hard. We do give in on solution quality because the MST algorithm disregards possible improvements in tree length by using maximally overlapping subtrees. Another way to show the superficial analogy is to observe that the minimum spanning tree (MST) of the set of pins in a net in a graph is called a Steiner minimal tree (in a graph). And as discussed before, finding such a tree is an NP-hard problem. Fortunately, the latter problem can be transformed in polynomial time to an easier to solve minimum spanning tree problem that is independent of the original graph size. Of course, the solution of the latter problem is not an optimum, but normally not too bad. The approach is as follows. Regard the set of pins as the nodes in a complete graph G(V, E, i.e. a graph in which each node is connected to every other node, with each edge in the graph equal to the Manhatten distance between the nodes it connects. Formally this is written as
where is the of node and is the of node As we now have a graph in which the number of pins to be connected is equal to the number of nodes in the graph, Prim’s minimum spanning tree algorithm can find a minimal tree connecting all nodes in which is also written as because G is complete. In terms of all nets to be routed the total computational complexity is
Note that the above procedure does not directly yield a solution in the form of a subgraph of the original graph. Additional steps must be taken for this. Very recently, an efficient line sweep algorithm was introduced which computes a rectilinear MST of points in the plane in [Zhou et al., 2001], without the use of Delaunay triangulation. The latter is a well-known method for Euclidian MST computation in but it not well defined for the Manhattan distance measure. Unlike the MBB estimation method, the MST error is bounded. Hwang [Hwang, 1976] proved that the ratio of rectilinear MST cost over rectilinear SMT cost is never more than 3/2. However, experimental results reported in et al., 2000] indicate that the difference between rectilinear MST cost and a
144
Routing
solution produced by a good rectilinear Steiner minimal tree heuristic, is more than 10% on average. This implies that the difference between the MST cost and the cost of an optimal routing solution in a graph is significantly more than 10%.
7.6.4
Path-Based Routing
The path-based routing heuristics grow a tree from a given node in the routing graph by sequentially adding a path to a pin until the tree spans all pins. The expansion is typically based on iterative addition of a shortest path between any node already in the tree and any pin not yet in the tree. There are many possibilities with respect to searching for a specific pin and connecting that pin to the tree built so far. Hence, the large number of path-based heuristics that exist in current literature. We will only cover the most interesting ones from our point of view. Although some differences between the given path-based heuristics are rather subtle, we demonstrate in Section 7.7 that the routing results can differ significantly. We start with the original shortest-paths heuristic (SPH) introduced by Takahashi and Matsuyama [Takahashi and Matsuyama, 1980]. A greedy variant is proposed next, which is called the shortest-paths-based heuristic I (SPBH_I). Another variant which uses a constanttime distance estimation technique is denoted by shortest-paths-based heuristic II (SPBH_II). Finally, we discuss a general method to improve the solution quality of any heuristic by repetition with different starting points. Of course, this occurs at the cost of increased computation time. Shortest-Paths Heuristic (SPH) Similar to the way in which Prim’s minimum spanning tree algorithm constructively adds edges to a shortest spanning tree of all nodes in a graph, the shortest-paths heuristic (SPH) of [Takahashi and Matsuyama, 1980] explores shortest path segments leading to pins. The shortest-paths tree is constructed iteratively by adding a shortest path from the current tree to a closest unconnected pin. The current tree can also be viewed as a virtual node to which the closest pin is to be connected. When all pins have been connected this way, the heuristic finishes and returns the tree it found. Let denote a path whose cost is minimal among all shortest paths from nodes in W to pin where and Denote the cost of by Furthermore, let the set of pins be Z. The essence of the algorithm is given in Figure 7.9.
7.6 Global Routing Algorithms
145
As noted by Rayward-Smith and Clare [Ray ward-Smith and Clare, 1986], the final shortestpaths tree can be improved by two additional steps. 4. Determine a minimum spanning tree for the sub-network of
induced by the nodes in
5. Delete from this minimum spanning tree all non-pins of degree 1 in a sequential manner.
Although the last two steps can improve the quality of the solution, it imposes a substantial increase in computation time. From experiments, we found that the improvement is usually negligible in our framework. The underlying reason is that the sub-network induced by the nodes of is not likely to contain better solutions if the pins lie relatively far from each other (in terms of intermediate nodes) and the number of alternative equally good paths is small. Generally, the shortest-paths heuristic yields good results [Hwang et al., 1992, Winter and Smith, 1992]. Furthermore, the worst-case computational complexity is
for a single net with pins. With the knowledge that for a planar graph the relationship holds, the worst-case computational complexity for all nets can be written
as
Typically the complexity is significantly lower because the addition of a new pin to the Steiner tree normally induces only a relatively small amount of nodes, proportional to the number of nodes in the shortest path segment, to be re-processed. Furthermore, the error ratio9 is where is the number of pins in a net. Shortest-Paths-Based Heuristic I (SPBH_I) The aforementioned shortest-paths heuristic does not add a pin to the currently built tree as soon as it finds a new pin. Instead, it makes sure that the path it adds to the current tree is indeed a shortest one over all possible paths from the current tree to this pin. A way to guarantee this condition is to postpone the addition of the currently found path to pin until all edges connected to have been explored, which implies that no improvement in path length is possible from the current tree to pin We propose a modification of the aforementioned algorithm which entails adding a shortest path to a pin as soon as we encounter this pin. This is essentially a greedy approach. We name this algorithm the shortest-paths-based heuristic I (SPBH_I). Furthermore, we observe that an implementation of the shortest-paths heuristic of Figure 7.9 typically uses a priority queue to store the candidate nodes before extracting the cheapest ones (one at a time) during the pin search. As proven by Huijbregts [Huijbregts, 1996, Corollary 4.1], upon first extraction of a candidate node from the queue to reach pin the actual shortest path to is established through some node that resides in the queue which not necessarily is the first extracted one to reach Therefore, a greedy approach does not comply to the SPH condition of (7.7). In Figure 7.10 the algorithmic steps are shown which feature SPBH_I. The description of 9
The error-ratio is defined to be the quotient of the worst-case solution quality and the optimal solution quality.
146
Routing
the algorithm in Figure 7.10 is self-explanatory. It is clear that the computational complexity of SPBH_I is never more than that of the original shortest-paths heuristic. Figure 7.11 shows an example which demonstrates the different strategies of SPH and SPBH_I. The purpose of this example is to show that the greedy behavior of SPBH_I can yield worse solutions than the non-greedy behavior of SPH. However, as we will see from the experimental results, the average solution quality of SPBH_I is significantly better than that of SPH for a wide range of problem instances. The explanation of Figure 7.11is as follows. We want to connect all
solid black circles which constitute the pins of this net. The white circles are regular nodes. We start with pin and Figure 7.1l(a) shows the snapshot of the situation in which pins and have just been processed and added to the shortest-paths tree T. Consequently,
7.6 Global Routing Algorithms
147
all pins and nodes in T reside in the priority queue with their distance (key) values set to zero. Subsequently, the cheapest element is extracted from the queue and that node (or pin) is expanded. First, all nodes and pins in tree T are extracted from the queue since they have a zero key value. During this process, pins and are also extracted and expanded. As a result, nodes and will be explored and put in the queue keyed by their distance from tree T. Thus, node has key 13 and node has key 5. This situation is shown in Figure 7.11(b) where also the backtracking arrows are drawn. Both, and only, nodes and reside in the queue at this moment with keys 13 and 5, respectively. Therefore, node is extracted and expanded. We find pin at distance 56 from Algorithm SPBH_I stops at this point and returns the routing solution as depicted by thick lines in Figure 7.11(c). However, algorithm SPH will not stop once is encountered for the first time. Instead, it will find a shortest path via node because 12 + 5 > 13 + 3, where 12 + 5 is the length of path and 13 + 3 is the length of path Clearly, the routing solution found by algorithm SPH is better than the one found by algorithm SPBH_I. Despite of this pessimistic scenerio, the average performance of SPBH_I compared to SPH in terms of routing quality is remarkably good, as will be shown in Section 7.7. Shortest-Paths-Based Heuristic II (SPBH_II) Another variant of the original shortest-paths heuristic is as follows. Instead of determining a closest pin to the current tree by means of exploring adjacent nodes in an iterative manner, we estimate the distance to a closest-distance pin using the Manhattan distance measure. We call this variant of the original shortest-paths heuristic the shortest-paths-based heuristic II (SPBH_II). The algorithm is given in Figure 7.12. By we denote the minimal
Manhattan distance between all pairs of nodes A significant improvement in run-time is obtained if the A * point-to-point routing algorithm [Nilsson, 1980, Pearl, 1984] is used instead of Dijkstra’s point-to-multipoint algorithm [Dijkstra, 1959]. The shortest-pathsbased heuristic II is somewhat more expensive, with respect to computational complexity, than SPH although the A* algorithm is fundamentally faster than any other shortest-path algorithm. The underlying reason is that a straightforward implementation of the proposed algorithm processes all nodes in the currently built tree T when evaluating Consequently, the actual computational complexity is
148
Routing
which is clearly upper-bounded by
In case the pins are spread over the entire routing graph we obtain a worst-case scenario. However, in most cases the routing region of interest can be confined and is therefore substantially smaller. This implies that Note that in virtually all cases Repetitive Shortest-Paths Heuristics For comparison purposes we also cover repetitive shortest-paths heuristics. A repetitive heuristic is not innovative in the sense that it makes use of a clever technique. On the contrary, it repeats a known heuristic over a given set of initial conditions. The solution of the repetitive heuristic is then the best solution returned by the underlying shortest-paths heuristic with a certain initial condition. Essentially, the class of repetitive variants is quite broad since in each heuristic some choice is made at some point in the algorithm. This can be an arbitrary choice, a greedy choice, or a choice based on some heuristic. Of course, a repetitive approach is only practical if a significant improvement in routing quality is obtained at the cost of a (preferably small) increase in run-time performance. The trade-off between the two depends on the application. When naively implemented, the overall run-time of a repetitive shortest-paths heuristic grows linearly with the number of repetitions. Therefore, a substantial amount of effort must be spent on clever techniques that exploit the incremental philosophy; only re-compute information when it is strictly necessary. From our point of view, the foremost reason for studying repetitive heuristics is to assess the practical improvement in routing quality. If the improvements are substantial, it proves worthwhile to consider the design of an efficient repetitive heuristic. Winter and Smith [Winter and Smith, 1992] have proposed a class of repetitive shortestpaths heuristics and conducted extensive experiments with them. Summarizing, this class consists of the following variants: SPH-N: determine
times, each time beginning with a different pin.
SPH-V: determine
times, each time beginning with a different node.
SPH-zN: determine a fixed pin to another pin SPH-NN: determine between a different pair of pins.
times, each time beginning with a shortest path from each time beginning with a shortest path
The investigations of Winter and Smith showed that the SPH-V and SPH-NN perform particularly well on all instances from a large set of randomly generated problem instances. The quality improvements are of course paid for by longer computation times. A possible method to reduce computation time while preserving routing quality is by heuristic identification of a good start. This could for instance be accomplished by close (visual) examination of many practical routing results. However, this falls outside the scope of this book.
7.6 Global Routing Algorithms
7.6.5
149
Node-Based Routing
The major difficulty when solving the Steiner minimal tree problem is to identify non-pins 10 that must be included in the tree to arrive at an optimal solution. Once these non-pins are given, the Steiner minimal tree can be found easily; it is a minimum spanning tree of the subnetwork induced by the pins and selected non-pins. The general idea behind so-called node-based heuristics is to identify good non-pins quickly. This stands in contrast to the earlier described path-based heuristics, where the idea is to identify a good path quickly. The average-distance heuristic (ADH) is a node-based heuristic, which is discussed in this book for two reasons. First, from experiments conducted by various researchers, ADH seems to perform better than shortest-paths heuristics on various kinds of graphs [Hwang et al., 1992]. Therefore, it is interesting to verify if this is also the case for our type of sparse (routing) graphs. Second, it is good to get an impression of how different types of routing heuristics differ in performance within our framework; both from the perspective of routing quality and computation times. Average-Distance Heuristic (ADH) The average-distance heuristic is a promising algorithm which was first proposed by Rayward-Smith [Rayward-Smith, 1983]. The heuristic is given in Figure 7.13.
Conceptually, the algorithm works as follows. We start with a set of unconnected pins. The idea is now to constructively connect pins to each other such that the number of unconnected pins is decreased. Since we connect two pins during each iteration, after a finite number of iterations all pins are connected. ADH distinguishes itself in the choice of which subtrees, each containing at least one pin, to connect and how they should be connected. The averagedistance node is the node which has the smallest average distance to all currently constructed subtrees. After such a node is computed, the closest subtree and the second-closest subtree to that node are connected via two paths originating from the average-distance node. As a consequence of this step, both subtrees are merged into a single subtree connecting all pins it 10
Non-pins are nodes which are not pins.
150
Routing
contains. The average-distance node is computed again, and the previous steps are repeated until all subtrees are connected. Two additional steps, identical to step 4 and 5 of the shortest-paths heuristic, can be applied to further improve the solution [Rayward-Smith and Clare, 1986]. Formally, we can define the average distance function by
where is the shortest-path distance between node and the currently built subtree Furthermore, is the iteration index as defined in the algorithm above, and is the total number of pins. A fast implementation of ADH is due to Chang and Lee [Chang, 1994]. Their main contribution is the identification of circumstances in which more than two subtrees can be joined together in a single iteration, hence reducing the total number of iterations. Nonetheless, the computational complexity of ADH is dominated by the evaluation of the average-distance function (7.11) during each iteration. It can be verified that ADH has computational complexity for planar graphs [Rayward-Smith and Clare, 1986]. Note that this complexity is a function of the total (confined) routing graph size and not the number of pins. A significant portion of this complexity can be attributed to the computation of shortest paths. Motivated by the urge to reduce computational complexity, we propose a modified version of ADH hereafter, which we call the average-distance-based heuristic (ADBH). Average-Distance-Based Heuristic (ADBH) The essential difference between the previously discussed ADH and the modified algorithm which we propose here, is the use of the Manhattan distance measure as used in
instead of the shortest-path distance measure used in (7.11). In addition, we use the A* algorithm to find an actual shortest path between a source and a target node. By virtue of faster Manhattan distance approximation and the use of the A* algorithm to find a shortest path, we can reduce the worst-case total computational complexity somewhat. This can be seen as follows. The time taken by the operation to f i n d the Manhattan distance between a node and a subtree is proportional to the number of nodes in this subtree. The maximum size of a subtree is of course never larger than the total number of nodes in the entire graph. Since all subtrees are disjunct, the total complexity for evaluating the distance from a given node to all subtrees is Function is evaluated times for all nodes not in to all nodes in the subtrees. Furthermore, the addition of the best average-distance node and the (two) paths leading to that node takes in the worst case. This is done times. Consequently, the total computational complexity is Note that and normally does not depend on V. Because of this approximation it makes more sense to compare the results of ADBH with SPBH_II instead of comparing it to the original SPH.
7.7 Benchmarking of Heuristics in Our Routing Model
151
As a final remark, we note that the average distance of a node in G only changes due to the change in distance to a merged (and extended) subtree This fact can be exploited to yield a more efficient overall algorithm. However, this point is not explored in this book.
7.7
Benchmarking of Heuristics in Our Routing Model
In this section we compare the following routing heuristics with respect to computational complexity and routing quality 11 : minimal bounding box (MBB), shortest-paths heuristic (SPH), shortest-path-based heuristic I (SPBH_I), shortest-path-based heuristic II (SPBH_II), and average-distance-based heuristic (ADBH). Since worst-case performance of a heuristic can give an overly pessimistic indication of practical performance, and different heuristics perform differently on different problem instances, it is necessary to experimentally evaluate these heuristics on a set of representative problem instances. We first define the problem instances which we use to benchmark the routing heuristics. Then we evaluate these heuristics with respect to the following points: solution cost, percentage deviation from the optimal solution cost, computation time. From these values we can derive some implications with respect to the following issues: worst-case solution quality as implied by the error ratio versus practical solution quality; computational complexity versus practical performance; Finally, we draw some conclusions.
7.7.1
Benchmark Problem Instances
From the point of truly integrated placement and global routing, a natural requirement on the performance of a routing heuristic is that it performs well (on average) on a routing graph derived from a representative placement. During the initial phase of optimization, random placements are generated. Generally, a randomly generated placement is sparse, meaning that it contains a large amount of unoccupied space. It is quite plausible to assume that a 11
In fact, we have implemented and evaluated more heuristics but the given set is sufficient to show the essence of our results.
152
Routing
routing graph derived from such a placement is certainly not easier to deal with than a routing graph derived from a very compact placement. For one, the graph is usually much larger. Hence, we may conclude that a heuristic which performs well (on average) on a broad set of “difficult” graphs, will also perform well on the easier graphs which are generated during the final stages of placement optimization. Thus, the results should give a good indication of typical routing performance. Table 7.1 shows the benchmark set of global routing graph instances we have defined, along with optimal routing solutions. The numbers shown in the shaded area are best known
upper bounds at the time of writing, while the other numbers are optimal values. The problem instances are derived from placements with as few as 10 modules (lin01, lin02, lin03) to placements with 2560 modules (lin34, lin35, lin36, lin37). To give the reader a visual impression of such a problem instance, a visualization of lin23 is shown in Figure 7.14.
7.7.2
Experimental Results
The purpose of experimental evaluation of the routing heuristics is to get a good idea of the practical performance and thus the usefulness of a certain heuristic in an iterative stochastic environment. The following five heuristics are benchmarked: MBB, SPH, SPBH_I, SPBH_II, and ADBH. The MBB heuristic is the most widely used method to estimate wiring requirements in connection with iterative placement optimization.
7.7 Benchmarking of Heuristics in Our Routing Model
153
The algorithms have been implemented in C. The hardware platform is a contemporary Linux 2.4 operating system running on an Intel Pentium III 800MHz processor with 512Mbytes of RAM. All computation times are measured using the getrusage() system call. Table 7.2 shows the experimental results on the previously defined set of global routing graph instances, of several routing heuristics. It is clear from these results that MBB routing is fastest of all, but the routing estimations it provides are disastrous; not only is the deviation from optimal very large, it also varies from -6% deviation to as much as -80% deviation. Also, we can see directly that ADBH performs very poorly, which is quite surprising. Because both run-time performance and solution quality are extremely poor for ADBH, we disregard it in our further discussion. We can also see that routing algorithm SPBH_I performs best. Not only does it produce the highest quality results which are not more than 3% away from the optimum (on average), but also its computation times are very modest. Figure 7.15 shows the solution of algorithm SPBH_I to problem instance lin23. This result should be compared with the optimal solution shown in Figure 7.14. It should be noted that the shown routing solution with length 18341 deviates 4.45% from optimal. However, this is barely assessable by visual inspection. The shaded rectangles are modules; 320 in total. Furthermore, the thin black lines are unexplored edges while the thin grey lines are explored edges. We can see that the right side of the plot contains a vertical region which has
154
Routing
been left unexplored by the search wave. Based on these results which suggest SPBH_I as a most promising routing heuristic, a few additional experimental investigations are conducted. We tested the following variations
7.7 Benchmarking of Heuristics in Our Routing Model
155
of SPBH_I: ISPBH_I_ZZ: iterated version of SPBH_I but instead of starting with an arbitrary pin, we start with a shortest path between all different pair of pins | Z | · ( | Z | – l )/2 times; ISPBH_I_Z: iterated version of SPBH_I but instead of starting with an arbitrary pin, we start | Z | times with a different pin; ISPBH_I_Z_BIAS: iterated version of SPBH_I_Z but whenever an arbitrary decision needs to be taken, this decision is biased towards extending an edge in the direction of the center of gravity of the net. Table 7.3 shows the results of the experiments conducted with these algorithms. For the moment, ignore the ISPBH_I_Z_BIAS results. Shortly, we will explain why. It is clear from the results in this table that ISPBH_I_ZZ produces overall the best results with 1.4% deviation from the optimum on average. However, the computation times of ISPBH_I_ZZ are quite large, increasing rapidly with larger nets. Therefore, the faster ISPBH_I_Z is more suitable for use in an iterative framework since the differences in routing quality are not that large. An average improvement over ISPBH_I_Z is obtained by ISPBH_I_Z_BIAS without additional
156
Routing
computational overhead, by exploiting biasing information where normally arbitrary decisions are made by the algorithms. This biasing technique can also be applied to ISPBH_I_ZZ to improve the solution cost slightly, without increasing computation time. However, the computation time of ISPBH_I_ZZ is too large for the algorithm to be practical, anyway. We see that the solution quality of ISPBH_I_Z_BIAS is comparable with ISPBH_I_ZZ while the computation time is two orders of magnitude lower.
Summarizing, SPBH_I and ISPBH_I_Z_BIAS are the most promising candidates for routing in an iterative optimization framework. 12 It depends, among others, on the typical size of a net whether or not it is worth trading off computation time with solution quality. For comparison purposes it is interesting to contrast the heuristic graph SMT results with optimal rectilinear SMT (RSMT) and Euclidean SMT (ESMT) results. Recall that the RSMT and ESMT13 solutions ignore modules in the plane. Consequently, wires can run over modules which per definition is undesirable. However, the results do give a good indication of how much solution quality we lose by imposing a non-over-the-cell routing constraint. The optimal results have been obtained using Geosteiner 3.0 written by Warme et al. [Warme et al., 1999] which is considered a state-of-the-art tool for computing RSMTs and ESMTs. 12
In principle, it is also possible to apply biasing techniques to SPBH_I, thereby improving solution quality while not enlarging computation time. 13 The ESMT is similar to the RSMT except for the fact that edges are not restricted to horizontal and vertical directions.
7.7 Benchmarking of Heuristics in Our Routing Model
157
Table 7.4 summarizes the outcomes of the experiments which are performed on the same hardware platform as the other routing experiments. The average improvement of RSMT solutions over near-optimal heuristic ISPBH_I_Z_BIAS solutions is 6.6%. This means that RSMT lengths are about 5% shorter than graph SMT lengths. Of course, ESMT improves on these results. Note that the CPU times of RSMT and ESMT are substantially smaller than the CPU times of the ISPBH_I_Z_BIAS heuristic and comparable with the CPU times of the SPBH_I heuristic.
158
7.7.3
Routing
Concluding Remarks
We have shown that efficient graph-based routing heuristics are suitable for finding near optimal approximations to the graph Steiner minimal tree problem. The best heuristics in terms of solution quality and running time are SPBH_I and ISPBH_I_Z_BIAS. The former can be improved somewhat by using a biasing technique similar to the latter. Since SPBH is orders of magnitude faster than ISPBH_I_Z_BIAS it is certainly more suitable during the i n i t i a l phase of simulated annealing optimization. Another issue worth investigating with respect to improvement of run-time performance, is the idea of multiple wave expansion as elaborated in [Huijbregts, 1996] but explored in a somewhat different context. We already pointed out that the Geosteiner 3.0 tool produces optimal solutions much faster than a heuristic produces sub-optimal solutions. This seems paradoxal but it should be noted that the Geosteiner 3.0 code is strongly optimized while our heuristic code is not. However, Geosteiner 3.0 cannot compute optimal Steiner m i n i m a l trees in graphs which is an important requirement in our framework. As yet, no previous published results are known on fast heuristics for finding near-optimal Steiner minimal trees in graphs derived from actual module placements. Moreover, little was known about their absolute performance in relation with optimal solutions. Our work on routing has filled this gap. Last but not least, a rigorous optimization of the heuristic code should significantly improve run-time performance, resulting in a very fast global routing heuristic that yields nearoptimal results.
7.8 Incremental Routing As mentioned earlier, incremental update techniques are of paramount importance in an iterative optimization environment where the number of iterations can be very large. It is clear that when only a small change in the placement of modules occurs, at least the nets connected to the modules that actually changed location have to be re-computed. Also the nets that are routed through the region of affected modules should be re-computed in order to have consistently good global routes for all nets. Thus, the total set of affected nets constitutes of two subsets. We have a subset of nets which have at least one pin connected to any of the moved modules. This set is implicitly identified via computation of all moved modules. The second subset contains nets which have no pin connected to any of the moved modules, but have a routing segment running through the region of moved modules. We cover these cases separately in the following subsections.
7.8.1
Re-routing Nets Connected to Moved Modules
When a module moves due to a perturbation operation, the global routing graph is always affected. Thus, it has to be updated. The next step is to determine which nets have to be re-routed. It is quite obvious that each net connected to any of the moved modules needs a routing update. A straightforward incremental algorithm for doing this is as follows. Since we want to compute the total wire length W L of all nets in an incremental fashion, the above algorithm should be supplemented with the following step.
7.8 Incremental Routing
159
1. Enumerate all nets that have to be considered for a routing update by traversing all moved modules. 2. For each enumerated net re-compute the global routing using a pre-defined routing heuristic. 3. Incrementally update the total wire-length by pre-subtracting the length of the net before the perturbation, and adding the length of the re-computed routing of that net after the perturbation. Unfortunately, a nasty problem arises which is not easily discovered. The underlying reason of this problem is given next. In all of our global routing heuristics we use a priority queue to store candidate nodes for expansion. One property of priority queues that has been left untouched up to now, is the action that has to be taken when multiple elements with the same key exist in the queue. We assume, as is the default in these cases, that when the priority queue has to decide which element to choose from a set of elements with equal keys, it will make an arbitrary choice among these elements. Generally, there is no reason to deviate from this (usually) implicit assumption. However, in the present context we can easily sketch a scenario in which an arbitrary choice is unwanted. Figure 7.16(a) shows a part of a global routing graph in which the three pins and have to be routed. Pin is the start pin (Figure 7.16(a)). After a few steps, we arrive at node which is explored and put in the priority queue. When the algorithm extracts node from the queue for expansion, it will find node relax it and put it in the queue, keyed by the distance from pin Node is found subsequently, relaxed, and put in the queue, keyed by the distance from pin Since edge has the same length as edge nodes and reside in the queue with the same key value, say dist. When dist is the smallest key in the queue, and an extract_min(·) queue operation is issued, it is not clear a priori whether element or element will be returned. Since we have implemented the priority queue with a splay tree (see Chapter 5), the priority queue is fully deterministic. Therefore, even though we cannot predict which of the elements and is going to be extracted first, when the priority is built up using the same sequence of elements and operations, the choice between and is arbitrary but static. Figure 7.16(b) shows that in the case where is extracted first from the priority queue, node will become the parent node of node because node is explored first via node This will eventually result in the routing of the three-pin net as shown in Figure 7.16(c), because when pin is found after expanding node the backtrack pointers (shown as arrows in the figure) go via nodes in that order. Pin is then connected to the routing tree using the traversed edges. On the other hand, when the priority queue is built up using a different sequence of elements and operations, for example due to the presence of a module it is possible that instead of element element is extracted first (even if both distance values relative to are equal again). This eventually results in the routing solution as shown in Figure 7.16(d). Due to the importance of this observation, we formulate it in the following theorem. Theorem 9 The exact topology of a balanced search tree, such as the splay tree (see Chapter 5), depends on both the order and the total set of performed tree operations and tree elements. Note that also the addition of a single element, directly followed by the deletion of that
160
Routing
element, results in a different sequence and thus a possibly different topology of the tree structure. The scenario in which a node is added to an already existing routing tree is easily conceivable. For instance, when a module changes position, its associated escape lines can induce new edges and nodes in the routing graph. It is important to note that this might occur without re-routing the routing tree, simply because of the fact that we know in advance that the routing tree should not change (the routing tree does not run through the affected region, nor is it directly connected to any of the affected modules). Essentially, the arbitrary breaking of ties (for elements with the same key in the queue), causes this unwanted behavior of the routing algorithm. Although, the tie-breaking choice
7.8 Incremental Routing
161
does not have any obvious preference at that specific moment, it is likely to affect the final outcome of the routing solution (and consequently, the routing length). The choices we have to break a tie are: choose the node, independent of the contents and structure of the balanced search tree; choose the node, dependent of the contents and structure of the balanced search tree. Clearly, we must select the first choice. A practical implementation could be based on breaking a tie by choosing the node closest to the center of gravity of all pins in the net. For clarity, we show next what will happen when a naive tie-breaking approach is adopted. It is clear from the previous discussion that the routing result of a net depends on the environment around the region defined by the modules connected to that net. Therefore, incremental re-routing is badly affected due to the fact that the total wire-length obtained by incremental means can deviate without bound from the real total wire-length. This can be seen from the algorithm shown in Figure 7.17. It is clear that essentially the operations
on lines 3 to 5 are equal to the operations performed on lines 9 to 11. Using an equivalent representation for lines 3 to 5, in the form of
it can be easily seen that we have essentially which can be written as Consequently, considering a single generation-rejection scenario, i.e. REJECT always evaluates to true, the algorithm of Figure 7.17 can be simplied to the algorithm shown in Figure 7.18. From this simple algorithm we can directly see that in the case of a proper generation-rejection operation, should hold. Consequently, in effect, nothing happens to W L when the loop is iterated. However, from the previous discussion we know that the routing heuristic can cause It is evident that W L can
162
Routing
start drifting uncontrollably. Hence, this unwanted effect renders the optimization algorithm useless when straightforward incremental routing techniques are applied. The previously discussed solution to resolve environment-dependent routing results is to use some sort of biasing technique which will decide in a predictable and balanced-treestructure-independenl manner in case of ties. A biasing technique which will give good routing results is preferred of course. However, it is d i f f i c u l t to measure the quality of such a technique in a general context. An implementation of this technique could be as follows. 1. Extract all equal-keyed nodes from the queue and put them in a separate data structure H. Then compute a unique center of gravity which will he the reference point for the net in the current topology. 2. Choose the node from H with smallest or from In ease of ties, choose the node with smallest first. If there is still a tie, choose the node with smallest to resolve all ties.
Actually, H is not strictly necessary for an efficient implementation. We can simply process the extracted nodes sequentially (in any order) and decide to adjust the parent node of an explored node with distance equal to the current exploration distance. The final decision is based on the criteria mentioned in step 2 of the previous algorithm. The validity of the sequential approach is a direct consequence of the associativity property of the mathematical min operator. This method can be simplified by discarding the center of gravity information and assigning fixed priorities to each of the four edges departing from a node; the edge with higher priority will always have preference. Since there is no good reason to assume that the more sophisticated approach will perform better in practice in terms of routing quality, we choose for the simplest approach which is also the fastest.
7.8.2
Re-routing Affected Nets Not Connected to Moved Modules
Unfortunately, moved modules do not only trigger re-routing of the nets directly connected to them, but also nets that are routed through the region containing moved modules. There are two issues that play a significant role here.
1. Identification of all affected nets, with focus on the nets that are routed through the region with moved modules without being connected to any of the moved modules.
2. Efficient re-routing of these nets such that the quality of the obtained routing is not affected in a negative sense. Clearly, when a routing segment runs along a side of a moved module, it should be considered for re-routing after moving the module. We propose two approaches to enumerate these nets efficiently.
7.8 Incremental Routing
163
Enumerate all nets that run along any of the sides of the moved modules. Since we accumulated all global routing information and assigned this information to appropriate module boundaries in a previous iteration, it is a straightforward task to perform the enumeration without incurring additional computational complexity. Take only into account the modules at the perimeter of the affected region, which is the region containing the moved modules. If a routing segment of a net runs into this affected region, then that segment needs re-routing. Since each affected net that is not connected to any of the moved modules must cross this perimeter, it is sufficient to process all perimeter modules. A clear drawback of the first method is that all moved modules must be processed in order to find all affected nets. When the number of moved modules becomes larger, it will become more advantageous to consider solely the perimeter modules, since the number of perimeter modules will grow roughly proportionally with the square root of the number of moved modules. Furthermore, the second method immediately identifies the exact boundary locations of the routing segments that penetrate the affected region. The usefulness of knowing these boundary locations will be made clear shortly. As a result of the previous discussion, the second method is preferred over the first method. Full Re-routing of Indirectly Affected Nets With respect to the nets that cross the affected region defined by the moved modules, the simplest way to recompute the routing of a net is to compute the entire routing again. Using this approach, a high-quality routing is generally maintained at the expense of higher computational complexity. In case of large nets with many pins, this approach may burden the overall algorithm to a large extent, since most of the pins and thus the largest part of the routing segments lie in the unaffected region. Partial Re-routing of Indirectly Affected Nets A method to reduce the computational overhead of full re-routing of affected nets is through the use of partial re-routing. Essentially, only that part of the routing is re-computed that lies in the affected region. Besides a beneficial reduction in computational complexity for larger nets, there is an additional advantage with respect to the quality of a net. If the boundary crossings of the routing segments of a specific net are considered as virtual pins, we can actually obtain a gain in routing quality when we connect the virtual pins in a Steinerminimal-tree-like manner. However, when naively viewing all virtual boundary pins induced by a net as pins of a single subnet which needs to be routed, a lurking danger is the introduction of loops in the interconnect of the total net. Clearly, measures should be taken to avoid the occurrence of loops, since this is unwanted by definition. 14 A way to solve this problem is by keeping track of subsets of boundary pins that are disconnected due to the removal of routing segments in the affected region. Consequently, partial re-routing of affected nets can be favorable over full re-routing, especially if the partial portion is relatively small compared to the total routing of the net. l4
Under some circumstances it might actually be desirable to introduce loops, motivated by electromagnetic considerations, but this topic is outside the scope of this book.
164
7.9
Routing
Impact of Routing on Placement Quality
Both placement and routing are NP-hard problems when considered separately. When these problems are combined, which we should do because of their strong interdependencies, finding a solution does certainly not become easier. Therefore, many researchers tend to neglect or oversimplify part of the problems involved. It is interesting to investigate whether or not these simplifications are justified and how much we have to pay for these simplifications in terms of solution quality. In this section, it is shown via experimental results that the de-facto standard way of estimating wire length which is called the minimal bounding box (MBB) method, does not result in high-quality placements. Moreover, the use of a more accurate routing heuristic yields significantly better final (global) routing results.
7.9.1
Integrated Placement and Routing
The integrated placement and global routing idea has been integrated into a robust simulated annealing optimization framework. The main idea of the overall algorithm is shown in Figure 7.19.
With the previous discussion of the construction of the global routing graph and the extended global routing graph the above algorithm speaks for itself. Based on former experimental results with respect to the routing heuristics, we have chosen for the shortestpaths-based heuristic I (SPBH_I) which gives near-optimal routing results quickly. This
7.9 Impact of Routing on Placement Quality
165
heuristic is used in conjunction with a non-incremental placement computation algorithm. Our main purpose is to show the impact of routing quality on placement quality when both are weighted equally, i.e. in an optimization environment. We have not put any effort into minimizing run-time performance other than the most obvious implementation choices.
7.9.2
Experimental Results
We have implemented the integrated placement and routing optimization framework in C based on [Murata et al., 1996, Otten and van Ginneken, 1989]. The platform is SuSE Linux 7.0, running on PIII@800MHz with 512Mb RAM. We use the best solution out of three runs with different random seeds, unless noted otherwise. A set of randomly generated problem instances is used, i.e. random net lists with at most one pin per block side, and random module sizes from a given range. The largest MCNC benchmark ami49 has been included, too [Kozminski, 1990]. The primary purpose of including a commonly used benchmark, is to set a reference level for comparison with existing works. Note that the MCNC benchmark data has been adapted slightly to be comparable with existing results and to make placement optimization easier because we do not have to put additional boundary constraints on any module. The reason for not including more MCNC benchmarks is that we do not allow over-the-cell routing and thus cannot handle the in-cell pins of some MCNC benchmarks. The experimental setup is as follows. 1. We search for a (near) optimal packing without considering routing. This is mainly for comparison purposes with existing results and to show the impact of routing on placement quality. 2. We search for a (near) optimal packing using MBB routing. For each final packing, we compute the SPBH_I solution and cheek if this is consistent with the MBB estimation. 3. We search for a (near) optimal packing using SPBH_I routing. For each final packing, we compute the MBB solution and check if it is consistent. Table 7.5 shows our main results on the randomly generated benchmark data and the MCNC benchmark. It is interesting to note that the final chip area of packing optimization without routing for ami49, being is better than any previously reported result [Chang et al., 2000, Guo et al., 1999, Tang and Wong, 2001]. This implies that the proposed optimization framework has excellent convergence behavior, resulting in near-optimal solutions in reasonable time without any tuning. Note that the intention of the experiments is to show the impact of routing quality on placement quality. Therefore, no efforts have been devoted to minimizing computation times. The attention should be focused on relative differences. Let us take a closer look at the last two columns of Table 7.5. The results in column “MBB” are the final total wire-length values measured by the MBB estimation method. The results in column “SPBH_I” are the final total wire-length values measured by algorithm SPBH_I. We clearly see a trivial difference when we compare the upper six values with the lower six values in the MBB column. This also holds for the values in the last column. A most remarkable observation is the fact that in almost all cases an increase in MBB value occurs when a decrease in SPBH_I value is obtained. Consequently, we may conclude that the correlation between MBB and the more accurate SPBH_I global routing is very poor. Moreover, MBB routing used in the optimization loop does not lead to a placement which
166
Routing
will be routable with a m i n i m u m amount of wire length (avoiding obstacles). To substantiate this statement, Figure 7.20 shows the ratio of the total wire length obtained by SPBH_I and MBB, as a function of the number of blocks M (for three independent runs per value of M). It is easy to see that the correlation between SPBH_I and MBB is heavily problem-size de-
pendent. In general we may conclude that MBB routing is a bad total wire length predictor. Even stronger, MBB routing can significantly decrease placement quality. Furthermore, for the randomly generated problem instances, a clear trend towards a fixed ratio can be observed
7.10 Concluding Remarks
167
as M grows larger. Despite the existence of a strong correlation between the wire-length estimation obtained by SPBH_I and MBB, it is merely a statistical measure which clearly does not guarantee that a decrease in SPBH-I solution always corresponds to a decrease in MBB solution. Therefore, the practical usefulness of MBB routing is highly arguable.
7.9.3
Conclusions
Summarizing, we can conclude the following. Our implementation of the simulated annealing optimization algorithm produces excellent packings, cf. [Hong et al., 2000, Chang et al., 2000, Tang and Wong, 2001]. Note that we did not optimize for speed, for instance by applying faster sequence pair algorithms [Lin and Leenaerts, 2000b, Tang and Wong, 2001]. Coarse MBB routing does not correlate well with more accurate routing schemes such as SPBH_I routing. Therefore it is not wise to apply MBB routing as a standard routing method to evaluate the quality of a routing-aware placement tool15 . However, we do observe quite a strong correlation between SPBH_I and MBB among several runs of the same problem instance. In other words, a “fixed” ratio can be computed but unfortunately this will give a distorted notion because it is not guaranteed to hold for every final solution. The accuracy of global routing estimation significantly impacts the quality of a block placement, since a substantial decrease of about 6.3% to 16.2% in wire length can be observed for SPBH_I-based optimization while the chip area increases at most 3.3%. Accurate global routing, as compared to MBB routing, incurs a large penalty on the run-time performance of the optimization framework. The main culprits for this are the explicit construction of a global routing graph which has to be updated for each net, and the complexity of the accurate global routing algorithm itself. Although the proposed optimization framework works well on pure block placement instances, it does not necessarily imply good behavior when additional constraints such as wire length are introduced. However, it is unlikely that MBB-based routing would render the optimization convergence behavior radically different from SPBH_I-based routing. It should also be noted that for problem instances in which blocks have a high amount of pins, the routing complexity starts dominating the behavior of the optimization tool. As a consequence, for accurate routing to be practical for large problem instances, the routing complexity should be minimized substantially. This could, for instance, be accomplished by employing incremental techniques in conjunction with thorough optimization of the source code.
7.10
Concluding Remarks
In this chapter we gave the main ingredients for efficient obstacle-avoiding global routing methods, consisting of a global routing model and a global routing algorithm. We showed which points are important to consider, and might thus be eligible for future improvements. 15
We observed that the ratio SPBH_I/MBB tends towards a value around 2.2 when M > 200 for our set of randomly generated benchmarks.
168
Routing
A very important requirement for enabling incremental computation is that all data structures are fully dynamic. Of course, much effort is needed to implement these concepts properly. It is even more difficult to implement these ideas with high run-time performance in mind. Since practical run times of in-loop operations are very important in an iterationintensive environment such as simulated annealing, optimizing the implementation should be considered, too. Integrated placement and global routing has been investigated. Although, minimalbounding-box routing is de-facto standard in present literature, its adequateness is highly questionable. Our experiments clearly indicate that more accurate routing is required to arrive at better final routings. standard
Chapter 8
Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations The performance of high-frequency mixed-signal and analog designs heavily relies on the actual layout of the circuit components on device level. Therefore, proper placement and routing are of utmost importance. However, sole conventional constraints on placement and routing such as minimal area and minimal wire length, respectively, are not sufficient anymore. Taking into account previously neglected second order effects has been acknowledged to be a necessity. This chapter deals with the most important phenomena that can be handled with proper placement and routing. Roughly stated these phenomema can be classified into self-parasitics, crosstalk phenomena, and process variations. The aforementioned phenomena are discussed in detail and their role in the context of mixedsignal layout generation is made clear. In order to minimize the detrimental effects of these phenomena, accurate models are required. However, due to the iterative nature of the adopted stochastic optimization engine, the models must have low associated computational complexity. We observe that in general very little effort has been dedicated to performance-driven optimization of layout in a pre-detailed-routing phase. We claim that performance issues should be taken into account as soon as possible in the optimization phase, preferably during placement and global routing, in order to obtain high-quality layouts. This claim is clearly supported by the approach taken in this book. Substrate coupling is a crosstalk phenomenon which has not been considered much in the context of layout generation. Therefore, investigations are performed to gain more insight on this topic in connection with integrated placement and routing. A novel method is proposed which takes substrate coupling into account without increasing computational complexity. Experimental results demonstrate the practical feasibility of the method. Furthermore, we show that the approach can be easily mapped into an incremental framework.
170
Dealing with Physical Phenomena:Parasitics, Crosstalk and Process Variations
8.1
Previous Work
The amount of prior art with respect to crosstalk-aware, parasitics-aware, and processvariations-aware layout generation is rather limited. Most related works are either focused on efficient modeling of delay and crosstalk phenomena [Sakurai, 1993], parasitics-ware detailed routing [Lampaert, 1998], or post-placement yield improvement by enhanced routing techniques [Huijbregts, 1996, Lampaert, 1998]. Typically, process variations are dealt with using matching techniques in the context of analog layout [Cohn et al., 1991,Lampaert, 1998]. The weakness of this approach is that the used design rules reduce the actual problems to human-manageable notions which do not adequately take into account all spatial and electrical considerations. Consequently, much room for improvement lies here, from an algorithmic point of view. To the best of the authors’ knowledge, no previous work exists which handles or attempts to handle any of the above phenomena at the pre-detailed-routing level. An exception should be made here with respect to process variations. Several researchers have attempted to use matching rules [Cohn et al., 1991, Lampaert, 1998], which are established design rules in analog layout, in an automated environment to reduce the adverse affects of process variations on circuit blocks which should resemble each other in every respect as close as possible. However, the improvement expressed in quantitative measures as a result of this matching has barely been assessed in published works. What renders this issue even more complicated is the (lack of knowledge on the) quality and impact of routing in this context. It is interesting to note that in state-of-the-art mixed-signal designs such a current-steering digital-to-analog converters, the matching problem due to process variations is very dominant [van der Plas et al., 1999], and thus needs to be dealt with. Recent work by Doris et al. [Doris et al., 2001] gave a fundamental theoretical basis to this problem and the same researchers proposed an effective method to mitigate the influence of process variations in high-performance D/A converters.
8.2 Efficiency and Accuracy Requirements Efficient and accurate modeling of significant performance-degrading phenomena is important for succesful optimization of mixed-signal layout for the following reasons: A very sophisticated model adds too much overhead to the overall computational complexity of the optimization framework, rendering the approach impractical. A coarse inaccurate model can negatively impact performance of the optimization engine and cause convergence problems in the worst case. A trade-off between accuracy and efficiency is in general unavoidable, but it is important to keep in mind that the model should produce a consistent estimation of reality. In other words, it is better to have a reasonably constant over-estimation of 30% than an apparently more accurate but fluctuating estimation accuracy between -10% and 10%.
8.3 Self-Parasitics
171
8.3 Self-Parasitics Inherent physical properties of the materials which form a layout, induce parasitic phenomena which are not adequately modeled by many automated layout generation systems. The selfparasitics which consist of resistance, capacitance and inductance of a wire, form a separate class of unwanted effects based on the observation that the value of a self-parasitic solely depends on the geometrical properties of a single wire, independent of neighboring objects.
8.3.1
Wire Resistance, Capacitance and Inductance
The most simple wiring scenario consists of a single wire routed in a single-layer. Although this is not always possible in reality, this situation is the basis of more elaborate wiring scenarios. Figure 8.1 illustrates the sources of self-parasitics of a piece of interconnect. The area
capacitance depends on the thickness T, width W, height H and length L of the piece of wire. Furthermore, a so-called fringing capacitance exists which depends on the same parameters but with a different weighting. Of course the actual type of material of which the piece of interconnect is made plays a role, too. Besides the capacitance, there is also a series inductance and a series resistance associated with every piece of interconnect. Depending on the type of signal carried through the wire, the material and geometry of the wire, either one of them might be dominant over the other. To give the reader an impression of some typical values of parasitic elements for a CMOS process: for a metal 1 -metal2 scenario, for metal2. Normally, higher metal for metal 1, and and layers have smaller sheet resistances.
8.3.2
Via Resistance and Area
When completion of a net in a single layer is not possible, for instance due to congestion problems, we are forced to use another routing layer. A so-called via is used to locally connect two pieces of interconnect from different layers. Typically these layers must be adjacent. Going from one layer to another layer cannot occur without a penalty. The penalty is the relatively high cost of a via in terms of series resistance and area. For example, in a CMOS process, a via has a typical series resistance of with a maximum of
172
Dealing with Physical Phenomena:Parasitics, Crosstalk and Process Variations
This is equivalent to a wide metal2 wire of length Moreover, typically the use of vias goes in pairs which means that the equivalent amount of additional wiring is equal to for every wiring “bridge”. Furthermore, yield generally decreases with increasing number of vias. Therefore, avoiding vias as much as possible is an important layout design rule.
8.4 Crosstalk Crosstalk is the net effect of undesired signal propagation via parasitic coupling between objects in the layout. An effective remedy to lessen crosstalk is spatial separation of the objects that are subject to crosstalk. However, this is not a trivial problem to solve in most practical cases. In this book we discuss two different types of crosstalk sources: crosstalk due to substrate coupling and crosstalk due to parasitic coupling capacitance. A third source of crosstalk is magnetic coupling modeled with parasitic mutual inductance. This last issue lies outside of the scope of this book. However, we note that magnetic coupling effects can also be incorporated into our framework with some effort. By improving our understanding of the mechanisms of crosstalk and their effect on performance, we can find better means to reduce detrimental crosstalk effects in the proposed integrated placement and routing framework.
8.4.1
Substrate Coupling
The layout of an integrated circuit is embedded into a piece of silicon material. Ideally, this carrier, which is better known as the substrate, should not influence the operation of the IC. Unfortunately, due to non-ideal properties of the substrate, i.e. finite resistance, it is a conductive layer which propagates signals via parasitic coupling to and from many points in the circuit. The layout objects which have the lowest impedance to the substrate, are also most severely affected by the signal(s) carried by the substrate. In principle, a MOS transistor can be coupled as strong to the substrate as a piece of interconnect, but this strongly depends on the actual geometry of the objects. Moreover, in case of an NMOS transistor which lies in a P-type substrate, the voltage across the NP-junction determines the effective coupling capacitance to the substrate. Also, the backgate capacitance of a MOS transistor is a strong source of substrate coupling. We distinct two different types of substrates: a high-resistivity substrate and a lowresistitivy substrate. Figure 8.2 shows how the semiconductor materials in these substrate types are typically doped and layered. The high-resistivity substrate is composed of a lightly doped bulk region which is about 200 to thick and a thin epi-layer which has a lower resistivity. The low-resistivity substrate consists of a lightly doped p-type epi-layer grown on a heavily doped p-type bulk. The bulk is typically 100 to thick, and the epi-layer thickness varies from 5 to One of the many advantages of the use of high-resistivity substrate is that these type of substrates preserve conventional circuit design techniques, whereas the low-resistivity substrates require significantly different circuit design techniques because the substrate can be viewed as a single super-node [Gharpurey, 1995]. Another reason to choose for a highresistivity substrate is that this type of substrate allows for layout manipulation to decrease
8.4 Crosstalk
173
the adverse effect of substrate coupling on circuit performance, while layout techniques are of less use for substrate coupling reduction in connection with low-resistivity substrates. Also, high-resistivity substrates allow for the creation of better on-chip passive components. However, a drawback of high-resistivity substrates is the latch-up phenomenon, introducing unwanted parasitic transistors in the circuit with all their consequences; low-resistivity substrates virtually do not suffer from latch-up. Fortunately, with the advent of higher-frequency circuits and lower supply voltages, the latch-up phenomenon becomes increasingly less a problem [Veendrick, 2000]. A simple model for a high-ohmic substrate, which was semi-empircally determined by Joardar [Joardar, 1994], is shown in Figure 8.3. This model fits perfectly in our stochastic optimization framework, since it can be evaluated quickly. It should be noted that this model holds for guarded modules, but application to unguarded modules is justified [Deferm et al., 1988] if we only want to minimize the influence of substrate coupling. Nodes A and B in the
circuit are connected to specific points in the integrated circuit, for instance the drains of two separate MOS transistors. In case of a bulk contact, the capacitor should be replaced by a short circuit. The resistances and strongly depend on process parameters and
174
Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations
the geometry of layout modules. This information can he easily stored in a parameterized manner, since the layout module shapes are known in advance and very regular. Resistance is of most interest to us since it depends on the actual module placement. A closed-form expression for is
where L is the effective lateral coupling length between two coupled objects, S is the spacing between these objects, and are constants for a given process. The form of the equation used to model is physically based, and obtained by solving the Laplace equations for two circular substrate contacts [Deferm et al., 1988]. The slightly complicated form is because three-dimensional effects are included. Furthermore, since no simple expression exists for rectangular geometries, the one available for circuit contacts was used as an approximation. In the remainder of this chapter resistances and junction capacitances and will be ignored for simplicity, but without loss of generality.
8.4.2
Parasitic Coupling Capacitance
In practice, the interconnect of a layout consists of many adjacent wires within the same layer or on distinct layers. In the situation where the adjacent wires lie on the same layer, we speak of line-to-line capacitance or lateral capacitance. The other cases are covered by the the area capacitance and fringing capacitance as shown in Figure 8.1. To complete the picture, a simplified scenario is drawn in Figure 8.4. It is clear that the lateral capacitance
depends on the distance between the objects on the same layer and the longest common length of the parallel-running parts of the lateral objects. More information on the values of these capacitances can be found in specific technology files. It is also possible to derive reasonably accurate closed-form expressions for many important parasitic phenomena in connection with wiring [Sakurai, 1993).
8.5 Process Variations
8.5
175
Process Variations
Process variations consist of systematic and random deviations which occur due to nonidealities of IC manufacturing equipment. To name a few examples: non-uniform layer thickness or doping across the wafer, under-etching and over-etching, mask mis-alignments, etc. The impact of these deviations on circuit performance can he tremendous, leading to errors in functionality and yield issues. For instance, differences in threshold voltages of switching transistors can easily lead to signal skew. This, in turn, can for instance lead to a reduced spurious-free dynamic range in the case of digital-to-analog converters. It is wellknown that at least the errors induced by systematic deviations are strongly correlated with the location of the geometrical objects in a layout [Pelgrom et al., 1989]. Therefore, it is important to take these effects into account in the layout phase so that the detrimental effects of process variations can be reduced as much as possible. Proper matching of the transistors of a differential pair is a well-known issue in analog layout design. In general we can speak of matched circuit or layout modules in which a module can consist of a single transistor, but can also be a passive element, or even a small subcircuit. Furthermore, it is important to note that matching constraints are essentially equivalent to relative placement constraints. These type of constraints can be taken into account by an efficient placement representation such as the sequence pair. However, the proposed approach by Balasa and Lampaert [Balasa and Lampaert, 1999] is not efficient in terms of computational complexity of a single constrained placement evaluation. Furthermore, their approach induces a more irregular cost landscape and, therefore, worse convergence of the simulated annealing optimization algorithm. An approach based on the constrained placement work of Tang and Wong [Tang and Wong, 2001] is likely to offer better results. Finally, we note that although virtually all analog layout matching efforts have been focused on symmetric placement, at least as important in this context is symmetric routing. The latter has, to the best of our knowledge, never been explored in depth in the context of layout generation. The term “symmetric” is in our notion not restricted to geometric symmetry. Albeit sufficient, we argue that it is not necessary, since symmetric signals through the matched interconnect is the ultimate goal. In this book we do not elaborate further on the issue of process variations in connection with mixed-signal layout generation.
8.6
Incorporating Crosstalk and Parasitics into Routing
The importance of crosstalk-aware and parasitics-aware routing is generally acknowledged by both industry and academia. However, the incorporation of these effects is mostly performed at the detailed routing level. Only a limited number of works have considered these effects at the global routing level [Parakh and Brown, 1999, Zhou and Wong, 1998]. To the best of the authors’ knowledge no works have considered these effects at the pre-detailed routing level in an integrated placement and routing stochastic optimization framework. Since expensive ripup-and-reroute strategies are to be avoided, it is better to estimate the amount of crosstalk beforehand as good as possible, and eventually perform detailed routing based on more flexible global routing information. Although the issue of incorporating crosstalk and parasitics into routing is of utmost importance for efficient detailed routing, it is not handled
176
Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations
in the present work.
8.7
Incorporating Substrate Coupling into Placement
In this section we investigate the incorporation of substrate coupling into the placement phase of modules. In principle, substrate coupling also occurs for wiring but this aspect is not covered in this book. In order to estimate the amount of substrate coupling, the simple substrate model of Figure 8.3 is used. For the calculation of resistance in this model, exact information on the geometry and location of each module connected to node A and B is needed. Therefore, we have to define exactly how a module is composed. The essentially 1-dimensional model of Joardar in the form of (8.1) is then generalized to two dimensions. In the context of sequence-pair-based block placement, we propose a novel method to handle the slack space that exists in most placements such that the impact of substrate coupling is minimized. The algorithm for accomplishing this is based on expansion of the core module and shifting the core module in the expanded module space so that the total impact of substrate coupling is minimized. More explicitly, in terms of the simple substrate coupling model of Figure 8.3, the task is to reduce to minimize the coupling, of which the actual impact is evaluated by means of a priori obtained sensitivity values. Note that the overall optimization procedure does not imply that placements with a large amount of slack space are seen as bad a priori. On the contrary, introducing additional slack space might reduce the overall impact of substrate coupling. With a properly chosen balance between chip area, total wire length, and substrate coupling impact, the simulated annealing algorithm stochastically searches for a placement which adheres to the given cost function (see (4.3)). Experimental results show the effectiveness and efficiency of the approach.
8.7.1
A Basic Module
The atomic elements that can be manipulated during optimization are rectangular modules. A module has connecting pins on all four sides, which means that modules are adequate to represent devices such as transistors, capacitors, inductors and resistors. The space occupied by a rectangular module can be subdivided into: core module space, routing space, and expansion space. Figure 8.5 depicts the basic module and the necessary data structure elements. The data-structure of the basic module we use consists of: the lower left coordinate of the enclosing module, the width and height of the enclosing module, and offset of the core module’s lower left corner, width and height of the core module, top, right, bottom, and left routing space width. If and then we call the enclosing module tight, otherwise the enclosing module is loose and we have expansion space. Hereafter, routing issues are disregarded (at least the details); we included them here for completeness.
8.7 Incorporating Substrate Coupling into Placement
8.7.2
177
Generalized 2-Dimensional Substrate Coupling Model
Due to the fact that two placed modules can be placed in many ways relative to each other, a simple 1-dimensional distance measure to estimate the coupling resistance is clearly not sufficient. Therefore, the value of resistance in the substrate model of Figure 8.3 should be made dependent on the amount of skew between two modules and and the distance between the two modules. This idea is visualized in Figure 8.6. We propose a simple geo-
metrical method to compute the effective coupling in cases where modules are skewed. In addition, it is made plausible that the original expression for should be modified slightly in order to incorporate the proximity effects of modules with different dimensions. These notions are shown in Figure 8.7. It is clear that the refinement of substrate coupling resistance in a 2-dimensional setting is based on geometrical arguments. For the case shown in Figure 8.7(a), two additional terms should be added to the computation of giving it the following general shape when two modules are not fully skewed (Figures 8.7(a) and (b)):
178
Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations
where is an additional constant which is used for fitting. In the case of a fully skewed placement of two modules and the lateral parallel coupling between and in the original sense of (8.1) has vanished. Instead, the following formula which is derived from Figure 8.7(c) holds:
where
and
are treated similarly to
and
8.7 Incorporating Substrate Coupling into Placement
179
It is a straightforward task to express (8.2) and (8.3) in terms of the geometrical and spatial properties of the basic modules. Finally, the amount of substrate coupling between modules and is inversely proportional to resistance and denoted by
8.7.3
Substrate Coupling Impact Minimization
A high level of substrate coupling does not necessarily mean that circuit performance is badly affected. To map the amount of substrate coupling to circuit performance, circuit sensitivities are required (see Chapter 2). Let us assume that these sensititivies are known, then this brings us to the ultimate goal of minimizing the impact of substrate coupling. The impact (on a given performance measure) of substrate coupling from module to module is defined by1
where is the substrate coupling sensitivity of performance function defined on module The sensitivities can, for instance, be obtained from circuit simulation, a priori. The noisiness of module depends on both amplitude and time-derivative of a predefined electrical property. Note that module is assumed to be fixed in location, while the optimal position of module is to be determined. The problem of minimizing the impact of substrate coupling can be stated as follows. Problem: Substrate Coupling Impact Minimization Problem Instance: Solutions:
A placement of M modules associated with a sequence pair with chip area dimensions W × H. All possible non-overlapping absolute placements of the M modules without violating the relative relationships dictated by sequence pair subject to
Minimize:
Note that the problem can be seen as a force-balanced constrained mechanical system with given initial conditions, but strongly nonlinear relationships between the components. Clearly it is too costly to solve this problem to optimality in the context of our stochastic optimization framework. Therefore, we simplify the problem in three respects. 1. We introduce a rectangular window around every module which limits the number of surrounding modules that affect the module to be shifted to an optimal location. This is a reasonable limitation since in practice the modules that lie further away will be “shielded” by closer modules. 1
Note that
in general, but
180
Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations
2. We accept a sub-optimal solution due to the procedure of selecting the modules to be processed sequentially in the order of decreasing amount of expansion space. Also this is an acceptable limitation since subsequent modules can never be shifted more than the maximum allowable thus lessening the effect on previously shifted modules. 3. Within the constrained location problem for a single module any locally optimal solution is accepted. The underlying reason is that finding the global solution of this nonlinear function minimization problem of two variables is computationally expensive, while any additional gain might not be much.
8.7.4
An Efficient Substrate Coupling Impact Minimization Algorithm
In order to tackle the substrate coupling impact minimization problem, it is necessary to find the surrounding modules of a given module efficiently. For that purpose we use the corner stitching data structure which enables efficient module enumeration. As discussed in Chapter 5 finding the neighbors of a given module can be performed in where is typically close to the selected area (range window) over total chip area ratio times the total number of modules M. For each set of enumerated neighboring modules, a function minimization problem has to be solved in a pre-defined order. Since we settle with a local minimum, an efficient off-the-shelf algorithm can be used for this [Forsythe et al., 1977, Press et al., 1992]. We consider the one-dimensional case of the problem for simplicity and ease of implementation, but without loss of generality with respect to the simplified version of the problem as described before. The overall algorithm is based on the ideas of module enumeration and expansion which have been explained in Chapter 6, Section 6.10. The most important implication of the proposed packing-to-scquence-pair algorithm is that the algorithm enumerates and expands all modules in linear computational complexity. With this information we can devise the algorithm given in Figure 8.8 to (locally) minimize the impact of substrate coupling. Clearly, the computational complexity of step 1 is
the sum of enumerating and expanding all modules, and putting them in sorted order in a data structure. This can be performed in on average. Step 2 consists of extracting an unprocessed module with largest expansion space from the data structure and enumerating the neighbors of that module. Since the modules are stored in sorted order, the extraction step takes The computational effort to enumerate the neighbors of a module depends on the size of the range window and is given by (5.4). Furthermore, finding a (local) minimum of substrate coupling impact is roughly proportional to the number of terms in the
8.7 Incorporating Substrate Coupling into Placement
181
function to be minimized. This, in turn is proportional to the number of neighboring modules. Consequently, the total computational complexity is dominated by the latter. Since step 2 is repeated exactly M times, the overall computational complexity of the algorithm is equal to When is not dependent on M, this expression equals Compared with a from scratch computation of a packing, we may conclude that no additional computational overhead is induced by the substrate coupling impact minimization problem.
8.7.5
Implementation Considerations
In an actual implementation, sorting of the modules with respect to their expansion space size must be handled carefully. Since a straightforward sorting procedure for M elements easily incurs complexity it is better to choose a sorting algorithm such as bucket sort. The use of bucket sort is justified, since we may assume that the input distribution of values is pseudo-random.
8.7.6
Experimental Results
The substrate coupling impact minimization problem has been implemented in C and evaluated on a Linux operating system running on a Pentium MMX 200MHz CPU with 64Mbytes of RAM. We will show how much the practical run-time of a simulated annealing optimization iteration increases, after incorporating the expansion and substrate coupling impact minimization algorithm. We also show that practical run-times of the proposed method are linear in the size of the problem instance. The simulations are performed on a batch of randomly generated problem instances. The results are summarized in Table 8.1. Moreover, Figure 8.9 shows graphically the relations between the CPU times and the problem instance sizes M. Figure 8.9(a) shows a superlinear relationship between the CPU time and the number of modules M for one complete SA iteration, whereas Figure 8.9(b) shows a linear relationship for the substrate coupling impact minimization algorithm, as expected. Note that the packing computation algorithm is the algorithm originally proposed by Murata et. al. [Murata et al., 1996]. For visual satisfaction, graphical representations of the standard packing results and the expanded and optimized packing results for a set of ten modules are shown in Figure 8.10. The optimization is performed using randomly generated substrate coupling sensitivities and noisiness values
182
8.7.7
Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations
Conclusions
We presented a new and efficient substrate coupling impact minimization (SCIM) algorithm, that enables efficient incorporation of substrate problems into an iterative placement optimization loop. Substrate coupling has been recognized as one of the major physical design bottlenecks for high-performance high-frequency mixed-signal circuits. Therefore, minimizing the impact of substrate coupling will result in better designs in less design iterations. Results of simulations performed on randomly generated medium to large problem instances, clearly show that the practical run-time of the SCIM algorithm is linear in the problem instance size, which is optimal in the context of a from scratch computation of a packing. It should be noted that in order to incorporate the influence of the coupling capacitances,
8.8 Incremental Substrate Coupling Impact Minimization
183
more iterations are needed.
8.8
Incremental Substrate Coupling Impact Minimization
In the same line as incremental placement and incremental routing, we propose to put substrate coupling impact minimization in the context of incremental computation. Indeed, it is almost trivial to compute the impact of substrate coupling in an incremental fashion. Once it is known which modules have changed position due to a perturbation of the simulated annealing algorithm, we only have to re-arrange that set of modules. Essentially, this comes down to applying the SCIM algorithm given in Figure 8.8 to the restricted set of moved modules. It is easy to see that the overall computational complexity of the incremental algorithm is proportional to the number of moved modules. Note that the latter holds under the assumption that enumerating these moved modules can be performed efficiently, i.e. proportional to the number of moved modules.
8.9
Concluding Remarks
We have given an overview of some very important physical phenomena that must be taken into account for the purpose of generating a high-quality mixed-signal layout. These phenomena are: self-parasitics, crosstalk and process variations. A specific type of crosstalk due to substrate coupling is considered in depth, and a novel approach to minimize the negative impact of substrate coupling is proposed. Experimental results show that the method is practically feasible, although the examples are not taken from real-life circuits. In the same line of thought as for placement and routing, the additional constraints induced by physical phenomena can also be taken into account in an efficient incremental manner.
This page intentionally left blank
Chapter 9
Conclusions The implementation details of simulated annealing have a tremendous impact on the performance of the algorithm in practice. This is an issue which is mostly left undiscussed in virtually all papers which employ simulated annealing for global optimization. We have shown that knowledge of efficient algorithms and advanced data structures is of utmost importance in the context of designing (new) efficient algorithms in the context of mixed-signal layout generation. We have proposed and implemented an efficient incremental framework for computing accurate block placements under the constraint of several user-definable parameters. The efficiency of the incremental approach is backed up by concise theoretical arguments. The average computational complexity for a single incremental computation‚ being is better than any previously reported result. A new consistent (idempotent) linear-time placement-to-sequence-pair mapping algorithm has been proposed. The algorithm is useful‚ for example‚ in the context of converting graphical user-interface data to an abstract format. An improved‚ more robust‚ and easy to implement constrained block placement algorithm has been proposed which improves significantly over previous results. However‚ the naive implementation which leaves room for improvement‚ is slower than the original tuned algorithm. A new method for constructing an efficient global routing graph from a placement of modules has been proposed. The method has average computational complexity where M is the number of placed modules. Under some reasonably weak conditions this complexity can be reduced to An important feature of the new construction is the fact that dynamic changes in the graph are supported and can be performed efficiently. We have devised new efficient global routing algorithms for finding obstacle-avoiding routes of multi-pin nets in the proposed global routing graph. These heuristics have been extensively benchmarked for a large set of routing problem instances derived from sequencepair placements. The heuristic results are compared with optimal results which have been obtained using state-of-the-art third party tools. The fact that not all problem instances were solvable to optimality demonstrates the difficulty of the problem instances (and the routing problem). A set of tests have been performed with the integrated accurate sequence-pair placement representation and accurate obstacle-avoiding global routing heuristic in the simulated annealing optimization loop. The outcome of our experiments demonstrates unambiguously that the current de-facto standard minimal-bounding-box routing method does not qualify for
186
Conclusions
finding good placements while minimizing actual global routing length. Substrate coupling can be taken into account efficiently and in an incremental manner using a linear complexity algorithm. Using pre-computed sensitivity values‚ we show that the impact of substrate coupling can easily be (locally) minimized. Although many important layout generation aspects have been dealt with in detail in this book‚ some topics still need further elaboration to complete the picture. In our view these are: global multi-net routing‚ detailed routing‚ and temperature aspects. In connection with routing‚ it is expected that the use of multiple high-quality global routing solutions for a given net‚ as opposed to using only the best found solution‚ will improve the overall quality of the global routing result. This approach essentially tackles the net ordering problem‚ which is a very hard problem. Using multiple high-quality solutions for a single net will not mitigate the problem. On the contrary‚ this version of the multi-net routing problem can be proven to be NP-hard under the constraint of uniform wire spreading‚ by showing that it is in essence a max sub-set sum problem which is known to be NP-hard [Cormen et al.‚ 1990]. Furthermore‚ generally we would like to have all wires uniformly distributed over the chip area. Several reasons can be brought forward in this respect. Firstly‚ uniform wire distribution implies that modules are expanded evenly. This in turn means that the quality of an interconnect will not suffer too much from the change in module positions. Although the quality of an interconnect might degrade due to longer length‚ compared to an optimal-length interconnect in the expanded scenario‚ the relative quality should be quite insensitive to a uniform expansion operation. From a manufacturability/yield point of view‚ it can be advantageous to spread power-dissipating wires over a larger area so that the temperature is more evenly distributed over the chip‚ plus the occurrence of so-called hot spots that can eventually cause performance degradation over time might be prevented. Lastly‚ the detrimental effect of parasitic coupling can be somewhat lessened by proper wire spreading. At least the impact of wire coupling can be assessed and handled more easily when the number of wires in a single region is lessened. In order to take the step to detailed routing‚ we need to make sure that enough routing space is reserved. Reserving enough space can be seen as a module expansion problem: how much do we need to expand each module? This‚ in turn‚ depends on which global route segments are assigned to which module‚ mapping the expansion problem into an assignment problem. The latter is an important problem which needs to be investigated in detail. Last but not least‚ we note that temperature issues are also important to consider in the context of dealing with other physical constraints‚ since temperature gradients can be considered to be as bad as process variations‚ especially in connection with matched circuit components. However‚ to perform temperature analysis in an accurate way‚ we need to estimate power accurately. It is well-known that the latter is a non-trivial problem which is an active field of research. Efficient models to estimate power can help us in quickly determining (dominant) temperature profiles‚ which in turn can be used in an iterative optimization framework. In this book we have shown that automatic layout generation is a very complex field. We strongly believe that the concepts and results presented in this book form a significant contribution to the progress in the mixed-signal layout generation arena. However‚ much research
187
still needs to be performed to achieve “push-button solutions”.
This page intentionally left blank
Bibliography [Aarts and Korst, 1989] Aarts, E.H.L. and Korst, J. (1989). Simulated annealing and Boltzmann machines: a stochastic approach to combinatorial optimization and neural computing. Wiley-Interscience series in discrete mathematics and optimization. Wiley, Chichester. [Adel’son-Vel’skii and Landis, 1962] Adel’son-Vel’skii, G. M. and Landis, Y. M. (1962). An algorithm for the organization of information. Doklady Akademii Nauk SSSR, 146:263– 266. English translation in Soviet Math. Dokl., 3:1259-1262. [Adler and Barke, 2000] Adler, T. and Barke, E. (2000). Single step current driven routing of multiterminal signal nets for analog applications. In Proc. Design, Automation and Test in Europe Conference and Exhibition 2000, pages 446–450. [Baker et al., 1998] Baker, R.J., Li, H.W., and Boyce, D.E. (1998). CMOS Circuit Design, Layout and Simulation. IEEE Press Series on Microelectronic Systems. IEEE Press. [Balasa, 2000] Balasa, F. (2000). Modeling non-slicing floorplans with binary trees. In Proc. International Conference on Computer Aided Design, pages 13–16. [Balasa and Lampaert, 1999] Balasa, F. and Lampaert, K. (1999). Module placement for analog layout using the sequence-pair representation. In Proc. ACM/IEEE Design Automation Conference, pages 274–279. [Beame and Fich, 1999] Beame, P. and Fich, F. E. (1999). Optimal bounds for the predecessor problem. In Proc. STOC’99, pages 295–304. [Bliek et al., 2001] Bliek, C, Spellucci, P., Vicente, L.N., Neumaier, A., Granvilliers, L., Monfroy, E., Benhamou, F., Huens, E., van Hentenryck, P., Sam-Haroud, D., and Faltings, B. (2001). Algorithms for Solving Nonlinear Constrained and Optimization Problems: The State of The Art. http://www.mat.univie.ac.at/~neum/glopt/coconut/. [Boese and Kahng, 1994] Boese, K.D. and Kahng, A.B. (1994). Best-so-far vs. where-youare: Implications for optimal finite-time annealing. Systems and Control Letters, 22( 1 ):71– 78. [Chalup and Maire, 1999] Chalup, S. and Maire, F. (1999). A study on hill climbing algorithms for neural network training. In Proc. 1999 Congres on Evolutionary Computation, pages 2014–2021. [Chang, 1994] Chang, H. (1994). A Top-Down, Constraint-Driven Design Methodology for Analog Integrated Circuits. PhD thesis, University of California, Berkeley.
190
Bibliography
[Chang and Wang, 1992] Chang, M.-S. and Wang, F.-H. (1992). Efficient algorithms for the maximum weight clique and maximum weight independent set problems on permutation graphs. Information Processing Letters, 43:293–295. [Chang et al., 2000] Chang, Y.-C, Chang, Y.-W., Wu, G.-M., and Wu, S.-W. (2000). B*trees: A new representation for non-slicing floorplans. In Proc. Design Automation Conference, pages 458–463. [Charbon, 1995] Charbon, E. (1995). Constraint-Driven Analysis and Synthesis of HighPerformance Analog IC Layout. PhD thesis, University of California, Berkeley. [Chiang et al., 1990] Chiang, C., Sarrafzadeh, M., and Wong, C.K. (1990). Global routing based on Steiner min-max trees. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 9( 12): 1318–1325. [Choudhury and Sangiovanni-Vincentelli, 1990] Choudhury, U. and SangiovanniVincentelli, A. (1990). Use of performance sensitivities in routing of analog circuits. In Proc. International Symposium on Circuits and Systems, volume 1, pages 348–351. [Cohn et al., 1991] Cohn, J.M., Garrod, D.J., Rutenbar, R.A., and Carley, L.R. (1991). Koan/anagram ii: New tools for device-level analog placement and routing. IEEE Journal of Solid-State Circuits, 26(3):330–342. [Cohn et al., 1994] Cohn, J.M., Garrod, D.J., Rutenbar, R.A., and Carley, L.R. (1994). Analog Device-Level Layout Automation. The Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers. [Cohoon and Richards, 1988] Cohoon, J.P. and Richards, D.S. (1988). Optimal two-terminal wire routing. Integration: the VLSI Journal, 6:35–57. [Cong et al., 2000] Cong, J., Kong, T., Liang, F., Liu, J.S., Wong, W.H., and Xu, D. (2000). Dynamic weighting monte carlo for constrained floorplan designs in mixed signal application. In Proc. ASP-DAC 2000, pages 277–282. [Cong and Madden, 1998] Cong, J. and Madden, P.H. (1998). Performance driven multilayer general area routing for pcb/mcm designs. In Proc. Design Automation Conference, pages 356–361. [Cormen et al., 1990] Cormen, T.H., Leiserson, C.E., and Rivest, R.L. (1990). Introduction to Algorithms. McGraw Hill. [Darwin, 1859] Darwin, C. (1859). The origin of species. John Murray, London. [Dawkins, 1976] Dawkins, R. (1976). The selfish gene. Oxford University Press, Oxford. [Dechter and Pearl, 1985] Dechter, R. and Pearl, J. (1985). Generalized best-first search strategies and the optimality of a*. Journal of the Association of Computing Machinery, 32(3):505–536. [Deferm et al., 1988] Deferm, L., Claes, C., and Declerck, G.J. (1988). Two- and threedimensional calculation of substrate resistance. IEEE Transactions on Electron Devices, 35(3):339–352.
Bibliography
191
[Diaconis and Stroock, 1991] Diaconis, P. and Stroock, D. (1991). Geometric bounds for eigenvalues of Markov chains. The Annals of Applied Probability, 1(1 ):36–61. [Dijkstra, 1959] Dijkstra, E.W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1:269–271. [Doris et al., 2001] Doris, K., Lin, C, and van Roermund, A.H.M. (2001). D/a conversion: Amplitude and time error mapping optimization. In Proc. 1CECS 2001, pages 863–866. [Forsythe et al., 1977] Forsythe, G.E., Malcolm, M.A., and Moler, C.B. (1977). Computer methods for mathematical computations. Prentice-Hall. [Francken et al., 2000] Francken, K., Vancorenland, P., and Gielen, G. (2000). DAISY: a simulation-based high-level synthesis tool for modulators. In Proc. IEEE International Conference on Computer Aided Design, pages 188–192. [Frigioni et al., 1997] Frigioni, D., Ioffreda, M., Nanni, U., and Pasqualone, G. (1997). Experimental analysis of dynamic algorithms for the single source shortest path problem. In Proc. Workshop on Algorithm Engineering, pages 54–63. [Fujiyoshi and Murata, 1999] Fujiyoshi, K. and Murata, H. (1999). Arbitrary convex and concave rectilinear block packing using sequence-pair. In Proc. ISPD’99, pages 103–110. [Ganley, 1995] Ganley, J.L. (1995). Geometric Interconnection and Placement Algorithms. PhD thesis, University of Virginia. [Garey and Johnson, 1977] Garey, M.R. and Johnson, D.S. (1977). The rectilinear steiner tree problem is NP-complete. SIAM J. Appl. Math., 32:826–834. [Garey and Johnson, 1979] Garey, M.R. and Johnson, D.S. (1979). tractability : a guide to the theory of NP-completeness. Freeman.
Computers and in-
[Gharpurey, 1995] Gharpurey, R. (1995). Modeling and Analysis of Substrate Coupling in Integrated Circuits. PhD thesis, University of California, Berkeley. [Graham et al., 1989] Graham, R.L., Knuth, D.E., and Patashnik, O. (1989). Concrete Mathematics: A Foundation for Computer Science. Addison-Wesley Publishing Company. [Guo et al., 1999] Guo, P.-N., Cheng, C.-K., and Yoshimura, T. (1999). An O-tree representation of non-slicing floorplan and its applications. In Proc. DAC’99, pages 268–273. [Hajek, 1988] Hajek, B. (1988). Cooling schedules for optimal annealing. Mathematics of operations research, 13(2):311–329. [Hanan, 1966] Hanan, M. (1966). On Steiner’s problem with rectilinear distance. J. SIAM Appl. Math., 14:255–265. [Holland, 1975] Holland, J.H. (1975). Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor.
192
Bibliography
[Hong et al., 2000] Hong, X., Huang, G., Cai, Y., Gu, J., Dong, S., Cheng, C.-K., and Gu, J. (2000). Corner block list: An effective and efficient topological representation of nonslicing floorplan. In Proc. International Conference on Computer Aided Design, pages 8–12. [Hsu, 1985] Hsu, W.L. (1985). Maximum weight clique algorithms for circular-arc graphs and circle graphs. S1AM J. Comput., 14:224–231. [Huang and Sangiovanni-Vincentelli, 1986] Huang, M. and Sangiovanni-Vincentelli, A. (1986). An efficient general cooling schedule for simulated annealing. In Proc. International Conference on Computer-Aided Design, pages 381–384. [Huijbregts, 1996] Huijbregts, E.P. (1996). A Complete Design Path for the Layout of Flexible Macros. PhD thesis, Eindhoven University of Technology, Eindhoven, The Netherlands. [Hunt and Szymanski, 1977] Hunt, J.W. and Szymanski, T.G. (1977). A fast algorithm for computing longest common subsequences. Communications of the ACM, 20(5):350–353. [Hwang, 1976] Hwang, F.K. (1976). On Steiner minimal trees with rectilinear distance. SIAM Journal of Applied Mathematics, 30(1):104–114. [Hwang et al., 1992] Hwang, F.K., Richards, D.S., and Winter, P. (1992). The Steiner Tree Problem, volume 53 of Annals of Discrete Mathematics. North-Holland, Amsterdam. [Ingber, 1989] Ingber, L. (1989). Very fast simulated re-annealing. Journal of Mathl. Comput. Modelling, 12:967–973. [Ingber, 1993] Ingber, L. (1993). Simulated annealing: practice versus theory. Mathl. Comput. Modelling, 18(11):29–57. [Ingber and Rosen, 1992] Ingber, L. and Rosen, B. (1992). Genetic algorithms and very fast simulated reannealing: A comparison. Mathematical Computer Modeling, 16(11):87–100. [Jepsen and Jr., 1983] Jepsen, D.W. and Jr., C.D. Gelatt (1983). Macro placement by monte carlo annealing. In Proc. IEEE International Conference on Computer Design, pages 495–498. [Joardar, 1994] Joardar, K. (1994). A simple approach to modeling cross-talk in integrated circuits. IEEE Journal of Solid-State Circuits, 29:1212–1219. [Jusuf et al., 1990] Jusuf, G., Gray, P.R., and Sangiovanni-Vincentelli, A.L. (1990). CADICS - cyclic analog-to-digital converter synthesis. In Proc. IEEE International Conference on Computer Aided Design, pages 286–289. [Kahng, 2000] Kahng, A.B. (2000). Classical floorplanning harmful? In Proc. ISPD, pages 207–213. [Kahng and Robins, 1995] Kahng, A.B. and Robins, G. (1995). On Optimal Interconnections for VLSI. The Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers.
Bibliography
193
[Kanchanasut, 1994] Kanchanasut, K. (1994). A shortest-path algorithm for manhattan graphs. Information Processing Letters, 49:21–25. [Kang and Dai, 1998] Kang, M.Z. and Dai, W.W-M. (1998). Arbitrary rectilinear block packing based on sequence pair. In Proc. ICCAD’98, pages 259–266. [Kirkpatrick et al., 1983] Kirkpatrick, S., Jr., C.D. Gelatt, and Vecchi, M.P. (1983). Optimization by simulated annealing. Science, 220(4598):671–680. [Knuth, 1989] Knuth, D.E. (1989). The Art of Computer Programming, volume 3. AddisonWesley Publishing Company, edition. [Knuth, 1996] Knuth, D.E. (1996). Selected Papers on Computer Science (CSLI Lecture Notes, No. 59). C S L I Publications. [Koch et al., 2000] Koch, T., Martin, A., and Voß, S. (2000). SteinLib: An updated library on steiner tree problems in graphs. Technical Report ZIB-Report 00-37, Konrad-ZuseZentrum für Informationstechnik Berlin, http://elib.zib.de/steinlib. [Kozminski, 1990] Kozminski, K. (1990). Mcnc benchmark data. In International Workshop on Layout Synthesis 1990. http://www.cbl.ncsu.edu/CBL_Docs/lys90.html. [Kruiskamp, 1996] Kruiskamp, W. (1996). Analog design automation using genetic algorithms and polytopes. PhD thesis, Eindhoven University of Technology, Eindhoven, The Netherlands. [Lampaert, 1998] Lampaert, K. (1998). Analog Layout Generation for Performance and Manufacturability. PhD thesis, Katholieke Universiteit Leuven. [Lee, 1961] Lee, C.Y. (1961). An algorithm for path connections and its applications. IRE Transactions Electronic Computers, EC-10(3):346–365. [Lengauer, 1990] Lengauer, T. (1990). Combinatorial algorithms for integrated circuit layout. Wiley, Chichester. [Lin and Leenaerts, 2000a) Lin, C. and Leenaerts, D.M.W. (2000a). A new efficient method for substrate-aware device-level placement. In Proc. ASP-DAC 2000, pages 533–536. [Lin and Leenaerts, 2000b] Lin, C. and Leenaerts, D.M.W. (2000b). A new faster sequence pair algorithm. In Proc. ISCAS 2000, volume III, pages 407–10. [Lin et al., 2001] Lin, C., Leenaerts, D.M.W., and van Roermund, A.H.M. (2001). Faster incremental VLSI placement optimization. In Proc. European Conference on Circuit Theory and Design, volume II, pages 153–156. [Liu, 2001 ] Liu, J.S. (2001). Monte Carlo Strategies in Scientific Computing. Springer Series in Statistics. Springer Verlag. [Liu and Sechen, 1999] Liu, L.-C.E. and Sechen, C. (1999). Multilayer chip-level global routing using an efficient graph-based Steiner tree heuristic. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 18(10): 1442–1451.
194
Bibliography
[Mäkinen, 1999] Mäkinen, E. (1999). On the longest upsequence problem for permutations. Technical Report A-1999-7, University of Tampere, Finland. [Malavasi, 1993] Malavasi, E. (1993). Techniques for performance-driven layout of analog integrated circuits. Master’s thesis, University of California, Berkeley. [Malavasi and Charbon, 1999] Malavasi, E. and Charbon, E. (1999). Constraint transformation for IC physical design. IEEE Transactions on Semiconductor Manufacturing, 12(4):386–395. [Malavasi and Sangiovanni-Vincentelli, 1993] Malavasi, E. and Sangiovanni-Vincentelli, A. (1993). Area routing for analog layout. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 12(8):1186–1197. [Martínez and Roura, 1998] Martínez., C. and Roura, S. (1998). Randomized binary search trees. J. ACM, 45(2):288–323. [Matsumoto et al., 1991] Matsumoto, T., Saigan, N., and Tsuji, K. (1991). Two new efficient approximation algorithms for the Steiner tree problem in rectilinear graphs. In Proc. Int. Symp. on Circuits and Systems, volume 2 of 5, pages 1156–1159. [Mehlhorn and Näher, 1990] Mehlhorn, K. and Näher, S. (1990). Bounded ordered dictionaries in time and space. Information Processing Letters, 35:183–189. et al., 2000] I.I., Vazirani, V.V., and Ganley, J.L. (2000). A new heuristic for rectilinear Steiner trees. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 19(10): 1129–1139. [Murata et al., 1995] Murata, H., Fujiyoshi, K., Nakatake, S., and Kajitani, Y. (1995). Rectangle-packing-based module placement. In Proc. ICCAD, pages 472–479. [Murata et al., 1996] Murata, H., Fujiyoshi, K., Nakatake, S., and Kajitani, Y. (1996). VLSI module placement based on rectangle-packing by the sequence-pair. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 15:1518–1524. [Murata et al., 1998] Murata, H., Fujiyoshi, K., Nakatake, S., and Kajitani, Y. (1998). VLSI/PCB placement with obstacles based on sequence pair. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 17:61–68. [Murata et al., 1997] Murata, H., Fujiyoshi, K., Watanabe, T., and Kajitani, Y. (1997). A mapping from sequence-pair to rectangular dissection. In Proc. ASP-DAC’97, pages 625– 633. [Nakatake et al., 1996] Nakatake, S., Fujiyoshi, K., Murata, H., and Kajitani, Y. (1996). Module placement on bsg-structure and ic layout applications. In Proc. ICCAD’96, pages 484–491. [Nakatake et al., 1998] Nakatake, S., Furuya, M., and Kajitani, Y. (1998). Module placement on bsg-structure with pre-placed modules and rectilinear modules. In Proc. ASP-DAC’98, pages 571–576.
Bibliography
195
[Nakatake et al., 2001] Nakatake, S., Kubo, Y., and Kajitani, Y. (2001). Consistent floorplanning with super hierarchical constraints. In Proc. ISPD’01, pages 144–149. [Nilsson, 1980] Nilsson, N.J. (1980). Principles of Artificial Intelligence. Tioga Publishing Company, Palo Alto, CA. [Ochotta et al., 1996] Ochotta, E.S., Rutenbar, R.A., and Carley, L.R. (1996). Synthesis of high-performance analog circuits in astrx/oblx. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 15(3):273–294. [Onodera et al., 1991] Onodera, H., Taniguchi, Y, and Tamaru, K. (1991). Branch-andbound placement for building block layout. In Proc. ACM/IEEE Design Automation Conference, pages 433–439. [Otten, 1982] Otten, R.H.J.M. (1982). Automatic floorplan design. In Proc. DAC’82, pages 261–267. [Otten, 2000] Otten, R.H.J.M. (2000). What is a floorplan? In Proc. ISPD 2000, pages 212–217. [Otten and van Ginneken, 1989] Otten, R.H.J.M. and van Ginneken, L.P.P.P. (1989). The annealing algorithm, volume 72 of The Kluwer International Series in Engineering and Computer Science. Kluwer Academic. [Ousterhout, 1984] Ousterhout, J.K. (1984). Corner stitching: A data-structuring technique for VLSI layout tools. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 3(1):87–100. [Pang et al., 2000] Pang, Y, Balasa, F., Lampaert, K., and Cheng, C.-K. (2000). Block placement with symmetry constraints based on the o-tree non-slicing representation. In Proc. Design Automation Conference, pages 464–467. [Parakh and Brown, 1999] Parakh, P.N. and Brown, R.B. (1999). Crosstalk constrained global route embedding. In Proc. ISPD’99, pages 201–206. [Pearl, 1984] Pearl, J. (1984). Heuristics: Intelligent Search Strategies for Computer Problem Solving. The Addison-Wesley Series in Artificial Intelligence. Addison-Wesley, Reading, Mass. [Pelgrom et al., 1989] Pelgrom, M.J.M., Duinmaijer, A.C.J., and Welbers, A.P.G. (1989). Matching properties of MOS transistors. IEEE Journal of Solid-State Circuits, 24(5): 1433–1440. [Press et al., 1992] Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P. (1992). Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, 2nd edition. [Prim, 1957] Prim, R.C. (1957). Shortest connection networks and some generalizations. Bell System Technical Journal, 36:1389–1401. [Pugh, 1990] Pugh, W. (1990). Skip lists: A probabilistic alternative to balanced trees. Communications of the ACM, 33(6):668–676.
196
Bibliography
[Ramalingam and Reps, 1996a] Ramalingam, G. and Reps, T. (1996a). An incremental algorithm for a generalization of the shortest-path problem. Journal of Algorithms, 21:267– 305. [Ramalingam and Reps, 1996b] Ramalingam, G. and Reps, T. (1996b). On the computational complexity of dynamic graph algorithms. Theoretical Computer Science, 158:233– 277. [Rayward-Smith, 1983] Rayward-Smith, V.J. (1983). The computation of nearly minimal steiner trees in graphs. Int. J. Math. Educ. Sci. Technol., 14:15–23. [Rayward-Smith and Clare, 1986] Rayward-Smith, V.J. and Clare, A. (1986). On finding Steiner vertices. Networks, 16:283–294. [Rechenberg, 1973] Rechenberg, I. (1973). Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Frommann-Holzboog. [Rönngren and Ayani, 1997] Rönngren, R. and Ayani, R. (1997). A comparative study of parallel and sequential priority queue algorithms. ACM Transactions on Modeling and Computer Simulation, 7(2): 157–209. [Sahni and Bhatt, 1980] Sahni, S. and Bhatt, A. (1980). The complexity of design automation problems. In Proc. IEEE/ACM Design Automation Conference, pages 402–411. [Sakanushi et al., 1998] Sakanushi, K., Nakatake, S., and Kajitani, Y. (1998). The multibsg: Stochastic approach to an optimum packing of convex-rectilinear blocks. In Proc. ICCAD’98, pages 267–274. [Sakurai, 1993] Sakurai, T. (1993). Closed-form expressions for interconnection delay, coupling, and crosstalk in VLSI’s. IEEE Transactions on Electron Devices, 40(1): 118–124. [Sechen, 1988] Sechen, C. (1988). VLSI placement and global routing using simulated annealing, volume 54 of The Kluwer international series in engineering and computer science. Kluwer Academic, Dordrecht. [Semiconductor Industry Association, 1998] Semiconductor Industry Association (1998). National Technology Roadmap for Semiconductors. [Sherwani, 1993] Sherwani, N.A. (1993). Algorithms for VLSI physical design automation. Kluwer Academic. [Sleator and Tarjan, 1985] Sleator, D.D. and Tarjan, R.E. (1985). Self-adjusting binary search trees. Journal of the Association of Computing Machinery, 32(3):652–686. [Stepniewski and Keane, 1997] Stepniewski, S.W. and Keane, A.J. (1997). Pruning backpropagation neural networks using modern stochastic optimization techniques. Neural Computing & Applications, 5:76–98. [Takahashi and Matsuyama, 1980] Takahashi, H. and Matsuyama, A. (1980). An approximate solution for the Steiner problem in graphs. Math. Japonica, 24(6):573–577.
Bibliography
197
[Takahashi, 1996] Takahashi, T. (1996). An algorithm for finding a maximum-weight decreasing sequence in a permutation, motivated by rectangle packing problem. Tech. Rep. IEICE, VLD96(201):31–35. [Takahashi, 2000] Takahashi, T. (2000). A new encoding scheme for rectangle packing problem. In Proc. ASP-DAC 2000, pages 175–178. [Tang, 2001] Tang, X. (2001). Constrained sequence-pair-based placement. private communication. [Tang et al., 2000] Tang, X., Tian, R., and Wong, D.F. (2000). Fast evaluation of sequence pair in block placement by longest common subsequence computation. In Proc. DATE 2000, pages 106–111. [Tang and Wong, 2001 ] Tang, X. and Wong, D.F. (2001). Fast-sp: A fast algorithm for block placement based on the sequence pair. In Proc. ASP-DAC 2001, pages 521–526. [Tseng, 1997] Tseng, H.-P. (1997). Detailed Routing Algorithms for VLSI Circuits. thesis, University of Washington, Seattle.
PhD
[van der Plas et al., 1999] van der Plas, G.A.M., Vandenbussche, J., Sansen, W., Steyaert, M.S.J., and Gielen, G.G.E. (1999). A 14-bit intrinsic accuracy random walk CMOS DAC. IEEE Journal of Solid-State Circuits, 34(12):1708–1718. [van Emde Boas, 1975] van Emde Boas, P. (1975). Preserving order in a forest in less than logarithmic time. In Proc. Annual Symposium on Foundations of Computer Science, pages 75–84. [Veendrick, 2000] Veendrick, H. (2000). ASICs. Kluwer Academic Publishers,
Deep-Submicron CMOS ICs: From basics to edition.
[Warme, 2000] Warme, D. (2000). Geosteiner extensions for steiner minimal trees in graphs. private communication. [Warme et al., 1999] Warme, D.M., Winter, P., and Zachariasen, M. (1999). Geosteiner 3.0. http://www.diku.dk/geosteiner. [Winter and Smith, 1992] Winter, P. and Smith, J. MacGregor (1992). Path-distance heuristics for the Steiner problem in undirected networks. Algorithmica, 7:309–327. [Wong and Liu, 1986] Wong, D.F. and Liu, C.L. (1986). A new algorithm for floorplan design. In Proc. DAC’86, pages 101–107. [Xu et al., 1998] Xu, J., Guo, P.-N., and Cheng, C.-K. (1998). Rectilinear block placement using sequence pair. In Proc. ISPD, pages 173–178. [Zheng et al., 1996] Zheng, S.Q., Lim, J.S., and Iyengar, S.S. (1996). Finding obstacleavoiding shortest paths using implicit connection graphs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 15(1): 103–110. [Zhou et al., 2001 ] Zhou, H., Shenoy, N., and Nicholls, W. (2001). Efficient minimum spanning tree construction without delaunay triangulation. In Proc. ASP-DAC 2001.
198
Bibliography
[Zhou and Wong, 1998] Zhou, H. and Wong, D.F. (1998). Optimal river routing with crosstalk constraints. ACM Transactions on Design Automation of Electronic Systems, 3(3):496–514.
About the Authors
Chieh “Achie” Lin was born on December 2‚ 1972 in Ruian City‚ China. In September 1991 he started a study “Informatietechniek” at the department of Electrical Engineering‚ Eindhoven University of Technology‚ The Netherlands. In June 1997 he received the “Ingenieur” (Ir.) degree from this institute. Thereafter‚ he worked towards a Ph.D. degree in the Mixed-signal Microelectronics research group of the department of Electrical Engineering at Eindhoven University of Technology. For the work that was carried out he received the Ph.D. degree on Februari 20‚ 2002. This book is a compact compilation of the research results. Since June 2001‚ he is with Philips Research‚ Electronic Design & Tools. Currently he is focusing on development of CAD tools for analog simulation and synthesis‚ with emphasis on radio-frequency issues‚ optimization‚ and layout aspects.
Arthur H.M. van Roermund was born in Delft‚ The Netherlands‚ in 1951. He received the M.Sc. degree in electrical engineering in 1975 from the Delft University of Technology and the Ph.D. degree in Applied Sciences from the K.U.Leuven‚ Belgium‚ in 1987. From 1975 to 1992 he was with Philips Research Laboratories in Eindhoven. From 1992 to 1999 he has been a full professor at the Electrical Engineering Department of Delft University of Technology‚ where he was chairman of the Electronics Research Group and member of the management team of DIMES. From 1992 to 1999 he has been Chairman of a two-years postgraduate school for “chartered designer”. From 1992 to 1997 he was consultant for Philips. October 1999 he joined Eindhoven University of Technology as a full professor‚ chairing the mixed-signal microelectronics group. Since 2001 he is also member of the departmental board‚ with research portfolio. He is chairman of the board of ProRISC‚ a nation-wide microelectronics platform‚ and senior member of the IEEE.
Domine M.W. Leenaerts studied electrical engineering at Eindhoven University of Technology. He gained his Ph.D. in 1992. Currently he is a principal scientist at Philips Research Laboratories‚ Eindhoven‚ where he is involved in RF integrated transceiver design. Dr. Leenaerts is IEEE Distinguished Lecturer and Associate Editor of the IEEE Transactions on Circuits and Systems - part I. His research interests include nonlinear dynamic system theory‚ ADC/DAC design and RF and microwave techniques. He has published over 100 papers in scientific and technical journals and conference proceedings. He has written two books among which “Circuit Design for RF Transceivers”‚ Kluwer Academic Publishers.
This page intentionally left blank
Index Symbols A* algorithm
sets sets “Big Oh” “Big Omega” “Big Theta” A absolute placement acceptance function admissible affected module algorithm
see algorithm
42 42 66 42 66 42 42 42 76 27,38 59, 141
86 41 A* 126, 138, 140 23 basic simulated annealing branch-and-bound 21 constrained maximum-weight com116 mon subsequence 20 deterministic 126 Dijkstra’s 74 direct view divide-and-conquer 21 21 dynamic programming 138 exact 22 genetic heuristic 22 incremental longest-paths 97 incremental single-sink-shortest99 paths 21 linear programming maximum-weight common subse81 quence 126 path search Prim’s 127 20 rule-based 22 simulated annealing
simulated evolution 22 single-source-shortest-paths 138 stochastic 21 substrate coupling impact minimization 180 template-based 20 amortized analysis see analysis analysis amortized 43 average-case 43 worst-case 43 area capacitance 174 area routing see routing asymptotic analysis 42 average distance function 150 average-case analysis see analysis 149 average-distance heuristic B B*-tree 60 basic module 176 Bellman-Ford equations 79 biasing technique 156 binary search tree 48 60 binary tree 8 bottom-up flow boundary constraint see constraint 60 bounded sliceline grid branch-and-bound algorithm see algorithm C capacity chaining circuit sensitivities clique collision computational complexity computational complexity analysis computational model
130 52 179 82 52 25 43 42
202
computer-aided design 2 congestion 130 consistency 141 constrained maximum-weight common subsequence algorithm see algorithm constraint boundary 115 corner 115 pre-placed module 15 range 115 constraint graph 67 constraint mapping 15 corner block list 60 corner constraint see constraint corner stitching 44 28‚ 39 cost function coupling parasitic 36 substrate 36 crosstalk 172 169 crosstalk phenomena
D data structure 41 corner stitching see corner stitching Delaunay triangulation 143 delay 170 demand 130 demand node 126 design cycle 8 mixed-signal 9 physical 2‚9 VLSI 8 design cycle 8 detailed routing see routing deterministic algorithm see algorithm Dijktra’s algorithm see algorithm see algorithm direct view algorithm directed acyclic graph (DAG) 69 divide-and-conquer algorithm see algorithm 52 double hashing doubly-linked list see list down-sequence 81
Index
dynamic net change dynamic placement change dynamic programming algorithm algorithm
135 137
see
E error ratio 145 escape graph 133 escape line 133 156 Euclidean SMT exact algorithm see algorithm expansion method 113 extended global routing graph 133 extraction 10 F see placement feasible placement 11 Field-Programmable Gate Array final temperature see temperature 56 floorplan 56 sized floorplanning
fringing capacitance full custom
7
171‚ 174 11
G 27‚ 38 generation function see algorithm genetic algorithm global routing see routing global routing graph 133‚ 134 global routing model see routing model 126 graph Steiner minimal tree gridding 111
H 142 half perimeter Hanan grid 131 hash function 51 hash table 50 heuristic algorithm see algorithm high-resistivity substrate 172 76 horizontal constraint graph I idempotent mapping implicit connection graph incremental computation
112 134 85
Index
203
incremental longest-paths algorithm see algorithm incremental placement see placement see routing incremental routing incremental single-sink-shortest-paths alsee gorithm algorithm see temperature initial temperature 82 interval graph irregular-grid routing model see routing model J
Jepsen-Gelatt
60
L labeled ordered tree latch-up lateral capacitance layout generation synthesis layout style layout system layout system interface LD-packing line-to-line capacitance linear programming algorithm algorithm list doubly-linked singly-linked longest common subsequence longest increasing subsequence longest path low-resistitivy substrate LP-overconsistent LP-underconsistent
47 47 67 71 67 172 97 97
M magnetic coupling Manhattan distance mapping problem matching maximal increasing subsequence maximum-weight clique
172 141 14 170 71 82
60 173 174 2 2 2 11 10 12 65 174 see
maximum-weight common subsequence 67‚ 80 maximum-weight independent set 82 maximum-weight monotone subsequence
81 maze routing see routing 61 meaningful placement minimal spanning tree 143 minimal-bounding-box routing see routing minimum spanning tree 127 65 module Moore’s Law 1 moved module 86 see routing multi-net routing 172 mutual inductance N node-based routing see routing noisiness 179 non-slicing 57 60 normalized Polish expression 126 NP-hard
O 67 oblique grid obstacle-avoiding rectilinear Steiner min126 imal tree 32 optimization framework 19 optimization method ordered sequence 65 60 ordered tree (O-tree) over-consistent 97 over-the-cell routing 17 P P-admissible packing packing optimization parasitic coupling parasitic phenomena passive components path-based routing path-search algorithm perfect hash function permutation permutation graph
59 65 106 see coupling 171 173 see routing see algorithm 52 65 83
204
26‚36 perturbation operator see design physical design 135 pin deletion 136 pin insertion 7‚55 placement feasible 55 85 incremental 17 placement constraint 164 placement quality placement representation see representation placement-to-sequence-pair mapping 112 138‚ 145 planar graph 60 Polish expression see pre-placed module constraint constraint see algorithm Prim’s algorithm 52 priority queue 42 problem instance size 25 problem representation process variations 169 41 program
R RAM see random-access machine 42 random-access machine see constraint ranee constraint 156 rectilinear SMT rectilinear Steiner minimal tree 131 rectilinear wiring model 126 see routing regular-grid routing model model repetitive shortest-paths heuristic 148 34 representation 60 placement routing 127‚ 130 routing 7‚ 125 128 area 128 detailed 126 global 158 incremental 128 maze 142 minimal-bounding-box 186 multi-net 149 node-based 144 path-based
Index
single-step two-step routing algorithm routing approach routing constraint routing hierarchy routing model global irregular-grid regular-grid routing quality routing representation rule-based algorithm
128 129 127 127 17 128 127‚ 130 133 131 130 151 see routing see algorithm
S 11 sea of gates selection function 38 169 self-parasitics sensitivities 15 sensitivity computation 16 sequence pair 60‚ 64 series inductance 171 series resistance 171 sheet resistance 171 shortest-paths heuristic 144 shortest-paths-based heuristic I 144 shortest-paths-based heuristic II 144 simple substrate model 173 simulated annealing algorithm see algorithm simulated evolution algorithm see algorithm single-pair shortest path problem 126 single-source-shortest-paths algorithm i single-step routing see routing singly-linked list see list sized floorplan see floorplan skip list 47 slicing 57 10 Spice 48 splay tree splaying 49 11 standard cell Steiner minimal tree 137 Steiner node 126 stochastic algorithm see algorithm
Index
205
stop criterion 28‚39 53 stratified tree 72 string subsequence 70 172 substrate substrate coupling see coupling‚ 169 substrate coupling impact minimization algorithm see algorithm symmetric placement 175 symmetric routing 175 T temperature final initial temperature decrement temperature schedule template-based algorithm top-down flow transitive two-step routing U under-consistent up-sequence V vertical constraint graph
via VLSI
39 39 39 27 see algorithm 7
66 see routing 97 81
77 171
1
W wafer worst-case analysis worst-case performance
2 see analysis
42
Y
yield
2‚ 175