Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC

Analog Circuit Design Michiel Steyaert • Arthur van Roermund Andrea Baschirotto Editors Analog Circuit Design Low Vo...

Author: Michiel Steyaert | Arthur van Roermund | Andrea Baschirotto

92 downloads 1671 Views 18MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Analog Circuit Design

Michiel Steyaert • Arthur van Roermund Andrea Baschirotto Editors

Analog Circuit Design Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC

123

Editors Michiel Steyaert K.U. Leuven Department of Elektrotechniek ESAT-MICAS Kardinaal Mercierlaan 94 B-3001 Heverlee Belgium [email protected]

Arthur van Roermund Electrical Engineering Technical University Eindhoven Mixed-signal Microelectronics Group Eindhoven Netherlands [email protected]

Andrea Baschirotto Department of Physics University of Milan-Bicocca Milan Italy [email protected]

ISBN 978-94-007-1925-5 e-ISBN 978-94-007-1926-2 DOI 10.1007/978-94-007-1926-2 Springer Dordrecht Heidelberg London New York Library of Congress Control Number: 2011937971 © Springer Science+Business Media B.V. 2012 No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

Preface

This book is part of the Analog Circuit Design series and contains contributions of the speakers of the 20th workshop on Advances in Analog Circuit Design (AACD), which was organized by KULeuven ESAT-MICAS. The workshop was held in Leuven, Belgium, from April 5 to April 7, 2011. I also would like to give my gratitude for the help to organize the workshop to Danielle Vermetten, Chris Mertens, Ben Geeraerts, Valentijn De Smedt and Hans Meyvaert. This book comprises three Parts, covering advanced analog and mixed-signal circuit design fields that are considered as very important by the circuit design community: • Low-Voltage Low-Power Data converters • Short Range Wireless Front-Ends • Power Management and DC-DC Each Part is set up with six papers from experts in the field. The aim of the AACD workshop is to bring together a group of expert designers to discuss new developments and future options. Each workshop is then followed by the publication of a book by Springer in their successful series of Analog Circuit Design. This book is number 20 in this series. The books can be seen as a reference for all people involved in analog and mixed-signal design. The full list of the previous books and topics in the series is given next. We sincerely hope that this 20th book continuous the tradition and provides a valuable contribution to our Analog Design Community. Michiel Steyaert

v

vi

Preface

The topics covered before in this series: 2010

Graz (Austria)

2009

Lund (Sweden)

2008

Pavia (Italy)

2007

Oostende (Belgium)

2006

Maastricht (The Netherlands)

2005

Limerick (Ireland)

2004

Montreux (Swiss)

2003

Graz (Austria)

2002

Spa (Belgium)

2001

Noordwijk (The Netherlands)

2000

Munich (Germany)

1999

Nice (France)

1998

Copenhagen (Denmark)

1997

Como (Italy)

Robust Design Sigma Delta Converters RFID Smart Data Converters Filters on Chip Multimode Transmitters High-speed Clock and Data Recovery High-performance Amplifiers Power Management Sensors, Actuators and Power Drivers for the Automotive and Industrial Environment Integrated PAs from Wireline to RF Very High Frequency Front Ends High-speed AD Converters Automotive Electronics: EMC Issues Ultra Low Power Wireless RF Circuits: Wide Band, Front-Ends, DACs Design Methodology and Verification of RF and Mixed-Signal Systems Low Power and Low Voltage Sensor and Actuator Interface Electronics Integrated High-Voltage Electronics and Power Management Low-Power and High-Resolution ADCs Fractional-N Synthesizers Design for Robustness Line and Bus drivers Structured Mixed-Mode Design Multi-Bit Sigma-Delta Converters Short-Range RF Circuits Scalable Analog Circuits High-Speed D/A Converters RF Power Amplifiers High-Speed A/D Converters Mixed-Signal Design PLLs and Synthesizers XDSL and other Communication Systems RF-MOST Models and Behavioural Modelling Integrated Filters and Oscillators 1-Volt Electronics Mixed-Mode Systems LNAs and RF Power Amps for Telecom RF A/D Converters Sensor and Actuator Interfaces Low-Noise Oscillators, PLLs and Synthesizers (continued)

Preface

vii

(continued) 1996 Lausanne (Swiss)

1995

Villach (Austria)

1994

Eindhoven (Netherlands)

1993

Leuven (Belgium)

1992

Scheveningen (The Netherlands)

RF CMOS Circuit Design Bandpass Sigma Delta and Other Data Converters Translineair Circuits Low-Noise/Power/Voltage Mixed-Mode with CAD tools Voltage, Current and Time References Low-Power Low-Voltage Integrated Filters Smart Power Mixed-Mode A/D Design Sensor Interfaces Communication Circuits OpAmps ADC Analog CAD

The book contains the contribution of 18 tutorials of the 20th workshop on Advances in Analog Circuit Design. Each part discusses a specific to-date topic on new and valuable design ideas in the area of analog circuit design. Each part is presented by six experts in that field and state of the art information is shared and overviewed. This book is number 20 in this successful series of Analog Circuit Design, providing valuable information and excellent overviews of • Low-Voltage Low-Power Data Converters – Chaired by Prof. Anderea Baschirotto, University of Milan-Bicocca • Short Range Wireless Front-Ends – Chaired by Prof. Arthur van Roermund, Eindhoven University of Technology • Power management and DC-DC – Chaired by Prof. M. Steyaert, Katholieke University Leuven Analog Circuit Design is an essential reference source for analog circuit designers and researchers wishing to keep abreast with the latest development in the field. The tutorial coverage also makes it suitable for use in an advanced design.

Contents

Part I

Low-Voltage Low-Power Data Converters

1

Power Minimization in ADC Design . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Willy Sansen

3

2

Low-Power Pipelined A/D Conversion . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Boris Murmann

19

3

Low-Power Successive Approximation ADCS for Wireless Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Jan Craninckx

4

Oversampling Converters Beyond Continuous-Time Sigma-Delta for Nanometer CMOS Technologies . .. . . . . . . . . . . . . . . . . . . . A. Di Giandomenico, L. Hernandez, E. Prefasi, S. Paton, A. Wiesbauer, R. Gaggl, and J. Hauptmann

39

59

5

Considerations for Cost-Efficient Calibration of Scaled ADCs . . . . . . . Marian Verhelst, Erkan Alpman, and Hasnain Lakdawala

6

A 12b 2.9 GS/s DAC with IM3 < 60 dBc Beyond 1 GHz in 65 nm CMOS .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 119 K. Bult, C.-H. Lin, F. van der Goes, J. Westra, J. Mulder, Y. Lin, E. Arslan, E. Ayranci, and X. Liu

Part II 7

89

Short-Range Wireless Front-Ends

Short Range Radio Communication – Novel Applications and Their Physical Layer Requirements . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 139 William G. Scanlon

ix

x

Contents

8

Ultra Low-Power Wireless Body-Area Sensor Networks . . . . . . . . . . . . . . 145 G. Dolmans, F. Bouwens, A. Breeschoten, B. Busze, P. Harpe, L. Huang, X. Huang, M. Konijnenburg, V. Pop, M. Vidojkovic, Y. Zhang, C. Zhou, and H. de Groot

9

Low Power RF Power Harvesting Enabling More Active Tag Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 163 Tim Piessens, Yves Geerts, Wim Vanacken, Eldert Geukens, Bram De Muer, Tim Butler, and Bob Hamlin

10 Low Power RF Frontend for Wireless Sensor Networks .. . . . . . . . . . . . . . 175 Frank Henkel, Thomas Leineweber, Mohamed Gamal El-Din, and Ralf Wilke 11 Ultra High Data Rate CMOS Front Ends . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 199 Reza Mahmoudi and Arthur van Roermund 12 Extremely Wideband CMOS Circuits For Future THz Applications Lorenzo Tripodi, Marion K. Matters-Kammerer, Dave van Goor, Xin Hu, and Anders Rydberg Part III

237

Power Management and DC-DC

13 State-of-the-Art of Integrated Switching Power Converters.. . . . . . . . . . 259 Gerard Villar Piqu´e and Henk Jan Bergveld 14 Data Conversion Pulse-Width Modulators for SwitchMode Power Converter Digital Control .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 283 Eduard Alarc´on, Vahid Yousefzadeh, Aleksandar Prodi´c, and Dragan Maksimovi´c 15 Advanced Power Management for Low Power Medical Applications 305 Kristof Quaegebeur and Jan Crols 16 Feedforward Control of Switching Regulators .. . . . .. . . . . . . . . . . . . . . . . . . . 321 Richard Redl 17 Device Optimization to Assess Losses and Ringing Issues in Integrated Synchronous Buck Converters .. . . . . . .. . . . . . . . . . . . . . . . . . . . 339 J. Roig and F. Bauwens 18 Control of Fully Integrated DC-DC Converters in CMOS . . . . . . . . . . . . 357 Tom Van Breussegem, Mike Wens, and Michiel Steyaert

Part I

Low-Voltage Low-Power Data Converters

The first part of the book deals with the design and implementation of “low-voltage low-power data-converters”. The topics have been addressed to the different dataconverter topologies, to the different implementation issues (from topology and system level down to circuit level). Moreover the presented solutions have been always analyzed in consideration of the effects of the CMOS technology scaling that reduces device analog performance but offers efficient digital signal processing for analog performance improvement. In the first paper, Willy Sansen gives an overview of the different ADC topologies, emphasizing the aspects relative the power consumption minimization. This contribution presents a good scenario of the different ADC topologies, like flash (using interpolation & folding), pipeline, SAR and Sigma-Delta, which have been introduced and compared with the data from the implementations most recently reported in literature. The second paper from Boris Murmann reviews recent developments and low-power design techniques for high-speed pipelined ADC. The fundamental operation principles are introduced, and, then, widely used low-power techniques are summarized. Finally some ideas that have been proposed in recent research publications are outlined. In the third paper Jan Craninckx discusses the advancements in SAR ADCs design, in particular for wireless transceivers application. An overview is given of recent techniques that reduce the switching power in the capacitive DAC, and as such improve the power ADC efficiency up to levels that are out of reach of the typically used pipeline architecture. Moreover, this paper discusses the chargesharing SAR ADC architecture, which proposes a new signal processing method in the charge domain that removes the often-neglected though requirements for the reference buffer. Antonio Digiandomenico et al. propose in the fourth paper low-power largebandwidth implementations of Continuous-Time Sigma-Delta ADCs, where cascaded architectures and time-encoding signal processing have been successfully applied. Two different implementations, PWM-based and VCO-based, are finally described.

2

I Low-Voltage Low-Power Data Converters

In the fifth paper, Mariam Verhelst et al. discuss digitally-assisted performance enhancement strategies to overcome ADC component mismatch limitation, otherwise addressed by increased component sizes and increased power consumption. Trade-off analysis between mismatch compensation in the analog domain (digitally assisted trimming, possibly in combination with up-scaling) vs. the digital domain (digital post-distortion) is considered. The increasing use of digitally enhanced ADC architectures proves to be the main driver for the observed improvement in area and power with CMOS technology scaling. Finally the sixth paper from Klaas Bult et al. analyzes the aspects relative to power reduction in very high-frequency DAC. The case study of 12b 2.9 Gs/s DAC is proposed as a benchmark. Several design technique limiting the DAC performances are introduced and eventual solutions are developed. Andrea Baschirotto

Chapter 1

Power Minimization in ADC Design Willy Sansen

Abstract An overview is given of the different ADCs, in which power consumption has been minimized. First flash ADCs are examined, in which interpolation and folding is used to reduce the number of comparators. Then pipeline and SAR ADCs are shortly reviewed. Oversampling ADCs are discussed in more detail. The noise shaping is carried out with Switched-capacitor, and with opamp/GmC filters. The text concludes with TDC based ADCs.

1 Introduction Excellent texts exist which give a good introduction on ADCs. Examples are the books by Van de Plassche [1], Razavi [2], Maloberti [3] and Johns and Martin [4]. A summary is given in the slide-based book of Sansen [5]. They all provide details on both system and transistor level. All of them compare ADC performance in terms of a FOM in pJoule per conversion step, which is limited by the effective number of bits resolution (ENOB), the bandwidth and the power consumption. The best overview of the state-of-the art is given on the website of Murmann [6]. It is clear that the minimization of the power consumption is the biggest concern of the designer. This is why this texts tries to highlight a number of design techniques which allow exactly that. Examples are drawn from all major categories of ADCs such as flash ADCs in which interpolation and folding is used to limit the number of comparators. Pipeline and successive-approximation ADCs follow shortly. Considerable attention is then paid to oversampling ADCs in which both switched-capacitor and continuous-time filters are used for noise shaping. Finally the capabilities are discussed of ADCs based on Time-to-Digital converters as they are most promising for nanometer CMOS technologies. W. Sansen () KULeuven, Groenstraat 124, 3001 Heverlee, Belgium e-mail: [email protected] M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 1, © Springer ScienceCBusiness Media B.V. 2012

3

4

W. Sansen

2 Flash/Interpolation and Folding Only three examples are given. The first one by Verbruggen [7] avoids preamplifiers for lower power consumption. The combination of 1-bit folding and a 4-bit flash ADC, using 15 comparators (see Fig. 1.1), leads to an impressive FOM of 50 fJ/conv.step. The bandwidth is 878 MHz. and is realized in 90 nm CMOS. The ENOB is only 4.7 bit as can be expected for high-speed flash converters. Another example is the 6-bit flash ADC which was realized in 45 nm CMOS [8]. Averaging and folding is realized with 65 dynamic comparators, which are all simple differential pairs. The realization (C045 in Fig. 1.2) is about a factor 8 worse than the simulation because of this averaging and folding circuitry, without calibration. The third flash ADC has been selected because of the low supply voltage [9]. Optimum performance is reached at 0.4 V but it operates down to 0.2 V. At 0.4 V the FOM is about 0.1 pJ/conv.step. It is realized in 0.18 m CMOS. This low supply

Fig. 1.1 A 5-bit folding flash ADC with 50 fJ/conv.step

Fig. 1.2 Five-bit folding flash ADC in 45 nm CMOS

1 Power Minimization in ADC Design

5

Fig. 1.3 Highly-digital flash ADC at minimum 0.2 V supply voltage

voltage is only possible when dynamic CMOS inverters are used as amplifiers and all transistors operate in weak inversion. This is discussed in more detail in the next Chapter (Fig. 1.3).

3 Pipeline ADCs All pipeline ADCs need precision amplification to provide the amplified residue to the next stage. This is easily achieved by switching matched capacitors. The amplifiers however take most of the current. A comparison of the types of amplifiers (for 100 MHz GBW and 2 pF CL) shows that the CMOS inverter is by far the best choice, despite its higher noise and its zero PSRR [5]. Its current consumption ITIOT is lowest and its swing is excellent. Moreover it acts as a simple class-AB stage, which allows power savings beyond to what is given in Fig. 1.4. More and more class-AB amplifiers are used in analog blocks, as they only use power when solicited. Multistage amplifiers provide even higher GBW.CL over ITOT ratios [5] but may not be the best choice if other specifications are taken into account. In nanometer CMOS technologies however, weak-inversion operation becomes evident as shown in Fig. 1.5. It gives the inversion coefficient (which is the ratio of the current to the si-wi crossover current) for a two-stage amplifier with 3 GHz GBW (with GBW D fT/16 [5]). The weak inversion asymptote is given as well. It is clear that important power savings can be achieved in nanometer CMOS, because of its high fT values. Other techniques to save power are opamp sharing, capacitor sharing, removing the S/H input stage, etc. They are summarized in [10].

6

W. Sansen

Fig. 1.4 Comparison of types of amplifiers Fig. 1.5 Nanometer CMOS technologies push weak inversion operation

A good example of an inverter based amplifier for a low-power pipeline ADC is shown in Fig. 1.6 [11]. The inverter schematic is clearly distinguished. Switches are used to set the biasing such that the actual current is much lower than in a conventional CMOS inverter. Cross-coupling is used to enhance the gain. A FOM of 72 fJ/conv. was reached in 90 nm CMOS.

4 SAR ADCs Successive-approximation ADCs are ideally suited for low power consumption as they only consist of a comparator, switches and an array of capacitors (see Fig. 1.7) [12]. It is obvious that a dynamic comparator must be used and that the unit capacitance must be minimized. The one in [12] uses 12 fF but in [15] only 0.5 fF is used (see Fig. 1.8). The resulting SNRD can never be high because of lack of

1 Power Minimization in ADC Design

7 To negative half circuit CNP

Stage1 φ1 : Resel φ2 : Sample φ3 : Compare φ4 : Amplify

Stage2

VBP CBP

VCM

Boot-strapped φ4

φ1 P0

φ2

φ4

φ1,3

φ4

CF

VIN

CF φ4

Cs A

φ2

φ4

CBN

VDAC φ3,4

N0 φ1

φ1

VCM

CS

φ4

φ1 VBN

VCM φ2e

φ4e

CNN

To negative half circuit

CNN

To negative half circuit

To negative half circuit

N0

Fig. 1.6 Pipeline ADC with inverters as class-AB amplifiers

Vcomp

Vref

Vref

Reference Switch

Vcomp

Vref 128C

64C

2C

C

+

–

C Comparator

Capacitor Array

CLK

Vin

Successive Approximation Register and Switching Network

D7

D6 D5

Fig. 1.7 Low-power SAR ADC

D4

D3

D2 D1

Vref GND D0

8

W. Sansen Clock input

Analog input

S&H Switched cap. network

Asynchronous logic

!

1.32 um

Digital output

DAC

0.70 um

0.5 fF Unit capacitance

INP INN

OUTP OUTN Ch

C7

C6 C5

C4

C3 C2

C1

C0

Sample CLK D7 D6

D5 D4 D3

D2

D1

D0

Fig. 1.8 Eight-bit 10 MS/s SAR in 90 nm CMOS [15]

good matching. As a result such low-power ADCs are only used for very low-power applications (portable electronics) and medium SNDR (8 : : : 10 bit). On the other hand their FOMs are impressive. It was 2.6 pJ for the one of [12], but 10 fJ/conv. for [13], 4.4 fJ/conv. for [14] and 12 fJ/conv. for [15].

5 Switched-Capacitor † ADCs Oversampling techniques give rise to higher resolutions than what is possible with matching. Three parameters determine the maximum SNDR, the first one of which is the oversampling ratio. The other ones are the multi-(or single-) bit quantizer and the order of the noise shaping filter [16]. At low supply voltages the switches become a problem. It is then easier to switch the amplifiers rather than the switches, as shown in Fig. 1.9 [17]. If class AB amplifiers are used, then excellent FOMs can be obtained. The amplifier used in [17] is s single VGS C VDSsat amplifier which allows supply voltage down to 0.7 V. It is shown in Fig. 1.10. Indeed transistor M2 acts as a source follower. It imposes vin2 on the source of transistor M1, which thus acts a single-transistor differential amplifier. It has a class-AB output as well which gives rise to low quiescent current consumption. The output current can thus be much larger than biasing current IB. Full feedforward is used now in all Sigma-delta modulators to avoid large signals at the input of the first integrator [18]. An example of a 4th order 1MS/s singlebit converter is shown in Fig. 1.11 [19]. It provides 1 MS/s and its FOM is 0.35 pJ/conv.step. Another low-power realization uses full feedforward as well and cascaded CMOS inverters as amplifiers. The minimum supply voltage is 0.7 V and its FOM is 0.1 pJ/conv.step [20]. Its schematic is shown in Fig. 1.12. The bandwidth is only 20 kHz but its power consumption is correspondingly small.

1 Power Minimization in ADC Design

9

Fig. 1.9 Switched-opamp sigma-delta converter section Fig. 1.10 Class-AB differential V-I converter

x

c1

+ +

c2 + c3 –+

I(z) a1

I(z) a2

I(z) a3

I(z) a4

y

+ c4 DAC

Fig. 1.11 Full-feedforward 4th order single-bit †D ADC

10

W. Sansen

0.5 1

–1 / 2

Z

1 + + –

Z 0.2

–1 / 2 –1

1–Z

0.4

Z

–1 / 2

–1 / 2 –1

0.1

1–Z

Z

–1

1–Z

2

–1 / 2

Z

+

DAC

Fig. 1.12 Twenty kilo Hertz †D modulator on 0.7 V

Fig. 1.13 Four-bit †D modulator architecture

Fig. 1.14 0.1–20 MHz SC †D modulator

For higher speed and higher resolution, multibit quantizers are required, which puts severe constraints on linearity of the multibit DAC in the feedback loop. Many techniques have been developed such as DEM, DWA, etc. An example is shown in Fig. 1.13 for a 1.2 MHz input more than 16 bit resolution signal [21]. Better performance has been reached recently [22] with a programmable modulator shown in Fig. 1.14. Its FOM is only 0.62 pJ/conv.

1 Power Minimization in ADC Design

11

Fig. 1.15 Ultra-low power amplifier based on a CMOS inverter

Fig. 1.16 SC modulator (left) and CT modulator (right)

The one with the lowest supply voltage (0.25 V) derives an internal supply voltage of 0.5 V. It reaches 10 kHz with 0.4 pJ/conv. CMOS inverters are used in weak inversion to reach such low power consumption of 7.4 W [23] (Fig. 1.15). For higher frequencies, SC filters cannot be used any more as they suffer from insufficient settling times and charge distribution effects. Continuous-time filters must then be used.

6 Continuous-Time † ADCs In such converters, the sampling is no more done at the input but in the quantizer (see Fig. 1.16). As a result no anti-aliasing filter is required. Moreover the speed of the amplifiers can be lower, saving a factor of 2 : : : 3 in power consumption. On the other hand, jitter occurs between the clocks of the quantizer and the DAC. Also differences between the time constants in the filters must be tuned out. Finally any delay around the feedback loop may cause stability problems [24, 25]. Two important discussions seem to continue. The first one has to do with the kind of filter to be used, an opamp based filter or a GmC filter. The other issue is related to feedforward versus feedback.

12

W. Sansen

Fig. 1.17 Comparison of filters [5]

– b4fs s + fs Vin

– a1fs s +

– a2fs s + DAC1

– a3fs s + DAC2

Digital Output DAC3

a4

1 bit FF fs = 640 MHz

Fig. 1.18 Architecture of 3-rd order single-bit † modulator

The following realizations will illustrate these points. It is clear from a first-order comparison (in Fig. 1.17) that opamp based filters excel in linearity at low frequencies. GmC filters on the other hand can reach higher frequencies but suffer from distortion. Both are used, sometimes even in the same † modulator ! An example of such CT † modulator using linearized GmC blocks is shown in Fig. 1.18 [26]. It only reaches 10 bit up to 10 MHz (0.22 pJ/conv.step). Local feedback is always used to sharpen the filter response. The CT Sigma Delta modulator with the lowest supply voltage, of only 0.5 V, is shown in Fig. 1.19 [27]. Its BW is 25 kHz and its FOM 1.5 pJ/conv.step. To reach such low voltage is only possible if either the bulk is used as an input or the bulks are slightly forward biased to reduce the threshold voltages.

1 Power Minimization in ADC Design

13

Fig. 1.19 Third-order single-bit † modulator on 0.5 V

Fig. 1.20 Third-order four-bit † modulator up to 20 MHz

Amplifiers biased in weak inversion can be sued provided the bandwidth is small as in [28], which reaches 24 kHz with 0.11 pJ/conv. For higher frequencies, CMOS inverters are used as in [29] with 0.4 pJ/conv. The lowest FOM hitherto has been reached in [30] with 0.12 pJ/conv. It is shown in Fig. 1.20. Three-stage class-AB amplifiers are used! Similar frequencies are obtained in [31, 32], the latter one of which uses an opmap in the input stage, followed by two GmC blocks. Also the DACs use SC techniques to save power. Its FOM is 0.23 pJ/conv.step. The last example reaches 125 MHz and is realized in 45 nm CMOS [33]. Now two operational amplifiers are used followed by a single GmC block as shown in Fig. 1.21. Its FOM is only 0.65 pJ/conv.step.

14

W. Sansen

Fig. 1.21 A 125 MHz CT † modulator in 45 nm CMOS

7 TDC Based ADCs Nanometer CMOS technologies require lower supply voltages, although 0.9 V seems to become the new standard. As a result, maybe it is better to convert the input signal directly into a frequency, rather than to convert the input signal amplitude into a quantized equivalent, which suffers from offset and noise. Such ADCs uses a Pulse-width modulator (PWM) or an oscillator (VCO or ring oscillator). Its most important problem is the linearity of the conversion of the voltage into a frequency [34–36]. An early version of such VCO based ADC is shown in Fig. 1.22 [34]. The supply voltage of the ring oscillator is driven by the input signal, after buffering. Logic is required to output the output bits. As a result of the architecture, first-order noise shaping is realized, leading to excellent results. A bandwidth of 100 kHz was reached with 10 bit resolution, giving rise to a FOM of 3 pJ/conv.step. The same authors realized a version [34, ICECS 2009], with 50 MHz and 9.5 bit in 65 nm CMOS, yielding 20 fF/conv.step, which is impressive indeed. A similar VCO-based ADC was realized for low frequencies (20 kHz) operating at 0.2 V supply voltage [36], which gave 82 fF/conv.step. As linearity of such VCO’s is a problem, a linearization technique is used in Fig. 1.23 [37]. Two oscillators are connected in a differential configuration. The FOM is 25 fJ/conv.step for a 40 nm CMOS technology! The maximum frequency is 10 MHz.

1 Power Minimization in ADC Design

15

Fig. 1.22 VCO based ADC [34]

Fig. 1.23 Linearized VCO based ADC [37]

A higher frequency of 30 MHz is obtained in [38] with a FOM of 143 fJ/conv. A bandwidth of 20 MHz is obtained in [39] with 0.33 pJ/conv. Its configuration is shown in Fig. 1.24. A TEQ (Time Encoding Quantizer) replaces the flash quantizer in Fig. 1.25 [40]. A clock of 2.56 GHz is used for high resolution. Another example of a VCO-based CT † modulator is shown in Fig. 1.25 [41]. Current starved pseudo-differential CMOS inverters are used in the current controlled ring oscillators (ICRO). Non-linear correction improves the linearity. It reaches 18 MHz and its FOM is 0.25 pJ/conv.step. The final one [42] reaches 20 MHz and its FOM is 0.32 pJ/conv.step, in 65 nm CMOS (Fig. 1.26).

16

W. Sansen (N-1)th Order Loop Filter a1

an-2

in(t) –

+

K1(s)

K2(s)

Kn-1(s)

1 s

1 s

1 s

VCO Quantizer Kn(s) –

+

Quantizer

1 s

τD

CLK NRZ DAC

NRZ DAC

Loop Delay

RZ DAC

out[n]

First Order Difference

REG

1–z–1

REG CLKB

CLK

CLK

Fig. 1.24 CT † with VCO quantizer [39]

Fig. 1.25 CT † with TEQ [40]

8 Conclusions An overview is given of the state-of-the art of ADCs. Most attention is paid to oversampling converters with SC or continuous-time filters. Finally TDC based ADCs are discussed as they show great promise for nanometer CMOS technologies. They can all be mapped on the P/fs (in pJoule) versus SNDR (in dB) curve [6], showing which ADC to use for which performance.

1 Power Minimization in ADC Design

n(t)

15-element ICRO

30

15-element ICRO

30

17

30 Ring Sampler

Phase Decoder

5

30 Ring Sampler

Phase Decoder

5

5

Nonlinearity Correction

5

Non14 linearity + – Correction 15

–1

1–z

VII –1

1–z

4-level DAC

14

16

Dither LFSR fs / 8

+ 4-level DAC 15-element ICRO

30

15-element ICRO

30

VII

30 Ring Sampler

Phase Decoder

5

30 Ring Sampler

Phase Decoder

5

–1

5

1–z

–1

1–z

5

Nonlinearity Correction Nonlinearity Correction

14

+ –

14

fz Calibration Unit

Fig. 1.26 Variable-rate CT † with ICRO’s [41]

References 1. R. Van de Plassche, Integrated Analog-to-Digital and Digital-to-Analog Converters (Kluwer Academic Press, Boston, 1994) 2. B. Razavi, Principles of Data Conversion System Design (IEEE Press, New York, 1995) 3. F. Maloberti, Data Converters (Springer, Dordrecht, 2007) 4. D. Johns, K. Martin, Analog Integrated Circuit Design (Wiley, New York, 1997) 5. W. Sansen, Analog Design Essentials (Springer, Dordrecht, 2006) 6. B. Murmann, http://www.stanford.edu/murmann/adcsurvey.html 7. B. Verbruggen,JSSC 44(3), 874–882 (2009) 8. P. Veldhorst,ESSCIRC, Athens, 2009, pp. 464–467 9. D.C. Daly, JSSC 44(11), 3030–3038 (2009) 10. J.W. Nam, ESSCIRC, Athens, 2009, pp. 468–471 11. J. Kim, B. Murmann, ESSCIRC, Sevilla, 2010, pp. 378–380 12. M. Scott, JSSC 38(7), 1123–1129 (2003) 13. G. Van der Plas, JSSC 43(12), 2631–2640 (2008) 14. M. Van Elzakker, ISSCC, San Francisco, 2008, pp. 244–245 15. P. Harpe, ESSCIRC, Sevilla, 2010, pp. 214–217 16. R. Schreier, G. Temes, Understanding Delta-Sigma Data Converters (Wiley, Chichester, 2004) 17. V. Peluso, JSSC 32, 1887–1896 (1998) 18. J. Silva, Electron Lett 37(12), 737–738 (2001) 19. L.B. Yao et al., Low-Power Low-Voltage † Modulators in Nanometer CMOS (Springer, Dordrecht, 2006) 20. M. Chae, ISSCC, San Francisco, 2008, p. 27.2 21. Y. Geerts et al., JSSC 35(12), 1829–1840 (2000) 22. T. Christen, ESSCIRC, Sevilla, 2010, pp. 414–417 23. F. Michel, ISSCC, San Francisco, 2011, pp. 476–477 24. J.A. Cherry, W.M. Snelgrove, CT SD Modulators for High-Speed ADCs (Kluwer Academic Press, Norwell, 2000) 25. S. Paton, JSSC 39(7), 1056–1063 (2004) 26. R. Schoofs, CAS 54(1), 209–217 (2007)

18 27. K.P. Pun, JSSC, 42(3), 496–507 (2007) 28. S. Pavan, JSSC, 45(7), 1365–1379 (2010) 29. R.H.M. van Veldhoven, ISSCC, San Francisco, 2008, pp. 492–493 30. G. Mitteregger, JSSC 41(12), 2641–2649 (2006) 31. L.J. Breems, JSSC 42(12), 2696–2705 (2007) 32. P. Crombez, JSSC 45(6), 1159–1171 (2010) 33. M. Bolatkale, ISSCC, San Francisco, 2011, pp. 470–471 34. T. Watanabe, JSSC 38(1), 120–125 (2003); ICECS, Hammamet, 2009, pp. 271–274 35. D. Hovin, JSSC 32(1), 13–22 (1997) 36. U. Wismar, ESSCIRC, Montreux, 2006, pp. 187–190 37. Op’t Eynde, ISSCC, San Francisco, 2010, pp. 450–451 38. J. Daniels, VLSI Circuits, Honolulu, 2010, pp. 155–156 39. M. Park, M.H. Perrott, JSSC 44(12), 3344–3358 (2009) 40. E. Prefasi, ESSCIRC, Sevilla, 2010, pp. 430–433 41. G. Taylor, I. Galton, JSSC 45(12), 2634–2646 (2010) 42. V. Dhanasekaran, JSSC 46(3), 639–650 (2011)

W. Sansen

Chapter 2

Low-Power Pipelined A/D Conversion Boris Murmann

Abstract This paper reviews recent developments and low-power design techniques for high-speed pipelined A/D converters. The discussion spans a review of the fundamental operation principles, a summary of widely used low-power techniques, and an examination of ideas that have been proposed in recent research publications. As we will show, the best research-level designs reach a power efficiency that lies within an order of magnitude of practically achievable limits in today’s architectures. This corresponds to a 2–3 order of magnitude improvement relative to the first pipelined ADCs designed in the late 1980s and early 1990s.

1 Introduction Pipelined ADCs have been investigated since the late 1980s [1–3] to enable highspeed conversion at moderate-to-high resolutions of approximately 8–14 bits. In the early days of its commercial adoption, the pipelined architecture was used to digitize video signals at approximately 20 MS/s and 10 bits of resolution [4]. At that time, with CMOS feature sizes of several microns, this level of performance was difficult to achieve with competing architectures. Even though more than two decades have passed, the situation hasn’t changed much. Despite the fact that competing architectures (such as successive approximation and oversampling ADCs) have substantially widened their performance space, pipelined converters still enjoy great popularity in high-speed applications. This can be seen from Fig. 2.1, which plots experimental ADC data presented at the IEEE International Solid-State Circuits Conference (ISSCC) and the VLSI Circuit Symposium from 1997 until 2011 [5]. For the range of 50–80 dB signalto-noise-and-distortion ratio (SNDR), the pipelined architecture clearly dominates B. Murmann () Stanford University, 420 Via Palou Mall, Stanford, CA 94305-4070, USA e-mail: [email protected] M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 2, © Springer ScienceCBusiness Media B.V. 2012

19

20

B. Murmann

Flash Folding Two-Step Pipeline (Interleaved) Pipeline (1 Channel) SAR ΔΣ Other 100 fsrm s Jitter

1010

BW [Hz]

109

108

107

106 20

30

40

50

60

70

80

90

100

110

120

SNDR [dB]

Fig. 2.1 Experimental data for ADC bandwidth (BW) versus SNDR. Pipelined ADCs dominate the 50–80 dB SNDR range for BW > 10 MHz

the architectural landscape for conversion bandwidths above 10 MHz. Within this range, popular applications include wireless base stations [6], wireless LAN [7, 8], Ethernet transceivers [9], and medical ultrasound imagers [10]. Regardless of the underlying architecture, a general application-driven necessity in the design of modern data converters is the aggressive reduction of power dissipation. According to the analysis described in [11], A/D conversion energy has halved every 2 years over the past decade (on average, across all architectures and performance regimes). While some of these improvements are undoubtedly due to technology scaling, refinements in ADC architectures and circuit design have carried a significant weight in this trajectory. Thus, the purpose of this paper is to review the progression of relevant design techniques for the specific case of highspeed pipelined A/D converters. As a basis for this discussion, Sect. 2 will review pipelined conversion at the conceptual block diagram level. In Sect. 3, common-practice and widely productized techniques applicable to low-power design are reviewed. These include for instance stage scaling, comparator redundancy, and amplifier sharing. In Sect. 4, we will then present ideas that have been evaluated mostly at the level of university research. The topics covered include OpAmp-less stage implementations as well as various digital enhancement techniques. In light of this discussion, Sect. 5 compares the state-of-the art with practical limits on power dissipation and Sect. 6 ends this chapter with a summary.

2 Low-Power Pipelined A/D Conversion

21

2 Review of Pipelined A/D Conversion Figure 2.2 shows a conceptual block diagram of the pipelined A/D converter topology. Several converter stages are cascaded and process the analog input sequentially, similar to flip-flops propagating a bit stream in a digital shift register. Each stage samples and holds its analog input (through S/H circuits implicitly contained in the local A/D block and the summing node) and performs a coarse A/D conversion. The error of this conversion result, often called residue, is computed with the help of a local D/A converter, amplified and fed to the next stage in order to extract additional bits. The last stage contains only a quantizer, since there is no further need to compute a residue. The sub-ADC results are aligned in time (using shift registers) and combined to yield the final digital output word. The principal advantage of this ADC architecture is that due to stage pipelining, its throughput rate is set by the time needed to perform a single sub-A/D and D/A conversion. Similar to workers on a manufacturing line, the stages operate concurrently, i.e. while stage i acquires a new input, stage i C 1 operates on the previous output of stage i. The propagation time through the cascade of pipeline stages merely results in conversion latency, which is tolerable in many applications. In order to understand the operating principle of a pipelined converter in more detail, it is useful to look at a simple two-stage pipeline example as shown in Fig. 2.3. In this diagram, the sub-DAC is ideal and the sub-ADCs are modeled as unity gain elements that introduce an additive quantization error. For an ideal

Align & Combine Bits Dout Vin SHA

Stage 1

Vin1

Stage n-1

Σ

G1 –

ADC

DAC D1

Fig. 2.2 Block diagram of a pipelined ADC

Vres1

Stage n

22

B. Murmann Stage 1

Stage 2

Vin

Σ

G11 G –

Vres1 = –G1εq

Σ εq2

Σ εq1 D2

D1 Dout

Σ

1 / Gd1

Fig. 2.3 Two-stage pipeline ADC example

B-bit quantizer, the quantization error (©q ) is bounded by ˙ ½, where is the quantizer’s step size. From this model, it is straightforward to show that ©q2 G1 C Dout D Vin C ©q1 1 Gd1 Gd1

(2.1)

and thus, for Gd1 D G1 Dout D Vin C

©q2 Gd1

(2.2)

From this result, we see that the “digital gain” term that combines the two subconversion results should ideally be set to the reciprocal of the gain used in the analog signal path.1 Under this condition, and assuming that the sub-DAC is ideal and none of the quantizers are driven into overload, it follows that the quantization error of the overall pipeline in this example is equal to the quantization error of the last stage, divided by gain of stage 1. It is straightforward to show that this result extends to an n-stage pipeline, in which case the last stage’s quantization error is divided by the aggregate gain in the analog path. We can therefore summarize the following key results: • In an ideal n-stage pipeline ADC (ideal DACs, quantizers are not overloading and ideal matching of analog and digital gain terms), the quantization error of stages 1 through n 1 does not appear in the output. • The aggregate bit-resolution of a pipeline is given by the resolution of the last quantizer plus the dual logarithm of the aggregate amplifier gain. I.e., for each factor of two in gain, the last stage’s quantization error is cut in half, corresponding one bit improvement in resolution.

1

The reader may notice a similarity to cascade delta-sigma converters: perfect cancellation of the first stage quantization noise requires perfect coefficient matching between the analog and digital domains; any mismatch will “leak” a portion of the coarse quantization error into the output.

2 Low-Power Pipelined A/D Conversion

23

3 Basic Low-Power Design Techniques Given the structural features of a pipelined converter and its underlying circuits, there exist a number of basic opportunities for translating relaxed precision requirements into power savings. We will review the most widely used concepts in this section using a prototypical circuit realization described next. Figure 2.4 shows a conceptual single-ended representation of a conventional pipeline stage. This circuit consists of a flash-type sub-ADC, a capacitive charge redistribution network, and an operational transconductance amplifier (OTA). The switched capacitor network, in combination with the OTA is often called MDAC (multiplying D/A converter). Like most switched capacitor circuits, the stage operates in two main clock phases. During the sampling phase (¥1 ), the stage input signal is acquired. In the redistribution phase (¥2 ), a residual charge packet, controlled by the local conversion result D, is redistributed onto the feedback capacitor CF to produce the amplified stage residue, Vres . In this scheme, the precise matching of capacitors in modern technologies and electronic feedback are leveraged to achieve an accurate realization of the required sub-DAC, subtraction and gain functions.

3.1 Comparator Redundancy The most widely used technique employed to save power in the stage’s flash subADC is comparator redundancy. As we have seen from the analysis of Sect. 2, the errors from the flash ADCs of stage 1 through n 1 do not appear in the converter’s output. Consequently, we should be able to get by with rather imprecise sub-ADCs that utilize near-minimum size transistors and run at very low levels of power. This turns out to be true in practice, provided that proper care is taken to prevent quantizer over-ranging. φ1

CF

. ..

. ..

Vin

Cs[1:m]

... Flash ADC

Switches

D

±Vref

Fig. 2.4 Generic pipeline stage implementation

Vres OTA φ1′

φ2

φ2

24

B. Murmann

Fig. 2.5 Overranging due to sub-ADC decision level error

Overranging due to misplaced decision level

Full-scale range

Vres1 4

Vin

As we see from the example in Fig. 2.3, the residue of the first stage is simply a gained-up version of the local quantization error. For the pipeline to work properly, we must ensure that the output does not saturate the following quantizer, i.e. the output must not exceed the next quantizer’s full-scale range (VFS ). Assuming that the local sub-ADC is free of errors in its decision levels, this condition is ensured as long as G1 2B since the quantization error of a B-bit ADC is bounded by VFS /2B and thus G1 VFS /2B VFS () G1 2B . The limit case of G1 D 2B with B D 2 is illustrated in Fig. 2.5. From this example, it is clear that any error in the sub-ADC decision levels will lead to over-ranging. Thus, it is impractical to design a stage for this limit case; it would require very high precision (and thus high power) in the sub-ADC comparators. There are three ways to ensure that over-ranging is avoided despite large errors in the sub-ADC’s decision levels. The first (and probably most obvious) is to design for a reduced gain less than 2B . This idea, sometimes called “radix <2,” is utilized e.g. in [12], which uses B D 1 and G D 1.93. In order to tolerate relatively large errors and to maintain a radix of 2 (which simplifies the digital logic that combines the bits), another option is to chose G D 2B1 , which was done e.g. in the design of [13]. A particularly elegant and popular variant for a stage that has G D 2 is to employ a sub-ADC with only two decision levels (instead of three), i.e. B D log2 (2 C 1) D 1.589 bits. Such a stage is typically called “1.5-bit” in literature [4]. A third option is to allow the stage to extend the full-scale range of the quantizer that takes the over-ranging residue voltage [14]. With proper redundancy in place, large comparator offsets can be tolerated, often allowing the designer to employ basic dynamic latch comparators without preamplification (unless charge kick-back is a concern). Typically, the power required for such comparators in modern technologies is on the order of 100 W/GHz (100 fJ per comparison) or less [15, 16]. Thus, the overall comparator power in a pipelined ADC tends to be small compared to the power dissipated in the MDAC circuit.

2 Low-Power Pipelined A/D Conversion

Vin

Stage 1

Stage 2

C1 / 2

C1 Vin

25

Stage 3

C2 / 2

C2 Gm

C3 / 2

C3 Gm

Gm

Fig. 2.6 Pipelined ADC model for noise considerations

3.2 Stage Scaling, Per-Stage Resolution and Noise Budgeting Unlike comparator nonidealities, errors in the MDAC circuitry refer directly to the input of the pipeline and there is no inherent architectural suppression mechanism. The most critical implication of this is that the electronic noise of the MDAC circuits must be minimized commensurate with the target resolution of the ADC. Since many stages contribute to the overall input referred noise, a key question is how the input referred noise budget should be distributed across the pipeline to minimize power dissipation. This question was investigated in detail by Cline [17] and later refined by Chiu [18]. In order to understand the first order result, consider the simplified ADC model in Fig. 2.6, using a stage gain of two as an example. Without worrying about the absolute value of the total input referred noise (which would require a more detailed analysis), we begin by expressing the rough proportionalities of the input referred noise, given by

Ntot

1 1 1 1 1 1 / kT C C C C1 C2 22 C3 22 22

(2.3)

In words, the noise power of the second stage is divided by 4 when referred to the input, and the third stage noise is divided by 16. If we choose all capacitors the same size, then the first stage will clearly dominate the noise. This is wasteful, since the second and third stage amplifiers are driving large capacitors that are essentially irrelevant to the circuit’s noise performance. Consider now the option of scaling the capacitors such that C2 D C1 /4 and C3 D C1 /16. In this case, each stage will contribute equally to the overall input referred noise. Unfortunately, this choice is also wasteful, since the first stage has a very small noise budget, requiring a very large input sampling capacitance. Further analysis shows that the optimum capacitance scale factor lies approximately midway between the two extremes considered above, i.e. the capacitance should be scaled by the aggregate gain (not gain squared) that precedes each stage. For instance, in the above example, C2 D C1 /2 and C3 D C1 /4 are near optimum

26

B. Murmann

Fig. 2.7 Pipelined ADC power as a function of stage scaling (from [18]) and for various per-stage resolutions

choices for a practical design. The exact optimum can deviate from these values depending on the speed of the converter and the per-stage resolution. Fortunately, however, the optimum is well-behaved and shallow, as can be seen from Fig. 2.7. For this illustration, the taper factor x is defined such that the capacitor scaling factor is equal to Gx , where G is the stage gain. Thus, x D 1 corresponds to a capacitor scaling by exactly the stage gain. Clearly, deviating by some fraction from x D 1 does not result in a significant change in power dissipation, as long as one does not approach x D 0 or x D 2, which correspond to the extreme (and undesired) scaling scenarios discussed above. In addition to properly distributing the electronic noise budget across stages, the designer must decide upon a split of the overall input noise budget into electronic noise and quantization noise. For an ideal ADC with quantization step , the input referred quantization noise is 2 /12. In general, it is wasteful to make the input referred electronic noise much smaller than the quantization noise; this is because in this case extra amplifier power is invested in diminishing gains in signal-to-noise ratio. For this reason, many designs make the input referred thermal noise at least comparable to the quantization noise. For embedded converters, one can argue that designing toward the other extreme of dominant electronic noise can be beneficial [19]. In an embedded converter, the cost of generating additional bits to reduce quantization noise (and thus allow for a larger electronic noise budget) is relatively low, since there is no I/O power associated with these bits. A quantitative evaluation of the proper sampling capacitor size versus number of bits and overall SNR is plotted in Fig. 9 of reference [20]. A final and equally important architectural parameter that must be considered in the power optimization of a pipelined converter is the per-stage resolution. Qualitatively speaking, the main result here is that the low-to-moderate speed pipelines can benefit from larger state resolutions of up to 4–5 bits (especially

2 Low-Power Pipelined A/D Conversion

27

in the first stage). This is because large per-stage resolutions minimize the OTA count at the expense of extra loading from the flash sub-ADCs. At moderate speeds, this extra loading is not detrimental and can be easily handled. For example, the designs of [21, 22] use only 3 and 5 stages to implement 10- and 14-bit converters, respectively. Recently, very low-power two-stage multi-bit designs were presented in [23, 24]. These ADCs use successive approximation to realize efficient subADCs, and rely on a single residue amplifier. Unfortunately multi-bit designs lose their advantage when the goal is to extract the maximum possible speed from a pipelined converter in a given technology. At the highest possible speeds, the designer will typically opt for a single-bit of effective stage resolution; see e.g. the 500-MS/s design of [25].

3.3 Mitigation of Capacitor Mismatch Since any errors in the first stage’s MDAC refer directly to the input of the converter, the matching precision of the employed capacitors must be commensurate with the overall target resolution of the design. In some cases, this leads to an outcome where the capacitor size is set by matching constraints rather than electronic noise requirements (see e.g. [7]). Thus, one can argue that any technique that addresses capacitor mismatch can translate into power savings. One popular architectural option is to increase the resolution of the pipeline’s first stage. One can show that for each extra bit resolved in the first stage, the converter DNL improves by 1 bit, while the INL improves by 0.5 bits. This was leveraged in the design of [26], which uses a 4-bit (3-bit effective) first stage to achieve no missing codes at 14-bits of resolution without any form of calibration and capacitor sizes set by electronic noise. Due to the limited benefit of increased stage resolution on INL, however, the integral nonlinearity in this design is still several LSBs at the 14-bit level.2 If better INL is desired, a variety of correction techniques are available in the analog or digital domain. In analog domain, it is for instance possible to employ passive error averaging to effectively cancel capacitor mismatch [18]. In the digital domain, there exist a wide range of schemes that are essentially based on adjusting the “digital gain terms” in the logic that combines the stage outputs (e.g. Gd1 in Fig. 2.2). The coefficients are typically measured iteratively, using the existing hardware within the pipeline. For instance, mismatch in stage i can be measured using the backend converter comprising of stages i C 1 : : : n. Self calibration techniques of this kind are described for instance in [12, 27–29]. As far as capacitor mismatch is concerned, it is important to note that the resulting errors do not drift. Consequently, it is sufficient to measure them once, e.g. during production test (or power up), and store the coefficients in fuses (or SRAM).

2

This is tolerable, e.g. in imaging applications.

28

B. Murmann

3.4 Amplifier Sharing and SHA-Less Frontends Beyond proper budgeting and distribution of electronic noise, trying to reduce the number of OTAs used in the pipeline is the designer’s next best architectural option for additional power savings. In this spirit, the concept of amplifier sharing has been evaluated. The main idea is to leverage the fact that the OTA in Fig. 2.4 is only used during charge redistribution; it is idle (reset) in the phase where the stage acquires its input. During that time, the OTA can potentially be used in the following stage, which performs charge redistribution in this time slot. This idea is used for example in the designs of [21, 30–32]. The key issues that limit the efficacy of inter-stage OTA sharing are: (1) the performance requirements (gain, bandwidth, load capacitance, etc.) are not equal in the two stages that share the same OTA; this leads to overdesign, (2) sharing the OTA requires additional switches that add load capacitance at critical nodes and thus lead to increase the power dissipation of the shared amplifier. The first of these issues can be overcome if the sharing occurs between two identical pipelines that run in parallel, as for instance in the I/Q channels of a radio receiver. An implementation that describes such an approach is found in [33]. In addition to the switch parasitics, a remaining problem in this approach is the channel-to-channel cross-talk that is introduced from switching a non-reset OTA between the two ideally independent signal paths. A more popular option for reducing the number of high-power OTAs in the pipeline is to eliminate the dedicated sample-and-hold amplifier (“SHA” in Fig. 2.1) from the pipeline. This option is attractive since this stage tends to dissipate up to one third of the total power in a high-performance design. One of the first highperformance designs that uses a SHA-less front-end was described by Mehr in 2001 [13]. Today, SHA-less operation is widely used across a wide range of performance specs, even in designs that sample at IF frequencies [34]. The primary issue that must be overcome in a SHA-less design is the clock timing and bandwidth mismatch between signals acquired by the sub-DAC and the main MDAC path. For relatively low-speed designs with a high degree of comparator redundancy in the first stage (e.g. 1.5-bit architecture), the designer can typically rely on the component matching of modern technologies (see e.g. [35]). For designs with wideband inputs, it is conceivable to measure and calibrate the mismatch [36]. Alternatively, it is possible to merge the sampling operation for the two paths into a single circuit [37].

3.5 Flip-Around Charge Redistribution Beyond reducing the number of amplifiers in the pipeline, the next step is to improve the efficiency of the circuit at the transistor level. A widely used idea is to employ “flip-around” charge redistribution [38]. A complete 1.5-bit pipeline stage that uses this technique is shown in Fig. 2.8. The input is sampled on Cs and Cf (nominally

2 Low-Power Pipelined A/D Conversion

29

Fig. 2.8 1.5-bit pipeline stage implementation using flip-around charge redistribution [39]

same size), and Cf is flipped around the amplifier as a feedback capacitor during redistribution. This scheme achieves a closed loop gain of 2 with a feedback factor of 1/2 (neglecting OTA input capacitance). If the generic circuit of Fig. 2.4 is used instead, the feedback factor would be only Cf /(Cf C 2Cs ) D 1/3. Since both the speed and noise performance of the circuit benefit from larger feedback factors, substantial power savings are enabled by flip-around charge redistribution.

3.6 Time Interleaving Time-interleaved architectures [40] exploit time parallelism and trade hardware complexity for an increased aggregate sampling rate. Once a single-channel pipeline has been optimized using the above-discussed techniques, time-interleaving can be employed to increase the achievable sampling rate in a given technology. In addition, interleaving has benefits for power efficiency, since the individual channels can be designed for lower speeds and thus achieve improved power efficiency. Time interleaved pipelined ADCs are currently popular in the 10GBASE-T Ethernet application, which demands sampling rates on the order of 1 GHz and 11 bits of resolution. While it may be possible to achieve this sampling rate by interleaving two channels, it is typically more efficient to work with four or more channels [9, 41]. A numerical justification of this argument is found in [20].

4 Exploratory Low-Power Design Techniques The techniques described in the previous section are widely used and have been in mass production for many years. In this section, we will consider exploratory ideas that have not yet found widespread industry adoption, and are mostly used in

30

B. Murmann

experimental converters. Since approximately 50–70% of all power in a pipelined ADC is dissipated in the residue amplifiers, research has focused mainly on this particular building block within the converter.

4.1 Residue Amplification with Low Loop Gain A first and most basic opportunity for power reduction lies in simplifying the OTA circuit shown in Fig. 2.4. Instead of using multiple gain stages to achieve high loop gain (to ensure precise amplification), it has been proposed to work with simplified single-stage amplifiers [42, 43], and to use calibration to overcome the resulting errors. Since the open-loop gain of an OTA varies with temperature, the calibration must run continuously and track changes in the converter’s operating conditions. Unfortunately, this creates overhead in and complexity that takes away from the benefits of the (sometimes minor) OTA simplification. In the realization of [6], a 10mW auxiliary ADC is used to monitor the gain of the first residue amplifier, helping to cut its power in half. This overhead is justifiable in this high-performance ADC that dissipates a total power of 850 mW. In general, when applying this particular or a similar technique, the benefits must be carefully evaluated against the cost of the added calibration. An alternative to handling the low amplifier gain through calibration is to apply some form of double sampling to boost the effective gain of the amplifier. The correlated level shifting technique (CLS) proposed recently [44] shows promise in this direction, as it also increases the available swing, which is highly desired in low-voltage technologies. In the context of designs that use simplified amplifiers, it should also be mentioned that there exist options in the optimization of the process technology. For instance, the design of [45] uses customized “high performance analog transistors” with increased intrinsic gain to achieve sufficient gain for 10-bit precision with a single-stage OTA.

4.2 Class-AB and Comparator-Based Residue Amplification Even in simplified, single-stage OTA realizations, the charge transfer from the power supply to the capacitive load is inherently inefficient, since the amplifier draws a constant current, while it delivers on average only a small fraction of this current to the load. In [46], it was found that the efficiency of a class-A OTA in a switched capacitor circuit is inversely proportional to the number of settling time constants. For the typical case of settling for approximately ten or more time constants, the overall efficiency, i.e. charge drawn from the supply versus charge delivered to the load, is only a few percent. One step toward improving the amplifier efficiency is to employ class-AB amplification [35, 47]. So far, class-AB amplification hasn’t been very popular for

2 Low-Power Pipelined A/D Conversion

31

Fig. 2.9 Comparator-based switched capacitor circuit + –

Vout CL

several reasons. First, a typical class-AB OTA still requires a class-A input stage, which upper bounds the possible savings, especially considering the relatively small load capacitances seen in pipelined ADCs. Furthermore, class-AB operation tends to reduce the achievable bandwidth when compared to a class-A design in the same technology. In [35], a single-stage inverter-based class-AB design was used to achieve good power efficiency at a sampling rate of 30 MS/s. A recent concept that has gained momentum is to replace the OpAmp by a comparator [48, 49], yielding a so-called comparator-based switched capacitor circuit (see Fig. 2.9). In this scheme, the comparator shuts off the capacitor charging current when the final signal value is reached. Thus, no charge is wasted and the efficiency of this topology can approach that of a class-B amplifier.

4.3 OpAmp-Less Residue Amplification A more aggressive approach to reduce the power spent on residue amplification is to eliminate the OTA (or comparator) in the stage entirely, and work with a more simplistic configuration that does not rely on OpAmp-like circuitry. Figure 2.10 shows examples of such circuits. The open-loop circuit of Fig. 2.10a [50] is similar to that of Fig. 2.4, except for the charge redistribution phase. The charge packet on the capacitive array is not re-distributed onto a feedback capacitor, but remains in place to produce a small voltage at node Vx . This residue is amplified by a resistively loaded transconductance stage to produce the desired full-swing residue voltage Vres . Since the high gain requirement in the transconductor is now dropped, a basic differential pair-type stage can be used to replace the complex amplifier in Fig. 2.4. Other variants of this open-loop residue amplification approach have been published; see e.g. [51, 52], and the current mode topology of [53]. The circuit in Fig. 2.10b [54] samples the input on two capacitors connected in parallel. Next, it stacks the capacitors in series to achieve a voltage gain of two. The 1 buffer can be a low-power source follower circuit. In scheme (c) a “bucket brigade” pass transistor is used to move a sampled charge packet (q) from a large sampling capacitor (CS ) to a smaller load capacitor (CL ), thereby achieving voltage gain without drawing a significant amount of energy from the supply [55]. The circuit in Fig. 2.10d uses the gate capacitance of a transistor to acquire a charge

32

B. Murmann

a

...

...

Vin

b

Cs[1:m] Vx

...

Switches

ADC

±Vref

D

c

Vres

Gm

d

+ – Vout

V

q

VDD

Vin

(large)

(small)

CL

q1

q2

q VBIAS

C

q1 + q2

– V + t q=0

Vout CL

(floating)

e

Fig. 2.10 Examples of OpAmp-less residue amplification schemes

sample (q1 , q2 ). The transistor is then switched into a source-follower configuration, moving all signal charge (q1 C q2 ) to the small capacitance from gate to drain, which also results in voltage amplification [56]. Much like the comparator based circuit of Fig. 2.9, all of the charge drawn from the supply is used to drive the load, thereby enabling high power efficiency. In this same spirit, the dynamic amplifier used in [57] [Fig. 2.10e] turns off once the amplification is complete and thereby achieves very low power dissipation. In this circuit, all nodes are first reset and the load capacitors are charged with a rate that depends on the input (inC, in). The current is shut off via transistors P1 once the stage’s comparator has made a decision. A general concern with most minimalistic design approaches is that they tend to sacrifice robustness, e.g., in terms of power supply rejection, common mode rejection and temperature stability. It remains to be seen if these issues can be handled efficiently in practice. Improving supply rejection, for instance, could be achieved using voltage regulators. This is custom practice in other areas of mixedsignal design, as for example clock generation circuits [58]. Especially when the power of the ADC’s critical core circuitry is lowered significantly, implementing supply regulation should be justifiable.

2 Low-Power Pipelined A/D Conversion

33

Analog Nonlinearity

Vin

Σ

Backend A/D

g –

A/D

Σ

D/A Coefficient Estimation

MOD Dout

Σ

g–1

Digital Inverse

Fig. 2.11 Digital residue amplifier linearization

A second issue with minimalistic designs is the achievable resolution and linearity. OpAmp circuits with large loop gain help linearize transfer functions; this feature is often removed when migrating to simplified circuits. For instance, the amplifier scheme of Fig. 2.10d is linear only to approximately 9-bit resolution. In cases where simplicity sacrifices linearity, it is attractive to consider digital linearization techniques (see Fig. 2.11). In [50], it was demonstrated that a basic open-loop differential pair can be digitally linearized (to 12-bit precision) using only a few 1,000 logic gates. Subsequently, this idea has been refined e.g. in the works of [59–62]. One key issue in most digital linearization schemes is that the correction parameters must track changes in operating conditions relatively quickly; preferably with time constants no larger than 1–10 ms. Unfortunately, most of the basic statistics-based algorithms for coefficient adaptation require much longer time constants at high target resolutions [60, 63]. Additional research is needed to extend the recently proposed “split-ADC” [64, 65] and feedforward noise cancellation techniques [66] for use in nonlinear calibration schemes.

4.4 Dynamic Error Correction Most of the digital correction methods developed in recent years have targeted the compensation of static circuit errors. However, it is conceivable that in the future the correction of dynamic errors will become attractive as well. One opportunity

34

B. Murmann 10–6

[4]

10–8 P / fs [Joules]

[34]

10–10

[24] [49] [56]

100 fJ / conv.-step [57]

10–12

10–14 20

10 fJ / conv.-step

[23]

30

40

50 60 SNDR [dB]

70

80

90

Fig. 2.12 Solid-line: predicted power limits for pipelined ADCs using class-A residue amplification (eq. (27) of [65] for 90 nm technology). The dashed lines represent the figure of merit P/(fs 2ENOB ), for 10 and 100 fJ/conv-step

is the digital correction of errors due to finite slew rate, incomplete settling or incomplete reset of amplifiers [59, 67–69]. With such corrections in place, the speed of the amplifier can be reduced and potentially translated into significant power savings.

5 Limits on Pipelined ADC Power Dissipation Even with the most innovative power saving techniques applied, there exists a practical minimum level of power that must be invested. Such a bound was developed specifically for pipelined ADCs [70] and assuming noise-limited classA residue amplification. The corresponding equation is plotted in Fig. 2.12 along with experimental pipeline ADC data points [5]. As we can see from this plot, the dynamic pipeline by Verbruggen [57] and the comparator based design by Chu and Lee [49] define the state-of the art in power efficiency and lie closest to the predicted limit line. Relative to Lewis’ pipeline from 1992, data point [49] achieves about 2.5 orders of magnitude improvement in power efficiency.

2 Low-Power Pipelined A/D Conversion

35

6 Summary The power efficiency of pipelined A/D converters has seen tremendous improvements over the past two decades. While a good portion of the progress can be attributed to process scaling, the design techniques reviewed in this paper have played an equally import role; especially as far as adjusting the architecture for optimization in fine line technology is concerned. Relative to Lewis’ pipeline from 1992, today’s research prototypes boast 2–3 orders of magnitude improvement in power efficiency. However, whether the most aggressive innovation will find their way into products is yet to be determined. Many of the ultra-low power architectures proposed recently tend to sacrifice robustness, either in terms of supply rejection or tolerance to process, voltage and temperature variations. Some of these issues can be addressed using calibration techniques, supply regulation and innovative transistorlevel design. With the continuing application-driven need for high-speed and power efficient pipeline ADCs, it is to be expected that many interesting and transformative answers to these challenges will emerge over the next decade.

References 1. S. Masuda et al., A CMOS pipeline algorithmic A/D converter, in Proceedings of the IEEE Custom Integrated Circuits Conference, Rochester, Sept 1984, pp. 559–562 2. S.H. Lewis, P.R. Gray, A pipelined 5-Msample/s 9-bit analog-to-digital converter. IEEE J. Solid-State Circuits 22(6), 954–961 (1987) 3. S. Sutarja, P.R. Gray, A pipelined 13-bit 250-ks/s 5-V analog-to-digital converter. IEEE J. Solid-State Circuits 23(6), 1316–1323 (1988) 4. S.H. Lewis et al., A 10-b 20-Msample/s analog-to-digital converter. IEEE J. Solid-State Circuits 27(3), 351–358 (1992) 5. B. Murmann, ADC performance survey 1997–2011, [Online]. Available: http://www.stanford. edu/murmann/adcsurvey.html 6. A.M.A. Ali et al., A 16-bit 250-MS/s IF sampling pipelined ADC with background calibration. IEEE J. Solid-State Circuits 45(12), 2602–2612 (2010) 7. J. Arias et al., Low-power pipeline ADC for wireless LANs. IEEE J. Solid-State Circuits 39(8), 1338–1340 (2004) 8. S. Mehta et al., An 802.11 g WLAN SoC, in ISSCC Digest Technical Papers, San Francisco, Feb, 2005, pp. 94–95 9. C.-C. Hsu et al., An 11b 800 MS/s time-interleaved ADC with digital background calibration, in ISSCC Digest Technical Papers, San Francisco, CA, USA, Feb 2007, pp. 464–615 10. Texas Instruments, AFE5805 Datasheet. [Online]. Available: http://focus.ti.com/lit/ds/symlink/ afe5805.pdf 11. B. Murmann, A/D converter trends: Power dissipation, scaling and digitally assisted architectures, in Proceedings of the IEEE Custom Integrated Circuits Conference, San Jose, Sept 2008, pp. 105–112 12. A.N. Karanicolas et al., A 15-b 1-Msample/s digitally self-calibrated pipeline ADC. IEEE J. Solid-State Circuits 28(12), 1207–1215 (1993) 13. I. Mehr, L. Singer, A 55-mW, 10-bit, 40-Msample/s Nyquist-rate CMOS ADC. IEEE J. SolidState Circuits 35(3), 318–325 (2000)

36

B. Murmann

14. I.E. Opris et al., A single-ended 12-bit 20 Msample/s self-calibrating pipeline A/D converter. IEEE J. Solid-State Circuits 33(12), 1898–1903 (1998) 15. D. Schinkel et al., A double-tail latch-type voltage sense amplifier with 18 ps setup C hold time,” in ISSCC Digest of Technical Papers, San Francisco, CA, USA, Feb 2007, pp. 314–605 16. T. Sundstrom, A. Alvandpour, A kick-back reduced comparator for a 4-6-Bit 3-GS/s flash ADC in a 90 nm CMOS process, in Proceedings of the MIXDES, Ciechocinek, Poland, 2007, pp. 195–198 17. D.W. Cline, P.R. Gray, A power optimized 13-b 5 Msamples/s pipelined analog-to-digital converter in 1.2 m CMOS. IEEE J. Solid-State Circuits 31(3), 294–303 (1996) 18. Y. Chiu et al., A 14-b 12-MS/s CMOS pipeline ADC with over 100-dB SFDR. IEEE J. SolidState Circuits 39(12), 2139–2151 (2004) 19. K. Bult, Embedded analog-to-digital converters, in Proceedings of the ESSCIRC, Athens, 2009, pp. 52–64 20. S. Kawahito, Low-power design of pipeline A/D converters, in Proceedings of the IEEE Custom Integrated Circuits Conference, San Jose, CA, USA, 2006, pp. 505–512 21. Y.-C. Huang, T,-C. Lee, A 10b 100MS/s 4.5 mW pipelined ADC with a time sharing technique, in ISSCC Digest of Technical Papers, San Francisco, CA, USA, Feb 2010, pp. 300–301 22. P. Bogner et al., A 14b 100 MS/s digitally self-calibrated pipelined ADC in 0.13um CMOS, in ISSCC Digest Technical Papers, San Francisco, CA, USA, Dec 2006, pp. 832–841 23. M. Furuta et al., A 0.06 mm2 8.9b ENOB 40MS/s pipelined SAR ADC in 65 nm CMOS, in ISSCC Digest Technical Papers, San Francisco, CA, USA, Feb 2010, pp. 382–383 24. C.C. Lee, M.P. Flynn, A 12b 50MS/s 3.5 mW SAR assisted 2-stage pipeline ADC, in Symposium of VLSI Circuits Digest, Honolulu, HI, USA, June 2010, pp. 239–240 25. A. Verma, B. Razavi, A 10-bit 500-MS/s 55-mW CMOS ADC. IEEE J. Solid-State Circuits 44(11), 3039–3050 (2009) 26. W. Yang et al., A 3-V 340-mW 14-b 75-Msample/s CMOS ADC with 85-dB SFDR at Nyquist input. IEEE J. Solid-State Circuits 36(12), 1931–1936 (2001) 27. E.G. Soenen, R.L. Geiger, An architecture and an algorithm for fully digital correction of monolithic pipelined ADCs. IEEE Trans. Circuits Syst. II 42(3), 143–153 (1995) 28. L. Singer et al., A 12 b 65 MSample/s CMOS ADC with 82 dB SFDR at 120 MHz, in ISSCC Digest Technical Papers, San Francisco, CA, USA, Feb 2000, pp. 38–39 29. S.-Y. Chuang, T.L. Sculley, A digitally self-calibrating 14-bit 10-MHz CMOS pipelined A/D converter. IEEE J. Solid-State Circuits 37(6), 674–683 (2002) 30. Min Byung-Moo et al., A 69-mW 10-bit 80-MSample/s pipelined CMOS ADC. IEEE J. SolidState Circuits 38(12), 2031–2039 (2003) 31. K. Gulati, Lee Hae-Seung, A low-power reconfigurable analog-to-digital converter. IEEE J. Solid-State Circuits 36(12), 1900–1911 (2001) 32. P.C. Yu, Hae-Seung Lee, A 2.5 V 12 b 5 MSample/s pipelined CMOS ADC, in ISSCC Digest Technical Papers, San Francisco, CA, USA, Feb 1996, pp. 314–315 33. D. Kurose et al., 55-mW 200-MSPS 10-bit pipeline ADCs for wireless receivers. IEEE J. SolidState Circuits 41(7), 1589–1595 (2006) 34. S. Devarajan et al., A 16-bit, 125 MS/s, 385 mW, 78.7 dB SNR CMOS pipeline ADC. IEEE J. Solid-State Circuits 44(12), 3305–3313 (2009) 35. J.K.-R. Kim, B. Murmann, A 12-bit, 30-MS/s, 2.95-mW pipelined ADC using single-stage class-AB amplifiers and deterministic background calibration, in Proceedings of the ESSCIRC, Sevilla, Sept 2010, pp. 378–381 36. P. Huang et al., SHA-less pipelined ADC converting 10th Nyquist band with in-situ clock-skew calibration, in Custom Integrated Circuits Conference (CICC), 2010 IEEE, San Jose, CA, USA, 2010, pp. 1–4 37. Byung-Geun Lee et al., A 14-b 100-MS/s pipelined ADC with a merged SHA and first MDAC. IEEE J. Solid-State Circuits 43(12), 2613–2619 (2008) 38. Bang-Sup Song et al., A 12-bit 1-Msample/s capacitor error-averaging pipelined A/D converter. IEEE J. Solid-State Circuits 23(6), 1324–1333 (1988)

2 Low-Power Pipelined A/D Conversion

37

39. A.M. Abo, P.R. Gray, A 1.5 V, 10-bit, 14 MS/s CMOS pipeline analog-to-digital converter, in Symposium of VLSI Circuits Digest, Honolulu, HI, USA, 1998, pp. 166–169 40. W.C. Black, D.A. Hodges, Time interleaved converter arrays. IEEE J. Solid-State Circuits 15(6), 1022–1029 (1980) 41. S.K. Gupta et al., A 1-GS/s 11-bit ADC with 55-dB SNDR, 250-mW power realized by a high bandwidth scalable time-interleaved architecture. IEEE J. Solid-State Circuits 41(12), 2650– 2657 (2006) 42. J. Ming, S.H. Lewis, An 8b 80MSample/s pipelined ADC with background calibration, in ISSCC Digest Technical Papers, San Francisco, CA, USA, Feb 2000, pp. 42–43 43. X. Wang et al., A 12-bit 20-Msample/s pipelined analog-to-digital converter with nested digital background calibration. IEEE J. Solid-State Circuits 39(11), 1799–1808 (2004) 44. B.R. Gregoire, Moon Un-Ku, An over-60 dB true rail-to-rail performance using correlated level shifting and an Opamp with only 30 dB loop gain. IEEE J. Solid-State Circuits 43(12), 2620–2630 (2008) 45. M. Boulemnakher et al., A 1.2 V 4.5 mW 10b 100MS/s pipeline ADC in a 65 nm CMOS, in ISSCC Digest of Technical Papers, San Francisco, CA, USA, 2008, pp. 250–611 46. B. Murmann, Limits on ADC Power Dissipation, in Analog Circuit Design, ed. by M. Steyaert et al. (Springer, Dordrecht, 2006), pp. 351–368 47. H.-C. Choi et al., A 12b 50MS/s 10.2 mA 0.18 m CMOS Nyquist ADC with a fully differential class-AB switched OP-AMP, in Symposium of VLSI Circuits Digest, Honolulu, HI, USA, June 2008, pp. 220–221 48. J.K. Fiorenza et al., Comparator-based switched-capacitor circuits for scaled CMOS technologies. IEEE J. Solid-State Circuits 41(12), 2658–2668 (2006) 49. J. Chu et al., A zero-crossing based 12b 100MS/s pipelined ADC with decision boundary gap estimation calibration, in Symposium of VLSI Circuits Digest, Honolulu, HI, USA, June 2010, pp. 237–238 50. B. Murmann, B.E. Boser, A 12-bit 75-MS/s pipelined ADC using open-loop residue amplification. IEEE J. Solid-State Circuits 38(12), 2040–2050 (2003) 51. Ding-Lan Shen and Tai-Cheng Lee, A 6-Bit 800-MS/s pipelined A/D converter with open-loop amplifiers, in Symposium on VLSI Circuits Digest, June 2006, pp. 134–135 52. A. Nazemi et al., A 10.3GS/s 6bit (5.1 ENOB at Nyquist) time-interleaved/pipelined ADC using open-loop amplifiers and digital calibration in 90 nm CMOS, in Symposium on VLSI Circuits Digest, 2008, pp. 18–19 53. K. Poulton et al., A 20 GS/s 8 b ADC with a 1 MB memory in 0.18 m CMOS, in ISSCC Digest Technical Papers, San Francisco, CA, USA, Feb 2003, pp. 318–496. 54. I. Ahmed et al., A low-power capacitive charge pump based pipelined ADC. IEEE J. SolidState Circuits 45(5), 1016–1027 (2010) 55. M. Anthony et al., A process-scalable low-power charge-domain 13-bit pipeline ADC, in Symposium on VLSI Circuits Dig., Honolulu, HI, USA, 2008, pp. 222–223 56. J. Hu et al., A 9.4-bit, 50-MS/s, 1.44-mW pipelined ADC using dynamic source follower residue amplification. IEEE J. Solid-State Circuits 44(4), 1057–1066 (2009) 57. B. Verbruggen et al., A 2.6 mW 6 bit 2.2 GS/s fully dynamic pipeline ADC in 40 nm digital CMOS. IEEE J. Solid-State Circuits 45(10), 2080–2090 (2010) 58. T. Toifl et al., A 1.25–5 GHz clock generator with high-bandwidth supply-rejection using a regulated-replica regulator in 45-nm CMOS. IEEE J. Solid-State Circuits 44(11), 2901–2910 (2009) 59. C.R. Grace et al., A 12-bit 80-MSample/s pipelined ADC with bootstrapped digital calibration. IEEE J. Solid-State Circuits 40(5), 1038–1046 (2005) 60. A. Panigada, I. Galton, A 130 mW 100 MS/s pipelined ADC with 69 dB SNDR enabled by digital harmonic distortion correction. IEEE J. Solid-State Circuits 44(12), 3314–3328 (2009) 61. B.D. Sahoo, B. Razavi, A 12-bit 200-MHz CMOS ADC. IEEE J. Solid-State Circuits 44(9), 2366–2380 (2009) 62. M. Daito et al., A 14-bit 20-MS/s pipelined ADC with digital distortion calibration. IEEE J. Solid-State Circuits 41(11), 2417–2423 (2006)

38

B. Murmann

63. B. Murmann, B.E. Boser, Digital domain measurement and cancellation of residue amplifier nonlinearity in pipelined ADCs. IEEE Trans. Instrum. Meas. 56(6), 2504–2514 (2007) 64. J. Li, U.-K. Moon, Background calibration techniques for multistage pipelined ADCs with digital redundancy. IEEE Trans. Circuits Syst. II 50(9), 531–538 (2003) 65. J. McNeill et al., ‘Split ADC’ architecture for deterministic digital background calibration of a 16-bit 1-MS/s ADC. IEEE J. Solid-State Circuits 40(12), 2437–2445 (2005) 66. K.-W. Hsueh et al., A 1 V 11b 200MS/s Pipelined ADC with digital background calibration in 65 nm CMOS, in ISSCC Digest Technical Papers, San Francisco, CA, USA, 2008, pp. 546–634 67. J.P. Keane et al., Digital background calibration for memory effects in pipelined analog-todigital converters. IEEE Trans. Circuits Syst. I 53(3), 511–525 (2006) 68. E. Iroaga, B. Murmann, A 12-Bit 75-MS/s pipelined ADC using incomplete settling. IEEE J. Solid-State Circuits 42(4), 748–756 (2007) 69. S. Kawahito et al., A 15b power-efficient pipeline A/D converter using non-slewing closedloop amplifiers, in Proceedings of the IEEE Custom Integrated Circuits Conference, San Jose, CA, USA, 2008, pp. 117–120 70. T. Sundstrom et al., Power dissipation bounds for high-speed Nyquist analog-to-digital converters. IEEE Trans. Circuits Syst. I 56(3), 509–518 (2009)

Chapter 3

Low-Power Successive Approximation ADCS for Wireless Applications Jan Craninckx

Abstract This chapter discusses the advancements made in SAR ADCs for wireless applications, which require accuracies in the range of 8–10 bit and a few 10’s of MHz sampling speed. An overview is given of recent techniques that reduce the switching power in the capacitive DAC, and as such improve the power efficiency of the ADC up to levels that are out of reach of the typically used pipeline architecture. The second part of this paper discusses the charge-sharing SAR ADC architecture, which proposes a new signal processing method in the charge domain that removes the often-neglected though requirements for the reference buffer. An implementation in 40 nm CMOS achieves 9.3ENOB and 60MS/s at a figure of merit of 34 fJ.

1 Introduction When it comes down to achieving low power consumption in A-to-D conversion, the successive approximation (SAR) principle appears to be a very attractive candidate. For an N-bit ADC, only N operations are needed to determine the output word, a process that can be done with little overhead. The use of the SAR architecture has been widely adopted since the original introduction of the charge-redistribution ADC in 1975 [1]. In this technique, illustrated in Fig. 3.1, the input voltage is first sampled on an array of binary scaled capacitors. Then, by sequentially switching the bottom plates of the capacitor to the reference voltage or to ground, this input voltage can be compared with different levels and in a binary-search the exact input level is determined. During this process, charge is indeed ‘redistributed’ between the two sides of a voltage

J. Craninckx () Imec, Kapeldreef 75, 3000 Leuven, Belgium e-mail: [email protected] M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 3, © Springer ScienceCBusiness Media B.V. 2012

39

40

J. Craninckx QX = –2CVIN VX = 0

VX

+

VC

–

C

C 2

C 8

C 4

C 16

VIN

C 16

VREF

Fig. 3.1 Conceptual 5-bit charge-redistribution SAR ADC [1]

divider constructed between the capacitors. A full conversion cycle thus consists out of 1 C N events (1 input sampling and N comparisons). This conceptual ADC schematic is deceivingly simple: there is only one active element (a comparator) and it further contains just some passive capacitors and a set of switches. An easy conclusion would be that a really low power consumption can be obtained. Still, some techniques have been proposed recently to make this basic structure even more efficient, focusing mostly on reducing the switching energy in the feedback DAC. They will be discussed in the state of the art overview in Sect. 2. An aspect that is often neglected however are the requirements posed on the reference buffers, and Sect. 3 will cover the design of an energy-efficient chargesharing SAR architecture. Finally, some conclusions are presented.

2 State-of-the-Art SAR Techniques 2.1 Split-Capacitor Array When analyzing the energy required to switch the capacitors in the typical binary weighted array, it becomes apparent that during a conversion the charge in the capacitors is not used efficiently [2, 3]. During the first bit decision after sampling, the MSB capacitor is connected to VREF with the remaining capacitors connected to ground. Depending on the output of the first comparison, the SAR will do one of two transitions. For a negative value of VX , an “up” transition must be performed for the MSB/2 capacitor, i.e. it must be switched from ground up to VREF , drawing an energy of EUP D 1=8 CV2REF from the supply. However, for a positive value of VX , more actions must be performed, i.e. besides the switching of the MSB/2 capacitor also the MSB capacitor must be switched from

3 Low-Power Successive Approximation ADCS for Wireless Applications

41

Ssample

VX

Cb-1=2b-2C0

+ Sb–1

Sb––1

C0

C1=C0

+

S1

–

S1

+

S0

–

S0

Ssample

Ssample

Ssample

Cb,b-1=2b-2C0

Cb, 1=C0

Cb,0=C0

VREF VIN

+ Sb,b–1

–

Sb,b–1 Ssample

+ Sb,1

Sb,–1 Ssample

+ Sb,0

– Sb,0

Ssample

Fig. 3.2 Split capacitor array for a b-bit SAR, with the main subarray on top and the MSB subarray below [3]

VREF to ground. In fact they ‘trade places’. Doing these two actions independently would require a much larger energy of EDOWN D 5=8 CV2REF . It takes five times more energy to lower than to raise it; this occurs because all of the charge initially on the MSB capacitor is discharged to ground, and all the charge that ends up on the MSB/2 capacitor must be delivered from the reference voltage supply. From this analysis, it is apparent that energy savings in a SAR ADC are possible if a more efficient switching scheme were devised. The most optimal scheme from [2] splits the MSB capacitor in a number of units as shown in Fig. 3.2, and switches down only the required number of them during the sequence of the SAR algorithm. This technique is expected to have a 37% lower switching energy than the conventional one [3].

2.2 Step-Wise Charging Another method to reduce the energy needed to charge a capacitor from 0 to V is to perform that operation not in one but in several sequential steps [4]. Switching in a

42

J. Craninckx

b

a VN

N

V2

2

V

N

RN, CN

2

R2, C2

CT CT V1

R1, C1

1

1 0

CL R0, C0

0

CL

Fig. 3.3 (a) Step-wise charging of a capacitive load; (b) using large ‘tank’ capacitors for the intermediate voltages [4]

2

single step takes an energy of E1step D CV 2 . But because of the quadratic relation to the voltage, switching in N steps would require N units of energy that are each N2 times smaller than that: EN step D N

C

V 2 N

2

D

C V2 2N

(3.1)

This configuration is shown in Fig. 3.3, where it is also shown that there is no need for a separate circuit to generate the intermediate voltages. Since during a full up/down switching cycle, no supply except for the top one provides any net charge to the circuits, all intermediate voltages can be sustained by using large tank capacitors. This configuration is even self-stabilizing. This step-wise charging method thus offers the possibility of almost adiabatic switching, without the need for inductors, but of course at the expense of longer switching times, and neglecting the power needed to drive the switches themselves. This technique has been used in a low-speed but ultra-low power 10-bit SAR ADC [5], in combination with the split-capacitor array, to even further reduce the switching energy in the DAC portion of the converter. That design also uses a delayline based controller and a low-power comparator to achieve a power consumption of only 1.9 W at 1 MS/s.

2.3 Monotonic Capacitor Switching Another DAC switching method that allows even further savings was presented in [6, 7].The architecture depicted in Fig. 3.4 looks very similar to the original structure, but in fact operates rather differently because the switching only happens in one direction, e.g. downward.

3 Low-Power Successive Approximation ADCS for Wireless Applications

43

Fig. 3.4 (a) Monotonic capacitor switching architecture; (b) typical waveforms [7]

The input is sampled on the top plates, while at the same time the bottom plates of the capacitors are reset to VREF . Next, the comparator directly performs the first comparison without switching any capacitor. According to the comparator output, the largest capacitor C1 on the higher voltage potential side is switched to ground and the other one (on the lower side) remains unchanged. The ADC repeats the procedure until the LSB is decided. For each bit cycle, there is only one capacitor switch, which reduces both charge transfer in the capacitive DAC network and the transitions of the control circuit and switch buffer, resulting in smaller power dissipation. This results in an 81% reduction of the total switching energy vs the conventional case [7]. On top to that, since the switching only goes in the downward direction and fast-settling NMOS transistors can be used for this, the ADC has a potentially high maximum conversion rate. A disadvantage of this technique is that, because of the fact that the common mode voltage at the input of the comparator lowers during the successive SAR cycles as shown in Fig. 3.4b, the signal-dependent comparator offset can cause a degradation in ADC linearity.

2.4 Discussion It is interesting to note that a lot of recent SAR improvements have focused on this reduction of capacitor switching energy. While in fact, it is often not the most power consuming block in the ADC. The implementation in [5] uses only 20% of the total power budget for this. The most efficient monotonic technique of [7] would have theoretically needed only about 7.5% of the total power, but just because of the digital buffers driving the switches this was increased to 16%. Designs targeting 8-bit resolutions have relaxed matching requirements and can use extremity small capacitor sizes to lower the switching energy [8, 9]. These numbers show that capacitor switching energy is not dominant and a shift in focus is needed for further efficiency improvements.

44

J. Craninckx

An aspect that in reality leads to a larger power consumption in other blocks is the fact that that the total SAR conversion takes 1 C N events, so certainly in highspeed designs only a limited amount of time can be attributed to each action. First, a powerful input voltage buffer is needed in order to make sure that during the (short) input sampling phase the voltage on the (large) capacitor array can settle accurately to the actual value. For the same reason also an accurate and high-speed reference buffer is needed to make sure that a stable reference voltage can be maintained during the charge-redistribution process. Second, the clock generation must be taken into account. Since the SAR algorithm runs at a speed higher than the actual conversion rate, a high-frequency clock is needed. This aspect was also already discussed in [5, 7], where asynchronous controllers are used. Thirdly, the comparator power must also be taken into account. This part is not easily quantified, as it heavily depends on the available or tolerable signal swing. Also the ADC speed requirements play an important role, as the comparator noise/power trade-off becomes more and more disadvantageous when very short evaluation times are needed. Of course, depending on the actual number of bits or conversion rate needed in a specific application, all these effects become more or less important. But they are not always taken into consideration when power consumption numbers are reported, so care must be taken. In the remainder of this paper, a novel charge-sharing SAR architecture will be presented that focuses on these aspect for the realization of lowpower ADCs.

3 Charge-Sharing Successive Approximation ADC Architecture The SAR ADC topology proposed does not need any high-speed buffers. Instead of operating in the voltage-domain, the useful signal is represented by charge. Its operation is explained in the following steps [10, 11].

3.1 Input Sampling An accurate charge proportional to the input signal must be created, and the most obvious way is of course to sample the input voltage on a capacitor. However, to avoid the problems of high-speed settling, a full conversion period must be available for this. If we manage to avoid that the input must be sampled on the binary capacitor array, but instead can be sampled on a separate capacitor, a time-interleaved Sample & Hold (S/H) can be used.

3 Low-Power Successive Approximation ADCS for Wireless Applications

45

Ψ1 Φ2

CTH1

CLK

Φ1 TG1

Ψ1

BM1 CTL1 VQP

INP

rst

Φ2 Ψ2

0 0 1

1 1 1

0

1

0

1

CTL2 TG1

Φ1

1

0 1 0

Ψ2

rst BM1

1

Φ1 Φ2

CTH2

VQP INP VCM

Fig. 3.5 Single ended schematic of the time-interleaved S/H and relevant waveforms

The time-interleaved sampling circuit, shown in Fig. 3.5, processes a largeswing single-ended input signal with common mode level VCM typically around half of the supply voltage. The circuit is based on bootstrapped NMOS switches at the input (BM1) [12], offering a low, signal-independent on-resistance, and passgates at the output (TG1). When phase ˆ2 is active, the input signal is tracked on capacitors CTL1 C CTH1 while the charge previously sampled on CTL2 C CTH2 is already available for conversion. So after sampling, the capacitors CT hold a charge proportional to the input: Qin D CT Vin

(3.2)

An extension to the basic operation is the fact that, after the MSB will have been determined and the maximum differential voltage on the nodes VQP;N becomes smaller, the common mode voltage VCM can be lowered without the risk clipping the signal to ground. To create this CM shift, the connection of the capacitors CTH is switched from VDD to ground by the signal ‰ 1;2 . With this lower common mode, the on-resistance of NMOS switches that will be connected to the sampling nodes will significantly decrease, an effect that will show its benefit once the core of the SAR operation is explained.

3.2 Successive Approximation Operation After sampling, the analog-to-digital conversion can start. The most significant bit is determined first by using the comparator to check if the differential voltage on CT

46

J. Craninckx VQP c0n

CTP

c0p

c0p

CTN VQN

Precharge CMSB

Precharge

c0n

S/H Compare c0p c0n VQN

VQP

Fig. 3.6 Comparison and charge sharing action for MSB detection

is positive or negative. Depending on that decision, a signal proportional to half of the input range must be subtracted or added to the input signal. In order to avoid tough specs on the voltage settling time with opamps, we will not add a certain voltage to the signal, but instead add a charge. This operation can be done by passive charge sharing and happens very fast without taking any power. As shown in Fig. 3.6, this charge is taken from a capacitor of a certain size that has been pre-charged previously to a fixed reference voltage. If the comparator input voltage is positive (VQP > VQN ), the switches controlled by c0p are closed, after which the charges on the three capacitors (CTP , CTN and CMSB ) equalizes. The voltage VQN will rise and VQP will fall. Likewise, if the comparator input voltage is negative (VQP < VQN ) as in the figure, the switches controlled by c0n are closed, after which voltage VQN will fall and VQP will rise. The total charge on the capacitor set is now given by Q0 D CS Vin C S0 C0 Vref

(3.3)

where S0 represents the output of the comparator (C1 for positive,1 for negative signal). The time needed for this charge-sharing process is of course proportional

3 Low-Power Successive Approximation ADCS for Wireless Applications

47

to the time constant given by the product of the switches on-resistance and the capacitor size. Since nmos switches are used for this, the on-resistance is improved by lowering the common-mode voltage as described earlier, and hence the ADC speed increases. It is crucial to note here that the reference voltage itself is not loaded by this action. During the pre-charge phase, that reference voltage was sampled on the capacitor CMSB , and it is that reference charge that is now used to provide the feedback DAC action required in the SAR ADC. The reference is thus independent on the input voltage, and the constraints posed on the reference buffer are almost negligible. Even if an error would have been made during the reference sampling, because of its independency on the input that would only result in a gain error of the ADC, which is not important. The following bit is determined in the same way, but now a precharged capacitor of size CMSB1 D CMSB /2 will be used, since in the binary search for the correct digital output code the range is now reduced by a factor 2. The sign of the voltage on the capacitor set formed by CT and CMSB represents the sign of the current signal Q0 , so the comparator can be used again to determine it. Depending on the comparator output, the switches c1p or c1n are closed, and the following charge sharing action between CTP , CTN , CMSB and the newly connected CMSB1 will cause the voltages VQP and VQN to rise or fall. Intuitively one can see that the SAR algorithm at each step uses these pre-charged capacitors to add or subtract a binary scaled-down charge to the initial charge (that represented the sampled input voltage) until the results converges to zero. If too much charge was added during a certain step, the next comparison returns the opposite sign, and in the next step the charge will be subtracted. The actual value of the voltage on the nodes (VQp,VQn) is not needed, just the sign is used to determine if the next binary scaled down capacitor (and hence, charge) must be connected positively of negatively.

3.3 First Block Diagram The block diagram of this initial charge-sharing SAR ADC architecture is shown in Fig. 3.7. As already explained, it consists of • a passive time-interleaved Sample&Hold with capacitors and switches • a binary scaled array of unit capacitors that are pre-charged to the reference voltage (e.g. the power supply) and afterwards connected positively or negatively to the sampling capacitor, depending on the outcome of the comparator • a comparator that returns the sign of the differential voltage on the sampling capacitors (VQP VQN ) • a control block that implements the SAR algorithm, i.e. – generate the control signals for the S&H switches – generate the signal precharge

48

J. Craninckx VQP

INP

4

2

1

N–1

M=2

CU

INN cn

cp

VQN cp[0..N–2] cn [0..N–2]

Sample & Hold

Precharge Comp

CLK Control block @FS

Result

B[0..N–1]

Fig. 3.7 Block diagram of initial charge-sharing SAR architecture

– go through a loop that for every bit of the ADC • activate the comparator • interprete the result and close one of the switches cp or cn – output the digital code that represents the digitized value of the input voltage.

3.4 Asynchronous Operation The synchronous operation of the control block described above would need a highfrequency clock, which has to be generated externally and hence also results in a power consumption penalty. Moreover, the maximum speed possible with the circuit is not exploited, as this way of working requires the control block to e.g. wait until the falling edge of the clock to close one of the switches cp or cn, although the comparator result S is already available earlier. All blocks (comparator speed, settling time for charge sharing) must be designed fast enough to certainly finish within the available clock period.

3 Low-Power Successive Approximation ADCS for Wireless Applications

INp

INn

OUTp

Comp

49

OUTn

OUTp OUTn

Valid

Fig. 3.8 Comparator for asynchronous operation

An asynchronous operation must be implemented that removes the need for an extra high-frequency clock, and allows analog-to-digital conversion at the highest possible speed. The timing of this asynchronous controller is fairly simple, as a straightforward sequential list of actions must be taken during the binary search algorithm. It is further aided by the use of a comparator that also provides a ‘valid’ signal, as shown in Fig. 3.8. The comparator is based on [13]. The ‘valid’ signal goes high after the two crosscoupled inverters leave their metastable operating point, meaning one of the two sides has gone high. This means the comparison results is ready. This comparator circuit also has the nice feature that it is a fully dynamic implementation that does not consume any power when inactive, and thus enables for the whole ADC the feature that its power consumption scales linearly with the sampling frequency.

3.5 Binary Scaled Capacitor Array The linearity (INL/DNL) performance of a SAR ADC is determined by the INL/DNL performance of the feedback DAC, and thus by the matching of the capacitors in the binary scaled reference array. From [14] it can be derived that e.g. for 99.7% yield, the units of a 9bit DAC need a standard deviation less than 0.7%, which is the key number in determining the size of the reference capacitor array. With some margin, the total size is set at 2 pF, such that CMSB D 1 pF, CMSB1 D 0.5 pF, etc.

50

J. Craninckx VREF

Share1

16C

8C

4C

2C

C

C

Share2

C

Share3

C

C

Charge Ground Charge Ground Share1 Share2 Share3

Fig. 3.9 Example of the capacitor array for a 9-bit ADC with a 3 upscaled unit capacitor

The resulting LSB capacitor size now equals about 8 fF, which is obviously too small to be used. The parasitics from the connections will be too large w.r.t. the actual capacitance, and since the units can be positively or negatively connected to the sampling capacitor, any difference or mismatch in these parasitics deteriorates the INL/DNL behavior. An alternative for the most significant bits is certainly to use a bigger unit, e.g. eight times bigger. The capacitor controlled by c0p;n can now consist of 16 units of 60 fF, which is a value that can be practically used. The following one has 8 units, and so on until the one controlled by c4p;n which has 1 unit. For the next charge sharing, a 30 fF unit could be used, but this will not match correctly with a ‘half’ unit of 60 fF. Instead, since we only care about the amount of charge that we will connect to the sampling capacitors CT , a 60 fF capacitor can also be charged to half of the reference voltage. This is done by taking 2 units, keep one of them empty and charge the other one to VREF . If then a switch between them is closed, the charge redistributes evenly and on each one a charge of 60 fFVREF /2 remains. This one can be used for the charge sharing by switches c5p;n . Also the next bits can be done similarly. If we close a switch between the other unit with half of the charge and an empty one, on each of them we have one quarter of charge for the next bit, and so on. With this structure a practical size unit capacitor can be used, and an example of this capacitor array is depicted in Fig. 3.9. The effect of parasitics in the capacitor array must of course be evaluated carefully. In first order, the parasitics to the substrate of the unit capacitors do not pose a problem. They behave just as useful capacitors, i.e. they have a certain charge on them which will be connected to the positive or negative sampling capacitor and this way help to perform the DAC operation of the SAR ADC. Some constraints do apply however. First of all, the parasitics on both sides of the capacitor must be balanced. If not, connecting the units positively or negatively to the sampling capacitor will have a

3 Low-Power Successive Approximation ADCS for Wireless Applications

51

different result. Therefore, the use of symmetric MoM capacitors that use lateral capacitance between a large number of closely space metal fingers is necessary. Capacitor structures that use vertical capacitance have obviously different parasitics on the top and the bottom plate. Mismatch of the parasitics will also result in missing the INL spec of the ADC, but since often no process data is available to estimate these mismatches, the only solution is to keep the parasitics small. An important part of the parasitics is caused by the switching transistors as well. Their drain-bulk and source-bulk junction constitute a non-linear capacitance. Their gate capacitance varies from very large to very small when the transistor switches from on to off. At first sight, this could be a performance-degrading effect, but to a first degree this nonlinearity is not important. The signal is represented by charge, and the fact that this charge is present on a nonlinear capacitance is of no importance. Besides matching, also sampling noise is an item that determines the minimum input capacitance of the ADC, and hence also the size of the reference array as these two are proportional. Each time a sampling switch closes, a noise power (given by the integrated noise voltage kT/C) remains on the capacitor. However, with capacitor sizes in the picoFarad range as dictated by matching requirements, kT/C noise is often negligible.

3.6 Comparator Noise Although also not obvious from the basic architecture of Fig. 3.8, care must be taken when designing the comparator. During the SAR algorithm, sometimes the input signal to the comparator is very small, and when this value becomes comparable to the inherent noise of the comparator, an error can be made. An elaborate analysis of the noise sources in a comparator falls out of the scope of this text, more details on this can be found in [15]. Comparator noise was the reason why the first prototype of the charge-sharing SAR that was implemented [10] had a measured performance that was worse than originally estimated. Reducing comparator noise by increasing its power consumption quickly has a detrimental effect on the overall energy efficiency, since for a 6 dB (2) noise reduction, a quadratic increase (4) in power is needed. To resolve this issue, a noise robust design approach to fully dynamic SAR ADCs was developed by leveraging redundancy in the search algorithm [11]. The strategy behind the proposed correction technique is the fact that during the SAR operation, at most two out of N comparisons are critical, i.e. the one when the signal is right below the threshold and the one when it is right above. All other comparisons will be done on a relatively large input signal, and hence can use a low-power but noisy comparator. A low-noise (higher-power) comparator is only needed for the critical decisions, but of course it is unknown when they will appear. However, one of them will certainly be the last one: an error in this decision can be avoided by using the comparator in its low-noise/high-power state. The other one can be any of the

52

J. Craninckx

a

High Noise Comp b0

b1

b2

Low Noise Comp b3

b4

b+ Vout = 11011 = 27

V QP

OK! Vout = 11011(0)

YES

NO

b4 == b+

Truth table

VQN

b

1

1

0

1

1

b0

b1

b2

b3

b4

VQP

Nth 0 1 1 0

0 b+

OR!

ERR

(N+1)th 1 0 1 0

Correction No No +1 –1

Vout = 11100–1 = 27 Vout = 11100(0) –1

b4 == b+

YES

NO 0 or 1

0 1 +1

VQN 1

1

1

0

0

0

Fig. 3.10 Correction algorithm for comparator noise (5b example) in case no error is made (a) and in case of error (b)

previous (N1) comparisons, and avoiding it by always employing a low-noise comparator is not power efficient. As shown in Fig. 3.10 for a 5b example, the SAR algorithm in the proposed redundant search algorithm uses the comparator in its high-noise state during the first (N1) iterations, thus allowing errors in these cycles. However, if H , the input comparator RMS noise in that mode, is less than one half of the LSB value, only one error can be made. The ADC then switches into its low-noise mode (with comparator input noise L H /2) to avoid errors for the Nth comparison, and an extra (N C 1)th iteration is added to correct for the error possibly made in the first phase. As shown in Fig. 3.10a, if the last two comparisons give different results, no error was made and no action has to be taken. On the other hand, in case the last two bits are equal, then a digital addition or subtraction needs to be performed on the final N-bit result. Being pipelined to the SAR conversion, the simple digital adder needed to correct does not work at the internal SAR frequency (N times the sampling frequency), thus limiting its power consumption. Importantly, the correction is effective not only for thermal noise, but also for other error sources, including static non-linearities, as far as they are not bigger than one LSB.

3 Low-Power Successive Approximation ADCS for Wireless Applications

53

3.7 Comparator Offset In contract with classical charge-redistribution SARs, in the charge-sharing SAR offset of the comparator does have an effect on the INL/DNL performance. The reason for this is that the signal is represented by charge and the offset in the comparator is always a certain voltage. During the successive approximation process, the capacitance size is changing and hence the relationship between charge and voltage is not fixed. To make the comparator offset small enough not to have an effect, the same offset calibration technique as in [13] was again used, as already indicated by the varicaps shown in Fig. 3.8. At startup or at regular time intervals, the two inputs of the ADC must be shorted and the correct digital value must be searched that results in equal probability of a positive/negative comparator decision.

4 Implementation The ADC prototype that was designed to show the performance of the proposed techniques was implemented in a 90 nm 1 V 1P9M digital CMOS process [11]. The die photo is shown in Fig. 3.11. Only regular transistors and MoM-caps are used in the whole design, making it ideally suited for implementation in digital CMOS. Figure 3.12a shows the static INL/DNL performance when the correction is active. The peaks in DNL and INL are due to incomplete settling during the common mode switching, a small design mistake that could easily be corrected. The actual

Fig. 3.11 Nine-bit 40MS/s charge-sharing SAR ADC photograph

54

J. Craninckx

a

b 9

ENOB

1.5 1 0.5 0 –0.5 –1 –1.5 0

255

8.5 8

7.5

510

0

0.5

–20

[dBFS]

1

0 –0.5

1

21

ENOB = 8.23 Bit SNDR = 51.31 dB THD = 52.76 d B fln = 18.8821MHz N = 16384

–40 2nd

–60

–1

11

Freq [MHz]

7th

5th

3rd

–80 0

255

510

0

2

4

6

8

10

12

14

16

18

20

Freq [MHz]

Fig. 3.12 (a) Measured INL/DNL plot; (b) ENOB vs. frequency and near-Nyquist FFT at 40MS/s

error caused by this effect is even worse (uncorrected INL/DNL is worse than C/1LSB), but as already stated the correction algorithm does not only detect errors cause by noise, but also this static nonlinearity. The peak resulting DNL and INL are 0.7/0.45 and 0.56/0.65 respectively. As shown in Fig. 3.12b, when the input signal is sampled at 40Ms/s, the measured ENOB is 8.56 (53.3 dB SNDR) at low frequencies, mainly limited by static distortion, and lowers to 8.23 at Nyquist. The effective resolution bandwidth extends up to 32 MHz. At 40Ms/s the ADC consumes 820 A from a 1 V supply voltage of which 290 A are drawn by the asynchronous controller, and 530 A are shared between the S/H, the pre-charging phase of the capacitor array, and the flexible comparator. Because of the dynamic architecture, power scales linearly with the sampling frequency. The resulting FoM is only 54 fJ per conversion step. Since this SAR ADC only contains digital gates, capacitors, switches and one comparator, it lends itself easily to further scaling in more advanced technology nodes. With decreasing CMOS features, the performance is expected to improve, since controller will become more power efficient and on top of that the capacitor sizes could be chosen smaller because of the improved matching characteristics. A new implementation of the charge-sharing SAR ADC in 40 nm CMOS as part of a full SDR transceiver [16] targets a resolution of 10 bit. This extra bit would normally require a 4 increase in the capacitors for matching, but because of the more advanced process node the total reference array size could be kept at 2 pF. Most of the building blocks (boosted input switch, capacitor array with 3 upscaled unit, controller, : : : ) have been simply ported into the new technology, the only serious effort needed for the new design was on the only analog block of the

3 Low-Power Successive Approximation ADCS for Wireless Applications

55

Fig. 3.13 Low-noise comparator schematic

Fig. 3.14 Ten-bit 60MS/s SAR ADC in 40 nm CMOS

converter, i.e. the comparator. As the comparator noise becomes the limiting factor in the achievable ADC resolution, the new design uses a dynamic preamp and an improved latch timing of the second stage that uses internal signals instead of the external clock [17] (Fig. 3.13). A die photo of a separate test chip to evaluate the ADC performance is shown in Fig. 3.14 and the measurements are reported in Fig. 3.15. The maximal sampling speed is 60 MS/s, with a power consumption of 1.2 mW. At lower speeds, the power decreases proportionally. The maximum DNL and INL are 1.4 and 0.8 LSB, respectively. An SNDR of 54 dB (9.3 ENOB) is obtained, which results in a figure-of-merit of 34 fJ per conversion step.

56

J. Craninckx

a

b

c

Fig. 3.15 Measured ADC performance. (a) INL/DNL; (b) near-Nyquist FFT; (c) SNDR vs input frequency

5 Conclusions As has been shown by many publications in the last years, SAR ADCs have replaced the typical pipeline design in the application range of 8–10 bit accuracy and sampling speeds up to several 10’s of MHz. The simplicity of the SAR architecture

3 Low-Power Successive Approximation ADCS for Wireless Applications

57

makes it very well suited for implementation in nanoscale CMOS, and several improvements in capacitor switching strategy, asynchronous controller implementation and low-power comparator design has improved the power efficiency by an order of magnitude. Furthermore, the charge-sharing architecture is proposed which makes the load of the reference buffer signal-independent, thereby removing all constraints posed on the reference buffer by a high-speed ADC. With continuous scaling of CMOS technology nodes, and probably with some more architectural improvements, ADCs with a figure of merit of a few femtoJoules per conversion step will be just around the corner! Acknowledgment The work presented here is the result of the research on ADCs performed in imec’s wireless research group in the past years, and the author would like to acknowledge the contributions of all team members, an especially Vito Giannini, Geert van der Plas, Bob Verbruggen, and Takaya Yamamoto.

References 1. J. McCreary, P. Gray, All-MOS charge redistribution analog-to-digital conversion techniques – Part I. IEEE J. Solid-State Circuits 10(6), 371–379 (1975) 2. B. Ginsburg, A. Chandrakasan, An energy-efficient charge recycling approach for a SAR converter with capacitive DAC, in Proceedings of IEEE International Symposium Circuits and Systems, 2005, pp. 184–187 3. B. Ginsburg, A. Chandrakasan, 500-MS/s 5-bit ADC in 65-nm CMOS with split capacitor array DAC. IEEE J. Solid-State Circuits 42(4), 739–747 (2007) 4. L.J. Svensson, J.G. Koller, Driving a capacitive load without dissipating fCV2, in IEEE Symposium on Low Power Electronics, 1994, pp. 100–101 5. M. van Elzakker, E. van Tuijl, P. Geraedts, D. Schinkel, E.A.M. Klumperink, B. Nauta, A 10-bit charge-redistribution ADC consuming 1.9 W at 1 MS/s. IEEE J. Solid State Circuits 45(5), 1007–1015 (2010) 6. C.-C. Liu, S.-J. Chang, G.-Y. Huang, Y.-Z. Lin, A 0.92 mW 10-bit 50-MS/s SAR ADC in 0.13 m CMOS process, in IEEE Symposium on VLSI Circuits Digest, June 2009, pp. 236–237 7. C.-C. Liu, S.-J. Chang, G.-Y. Huang, Y.-Z. Lin, A 10-bit 50-MS/s SAR ADC With a monotonic capacitor switching procedure. IEEE J. Solid State Circuits 45(4), 731–740 (2010) 8. P. Harpe, C. Zhou, X. Wang, G. Dolmans, H. de Groot, A 30fJ/conversion-step 8b 0-to-10MS/s asynchronous SAR ADC in 90 nm CMOS, in ISSCC Digest of Technical Papers, Feb 2010, pp. 388–389 9. P. Harpe, C. Zhou, X. Wang, G. Dolmans, H. de Groot, A 12fJ/conversion-step 8bit 10MS/s asynchronous SAR ADC for low energy radios, in Proceedings of European Solid-State Circuits Conference, Sept 2010, pp. 214–217 10. J. Craninckx, G. Van der Plas, A 65fJ/conversion-step 0-to-50Ms/s 0-to-0.7 mW 9b Charge sharing SAR ADC in 90 nm digital CMOS, in ISSCC Digest of Technical Papers, Feb 2007, pp. 246–247 11. V. Giannini, P. Nuzzo, V. Chironi, A. Baschirotto, G. Van der Plas, J. Craninckx, A 820 W 9b 40MS/s noise tolerant dynamic SAR ADC in 90 nm digital CMOS, in ISSCC Digest of Technical Papers, Feb 2008, pp. 238–239 12. M. Abo, P. Gray, A 1.5-V, 10-bit, 14.3-MS/s CMOS pipeline analog-to-digital converter. IEEE J. Solid State Circuits 34(5), 599–606 (1999)

58

J. Craninckx

13. G. Van der Plas, S. Decoutere, S. Donnay, A 0.16 pF/conversion-step 2.5 mW 1.25GS/s 4b ADC in a 90 nm digital CMOS process, in ISSCC Digest of Technical Papers, Feb 2006, pp. 566–567 14. A. Van den Bosch, Static and Dynamic Performance Limitations for High Speed D/A Converters (Springer, New York, 2004). ISBN 9781402077616 15. P. Nuzzo et al., Noise analysis of regenerative comparators for reconfigurable ADC Architectures. IEEE Trans. Circuits Syst. I: Fundam. Theory Appl. 55(6), 1441–1454 (2008) 16. M. Ingels et al., A 5 mm2 40nm LP CMOS transceiver for a software-defined radio platform. IEEE J. Solid-State Circuits 45(12), 2794–2806 (2010) 17. M. Miyahara, et al., A low-noise self-calibrating dynamic comparator for high-speed ADCs, in Proceedings of IEEE Asian Solid-State Circuits Conference, Nov 2008, pp. 269–272

Chapter 4

Oversampling Converters Beyond Continuous-Time Sigma-Delta for Nanometer CMOS Technologies A. Di Giandomenico, L. Hernandez, E. Prefasi, S. Paton, A. Wiesbauer, R. Gaggl, and J. Hauptmann

Abstract This paper describes first the properties of Continuous-Time SigmaDelta ADCs which make this type of converters attractive for low-power and high-bandwidth applications. Cascaded architectures are analyzed as a possible way to further improve the analog bandwidth. The limits towards nanometer technology integration are then described, showing how the time-encoding theory can be successfully applied to overcome them. Two different implementations are introduced (PWM-based and VCO-based), and some case-studies are given to support the theories. Conclusions are drawn, with emphasis on possible future development steps.

1 Introduction Sigma-Delta ADCs are a valid and attractive solution to build converters with either a very high resolution (above 20-bits [8]) or a very high analog bandwidth (above 100 MHz [14]) while consuming a lower power if compared to other A/D types, such as SAR (for high resolution) or pipeline (for high bandwidth). Discrete-Time (DT) types are preferred when a high resolution in a narrow-band is required, due to the very good matching achievable between integrators gain and system

A. Di Giandomenico () • R. Gaggl • J. Hauptmann Lantiq, Villach, Austria L. Hernandez • E. Prefasi • S. Paton UCIIIM, Madrid, Spain A. Wiesbauer Infineon Technologies, Villach, Austria M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 4, © Springer ScienceCBusiness Media B.V. 2012

59

60

A. Di Giandomenico et al.

a

b fs =

Amp[H(s)]

1

poles

Ts

u(t)

y(n)

zeros

H(s) DAC

UGF ADC Phase[H(s)] –180º PM

Fig. 4.1 (a) A classical CTSD converter and (b) Amplitude and phase of the open-loop transfer function

coefficients. On the other hand, Continuous-Time (CT) implementation presents someadvantages, when the target resolution is not very high (below 14-bits): • Built-in anti-alias filter: since the sampler is moved just before the internal ADC (see Fig. 4.1), the loop filter H(s) can be designed such that the signal transfer function (STF) has a low-pass characteristic. This can be realized efficiently by using multi-feedback architectures, although an interesting structure was proposed which uses explicit-filtering to provide high immunity to interferers [6]. This property can be used in the system where the CTSD is embedded, to simplify the receive chain, saving area and power. • High-impedance input stage: the input network of a CTSD is usually a simple resistor connected to the virtual-ground of the first integrator (Opamp-RC filter implementation) or a simple differential-pair of a transconductor (Gm-C filter implementation); this simplifies the design of the preceding filter, since it doesn’t need to drive a big switching load (like the one presented from other types of converters such as DT-SD, SAR or pipeline). Moreover, the input network can be made programmable quite easily (for instance with a programmable resistor), realizing in this way also a built-in PGA functionality [6]. The system design of the loop filter is done by placing first the poles to optimize the bandwidth and then by finding the zeros to recover stability. For a given loopfilter, the architecture of the CTSD can be chosen to target high analog bandwidth (BW) and low power consumption, by tuning some system parameters such as OverSampling-Ratio (OSR), loop-filter order or number of levels of the internal quantizer [3]. The physical limit towards BW increase is the maximum sampling frequency achievable in a given technology, while the limit towards power reduction is the current consumption of the active elements used. This paper aims to present two architectural strategies to push the CTSD beyond their limits: • Cascaded (MASH) architectures: increasing the OSR allows increasing the analog BW. When the clock frequency can’t be made higher, then the only option

4 Oversampling Converters Beyond Continuous-Time Sigma-Delta...

61

to further increase the analog BW is reducing the OSR. Since high-order loops become unstable for low OSR, the architecture must be split into a cascade of low-order single-loop SD, still maintaining the overall high-order noise-shaping. • Time-Encoded quantizers: exchanging amplitude-quantization with timequantization allows replacing the power-hungry quantizer with structures simpler which consume less current and occupy less area. This exchange can be done at a price of a higher time resolution (which often means also a higher sampling rate), making these architectures particularly attractive for nanometer technologies. This paper is organized as follows: Chap. 2 gives an overview of the most important challenges for designing efficient CTSD converters; Chap. 3 describes how to extend the analog BW by using MASH topologies; Chap. 4 shows how the time-encoding theory can be effectively applied to the design for low-power and low-voltage applications; in Chap. 5 the conclusions are presented.

2 Challenges and Limitations of Classical CTSD One of the most important challenges for designing an efficient CTSD is to keep the modulator stable even with all the impairments imposed by the non idealities of the analog and digital building blocks. Another limitation of this type of converters – which must be considered in the noise budget since the beginning of the system design phase – is the high sensitivity to clock-jitter. This chapter describes all these challenges and shows the most common ways to reduce their undesired effects.

2.1 Modulator Stability and Excess Loop Delay As described in Fig. 4.1, the phase margin of the open-loop transfer function mainly depends on the location of the compensation zeros with respect to the in-band poles. If the OSR of the modulator is high enough, then the compensation is easy and a high phase margin can be achieved. On the other hand when the OSR becomes too low (typically below 16), then the order of the modulator must be increased to still achieve the target SNR: this leads to a loop filter design very aggressive, with reduced phase-margin and high sensitivity to coefficient deviation. Hence, when trying to increase the BW by reducing the OSR, all non-idealities leading to a phase-loss in the loop must be considered, and primarily two among all: the parasitic effects in the integrators and the digital latency in the feedback path.

2.1.1 Parasitic Effects in a Real Integrator The most severe non-idealities in a real integrator are the finite GBW of the OpAmp and the parasitic capacitance at the virtual-ground node, as shown in Fig. 4.2.

62

A. Di Giandomenico et al.

a

b

CINT RINT

RZ

CINT

RINT –

CPAR

–

+

CPAR

GBW

+

–

+ GBW + –

OpAmp Transfer Func Ideal Integrator Transfer Func

Real Integrator Transfer Func With compensation

FREAL FINT

Real Integrator Transfer Func

GBW

FLP

Fig. 4.2 A real integrator without (a) and with (b) zero-compensation

Given a first-order approximation of the OpAmp (considering the finite DC gain Av1), then the real transfer function of an ideal integrator with ¨int D 1/Rint Cint angular frequency, can be written as: Hint;real .s/ D GE Hint;ideal .s/ Hlp .s/ D GE

Tclk 1 s s Rint Cint 1 C 2F LP

(4.1)

The real transfer function is therefore affected by a gain-error GE and by a phaseloss equivalent to having a high-frequency pole FLP : GE D

1 1C

!int 2GBW

FLP D

! Cint int C GBW Cint C CPAR 2

(4.2)

From (4.2), one can see that both effects are proportional to the GBW of the OpAmp (which changes under different PVT conditions); the gain error mainly depends on how far is the GBW with respect to the integration frequency, while the parasitic pole mainly depends on how big is the parasitic capacitance with respect to the integrating capacitance. The gain error can only be compensated by re-tuning the integration frequency, though particular care must be taken to ensure that the GBW doesn’t change too much with the operative conditions. On the other hand, the phase-loss due to the parasitic pole can be compensated by adding one resistor in the feedback network to realize a pole-zero cancelation [46], such that:

4 Oversampling Converters Beyond Continuous-Time Sigma-Delta... 1

63

2

u(t) H(s) DAC 4

3 DEM ?

FADC u(t) –

ADC-out

H(s)

A

τ = ELD, [0..1] e

–sττ

DAC Ideal delay-free subsystem

PMLOSS

Φ –180° PM

Fig. 4.3 (a) Excess loop delay and its effect in a CTSD converter

1 1 2 GBW D 1C RZ RINT C RZ !int

(4.3)

The compensating resistor Rz is usually very small (in the range of some hundreds of Ohms) and hence can’t be tuned. As a result, from (4.3), one can see that the cancelation works well only if the gain-bandwidth product of the OpAmp doesn’t vary too much with the operative conditions.

2.1.2 Analog and Digital Latencies in the Feedback Path In an ideal CT sigma-delta converter the digital data in the feedback loop is instantaneously generated in the quantizer when the clock edge arrives, is then instantaneously propagated via the feedback network down to the main DAC and is finally instantaneously converted into an analog signal at the summing node of the loop-filter. In real life, however, all these processes require some finite time to happen as in practice (see Fig. 4.3): 1. The comparators need some time to decide, depending on the input level 2. The digital buffers needed to drive the long feedback lines introduce a digital latency 3. If a DEM block is necessary to linearize the main DAC, then its latency must also taken into account (this is often the dominant delay source) 4. The response time of the main DAC is always finite

64

A. Di Giandomenico et al.

b

a +

Hsd(s)

+

–

Hsd(s)

–

+

+

–

fbe

D/A

c

D/A

d

Loop Filter H(s) x(t)

Loop Filter H(s)

Analog Adder

+

+

y(t)

x (t) A/D

+

–

y(t)

+ –

fbe

cint / s

A/D

Last Integrator

fbe¢ u(n)

v(n)

D/A

D/A

Inner D / A converter

Digital Differentiator

u(n)

Inner D / A converter

Fig. 4.4 (a, b) Excess loop delay compensation at system level and (c, d) some implementations

The latency introduced by the DEM block can be avoided by using different techniques to linearize the main DAC, such as background self-calibration [47]. To understand the effects of all these delays in the loop, one can build a simple model, using a single delay element in the feedback path to take into account the four non-idealities listed above. The phase loss introduced by this delay element is then proportional to the relative delay Td (compared to the clock period) and to the analog bandwidth (compared to the clock frequency): P MLOSS

1 / 2OSR

2 Td Tclk

(4.4)

Expression (4.4) tells that the effect is higher for low-oversampling converters, while it can be negligible if the OSR is high enough. Since in Low-OSR converter the loop filter is usually designed to be quite aggressive, any additional phase-loss can’t be tolerated and compensation techniques become unavoidable. One of the most used ELD-compensation consists in adding a high-frequency zero in the openloop transfer function, by a dedicated feedback loop just before the quantizer (see Fig. 4.4b) [1, 5]. One way to realize this loop is to build an additional D/A converter and an analog adder (see Fig. 4.4c); however, this solution is not really efficient, since it requires extra-hardware which costs area and power (the analog-adder should have also very high bandwidth, in order to maintain the high-frequency zero). An alternative and more efficient solution [9, 14] is to make use of the last integrator of the loop-filter (see Fig. 4.4d): in this case, the digital signal must

4 Oversampling Converters Beyond Continuous-Time Sigma-Delta...

65

Fig. 4.5 A sensitive (a) and a less sensitive (b) Noise transfer function

be differentiated before the inner D/A converter, in order to compensate for the following integration. If the architecture of the modulator is multi-feedback, then the inner D/A converter can be also re-used by taking a digital adder to combine the normal digital signal to the ELD-compensation signal coming from the digital differentiator.

2.1.3 Effects of Clock Jitter The biggest drawback of CT sigma-delta modulation is the high sensitivity to clock jitter [1]. This is basically due to the fact that if the width of the pulse sent back in the loop by the main DAC varies randomly at each sampling process, it generates an error which is at least integrated once in the loop filter. If the spectral density of the clock jitter (assumed to be a random process uncorrelated with the input signal) surpasses the quantization noise density, it can limit the overall SNR. The only way to reduce the effects of clock jitter is to reduce the average-step-size (ASZ) of the modulator, and this can be achieved either by increasing the levels of the quantizer (using multi-bit D/A converters) or by reducing the oversampling ratio. However, in [4] a jitter-model was presented, which could be used to link the jitter-sensitivity to the spectral shape of the noise-transfer-function (NTF), hence finding a design criterion to minimize the variance 2dy of the ASZ: 2 Š dy

2 Q

2

Z

2

ˇ ˇ ˇ 1 e j! NTF.e j! /ˇ2 d!

(4.5)

0

Where 2Q is the variance of the jitter noise introduced at the quantizer, assumed white. From (4.5), one can see that the effect of the NTF are shaped by the weighting function Hw(¨)(1 ej¨ ), which gives more emphasis to the higher frequencies of the NTF (see Fig. 4.5).

66

A. Di Giandomenico et al.

3 MASH Topologies One way to look at a cascaded CTSD architecture is to describe the system like a 2-step sigma-delta, where (see Fig. 4.6): • The first stage produces a quantization error ©Q • The quantization error is sensed and sent to the second stage • The second stage amplifies the error to fit into the full-scale range and quantizes it again • A digital filter takes the two digital signals from the single-stages and combines them to cancel out the quantization error of the first stage This concept can be of course extended to N-step converters, with more than two stages. The biggest advantage of this architecture is that the order of the overall modulator is given by the sum of the order of the single stages, allowing for higher stability and lower achievable OSR. On the other side, the digital coefficients must have a perfect matching with the analog coefficients in order to guarantee a perfect cancellation of the quantization noise of the first stage, avoiding noise leakage which could degrade the overall SNR. The biggest challenge in CT MASH topologies is that the quantization error is usually not available in the circuit as it is in Discrete-Time SD-ADCs, because most of the designs use a continuous-time quantizer which does not provide the sampled information before quantization [11, 12]. In such architectures, the estimated quantization error transferred from one SD-stage to the next has a peakpeak value much higher than the actual sampled quantization error. The inter-stage gain must thus be reduced to avoid overloading of the succeeding stages, reducing the efficiency of the cascaded principle by the same factor.

Q1 x(t)

1 st Stage Sigma-Delta

Q

Digital Cancelation Filters NCF1 = STF2

y1(n)

+

Q2

– G

2 nd Stage Sigma-Delta

y2(n)

1/G

NCF2 = NTF1

Y = STF1·X-1 / G· NTF1·NTF2·Q2

Fig. 4.6 A cascaded CTSD

y(n)

4 Oversampling Converters Beyond Continuous-Time Sigma-Delta...

67

Fig. 4.7 Block diagram of the 2–2 MASH converter

3.1 A Design Example In this chapter we present a MASH converter using a novel and simple inter-stage network, which does not overload the succeeding stages, thereby increasing the maximum stable amplitude and the dynamic range at the same time. The proposed architecture is shown in Fig. 4.7. The quantizers have been replaced with linear models. Each stage is implemented as a second order multiple feedback CTSD, so that the implicit anti-alias filter has a second order slope in each stage and a fourth order slope at the output of the cascaded CTSD. The quantization error is estimated by a simple linear combination of the two state variables of the first stage, removing the need of an extra DAC as is usually done [11, 12]. The values of the two coefficients K1 and K2 are chosen imposing that (1) the input of the second stage is a band-limited signal, (2) the transfer function from the first quantization noise to the output of the second stage is all-pass in-band, and that (3) the transfer function from the input signal to the input of the second stage is high-pass in-band. The removal of the extra DAC in the inter-stage network gives the advantage of power and area saving. The integrators are realized with OpAmp-RC cells, where the capacitors are tuned to compensate for process deviation. The feedback DAC connected to the first integrator of the first MASH-stage is self-calibrated. No mismatch-shaping logic is therefore needed in the digital feedback loop. This reduces the Excess-Loop-Delay of the modulator, which is fixed in this converter to ½ of the clock period. The Loop-Filter of the second stage is designed to be equal to the one used in the first stage.

68

A. Di Giandomenico et al.

Analog Bandwidth

40 MHz

30 MHz

Clock-Frequency

800 MHz

600 MHz

Dynamic-Range

63 dB

70 dB

Peak-SNR

60 dB

68 dB

Peak-SNDR

60 dB

68 dB

SFDR Total Power

79 dB

85 dB

110 mW, 1.5 V

100 mW, 1.5 V 1.2 mm

Area

2

0.13 μm CMOS

Technology

FFT @ VIN = –2dBFS / 2.8320 MHz 0 SNDR vs AIN

SNR = 58.8 dB SNDR = 58.8 dB DR = 60.6 dB

–20

70

SNDR [dB]]

50 40 30 SNDR (40MHz)

20

SFDR = 79 dB

–40 –60 –80

10 0

SNDR (30MHz)

–10 –20 –80

PSD [dB]

60

–60

–40 AIN [dBFS]

–20

0

HD2 = 92.2 dBc HD3 = 78.9 dBc HD4 = 90.4 dBc HD5 = 83.3 dBc

–100 –120

0

5

10

15

20 25 freq [MHz]

30

35

40

Fig. 4.8 Layout of the MASH prototype and measurement results

This prototype CTSD modulator has been fabricated in a digital 0.13 m CMOS process. The two digital cancellation filters are designed as FIR types (2-TAPs and 5-TAPs respectively) and emulated in a software platform off-line. The loop-filter coefficients and the sampling frequency are programmable to work in two modes. In the 40-MHz analog bandwidth (ABW) mode, the modulator is clocked at 800MHz; it achieves a DR of 63-dB and a peak-SNDR of 60-dB consuming 110-mW from a single 1.5 V supply. In the 30-MHz ABW-mode, the modulator is clocked at 600-MHz and achieves a peak-SNDR of 68-dB. Figure 4.8 shows a measured output spectrum and the SNR as a function of the input signal. An FFT-plot of the output is also included for the 40-MHz mode, and the in-band zoom reveals a 79-dB SFDR for a -2dBFS sine (FS D 1.5 Vpp-diff).

4 New CTSD Architectures Towards Nanometer Technologies Integrating CTSD converters in nanometer technologies would allow for achieving higher bandwidths without power consumption penalty; however, the integration of these converters is limited by the growing difficulty of implementing circuits with high amplitude accuracy. Many problems concur to make this difficult: the implementation of the embedded multi-bit A/D and D/A converters. The main problem is the limited dynamic range of the comparators in a low voltage technology, which degrades the linearity of the quantizer.

4 Oversampling Converters Beyond Continuous-Time Sigma-Delta...

69

As CMOS technology scales towards smaller feature sizes, it is getting more difficult to design circuits with high amplitude accuracy. This is due to several problems: the threshold voltages of the transistor keep almost constant, while the supply voltages decrease; the intrinsic gain of the transistor reduces, requiring more and more cascading topologies to achieve same linearity; current sources and current mirrors have lower output impedance and a higher leakage current, requesting trimming and calibration procedures. On the other hand, nanometer technologies offer digital operations at a speed faster and faster and at an area-cost smaller and smaller. It becomes then advantageous trying to port some power-hungry operations in the CTSD modulator from the analog-domain to the digital-domain, and the most interesting area for improvements seems to be the embedded Flash A/D in the CTSD loop: here the main problem is the limited dynamic range of the comparators in a low voltage technology, which degrades the linearity of the quantizer. The “tracking-quantizer” [7] has been one of the first attempts to reduce the analog complexity of this block at a cost of more sophisticated digital circuits. However it suffers from a limited slew-rate, which can become a problem if high immunity to out-of-band interferers must be guaranteed (especially in feed-forward CTSD structures with no anti-alias built-in function). To leverage the hardware possibilities opened up by the increased maximum clock frequency of the digital circuits, one more efficient solution has been explored in the recent past: turning the amplitude-quantization into time-quantization thereby focusing on time-based analog signal processing instead of voltage- (current-) based analog signal processing, as explained in Fig. 4.9. The background of this idea is that the analog signal v(t) coming from the feedback-path is being processed by a Continuous-Time integrator (the first CTSD stage); hence, the most important information which v(t) brings in the loop is the area of the DAC pulse, which is proportional to the quantized amplitude of the DAC-output. The same information (i.e. the same area of the DAC pulse) can indeed be propagated if the amplitude of the DAC-output keeps constant, while the pulse duration becomes quantized in time. To make this idea possible, two building blocks can be defined to realize a Single-Bit Time-Encoded ADC equivalent to the standard multi-bit FLASH ADC: a Time-Encoder and a high-speed sampler. The Time-Encoder converts the amplitude of the input signal into a sequence of continuous-time pulses having the same amplitude but different duration and/or period. Such a block has the following characteristics: • It oscillates at rest (i.e. when the input signal u(t) is null) at a free-running frequency Fo, providing a well defined pulse width and period. • The duration of the pulse (referred to the instantaneous oscillation period) is modulated by the amplitude of the input signal u(t), providing a phase and/or frequency variation of the generated signal p(t). • The free-running oscillation Fo is in the same order of the sampling frequency Fs of the equivalent multi-bit FLASH ADC.

70

A. Di Giandomenico et al.

a

b u(t)

y(n)

p(t)

u(t)

p(t)

x(n)

Fs

TDC Single-Bit TE-ADC

FLASH ADC u(t)

p(t)

y(n)

u(t)

Time Encoder

Fc x(n)

p(t)

Time Decoder

Sampler

v(t)

v(t) D/A

D/A N-bit @ Fs

1-bit @ Fc

Ts = 1 / Fs

To = 1 / Fo ROSR = To / Tc

v(t)

v(t) ~ y(n)

COSR = Ts / To

v(t)

Tc = 1 / Fc

Fig. 4.9 (a) Classical multi-bit quantizer within a CTSD loop and (b) an alternative equivalent Single-Bit Time-Encoding-ADC

The high speed sampler must run at a frequency Fc much higher than the free-running oscillation Fo of the Time-Encoder; the Ratio-Over-Sampling-Ratio (ROSR D Fc/Fo) is a measure of the ability of the sampler to digitize the timedomain information contained in the time-encoded analog signal (it’s equivalent to the number of digital levels that a quantizer is able to resolve in a FLASH ADC). In some implementations there is not a physical high speed sampler, as it is embedded in a Time-to-Digital-Converter (TDC) block, where multiple phases of the lowfrequency Fs clock are derived and distributed to a bank of latches, which sample the time-encoded continuous-time signal on different time edges. Although this idea looks simple, many system aspects must be considered, such as the relative position of the free-running oscillation frequency of the TimeEncoder with respect to the system clock Fs of the equivalent CTSD (defined by the ratio COSR D Fo/Fs) and such as the stability of the main CTSD loop. Also the Time-Encoder type plays a significant role, and in some cases additional analog or digital filtering is required in the feedback loop, as better explained in the next section. In general, the implementation of the architecture proposed in Fig. 4.9b looks quite attractive for many reasons: • The multi-bit FLASH ADC can be replaced by a simpler single-bit structure, with a big area and power saving potential: the simpler structure can be either a single comparator or a VCO, as it will be explained in the next section.

4 Oversampling Converters Beyond Continuous-Time Sigma-Delta...

71

• The multi-bit DAC is replaced by a single-bit DAC, which is an inherently linearblock: no DEM block is needed any more in the feedback path, reducing in this way the ELD and making the loop-filter design much easier. In opposite to these positive aspects, the biggest drawbacks of Single-Bit TimeEncoding CT sigma-delta are mainly some toughest requirements on the clock generation and distribution to the modulator. This architecture suffers indeed of: • An increased sensitivity to clock-jitter, which can be considered to be half-way between a standard multi-bit CTSD and a standard single-bit one. • A high sensitivity to deterministic mismatch between time-edges (for those implementations using multi-phase TDC) which can be turned into signal distortion. Indeed, the architecture proposed in Fig. 4.9b is completely different from a standard single-bit CTSD running at clock frequency Fc (i.e. the same structure shown, where only the Time-Encoder is removed). In the latter case, the loop filter would be designed for an OSR much higher, and the average-step-size [4] during normal operation would be also much higher. For this reason the sensitivity to clock-jitter of the single-bit Time-Encoded CTSD is much lower than the standard single-bit CTSD. However, since the height of the pulse at the output of the feedback DAC is always higher than the height of the equivalent multi-bit CTSD in Fig. 4.9a, this latter architecture will still result in a lower sensitivity to clock-jitter. The exact influence of clock-jitter to modulator noise strictly depends on the way the highspeed clock Fc is generated: • In some implementations, this high-frequency clock is directly derived from the main clock source (PLL): in this case the jitter requirement is directly One way to overcome this limitation and to reduce the sensitivity to clock-jitter of TE-based CTSD is to convert the single-bit stream back into a multi-bit stream by using an additional Time Decoder in the feedback loop, between the ADC and the DAC, as shown in Fig. 4.10. This alternative architecture has now only half of the benefits of the Single-Bit one, since the DAC is again multi-bit and for highresolution converters DEM logic is again necessary. However, the implementation of the Time-Decoder which follows the sampler depends on the particular type of the Time-Encoder. Although in some cases it is not possible to achieve the low-latency required by the SD loop to maintain the ELD below the stability-limit, in some other cases it can be realized by means of compact Time-to-Digital-Converters (TDC), which are able to reconstruct the multibit stream at the original sampling rate Fs. As these TDC blocks are built with purely digital cells, their area and power penalty will become more and more negligible with the scaling of the technology. The implementation of the Time-Decoder which follows the sampler clearly depends on the particular type of the Time-Encoder; in some cases it is not possible to achieve the low-latency required by the SD loop to maintain the ELD below the stability-limit, making this multi-bit approach practically unfeasible. In some other

72

A. Di Giandomenico et al.

u(t)

p(t)

y(n)

Fs

Fc

x(n)

Tc = 1 / Fc

Ts = 1 / Fs

Multi-Bit TE-ADC u(t)

Time Encoder

p(t)

y(n)

Time Decoder

x(n)

Sampler

TDC v(t) D/A N-bit @ Fs

Fig. 4.10 A Multi-Bit Time-Encoding-ADC within a CTSD loop

cases, the sampler and the decoder can be realized by means of Time-to-DigitalConverters (TDC), which are able to reconstruct the multi-bit stream at a lower rate (usually the original sampling frequency Fs of the equivalent CTSD).

4.1 Basics of Time-Encoding ADC: Theory and Examples For a structured analysis of the Time-Encoded CTSD systems described in Figs. 4.9b and 4.10, it helps the classifications based on the specific Time-Encoder which makes the Amplitude-to-Time mapping [22]: 1. PWM-based TE-ADC: a Pulse-Width-Modulator (PWM) is used as TimeEncoder, and the way the modulation is produced leads to the two following sub-categories: (a) Synchronous PWM [24]: the pulse-width modulated waveform is generated by comparing the input signal with a periodical waveform (usually a saw-tooth or a triangular-shape) synchronous with the master clock. (b) Asynchronous PWM [18, 19, 26, 30]: the pulse-width modulated waveform is generated with a self-oscillating loop fulfilling the Barkhaussen phase criterion. The main difference between the two subcategories here above defined, is that the self-oscillation frequency is fixed and constant in the synchronous

4 Oversampling Converters Beyond Continuous-Time Sigma-Delta...

73

systems, while it depends on the input signal (amplitude and frequency) in the asynchronous ones. 2. VCO-based TE-ADC [21, 28]: in such systems a Voltage-Controlled-Oscillator is used as Time-Encoder to generate the oscillation at rest; the VCO output can then be perturbed (in frequency and/or phase) by the input signal to the TE-ADC, thereby mapping the amplitude-to-time information.

4.2 PWM-Based CTSD ADCs There are two types of Time-Encoded ADCs [22] (synchronous and asynchronous) based on Pulse Width Modulators (PWM) and their most common implementations are shown in Fig. 4.4. In the former category a fixed frequency square pulse stream is produced, where the width of each pulse is proportional to the amplitude of the sampled continuous time input signal. A saw-tooth signal generator and a comparator are used, and it has been shown that different type of triangular waveforms can lead to different results [20]. When closed in the Sigma-Delta loop, the non-linearity of the ramp generator is attenuated by the loop-gain, so its implementation is normally easy and very efficient. In an Asynchronous PWM Time-Encoded ADC, the oscillation is induced by a positive feedback fulfilling the Barkhaussen phase criterion. The input signal is then added to the feedback signal, perturbing the phase and frequency of the oscillation itself. Figure 4.11b shows the block diagram of a typical implementation, composed of a comparator, a loop filter H(s) and a delay element. Since the loop-filter (usually one integrator) is not sufficient to reach the required 180ı phase, to induce a stable oscillation, the phase of the loop is increased by a delay-block in the feedback path or by introducing hysteresis in the comparator. The oscillation frequency (so called limit-cycle) at rest and the dynamics of the system are nontrivial and need to be established in approximately [19]. One of the biggest challenge of these systems is to control (via tuning or tracking) the limit-cycle frequency over PVT variations (process, voltage, temperature) and also over the whole variety of input signals (amplitudes and frequencies). It can be proven [19] that both of the PWM time-encoding architectures do realize ideal signal coders that do not introduce any error, given that the input signal is band limited and the self-oscillation frequency is sufficiently high. In both cases, to implement a practical decoder with digital logic running with a synchronous clock, the PWM signal must be sampled first. The decoder performs the Time-toDigital-Conversion, i.e. measures the pulse width within a discrete set of values, thereby introducing a time-quantization noise. For this reason, the resolution of a TE-ADC built with this principle is directly proportional to the sampling frequency Fc, which determines the number of discrete-time values resolvable by the TDC (or, in turn, by the achievable time-resolution of the TDC when implemented with multiple-phases of the low-speed Fs clock). When used in combination with a CTSD, then it is convenient placing the TDC decoder within the SD loop, such

74

A. Di Giandomenico et al.

a

Fs TDC Fc

u(t)

+

From Loop Filter

y(n)

–

T&H

Time Decoder

Sampler

Ramp Gen

To Main DAC

Time-Encoder

b

PWM sampled signal

TDC Fc

u(t) From Loop Filter

y(n)

+

–

H(s)

Time Decoder

Sampler Td Delay

Time-Encoder

To Main DAC

PWM sampled signal

Fig. 4.11 (a) Synchronous PWM, (b) Asynchronous PWM

that the time-quantization error gets shaped by the NTF of the Sigma Delta itself, allowing for easier implementation. Is worthwhile to mention that for asynchronous modulators, a perfect recovery algorithm also exists, although it is quite complicated [17] and would require a very sophisticated digital reconstruction filter.

4.2.1 Synchronous PWM CTSD – Case Studies One recent example of a Single-Bit Synchronous PWM Sigma-Delta ADC is given in [24, 26, 29] and the block diagram of such a system is given in Fig. 4.12. The CTSD modulator implements a third order feed-forward architecture, with an active-RC inv-chebyshev filter. An additional feedback loop realizes the ELD compensation by placing a high-frequency zero in the NTF [1]. The triangular waveform used in the PWM generator is realized by integrating the master clock square-wave with a simple active integrator (switched-current sources driving a capacitive load [24]). Double sampled PWM [16] is used to eliminate the harmonics of the input signal (minimize the distortion) as compared with single sampled PWM. The high-speed sampler working at 12.5 GHz does not exist as a unique block in the circuit, being this embedded in a compact time-to-digital converter. The TDC decoder is realized by using 50 different latches, each one triggered by a different phase of the master clock (Fs D 250 MHz), achieving an equivalent sampling of the

4 Oversampling Converters Beyond Continuous-Time Sigma-Delta...

Fs = 250 MHz

75

Fc = 12.5 GHz Single-Bit TE-ADC

Fs = 250 MHz

PWM-mod +

H(s) –

+

+ –

Time Decoder

–

Loop Filter

y(n) [50 lev @ 250MS / s]

Sampler TDC fbe pq(t) D/A 1

D/A 1

Fig. 4.12 The Synchronous PWM CTSD presented in [24]

PWM signal of 12.5 GHz. The re-sampled single-bit feedback signal is then sent back in the CTSD loop, using different DAC types for the two loops: • a differential-pair as the “main-DAC” for high PSRR requirement • a CMOS digital driver (connected to the supplies) as the “fbe-DAC” for power reduction The sensitivity to the jitter of the master-clock Fs of this converter is very low [24], since the TDC architecture generates a DAC pulse with one rise-edge and one fall-edge in each clock period Ts. The resulting waveform can be therefore considered a Return-to-Zero code with very low pulse-width jitter ı£2 and very high delay clock jitter ıtd 2 , resulting therefore quite robust [33]. The pulse-width jitter depends mainly on the size of the delay line used for the multi-phase clock generation: this implies that jitter sensitivity can be exchanged with area and power of the TDC. Increasing the size of the unity elements composing the delay line helps also to reduce the distortion due to static mismatch between them. The time-quantization noise generated in the TDC is similar to the amplitudequantization of the equivalent quantizer, but for the same sampling frequency Fs and for the same number of levels, the noise floor of the time-quantizer results higher (8 dB in this example); this is due to the aliasing of the high-frequency tones of the PWM continuous-time signal (harmonics of the free-running frequency and intermodulation products with the input signal) which are folded in band.

76

A. Di Giandomenico et al. FFT Spectrum of pq(t)

FFT Spectrum of y(n)

0 NTF peak Magnitude (dBFs)

Magnitude (dBFs)

0 NTF peak

–20 –40 –60 –80

–20 –40 –60 –80

–100

–100 10

6

107

107

106

109

108

108

Frequency (Hz)

Frequency (Hz) Fs = 250 MHz Fc / 2 = 6.25 GHz

Fs / 2 = 125 MHz

SNR, SNDR (dB)

Limited DR

70 60 50 40 30 20 10 0 –10

SNR SNDR

–80

–60

–40

Amplitude (dBFs)

–20

0

Analog Bandwidth

20 MHz

Clock-Frequency Fc

12.5 GHz (equivalent)

Clock-Frequency Fs

250 MHz

Dynamic Range

68 dB

Peak-SNR

62 dB

Peak-SNDR

60 dB

Total Power

10.5 mW @ 1.2 V

Technology

CMOS 65 nm

Fig. 4.13 Some measurement results of the Synchronous PWM CTSD presented in [24]

Figure 4.13 shows some measurement results, where the FFT of the signal before decoding and after decoding are shown. The FFT plot of the PWM signal pq(t) clearly shows a high out-of-band (OOB) energy due to the oscillation tone and its multiples. The NTF-peak contributes to the OOB power which is sent back into the loop-filter and reveals a design closed to the instability margin, despite the help of the compensation fbe-DAC. The OOB high-energy constrains the first Op-Amp of the loop filter to have a high gain-bandwidth product and limits also the dynamic range of the modulator, as it can be seen in the SNR versus input amplitude plot.

4.2.2 Single-Bit Asynchronous PWM CTSD – Case Studies When the asynchronous Time-Encoding ADC shown in Fig. 4.11b is embedded into a CT Sigma-Delta converter, the resulting structure is shown in Fig. 4.14a. The architecture can be further improved, by moving the sampler inside the PWM loop (see Fig. 4.14b) and then by moving the PWM-filter H(s) in the feedback path (see Fig. 4.14c), realizing in this way a Single-Bit Time-Encoding-Quantizer (TEQ, [25]). Due to the nature of the limit-cycle generated by the asynchronous modulation (it is not constant, but depends on amplitude and frequency of the input signal), the exact time-decoder filter would require a very high hardware complexity [17,

4 Oversampling Converters Beyond Continuous-Time Sigma-Delta...

a

77

Fc v(t)

x(t) +

–

w(t) +

Hsd(s)

–

p(m)

Time Decoder

H(s)

y(n)

Sampler Td PWM sampled signal

Delay

Time-Encoder u(t) D/A

b

Fc w(t)

v(t)

x(t) +

+

Hsd(s)

–

–

p(m)

Time Decoder

H(s)

y(n)

Sampler u(t) Td

D/A

Delay

Time-Encoder

c

Fc v(t)

x(t) +

Hsd(s)

– u(t)

Loop-2

w(t) +

p(m)

u(m) H(z)

– Loop-1 H(s)

Sampler p(t) D/A

Time-Encoding-Quantizer

Td Delay

Equalizer

y(m) S(z)

y(n) ROSR

Oscillation Removal (Sync) Time-Decoder

Fig. 4.14 The Asynchronous PWM CTSD architecture

19]. Instead, a simple time-decoder can be built as a cascade of an equalization filter H(z) – the digital equivalent of the PWM filter H(s) – and an oscillationremoval filter S(z) – a sync-filter is sufficient for the purpose – which decimates also the digital stream down to the equivalent sampling frequency Fs. Since this modified system present two loops, the dynamics of the signals involved depend on the design of the two filters H(s) and Hsd(s): by proper placement of poles and zeros, it can be guaranteed that at low-frequencies the outer loop (Loop-2 in Fig. 4.14c) dominates, while at high-frequencies the inner PWM-loop (Loop-1 in Fig. 4.14c) becomes dominant. By doing so, one can virtually split the feedback signal path in a way that most of the OOB energy circulates mainly in the inner loop (providing the oscillation at the limit-cycle) and that most of the in-band power circulates in the outer loop (giving to the SD the optimal signal to be cancelled on the input adder). As a result, this modified architecture presents multiple advantages: • Since the PWM filter H(s) has usually a high-frequency low-pass characteristic (can be also a high-frequency integrator), it attenuates the high OOB energy of the signal p(t), it helps to increase the dynamic range of the CTSD modulator, as more dynamic is then available in the loop filter Hsd(s) for the input signal x(t). • As the inner loop (Loop-1 in Fig. 4.14c) provides already a first order shaping to the time-quantization error of the sampler in the TEQ, the intrinsic SNR of the TEQ stand-alone is increased (also thanks to the high OSR). • If the clock Fc is affected by jitter, then the jitter-error-signal at the output of the D/A (which is a stream of very narrow pulses produced every time p(t) changes

78

A. Di Giandomenico et al. Ca1

Ra

Vi x(t) + –

Rx2 Ci2

Ci1

Rx1

R1 + – –+

+

R2

+

+– – +

–

R1

Ca2

–

R2 Rx1 C i1

u(t)

Rx2

Tunable Lowpass

Latch

v(t)

Delay clk

Rx3 C i3

Ca2

Ci2

Vdac1 –

+

Ra

CLK = 2.56 GHz

Vco

v(t)

+– – +

R3

Comp Vci clk

Rx3 Ci3

R3

Passive Adder

DAC 1

Data-Out

Ca1

Outer-Loop

DAC 2

FFT plot of feedback signals in Loop-2 Fc

+

v(t)

–

Hsd(s)

+

Loop-2

BEFORE H2(s)

–20 AFTER H2(s)

–

–40 Loop-1

u(t)

0

p(m)

w(t)

Sampler p(t)

H1(s)

dBFs

x(t)

D/A

Td Delay

–60 –80 –100 –120

p2(t) H2(s)

D/A

–140 10

6

10

7

10

8

10

9

Freq (Hz.)

Fig. 4.15 The Asynchronous PWM CTSD presented in [30]

polarity and which has a high OOB frequency content) is also attenuated by the low-pass filter H(s) before it’s subtracted from the input signal x(t); this clearly reduces the jitter sensitivity of the modulator. As the derivation of the limit-cycle frequency is not easy, the design methodology of such converters is more complex and has been in details explained in [19, 25]. In short, three important parameters must be determined, to make the PWM-based oversampled converter shown in Fig. 4.15c equivalent to the standard CTSD shown in Fig. 4.8a: 1. PWM loop gain: although the oscillation can be easily guaranteed at rest, the dynamics of the loop may be perturbed when an input signal v(t) is applied to the TEQ. The filtered signal u(t) can be considered to be a triangular signal, if the filter H(s) is an integrator or a low-pass filter with very low cut-off frequency as compared to the limit-cycle frequency. When the slope of the input signal v(t) becomes comparable with the slope of the ramp-signal u(t), then the oscillation could be lost, leading the modulator into overload condition. To avoid that this occurs, a sufficiently high gain kg must be chosen in the feedback DAC of the inner loop. The minimum value of the gain kg depends mainly onto the ratio between the oscillation tone ¨O and the maximum input frequency and amplitude, ¨B and A [19] (the maximum amplitude A can be seen also as the full-scale input voltage of the Time-Encoded-Quantizer shownin Fig. 4.14c):

4 Oversampling Converters Beyond Continuous-Time Sigma-Delta... Table 4.1 System parameters of the PWM CTSD presented in [30]

79

Parameter

Value

Analog BW CTSD OSR/Fs Kg COSR/Fosc ROSR/Fc TEQ resolution

20 MHz 16/640 MHz 6.7 0.5/320 MHz 8/2.56 GHz 4.7 bits

ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ j !S ˇ ˇH j !S ˇ kg A j1 C H.j !B /j jH.j !O /j D A ˇˇ1 C H ˇ ˇ 2 OSR COSR ˇ (4.6) 2. Relation between ¨S and ¨O : due to the sampling error, the time when the digitized signal p(m) changes polarity will be always delayed as compared to its analog representation w(t). This introduces a time-varying delay which, in average, will be always smaller that TO (the oscillation period itself). Since this delay may cause the sigma-delta loop to become unstable, it must be kept always below the maximum tolerable excess loop delay EMAX (expressed as a portion of the sampling period Ts). As a rule of thumb, one can assume that the oscillation period must be always smaller than twice the maximum tolerable ELD [19], yielding to: COSR D

!O TS 1 > D !S 2 .EMAX / TS 2 EMAX

(4.7)

3. Relation between the quantizer resolution and the sampler frequency Fc: the time-quantization error introduced by the uniform sampler can be reduced by increasing the sampling frequency Fc. The minimum value of the Ratio-Oversampling-Ratio (ROSR) required to obtain an error lower than the quantization error of an equivalent classical amplitude quantizer with N LEV levels can be estimated to be [19]: ROSRmin D

!C 2NLEV > 2P !O kg

(4.8)

Where P is the amplitude of the PWM signal before quantization (usually it’s 1). Once these three parameters are computed, the Single-Bit Time-EncodedQuantizer (TEQ) is ready for integration into a CT Sigma-Delta converter. Many designs have been proposed to implement system in Fig. 4.14c [25, 27] and the most efficient one, presented in [30], is described in Fig. 4.15. The most important system parameters are listed in Table 4.1.

80

A. Di Giandomenico et al. FFT Spectrum of p(m) for a small signal

FFT Spectrum of p(m) for a big signal

0

0

NTF not peaking

Limit Cycle moving

–20 PSD [dBFS]

PSD [dBFS]

–20 –40 –60 –80 –100

–40 –60 –80 –100

–120 105

106

108 107 Frequency [Hz]

109

–120 105

106

107

8

10

109

Frequency [Hz]

Fosc ~ 320 MHz Fc / 2 = 1.28 GHz Extended DR

70 SNR SNDR

SNR/SNDR [dB]

60 50 40 30 20 10 0 –70

–60

–50 –40 –30 –20 Relative input leve [dBFS]

–10

0

Analog Bandwidth

20 MHz

Clock-Frequency Fc

2.56 GHz

Clock-Frequency Fs Dynamic Range

640 MHz (equivalent) 63 dB

Peak-SNR

63 dB

Peak-SNDR Total Power Technology

61 dB 7.0 mW @ 1.0 V CMOS 65 nm

Fig. 4.16 Some measurement results of the Asynchronous PWM CTSD presented in [30]

The loop filter realizes a multiple feed-forward architecture where the last integrator is used also as capacitive adder of all state-variables. The two feedback loops are split for a different optimization of the D/A converters: • DAC1 in the inner loop: uses as reference voltages the supply rails, with more noise and more gain • DAC2 in the outer loop: uses low-noise reference buffers The analog filters in the feedback path are realized with a passive implementation, boosting the efficiency of the overall modulator. A programmable high frequency pole is used in the filter H1(s) of the inner loop to tune the limit-cycle frequency over process deviation. A low-frequency pole is used in the filter H2(s) of the outer loop to dump the OOB energy in the loop filter, enhancing in this way the Dynamic Range of the modulator. Figure 4.16 shows some measurement results, where the FFT of the singlebit output is compared for different values of the input amplitude. It can be seen that the peak of the limit-cycle (and its harmonics) is very dominant for lowamplitudes, while it becomes more flattened (and moves to lower frequency) for higher values. The NTF does not peak significantly, revealing a robust design with enough stability-margin, despite there’s no use of any ELD compensation technique.

4 Oversampling Converters Beyond Continuous-Time Sigma-Delta...

a

81

Fc v(t)

x(t) +

–

Hsd(s)

w(t) +

u(m)

y(m)

H(z)

p(t)

H(s) Loop-2 Time -Encoding

1-bit D/A

S(z)

Equalizer

Sampler

Loop-1

u(t)

p(m)

–

Oscillation Removal (Sync)

Td Delay

- Quantizer

y(n) ROSR

Time-Decoder

M-bit D/A

b p ( t)

α

1-bit D/A

Time-Encoding-Quantizer p(m)

Fc Loop-1

w (t) +

–

a1/s

+

–

a1/s

+

–

–d

a1/s

z

Fs

Loop-2 M-bit D/A

Time Decoder

y(n)

Edge detect

u(m)

–1

z

1–z

–1

Digital Integrator

Fig. 4.17 A multi-bit Asynchronous PWM CTSD architecture proposed in [17]

The OOB high-energy is attenuated by the filter H2(s) (see Fig. 4.15); hence the design of the first Op-Amp of the loop filter is more relaxed and the dynamic range of the modulator is extended, as it can be seen in the SNR versus input amplitude plot.

4.2.3 Towards Multi-Bit Asynchronous PWM CTSD If the clock speed is already at the edge of the technology, getting more resolution from the Asynchronous PWM CTSD converters can be achieved by increasing the number of levels of the time-encoding quantizer [19] as shown in Fig. 4.17a. This can be accomplished by moving the Equalization filter H(z) inside the outer loop (Loop-2) which will act as the previous analog filter H(s) for the outer loop. The two loops are now independent one from each other, being the inner one still realized with Single-Bit D/A converter. The design procedure of such a converter is the same as the one described in the next section. If the parameter ROSR has been chosen to be the minimum, according to (4.8), then the number of levels for the multi-bit DAC in Loop-2 will be the same as the levels of the quantizer of the equivalent standard CTSD system: NDAC2 D log2

kg ROSR 2

(4.9)

82

A. Di Giandomenico et al.

One interesting design example is also presented in [19] (see Fig. 4.17b), presenting the following characteristics: • The equalizer H(z) is realized with an integrator, which can be implemented as an up/down counter running with the high-frequency clock Fc (the same as the comparator) • The sampling and interpolation operations can be implemented with a row of two M-bit registers running at the low-speed clock Fs • One DEM module might be needed before the multi-bit DAC, in case its static non-linearity does not meet the distortion requirement • The last integrator of the loop-filter can be reused also within the inner loop, to realize the pulse-width modulation If the implementation of the digital integrator is unfeasible due the tough latency requirement (to meet the maximum ELD target), then a hybrid solution has been also proposed, by combining the digital filter in the feedback with the D/A converter, employing a single-bit FIR-DAC [23].

4.3 VCO-Based CTSD ADCs Time encoders using a VCO can also be used to directly implement an ADC. Its principle relies in a very simple fact. If we periodically sample the number of complete cycles of an oscillator that fit within a given sampling time, we will have an estimate of the frequency of the oscillator. Due to the fact that the phase state of the oscillator may not be an integer number of periods within the sampling period, the phase error is accumulated for the next estimation, which results in a first order noise shaping of the quantization error. To implement a data converter we only need to control the oscillator frequency by a voltage such that input voltage maps directly into a digital code. Strictly speaking, such data converter is not a sigma delta modulator, because there is no feedback loop. Instead, is may be seen as a quantizer whose quantization noise exhibits noise shaping. An in depth mathematical analysis of VCO based ADCs can be found for example in [34]. Figure 4.18a shows the basic building block of a VCO based ADC, where the input is directly connected with the VCO frequency control input. The VCO signal clocks a digital counter that is dumped and reset periodically at a sampling frequency fs . This scheme can be refined by using several of the intermediate phases of a VCO implemented with a ring oscillator. The system of Fig. 4.18a produces first-order shaped quantization noise only, and as a consequence, it requires oversampling to achieve a sufficient SNR. As there is no feedback loop, all distortion components introduced in the VCO are coupled to the output. For this reason, the system of Fig. 4.18 can only be used for limited resolutions, although it permits achieving a remarkable FoM and low area. For instance, [35] shows an exemplary circuit implemented with digital inverters which exhibits very low area and FoM, but whose linearity is strongly limited by the VCO.

4 Oversampling Converters Beyond Continuous-Time Sigma-Delta...

a

fs

VCO Input signal

reset COUNTER

Clk LATCH

Digital out

fs

b

Digital out VCO

Input signal

83

reset

Clk

COUNTER

LATCH

Look-up Table

Fig. 4.18 A VCO based oversampling converter

Calibration has been proposed as an option to improve the linearity. In this case, the ADC core resorts to the building block of Fig. 4.18b and first order is maintained. However, the output data is linearized by use of a look up table after the counter. The look up table contents is calculated at power up by a calibration system which may require dedicated calibration hardware. In [37] this approach is employed to achieve a low FoM and very small size converter. Although this approach can be very adequate for low resolution ADCs (below 11-bits), increasing the resolution above that value by means of calibration seems not so efficient. Moreover, concerns have been raised about the possibility of maintaining the calibration with power supply and temperature variations. Apart from these two alternatives, some others have been developed [38], based in similar principles. VCO based oversampled converters have been there for more than 15 years [39], however, this topic cannot be considered already closed and more architectures based on oscillators will likely be seen in the future.

4.3.1 Sigma-Delta Employing VCO-Based ADCs VCO based ADC converters have been the subject of many research works for its promising advantages, namely simplicity and low power. However, the hardware cost of the proposed solutions to their inherent drawbacks has placed them as an option similar to the PWM based converters. Using the VCO as the quantizer of a higher order conventional continuous time sigma delta modulator is an approach which combines several advantages: • The order can be increased arbitrarily; it’s just a matter of increasing the complexity of the analog filter. Moreover, one needs one integrator less than the modulator order, because the VCO already provides first order noise shaping

84

A. Di Giandomenico et al. Φ0

fs

Analog input Hsd

ring oscillator

register

1–z–1

Digital multibit output

ΦN 5b DAC Analog feedback

Fig. 4.19 A VCO as time quantizer of a CT-SDM [33]

• If the VCO is implemented with a particular ring oscillator topology, the feedback DAC may incorporate a Data Weighted Averaging (DWA) effect without the need of a dedicated DWA hardware. Finally, VCO nonlinearity is spectrally shaped by the analog loop filter. In [36], a polyphase VCO is implemented using a ring oscillator as shown in Fig. 4.19. Each output phase, (ˆ0 : : : ˆN) is sampled in a register by a flip-flop and thus, time quantized with a single bit. This way, a multilevel code is generated. After a digital differentiator, this code is both used as digital output and to drive a conventional unit element current-DAC. The rotating effect of the ring oscillator eases the equiprobable use of the DAC elements, resulting in a built-in DWA kind of nonlinearity compensation. In spite of all these advantages, one still needs high performance analog integrators and a multi-bit D/A converter which end up in a real improvement over a standard continuous time sigma delta but with similar limitations, especially the linearity of the first integrator, which dominates the overall performance.

5 Conclusions As many other research topics with an industrial application, Sigma Delta converters have been a fashion item that many semiconductor companies wanted to display in its portfolio. Recently its effectiveness has been questioned in favor of classical Nyquist architectures such as SAR and pipeline converters. As a difference to Nyquist ADC architectures, they have been surrounded by a myth of complicated mathematics and connections with chaos theory and other exotic disciplines. While this is true, their design resorts to a deep analog circuit design knowledge (same a Nyquist converters) combined with a clear understanding of classical linear systems and filter theory (same as the anti-aliasing filters that precede a Nyquist converter). In spite of these arguments, Sigma Delta converters are typically the solution in the industry “after” pipelines and SAR fail to deliver the power or performance that differences a product from its competitors.

4 Oversampling Converters Beyond Continuous-Time Sigma-Delta...

85

The historical steps taken in sigma delta converter evolution go from the ideal model of a sigma delta, embodied as a switched capacitor circuit to continuous time sigma deltas and lately to time encoding designs. This evolution has been forced by starvation of analog performance as feature sizes decreased. The way to exploit the extra digital MHz allowed by shrinking technologies could not come from higher bandwidth opamps or better matched current sources. Hence, researchers have been forced to dig a bit more in unconventional engineering knowledge. Time encoding techniques will represent in the near future one of the main escapes to this low performance of analog electronics associated with nanometer CMOS. The attempts to implement time encoding converters with direct sampling in 65 nm and 45 nm have shown to date that the paradigm of excellence of digital versus analog circuitry is still not enough accomplished. High order continuous time Sigma delta converters implemented with time encoded quantizers [30], VCOs [36] are bridging the gap momentarily, but still require high performance operational amplifiers. It can be envisioned that a definitive solution must benefit from the advantages of time encoding in the whole analog signal processing chain, and not only at the quantizer. This way, time encoding could be applied to filtering, amplification and other building blocks made mostly of digital logic but realizing analog operations. Opamps and integrators implemented with charge pumps and logic inverters [40] are one of the early examples of the changes that will arrive. Other uses of time encoding to implement analog to digital conversion are a mix of traditional frequency synthesis techniques with VCO oversampled converters [41, 42]. A classical time encoding converter was the dual slope ADC. In [43, 44], it is shown how to implement a multi-bit oversampled converter using the dual slope principle by means of time encoding. Continuous time digital signal processing [45] shows that the real barrier is not the speed and performance of CMOS technology but the way engineers see real world signals and signal processing, linked to the classical theory of sampled data systems with a fixed sampling rate.

References CTSD Theory 1. J.A. Cherry, W.M. Snelgrove, Continuous-Time Delta-Sigma Modulators for High Speed A/D Conversion (Kluwer Academic, Boston, 2000) 2. L. Breems, J.H. Huijsing, Continuous-Time Sigma-Delta Modulation for A/D Conversion in Radio Receivers (Kluwer Academic, Boston, 2001) 3. R. Schreier, G. Temes, Understanding Delta-Sigma Converters (IEEE Press, Hoboken, 2005) 4. L. Hernandez, A. Wiesbauer, S. Paton, A. Di Giandomenico, Modelling and optimization of low-pass continuous-time sigma-delta modulators for clock-jitter noise reduction, in Proceedings of 2004 IEEE International Symposium on Circuits and Systems (ISCAS), Vancouver, May 2004, pp. 1072–1075

86

A. Di Giandomenico et al.

CTSD Examples 5. S. Paton, A. Di Giandomenico, L. Hernandez, A. Wiesbauer, T. Poetscher, M. Clara, A 70 mW 300 MHz CMOS continuous-time DS ADC with 15 MHz bandwidth and 11 bits of resolution. IEEE J. Solid-State Circuits 39(7), 1056–1063 (2004) 6. K. Philips, P.A.C.M. Nuijten, R.L.J. Roovers, A.H.M. van Roermund, F. Munoz Chavero, M. Tejero Pallares, A. Torralba, A continuous-time SD ADC with increased immunity to interferers. IEEE J. Solid-State Circuits 39(12), 2170–2178 (2004) 7. L. Doerrer, F. Kuttner, P. Greco, P. Torta, T. Hartig, A 3-mW 74-dB SNR 2-MHz continuoustime delta-sigma ADC with a tracking ADC quantizer in 0.13 m CMOS. IEEE J. Solid-State Circuits 40(12), 2416–2627 (2005) 8. V. Quiquempoix, P. Deval, A. Barreto, G. Bellini, J. M´arkus, J. Silva, G.C. Temes, A low-power 22-bit incremental ADC. IEEE J. Solid-State Circuits 41(7), 1562–1571 (2006) 9. G. Mitteregger, C. Ebner, S. Mechnig, T. Blon, C. Holuigue, E. Romani, A 20-mW 640-MHz CMOS continuous-time SD ADC With 20-MHz signal bandwidth, 80-dB dynamic range and 12-bit ENOB. IEEE J. Solid-State Circuits 41(12), 2641–2649 (2006) 10. S. Ouzonov, R. van Veldhoven, C. Bastianseen, K. Vongehr, R. van Wegberg, G. Geelen, L. Breems, A. van Roermund, A 1.2 V 121-mode CT SD modulator for wireless receivers in 90 nm CMOS, in Proceedings of ISSCC (2007), San Francisco, 2007, pp. 242–243 11. L. Breems, R. Rutten, R.H.M. van Veldhoven, G. van der Weide, A 56 mW CT quadrature cascaded SD modulator with 77-dB DR in a near zero-IF 20-MHz band. IEEE J. Solid-State Circuits 42(12), 2696–2705 (2007) 12. J. Sauerbrey, J. San Pablo Garcia, G. Panov, T. Piorek, X. Shen, M. Schimper, R. Koch, M. Keller, Y. Manoli, M. Ortmanns, A configurable cascaded continuous-time DS modulator with up to 15 MHz bandwidth, in Proceedings of the Custom Integrated Circuits Conference (CICC), IEEE, San Jose, May 2010, pp. 426–429 13.Y. Ke, P. Gao, J. Craninckx, G. Van der Plas, G. Gielen, A 2.8-to-8.5 mW GSM/Bluetooth/UMTS/DVB-H/WLAN fully reconfigurable CT DS with 200 KHz to 20 MHz BW for 4 G radios in 90 nm digital CMOS, in Proceedings of the Symposium on VLSI Circuits Conference, IEEE, 2010, pp. 153–154 14. M. Bolatkale, L.J. Breems, R. Rutten, K.A.A. Makinwa, A 4 GHz CD SD ADC with 70 dB DR and -74 dBS THD in 125 MHz BW, in Proceedings of ISSCC, San Francisco, Feb 2011, pp. 470–471 PWM-Based and VCO-Based Theory 15. E. Roza, Analog-to-digital conversion via duty-cycle modulation. IEEE Trans. Circuits Syst. II 44(11), 907–914 (1997) 16. D.G. Holmes, T.A. Lipo, Pulse Width Modulation for Power Converters: Principles and Practice (IEEE Press, Piscataway, 2003) 17. A.A. Lazar, L.T. Toth, Time encoding and perfect recovery of bandlimited signals, in International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings (ICASSP ’03), 2003 IEEE, vol. 6, Hong Kong, 6–10 Apr 2003, pp. VI709–712 18. F. Colodro, A. Torralba, M. Laguna, Continuous-time sigma-delta modulator with an embedded pulsewidth modulation. IEEE Trans. Circuits Syst. I 55(3), 775–785 (2008) 19. L. Hernandez, E. Prefasi, Analog-to-digital conversion using noise shaping and time encoding. IEEE Trans. Circuits Syst. I 55(7), 2026–2037 (2008) 20. F. Colodro, A. Torralba, New continuous-time multibit sigma-delta modulators with low sensitivity to clock jitter. IEEE Trans. Circuits Syst. I 56(1), 74–83 (2009) 21. M.H. Perrot, VCO-based wideband continuous-time sigma-delta analog-to-digital converters, in Proceedings of the 19th Workshop on Advances in Analog Circuit Design, Graz, Apr 2010, pp. 177–203 22. L. Hernandez, A. Wiesbauer, Exploiting time resolution in nanometer CMOS data converters, in Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), Paris, May 2010

4 Oversampling Converters Beyond Continuous-Time Sigma-Delta...

87

23. F. Colodro, A. Torralba, Pulse-width modulation in sigma-delta modulators, in Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), Paris, May 2010, pp. 1081–1084 PWM-Based and VCO-Based Examples 24. V. Dhanasekaran, Basedband analog circuits in deep-submicron CMOS technologies targeted for mobile multimedia, PhD dissertation, Texas A&M University, College Station, Aug 2008 25. L. Hernandez, E. Prefasi, E. Pun, S. Paton, A 1.2 MHz 10-bit continuous-time sigma-delta ADC using a time encoding quantizer. IEEE Trans. Circuits Syst. II 56(1), 16–20 (2009) 26. V. Dhanasekaran, M. Gambhir, M.M. Elsayed, E. S´anchez-Sinencio, J. Silva-Martinez, C. Mishra, L. Chen, E. Pankratz1, A 20 MHz BW 68 dB DR CT † ADC based on a multibit time-domain quantizer and feedback element, in Proceedings of the Solid State Circuits Conference (ISSCC), San Francisco, IEEE, 2009, pp. 174–176 27. E. Prefasi, L. Hernandez, S. Paton, A. Wiesbauer, R. Gaggl, E. Pun, A 0.1 mm², wide bandwidth continuous-time SD ADC based on a time encoding quantizer in 0.13 m CMOS. IEEE J. Solid-State Circuits 44(10), 2745–2754 (2009) 28. M. Park, M.H. Perrot, A 78 dB SNDR 87 mW 20 MHz bandwidth continuous-time DS ADC with VCO-based integrator and quantizer implemented in 0.13 m CMOS. IEEE J. Solid-State Circuits 44(12), 3344–3358 (2009) 29. J. Silva-Marinez, C.Y. Lu, M. Onabajo, F. Silva-Rivas, V. Dhanasekaran, M. Gambhir, Wideband continuous-time multi-bit delta-sigma ADCs, in Proceedings of the 19th Workshop on Advances in Analog Circuit Design, Graz, Apr 2010, pp. 205–225 30. E. Prefasi, S. Paton, L. Hernandez, R. Gaggl, A. Wiesbauer, J. Hauptmann, A 0.08 mm², 7 mW time-encoding oversampling converter with 10 bits and 20 MHz BW in 65 nm CMOS, in Proceedings of ESSCIRC 2010, Sevilla, 2010 Others (Asynchronous, etc.) 31. S. Ouzonov, E. Roza, H. Hegt, G. van der Weide, A. van Roermund, An 8 MHz, 72 dB SFDR asynchronous sigma-delta modulator with 1.5 mW power dissipation, in Proceedings of the Symposium on VLSI Circuits Conference, Honolulu, IEEE, 2004, pp. 88–91 32. S. Ouzonov, E. Roza, H. Hegt, G. van der Weide, A. van Roermund, Design of highperformance asynchronous sigma delta modulators with a binary quantizer with hysteresis, in Proceedings of the Custom Integrated Circuits Conference (CICC), San Jose, IEEE, 2004, pp. 181–184 33. O. Oliaei, H. Aboushady, Jitter effects in continuous-time SD modulators with delayed returnto-zero feedback, in Proceedings of the International Conference on Electronics, Circuits and Systems, The Hague, IEEE, 1998, pp. 351–354 34. J. Kim, T.-K. Jang, Y.-G. Yoon, S.H. Cho, Analysis and design of voltage-controlled oscillator based analog-to-digital converter. IEEE Trans. Circuits Syst. I Regul. Pap. 57(1, January), 18–30 (2010) 35. U. Wismar, D. Wisland, P. Andreani, A 0.2 V 0.44 uW 20 kHz analog to digital sigma delta modulator with 57 fJ/conversion FoM, in Proceedings of the 32nd European Solid-State Circuits Conference, 2006, ESSCIRC 2006, Montreux, 19–21 Sept 2006, pp. 187–190 36. M.Z. Straayer, M.H. Perrott, A 12-Bit, 10-MHz bandwidth, continuous-time sigma-delta ADC With a 5-Bit, 950-MS/s VCO-based quantizer. IEEE J. Solid-State Circuits 43(4, April), 805–814 (2008) 37. J. Daniels, W. Dehaene, M. Steyaert, A. Wiesbauer, A 0.02 mm2 65 nm CMOS 30 MHz BW all-digital differential VCO-based ADC with 64 dB SNDR, in 2010 IEEE Symposium on VLSI circuits (VLSIC), Honolulu, 16–18 June 2010, pp. 155–156 38. G. Taylor, I. Galton, A mostly-digital variable-rate continuous-time delta-sigma modulator ADC. IEEE J. Solid-State Circuits 45(12), 2634–2646 (2010) 39. M. Hovin, A. Olsen, T.S. Lande, C. Toumazou, Delta-sigma converters using frequencymodulated intermediate values, in 1995 IEEE International Symposium on Circuits and Systems, ISCAS ’95, vol. 1, Seattle, 30 Apr–3 May 1995, pp. 175–178

88

A. Di Giandomenico et al.

40. L. Brooks, H.-S. Lee, A zero-crossing-based 8b 200MS/s pipelined ADC, in IEEE International Solid-State Circuits Conference, 2007, ISSCC 2007, San Francisco, 11–15 Feb 2007, pp. 460–615 41. L. Hernandez, E. Prefasi, Continuous time † modulator based on digital delay loop and time quantisation. Electron. Lett. 46(25), 1655–1656 (2010) 42. B. Young, P.K. Hanumolu, Phase-locked loop based -† ADC. Electron. Lett. 46(6), 403–404 (2010) 43. E. Prefasi, E. Pun, L. Hernandez, S. Paton, Second-order multi-bit † ADC using a pulsewidth modulated DAC and an integrating quantizer, in 16th IEEE International Conference on Electronics, Circuits, and Systems, 2009. ICECS 2009, Hammamet, 13–16 Dec 2009, pp. 37–40 44. L. Hernandez, E. Pun, E. Prefasi, S. Paton, Continuous time sigma-delta modulator based on binary weighted charge balance. Electron. Lett. 45(9), 458–460 (2009) 45. M. Kurchuk, Y. Tsividis, Signal-dependent variable-resolution clockless A/D conversion with application to continuous-time digital signal processing. IEEE Trans. Circuits Syst. I Regul. Pap. 57(5), 982–991 (2010) 46. A.M. Soliman, M. Ismail, Phase correction in two-integrator loop-filters using a single compensating resistor. Electron Lett 14(12), 375–376 (1978) 47. S. Paton, T. P¨otscher, A. Di Giandomenico, K. Kolhaupt, L. Hernandez, A. Wiesbauer, M. Clara, R. Frutos, Linearity enhancement techniques in low OSR, high clock rate multi-bit continuous-time sigma-delta modulators, in Proceedings of the Custom Integrated Circuits Conference (CICC), San Jose, IEEE, 2004

Chapter 5

Considerations for Cost-Efficient Calibration of Scaled ADCs Marian Verhelst, Erkan Alpman, and Hasnain Lakdawala

Abstract Observed ADC area and power scaling do not seem to follow the trends predicted using pure technology scaling arguments. A cubic improvement in area and power with gate length is observed in literature, which has been enabled by migration towards more and more capacitor-based ADC architectures, and the introduction of digitally-assisted performance enhancement strategies to overcome component mismatch. This paper assesses these trends, and discusses the most relevant enhancement strategies for mismatch-limited ADCs. Trade-off analysis between mismatch compensation in the analog domain (digitally assisted trimming, possibly in combination with up-scaling) vs. the digital domain (digital postdistortion) is required. The increasing use of digitally enhanced ADC architectures proves to be the main driver for the observed improvement in area and power with scaling.

1 Introduction The need for increased mobility and portability of computing devices and ever increasing data rate requirements puts more and more stress on the ADC’s performance. At the same time cost and battery life issues demand continuous scaling of the ADC area and power consumption and require designs in smaller and smaller (CMOS) technologies. Furthermore increased dynamic range required for modern communication standards also pushes the required dynamic range of the ADCs with scaling. This causes significant noise and matching issues in several key ADC building blocks, as traditional scaling studies predict a power and area flattening or even increase.Nevertheless, a survey of published data indicates that

M. Verhelst () • E. Alpman • H. Lakdawala Intel Labs – Radio Integration Research, 2111 NW 25th Ave, Hillsboro, OR, USA e-mail: [email protected] M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 5, © Springer ScienceCBusiness Media B.V. 2012

89

90

M. Verhelst et al.

ADC performance does improve significantly over technology. This contradiction is explained by new architecture and design innovations in ADC design that exploit the inherent improvements provided by CMOS technology scaling. These improvements include: 1. Metal finger capacitor (MFC) density as well as MFC matching per pF improved significantly over the last technology generations. 2. The speed of digital gates increases, while their power and area reduce. Analysis of performance enhancement techniques that exploit these advantages of scaling is necessary to understand the improved performance of ADC implementations and to extrapolate these learnings towards future scaled ADC designs. This paper starts by deriving the expected ADC area and power consumption trends from pure technology scaling in Sect. 2. Section 3 makes the comparison with observed trends from survey data on state-of-the-art ADCs of the last decade. Next, Sect. 4 focuses on several digital enhancement techniques to explain the inconsistency between the theoretical and observed trends. Section 5 finally derives the strategies to incorporate these calibrations in a cost-aware way into ADC design and illustrates this with a design example.

2 Theoretical Performance Trends in Scaled ADCs 2.1 ADC Performance Limiters Noise and distortion impose fundamental limits on ADC performance. Their impact on ADC area and power consumption has been covered extensively in literature [1–4]. This section summarizes these dependencies, which will be used in Sect. 2.2 to evaluate the impact of scaling. The conversion accuracy of ADCs is typically expressed in ENOB (effective number of bits), or SNDR (signal-to-noise-and-distortion power ratio): SNDR.dB/ 1:76 ; 6:02

(5.1)

S Nnoise C Nmismatch C Nnonlin

(5.2)

ENOB D SNDR D Where:

S: signal power at the ADC input, or Vsig; rms 2 . Nnoise : input referred noise power. Noise appearing in the ADC output signal is caused by a combination of quantization noise, thermal noise, flicker noise and input sampling jitter. Quantization noise, caused by the finite resolution quantization intervals, sets the limit for the maximum achievable SNDR. Practical ADC designs

5 Considerations for Cost-Efficient Calibration of Scaled ADCs

91

are also limited by thermal noise, which can be characterized by the total integrated noise: Nnoise D

kB :T ; C

(5.3)

With kB the Boltzmann constant, temperature T and effective input referred noise capacitance C. Nmismatch : distortion due to mismatch. As shown in [2, 5] matching rather than thermal noise dictates the performance of low resolution ADCs. Mismatch of critical circuit elements has different effects depending on the ADC architecture. In Flash converters the random mismatch among the comparators’ (or pre-amplifiers’) offset degrades performance, while in a SAR ADCs, the comparator offset is un-important, but the sensitivity to capacitor (and hence radix) mismatch is large. Pipelined ADCs need carefully matched opamps and capacitors to maintain good ENOB, while finally time-interleaved ADCs heavily suffer from mismatch among gain, offset, skew or bandwidth of the time interleaved channels. This mismatch causes non-linear distortion, affecting dynamic and static ADC metrics, like SNDR, INL and DNL. The latter have to be reduced to a fraction of the LSB to avoid ADC performance degradation. One way to reduce circuit mismatch, is by increasing circuit area. This linear relationship is demonstrated in Eqs. 5.4 and 5.5 for amplifier differential offset voltage matching (Vgs ) [5–7], as well as capacitor matching (C ): # " A2ˇ 2 1 2 AVT C .Vgs / D Vgs VT W:L 4 2

2

C C

D

A2C N Cunit

(5.4)

(5.5)

with AVT t , Aˇ and AC technology constants, W.L the transistor area and N the unit capacitor multiplier. However, due to the deterministic nature of mismatch (unlike thermal noise), opportunities for smarter correction exist. They are the primary focus of this paper and will be covered extensively in Sect. 4. Nnon lin : device non-linearity. The linearity of an ADC is further degraded by device non-linearity. A well know example of this is the input sampling stage, which

92

M. Verhelst et al.

a

b

VDD

VDD

vbias2

i2

vo1 vid1 / 2

iod1

W1, L1

–vid1 / 2

W1, L1

vo2 Cpassive1 vi2

W2, L2 (M2)

Cpassive2

IB1

Fig. 5.1 Representative circuit model of ADC input stage (a) and output stage (b)

is affected due to charge injection and a varying input resistance [8]. The input resistance Ron heavily depends on the sampled input voltage Vin : Ron D dV in =dI D

2 WL

1 Cox .Vin VT /

(5.6)

To limit the non-linear distortion, the difference in Ron over the signal swing (Ron) has to be kept small. Assuming a transmission gate: max Ron D

2k 2 Vdd VT Ronmax D D Vdd Ronmin k=2 1 . 2 VT /=2

(5.7)

This parameter however significantly degrades with process scaling, as the ratio k D Vdd=V th decreases rapidly with technology (trending < 2.5 for some low leakage <45 nm CMOS). Luckily, gain boosting and bootstrapping resolve most of this signal dependency, but at the cost of input bandwidth, area and power [9]. Again, due to the deterministic nature of the impairment, post-distortion techniques have proven to provide additional improvement [8]. Amplifying stages can also be a source of non-linear distortion. Although even order harmonics are typically cancelled out by employing differential circuits, odd order distortion does affect the ADC performance. For the differential pair of Fig. 5.1a, and assuming ideal square law devices, it can be shown that [10]: iod Dx IB

r 1

x2 ; 4

with x D

vid .Vgs1 VT /

(5.8)

5 Considerations for Cost-Efficient Calibration of Scaled ADCs

93

and hence by approximation: gm IB D 8:.Vgs1 VT /2 : Š8 gm3 2:Kn :W1 =L1

(5.9)

with gm being the transconductance, gm3 D ı 3 iod =ıVgs3 (third harmonic of the transconductance) and Kn D 2n Cox .

2.2 Fixed Performance ADC Scaling The noise and distortion formulae of previous section allow assessing the impact of scaling on ADC performance. More specifically this section derives trends in ADC area and power consumption with iso-performance technology scaling. Broader studies of scaling effects in analog and mixed signal circuits can be found in [6, 7, 11–14]. The circuits shown in Fig. 5.1 will be used as a representative circuit for an ADC input stage (Fig. 5.1a), respectively output stage (Fig. 5.1b), driving a passive capacitor Cpassive . These circuits are relevant for a multitude of recent ADCs based on open loop amplifiers and passive capacitors, like pipelined, sigma-delta or SAR ADCs. While noise and matching constraints mainly impact the input stage, device linearity has to be addressed for both input- and output stage. Throughout this study, following assumptions are made: • The square law MOSFET model is used, with the understanding that short channel effects limit the accuracy of this model in scaled technologies. However this assumption allows a first order calculation to reveal trends. • L scales with technology. Design rules of deep submicron technologies do not allow long channel devices without severe area and leakage penalties. The scaling factor of L per technology generation will be denoted sL (0.7). • Voltage scaling is pursued less aggressively in latest technology generations. Leakage concerns cause the threshold voltage to be almost flat. As a result VDD scales at a slower pace to keep sufficient voltage overdrive. Therefore, a voltage scaling factor sV , different from sL , is used. Recently this factor has been trending p towards sL or even less when different supply voltages are used for analog and digital blocks. • Gate tox , and as a result Cox has not been keeping up with feature scaling due to gate leakage concerns. Survey data [15] shows Cox scaling of 1=sL (towards 1=sL for high-k gate). Interconnect tox scales with 1=sL . • Absolute matching coefficients AVT and AC (Eqs. 5.4 and 5.5) improve with every new technology generation. Lately this improvement rate has been trending p around sL [15, 16]. Aˇ , which does not see similar improvements, has not been taken into account in this study, resulting in a slightly optimistic (smaller area, power) outcome.

94

M. Verhelst et al.

• A passive metal finger capacitor (MFC) capacitor is assumed. MFC capacitor density scales with technology as 1=sL [16]. • The input stage operates in Class A, or IB1 =iod1 D constant. This implies that .Vgs VT / cannot scale faster than the input signal swing, or sV . • Iso-bandwidth (conversion rate) scaling is pursued in all scenarios (unless other iso-performance requirements force the bandwidth to be larger). This poses the following constraints for the representative circuits: Input stage: (linear settling): gm1 D constant Cload1

(5.10)

Output stage: (slew rate limited): Cload2 :.vo /max D constant i2

(5.11)

Table 5.1 shows the effect on the most important circuit parameters when scaling device length (sL ) and supply voltage (sV ) under different scenarios.

2.2.1 Iso-(device-)linearity Scaling In a first scenario, an iso-linearity scaling of the input stage is pursued, assuming a constant load capacitor Cpassive1 . Only the intrinsic non-linearity of the transistor device is considered. Non-linearity due to mismatch will be covered later (isomatching). As can be derived from Eq. 5.9, this linearity is maintained as long as the overdrive voltage Vov D .Vgs1 VT / scales proportional to vid . This is realized by scaling both voltages with sV , resulting in a device width scaling: W

I L Cox .Vgs VT /2

p sI sL : sL : sV 2

(5.12)

Iso-(device-)linearity scaling of the output driver stage (Fig. 5.1b) requires both .Vgs VT / of M2, as well as the voltage drop over the output stage bias transistor to scale with sV . Assuming constant bandwidth, bias current can be decreased according to Eq. 5.11. As a result, iso-linearity enforces similar scaling to the output driver stages as derived for the input stage (Eq. 5.12). As can be seen from Table 5.1, iso-(device-)linearity scaling (without other noise or matching requirements) results in an almost perfect scaling with technology, p where both area and power scale down with a factor sL to sL 2 (using sV sL ). From this observation, linearity does not seem to be affected by scaling if the input swing range is allowed to be reduced with the input supply, predicting ever

sL sV sV 1 sV sL 3=2 sV 1 sV 2 sL sL 5=2 sV 1 sL2 sL p Green formulaeassuming sV sL

L Vdd (vo ), (vi )max Cpassive I W Power Area (active) Area (passives)

sL sV sV sV 2 sV 1 sL 3=2 sV 3 1 sL 5=2 sV 3 sL sL sV 2 1

sL sV sV sL sL 3=2 .passive load/ sV 2 .offset/ sL 3=2 sV sL1 sL sV 2 1 sL 2

sL sV sV sV 2 .noise/ sL 3=2 .linearity/ sV 2 .matching/ sL 3=2 sV sL1 sL sV 2 1 sL sV 2 1

Table 5.1 Effect on design parameters, area and power consumption of the input stage reference circuit of Fig. 5.1a under iso-performance scaling Parameter Iso-linearity Iso-noise Iso-matching Iso-SNDR

5 Considerations for Cost-Efficient Calibration of Scaled ADCs 95

96

M. Verhelst et al.

decreasing area and power numbers for scaled technologies. However, system, noise and matching requirements will make it harder and harder to scale the input swing with sV , which will be reflected in an area and power penalty as seen in the following scenarios. 2.2.2 Iso-thermal Noise Scaling To keep the signal-to-thermal-noise-ratio constant in a scaled technology, assuming the input swing scales proportional to the supply, circuit noise has to be suppressed with sV 2 . As a result, the passive capacitive load Cpassive has to be scaled up with the same factor. To account for the larger gm requirement due to this load under iso-bandwidth constraints and only allowing limited overdrive voltage scaling to maintain Class-A operation, this requires approximately flat device width scaling. Table 5.1 shows the impact on area and power consumption. It has to be noted that flicker noise, important for low frequency ADC designs has been neglected here. Flicker noise limited designs can use architectural solutions like correlated double sampling or need to keep input device sizes large to limit the flicker noise. 2.2.3 Iso-matching Scaling In an iso-matching scenario the amount with which active and passive circuits can scale is limited, and directly tied to the improvement over technology of AVT and AC (Eqs. 5.4 and 5.5). Additionally, the scaled supply voltage and input swing increases the threshold voltage matching requirement for the active devices with sV . The decrease in required gm (smaller load) does however not allow significant power savings due to the Class A operating requirement. The design is no longer iso-bandwidth, but is forced to increase bandwidth with ‘sL 3 ’ at a larger power cost. As shown in Table 5.1, area is flat for active devices, passive area decreases due to improved capacitor density and matching. Note that a purely passive load is assumed. An active load (with matching requirements, like in current steering DACs, or CT † ADCs) scales slower, 3=2 resulting in a 1=sL times higher power consumption. 2.2.4 Iso-SNDR A good ADC implementation will always use all excess margin in terms of every performance limiter: device non-linearity, noise and matching. As result, when such a design has to scale down, performance across all three has to be improved simultaneously. The last column in Table 5.1 shows the result on the circuit’s area and power consumption for such an iso-SNDR scaling (iso-linearity C iso-noise C iso-matching). The size of the passives will typically be determined by

5 Considerations for Cost-Efficient Calibration of Scaled ADCs

97

noise, and of the actives in many designs (e.g. flash) by threshold voltage matching, while the current is set based on the linearity constraint of Eqs. 5.9, taking the p increased W due to matching into account. Assuming sV sL , this scenario results in iso-bandwidth scaling as well.

2.3 Conclusion An interesting observation from Table 5.1 is the difference in scaling between active and passive devices. While active devices seem to suffer less from scaling in a noise-limited scenario, passive devices scale better under matching constraints. This relates to the ongoing shift of ADC designs towards oversampling implementations (relying more on active devices) for noise limited designs, while more passive capacitor based designs (like SAR ADCs) gain popularity for low SNDR requirements (matching limited [17]). However, as can be concluded from Table 5.1, no significant overall area or power improvements can be expected from pure technology scaling alone under fixed performance constraints. The ADC area and power seems solely dependent on and tied to its SNDR requirement. Due to the p slowed down voltage scaling sV 6 sL both are more or less flat, or slightly increasing over generations. If sV would have maintained his old trend sL , power would have scaled better at the cost of additional area. The increasingly common trend of using of dual supplies for the ADC, with the analog supply being higher and scaling slower than the digital supply helps in terms of area scaling. Next section will compare these trends with survey data from recent ADC implementations.

3 ADC Area and Power Survey Based on the survey data from [18] an assessment can be made about the actual trend of area and power consumption of iso-SNDR ADC implementations over the past decade. The power consumption scaling per calendar year is studied extensively in [17]. The study reveals a significant scaling divergence between high SNDR ADCs (>75 dB SNDR), limited by thermal noise (technology), and lower SNDR ADCs, which are mismatch limited. The remainder of this paper will focus on trends for mismatch limited ADCs, as they contain the majority of recent ADC designs. This section will investigate the trend in area scaling, as well as quantify the area and power scaling over technology generations. Figure 5.2 shows an analysis of all ADC implementations presented at the IEEE International Solid-State Circuits Conference (ISSCC) and the VLSI Circuit Symposium during the last decade [18]. In this plot, the area efficiency (area divided by the Nyquist sampling rate fnyquist ) is plotted in function of the achieved SNDR.

98

M. Verhelst et al.

ISSCC, VLSI data 10–4

2000-2001 2010

area / fnyquist [mm2 / Hz]

fit all fit 2000-2001

10–6

fit 2010

10–8

10–10

10–12 20

40

60 SNDR [dB]

80

100

Fig. 5.2 ADC performance data (ISSCC 2000–2010, VLSI Circuit Symposium 2000–2010). Area efficiency plotted in function of SNDR

The oldest (2000–2001) and most recent (2010) implementations are highlighted. A first observation is the large spread of the data around their best linear fit, which can be explained by different performance metrics targeted by the various ADC designs, not all reflected in this drawing: area, power, bandwidth, or a combination of them. However, due to the abundance of data, interesting conclusions can still be drawn from averaged data through linear regression models. Figure 5.2 shows linear fits constructed based on all mismatch limited ADCs (SNDR < 75 dB) of different publication years. Based on these lines, a clear improvement of area efficiency from 2000 to 2010 can be identified. This trend is also observed in three generations of similar sigma-delta ADCs at Intel in scaled technologies (Fig. 5.3) [19]. This observation is contradictory to the theoretical area scaling effect derived in previous section. One partial explanation is a shift toward passive capacitor-based ADC designs, which rely on passive, rather than active device matching. As shown in Table 5.1, these devices still scale quite well over technology. Figure 5.4 confirms this trend: The fraction of SAR ADC implementations significantly increased over the past years. Also, sigma-delta (SD) and pipeline ADCs, relying heavily on passive capacitors as well, remain popular.

5 Considerations for Cost-Efficient Calibration of Scaled ADCs

MASH 2-2 90 nm CMOS 1X

99

MASH 2-0 45 nm CMOS ~ 0.2X

MASH 2-2-0 32 nm CMOS ~ 0.1X

Fig. 5.3 Three generations of similar delta-sigma ADC implementations, demonstrating the ongoing area improvement over technology generations

1

other SAR SD(SC+CT) flash pipeline

fraction of ISSCC +VLSI paper

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 year

Fig. 5.4 Fraction of ADC architectures published in ISSCC and VLSI in different calendar years

However, as derived in Table 5.1, this design shift can only (partially) explain an improvement in area efficiency. Improvements in power efficiency (power consumption divided by the Nyquist sampling rate fnyquist ) are not expected from the scaling study of Sect. 2. Figure 5.5 however shows that the power efficiency of ADC designs demonstrates a similar decrease over the last decade. This scatter plot visualizes the power versus area FOM (figure-of-merit): power FOM D

P ; fnyquist :2ENOB

area FOM D

A fnyquist :2ENOB

(5.13)

100

M. Verhelst et al. 10–10 ISSCC, VLSI data

power FOM

10–11

2000-2001 2010

10–12

10–13

10–14

10–15 10–14

10–12

10–10 area FOM

10–8

10–6

Fig. 5.5 Power figure-of-merit vs. area figure-of-merit (FOM)

The scattered data can again be attributed to different design optimization metrics pursued. A clear power-area trade-off locus can be observed, which steadily improves over the years for both area and power. A similar trend can be observed when computing the expected area and power consumption of a iso-SNDR ADC in different technology generations: Based on a linear fit of both area and power in function of SNDR over the ISSCC/VLSI survey data for every different CMOS technology generation between 600 and 65 nm (not enough data points available for 45 nm), the area and power consumption of a typical comms ADC, targeting 60 dB SNDR, is predicted. Figure 5.6 plots the result: a perfect scaling with sL 2 for both metrics : : : To understand why this perfect scaling with technology is possible, despite the contradictory theoretical derivation of Table 5.1, let’s look at two interesting data points in Fig. 5.5: the most power efficient and most area efficient design up to date, indicated with the ‘stars’: The most power efficient ADC design, described in [20] is a 4.4fJ/conversion step charge redistribution (SAR) ADC, heavily relying on metal-plate capacitor matching. The most area efficient ADC design, described in [21] is a 0.01 mm2 flash ADC using minimum-size input devices in 65 nm. To compensate for resulting non-linearities and offsets in the comparator and track-and-hold, the ADC employs digital compensation techniques, both calibrated during startup.

5 Considerations for Cost-Efficient Calibration of Scaled ADCs

101

fitted area / fs, resp. power / fs for 60 dB ADC

100

10–1

10–2 mean area scaling mean power scaling scaling with L sL2 10–3

65

90

130

180

250 technology

350

500

650

Fig. 5.6 Predicted power consumption and area for 60 dB SNDR ADC over technology generation (normalized to the 650 nm data point (1 m2 /Hz, 9.5nW/Hz))

These two examples beautifully illustrate the two most important strategies followed in many of the ISSCC/VLSI survey designs to overcome mismatch limitations and maintain aggressive area and power scaling over technology: 1. Rely on metal-plate capacitor matching instead of device matching whenever possible (see also Fig. 5.4). 2. Add digital enhancements to the ADC to boost performance. Although a majority of the recent ISSCC/VLSI ADC implementations heavily relied on digital enhancements, only few of them demonstrated these in actual silicon. As a result, their true power and area cost is often not taken into account in the reported performance metrics. The remainder of this paper will focus on various digital enhancement strategies for ADCs, as well as quantify their benefits and penalties. This information helps the ADC designer to make smart design choices to optimize overall area and power of analog plus digital.

4 ADC Performance Through Digital Enhancements Designers have been using digital enhancements for many decades to boost the ADC performance at a reduced power/area cost compared to traditional up-scaling

102

M. Verhelst et al.

Fig. 5.7 (b–f) Enhancement techniques to improve matching performance over the minimum size, thermal noise-limited, baseline design (a)

[22, 23]. Analog power/area is saved, at the expense of more digital gates. Finding the best trade-off between the two is not straightforward and requires thorough understanding of the impact of these enhancements. This section will revise different digital enhancement techniques and their influence on both analog and digital performance metrics. This data can be used to understand the sL 2 scaling trends seen for ADC area and power consumption, and to investigate whether this trends is expected to continue in the future. A well-known and thoroughly studied digital enhancement technique for overcoming SNDR limitations for thermal noise limited ADCs is oversampling [24]. This section will therefore solely focus on enhancements for mismatch limited ADCs.

4.1 Non-digital Enhancements Figure 5.7 gives a classification of various strategies to improve matching performance of circuit components, and by extension of ADCs.

5 Considerations for Cost-Efficient Calibration of Scaled ADCs

103

The baseline, reference design which has to be improved is drawn in Fig. 5.7a. It consists of a set of N mismatched circuit components, which can be either active or passive devices, depending on the ADC under study. Example in a Flash ADC, these could be N pre-amplifiers which require careful voltage offset matching; While in a SAR ADC these could be N capacitors. In this reference design of Fig. 5.7a these N components have the minimal size required to meet the signal-to-thermal-noiseratio constraint (target SNDR). We denote this size as ‘1’. As shown in Table 5.2, such an ADC would consume a reference (D‘1’) analog area Aanalog and power Panalog , and does not need any digital area Adigital , power Pdigital or calibration time Tcal . Its performance is limited by mismatch, having a component variation of .a/ and the resulting SNDR is again normalized to ‘1’. A first way to improve matching between the fundamental circuit components of the reference design of Fig. 5.7a in a fixed silicon technology is by up-sizing the individual circuit components. Up-scaling every device area with a factor U, as shown in Fig. 5.7b improves the component matching and reduces their variance p with U (Eqs. 5.4 and 5.5). As a result, the ‘voltage accuracy’ improves by U , or an SNDR (power ratio) improvement of a factor ‘U’ is achieved at the cost of a ‘U’ times analog area and power increase (See Table 5.2). From the discussions earlier in this paper, it is clear that this is not the way mismatch and non-linearity are overcome in modern ADCs. The oldest calibration techniques to improve ADC matching and linearity are based on analog feedback (e.g. Opamps). Drawback of these analog feedback loops are however the requirement for the circuit to remain active during the whole circuit operation, as well as the very stringent gain-bandwidth (GBW) and linearity requirements for the feedback opamps and reproducibility and yield concerns. Designing under these requirements becomes problematic in scaled technologies and has a detrimental impact on system power consumption and area. This trend, together with the ever decreasing cost of digital gates over technology [2], pushes designers towards digital performance enhancements to improve performance will a smaller area/power penalty. Three techniques will be described: digitally assisted analog selection (including analog redundancy), digitally assisted analog trimming and digital post-distortion.

4.2 Analog Redundancy and Digitally Assisted Analog Selection A straightforward approach to avoid designing accurate analog components is to create analog redundancy, and average the outcome of the redundant, inaccurate (min-size) elements in the digital domain (shown in Fig. 5.7c) to enhance performance. This is e.g. used in Flash convertors to reduce sensitivity to offset voltage, by having several comparators evaluate the same input voltage and use their output in a voting mechanism [25]. By statistically averaging, the designer can get away with small devices.

Component trimming Digital postdistortion

1 C Z/2.ftrim (Š1 C 3. a ) 1 (e.g. 1.2)

Minimal (at startup) Significant (e.g. LUT)

1 1 C 6:a 1 (e.g. 1.2)

log2(Z) 0 (if ran in background)

Minimal Significant (e.g. LUT)

"" ,depends on impairment, f(an. redund., LUT size)

6. a /ftrim (DZ)

sL 2 ; sL 2 (Eq. (5.16)) sL 3=2 ; sL 2 (digital scaling)

Table p 5.2 Costs and SNDR improvement of enhancement techniques. Last column shows the effect of technology scaling on every enhanced ADC (assuming sV sL ) SNDR (defined Technology Aanalog Adigital Panalog Pdigital Tcal by Eq. (5.2)) effect on (P ; A) Min size design 1 / 1 / 0 1 sL 1 ; 1 (Table 5.1) Upscaled U / U / 0 U sL 1 ; 1 (Table 5.1) design Analog X Small, log2(X) X Small, 0 X sL 1 ; 1 (Table 5.1) redundancy log2(X) erfcinv.erfc.3/1=Y / sL 1 ; 1 Component Y Minimal >1 Minimal Y 3 (Table 5.1) selection (at startup)

104 M. Verhelst et al.

5 Considerations for Cost-Efficient Calibration of Scaled ADCs

105

Fig. 5.8 Area and SNR impact of different performance enhancement techniques, normalized to the thermal noise-sized design (1,0). Assumptions for baseline design: 7% mismatch, ftrim;min D 1/10, trimming overhead D 1/10

As shown in Table 5.2, increasing the number of redundant devices bypa factor of X, will only reduce equivalent device mismatch standard deviation by X , hence improving SNDR by X. A small area and power penalty in the digital domain is paid, to implement the averaging. As a result, analog redundancy shows a similar, or even slightly worse performance cost compared to classical up-scaling. Similarly, Fig. 5.8, which plots the SNDR improvement in function of area increase, shows an identical SNDR-area relationship for device up-scaling and analog redundancy, which both linearly reduce the component variation with increasing area (see distribution histograms in Fig. 5.8). Pure analog redundancy is hence not a good strategy to enhance ADC performance. It can however be extended with digitally assisted analog component selection, which is much more interesting [26]. This technique (Fig. 5.7d) aims at digitally selecting the best devices out of the pool of redundant analog components. It has been applied to Flash converters to reduce input offset voltages, where for an N- bit converter, all 2N comparator (Cpreamplifier) stages are replaced by Y identical copies of the same component. During a training phase, the best matching comparator (smallest offset) is selected out of every pool of Y comparators [26, 27]. This “selection step” reduces the variation of the remaining components much more effectively than adding redundancy. It can be derived that the 3:.d / spread

106

M. Verhelst et al.

(determining SNDR) after component selection in scheme (d) is reduced from the spread 3:.a/ of the reference design of scheme (a) as: 3:.d / D erfcinv .erfc.3/1=Y /:.a/

(5.14)

with erfc and erfcinv the (inverse) error function. The resulting “peaking” distribution is depicted in Fig. 5.8. Power savings are more significant, than area savings, since power is saved due to smartly shutting down the non-selected components. However, depending on how the non-selected components are gated, they might still load the input stage, resulting in some additional power consumption compared to the baseline design. The overhead of the off-line calibration required to implement this approach can have significant impact on the system and should not be neglected. Contrary to previous solutions (Fig. 5.7), calibration time (Tcal) will have to be foreseen in the manufacturing environment or when powering up the device to run the selection procedure. Depending on the configuration stability, this could however be a onetime tune-and-store process.

4.3 Digitally Assisted Analog Trimming While analog component selection allows reducing ADC power consumption for a fixed SNDR drastically, it does not come with a significant area breakthrough. Even more importantly though, as shown in the last column of Table 5.2: it does not allow to break with the traditional scaling laws presented in Table 5.1 and hence does not explain the observed scaling of Fig. 5.6. The enhanced designs (b)–(d) don’t scale any better than the reference case (a)! More interesting it is however to trim, instead of selecting components [28–30]. In component trimming (Fig. 5.7e), component variation is reduced by postmanufacturing inserting or removing small fractions of the component. A wellknown example is trimming capacitor values by connecting or disconnecting small capacitor to the main capacitor [30]. Similar trimming can be done to match current sources, gm, etc. [28, 29]. As shown in Fig. 5.8, this kind of trimming reduces the spread of the component drastically, since it cuts the tails of the component variation distribution. Both the amount of trimming steps Z, as well as the size of every trim step required depends on the original variation of the baseline device (.a/ 2 ) and the target SNDR. All trim steps together should cover the 6:.a/ spread of the original component. Or, defining ftrim as the fraction of the trim step size to the original component size: .Z C 1/:ftrim D 6:.a/

(5.15)

5 Considerations for Cost-Efficient Calibration of Scaled ADCs

107

Fundamental unit of digital trimming Vout Ctrim Corig Cp

select Rp

Fig. 5.9 Fundamental unit of digital trimming

The resulting SNDR will then show an improvement with approximately a factor Z (6:.a/ =ftrim ) over the baseline design. This of course comes at a small area cost due to the component selection switches and interconnect overhead. The power cost is often negligible, since the extra load of the switches can be incorporated in the design. Certainly the most important observation is that digitally controlled trimming decouples device sizing from the target SNDR, since it is tuned post-manufacturing to the required SNDR. As a result the area and power cost is nearly independent of SNDR, as seen in Fig. 5.8 by the steep increase in SNDR at almost no area cost. This would lead to the conclusion that minimum sized components (thermal noise limited) can in theory be used as trimmable devices. This however does not hold in practice due to two reasons: 1. The size of a trim component cannot be made arbitrarily small. Technology limits the trim step ftrim . As a result, the maximally achievable SNDR improvement for a design depends on the original mismatch 6:.a/ and the best achievable trim ratio ftrim . 2. The component selection area and power overhead is not negligible for close to minimum size devices. To assess this, let’s look into a representative circuit, which can be used as a “fundamental unit of digital trimming”: a digitally controlled switch, plus a passive element (Rtrim or Ctrim ) (Fig. 5.9). The overhead of using this passive element as a trim component are the parasitics of the switch, which load the component under trim, consume area and power and diminish the effect of the trim impact. In order to keep the overhead marginal and ensure a predictable impact of the trim, the value of the passive trim component should be significantly (e.g. 5X) larger than the parasitic of the switch.

108

M. Verhelst et al.

As a result, these switch parasitics (Cp or Rp ) determine the maximal achievable SNDR improvement and hence the sizing and power consumption of the original component under trim. Since switch parasitics, and hence ftrim;min Cp ; Rp , heavily depend on technology, this directly links the design cost to achieve a certain SNDR to the silicon technology used. Parasitics are moreover characterized by following scaling rules [31]: Cp;switch W:L sL 2

(5.16)

Rp;switch L=W 1

(5.17)

Interconnect parasitics follow different scaling rules [31]: Cp;interc

W:L sL ! .local interconnect/ tox

1 ! .constant length interconnect/

(5.18)

Rp;interc L=.W:H / 1=sL 1=2 ! .local interconnect/ 1=sL 3=2 ! .constant length interconnect/

(5.19)

The relationship between ftrim;min and Cp together with Eq. 5.16 justifies a scaling of the ADC design cost (area and power) with sL 2 over technology: The intrinsic accuracy with which components can be trimmed in a certain technology improves by sL 2 , which does explain the observed trends of Sect. 3. This conclusion holds for C-based trimming, and as long as interconnect parasitics do not dominate. Since interconnect parasitics start to become more and more relevant relative to device parasitics, a slowdown of this area and power scaling trend is to be expected. It is also clear that R-based trimming is not favorable in advanced silicon technologies. Once the trimming accuracy limit of a certain technology is reached, the only way to increase SNDR further in the analog domain is to increase original device sizes. This up-scaling results in a relative decrease of ftrim . It however again has a linear effect on area and power consumption. An alternative to stick with minimal size devices is to increase SNDR in the digital domain by using digital post-distortion when technology prevents further trimming.

4.4 Digital Post-processing Due to the decreasing cost of digital gates over technology generations, digital postdistortion (Fig. 5.7f) to boost ADC performance becomes less and less costly [2]. A multitude of digital post-distortion techniques for ADCs have been developed and published over the past decade and are all very diverse in nature. The two most common, but very distinct, classes of digital post-processing are “look-up-table

5 Considerations for Cost-Efficient Calibration of Scaled ADCs

109

Table 5.3 Analog overhead required for digital impairment correction Impairment type Offset Gain error Radix mismatch (e.g. in SAR) Linearity correction

Required analog overhead ADC dynamic range C max offset ADC dynamic range * max gain error Use nominal radix <2, to ensure max radix 2. Typically radix Š 1.7 Ensure worst case DNL still within spec (smaller nominal DNL)

(LUT)”-based [32] and filter-based [8]. The former corrects analog impairments by using the bare ADC output as the index to a table look-up containing the corrected sample data. This approach is very straightforward and easy to implement, but only suitable for low SNDR (small number of bits) ADCs. It is also not well suited for on-line calibration. Filter based correction eliminates impairments by sending the bare ADC output data through a digital filter. The filter coefficients are adapted (on-line or offline) based on the detected actual ADC impairments. The type of filter required (linear vs. non-linear, order, length, etc.) is heavily dependent on the nature of the impairment(s) under correction. E.g. offset and gain mismatches between channels in a time interleaved ADC can simply be corrected with a linear, single-tap ‘filter’, while skew or non-linearity correction of the same ADC needs complex higher order implementations [8, 33]. These types of adaptive filters are especially attractive in medium to high SNDR ADCs, where LUT approaches become infeasible and where the cost of extra digital gates is relatively low compared to the analog power. Moreover, these filters lend themselves perfectly to on-line training and background adaptation, hence eliminating the need for (and cost of) startup calibration time Tcal , but only if initial settling transients are acceptable before achieving full performance. While very distinct in nature, digital post-distortion techniques have several key characteristics in common: They are all able of achieving very large SNDR improvements, and allow close-to-minimum size ADC front-end designs. Like in the case of digitally assisted trimming, this partly decouples component sizing, area and power requirements from the target SNDR, explaining the observed improvement over technology. However, unlike in the case of digitally assisted trimming, this comes at a significant (digital) area and power penalty. Additionally, a penalty in the analog domain has to be paid as well: Almost all forms of digital post-distortion require analog overhead to allow correction in the digital domain. This overhead is necessary to ensure that no unrecoverable information gets lost when digitizing the data. Table 5.3 lists the required overhead for common digitally corrected ADC impairments. The amount of analog overhead area and power consumption due to this depends on the (3- value of the) expected impairment, but can typically be estimated to be between 10% and 20%. It finally has to be noted that sometimes digital power and area are non-existent if leveraged from other DSP blocks already present in the system [34].

110

M. Verhelst et al.

5 Cost-Aware Calibration Design 5.1 Cost-Aware Enhancement Selection Previous section gave an overview of strategies for digital ADC enhancement. The effect of every strategy on the area, power consumption, the calibration time, as well as the SNDR was discussed. As shown, the various strategies improve component matching, which in its turn allows to scale down analog component sizes, and hence area and power consumption. However, often a penalty on either the digital area, power or the calibration time has to be paid. As a result, the selection of the optimal strategy for a particular ADC design is not straightforward. It requires a careful analysis of the expected analog savings due to the enhancement compared to its cost. Depending on the relative importance ˛ i of area vs. power vs. calibration time in the overall cost factor K of the particular design a different solution will be preferred. K D ˛1 :.Aanalog C Adigital / C ˛2 :.Panalog C Pdigital / C ˛3 :Tcal

(5.20)

Often a combination of different strategies offers the best trade-off. This “strategy selection/combination” problem can be treated as an optimization problem. Starting from a minimum sized design, without any enhancements, the overall cost factor K has to be minimized under an SNDR constraint by successively applying several enhancement strategies (Fig. 5.10). At every point in this optimization process, the enhancement strategy x with the highest SNDR vs. cost sensitivity SSNDR; K; x should be applied first until the target SNDR is achieved: SSNDR;K;x D

SNDR=x ; applying enhancement strategy x K=x

Fig. 5.10 Optimizing K under SNDR constraint by successively applying performance enhancement with largest sensitivity

(5.21)

5 Considerations for Cost-Efficient Calibration of Scaled ADCs

111

This sensitivity is a measure of the expected cost investment to achieve SNDR improvement with this strategy. As can be seen from Fig. 5.8 and Table 5.2, up-scaling and analog redundancy have and equal SNDR-cost sensitivity of ‘1’. Analog selection demonstrates a larger sensitivity, which is however still far off from the sensitivity of digitally assisted trimming, being 2/ftrim considering area cost and even larger when taking power cost into account. However, the benefit of this large SNDR sensitivity to trimming is only limited, since technology restricts achievable ftrim . When more SNDR improvement is needed, additional up-scaling can however be combined with trimming, resulting in a joint SNDR-cost sensitivity of: SupscaleCtrim

p p 6:.a/ = U =ftrim :U p 6:.a/ = U =ftrim .for large U / (5.22) D .U 1/ C 3.a/ = U

with .a/ and ftrim , resp. the component standard deviation and trim-ratio of the minimum size baseline design. This sensitivity decreases when U gets larger. The SNDR-cost sensitivity of digital post-distortion cannot be put into formulae that easily, as it heavily depends on the type (LUT vs. filter) and complexity of the implemented enhancement. Moreover, the sensitivity will in general be larger for complex, high resolution ADCs, since the area and cost adder will be relatively smaller compared to the overall ADC area and power consumption [2]. This sensitivity should be compared to the (decreasing) SupscaleCtrim , to determine at what point in the design it no longer makes sense to add additional trimming (in combination with up-scaling) and post-distortion should be deployed instead due to its larger sensitivity.

5.2 Practical Example: TI SAR ADC Calibrations In this section, the cost K of a time interleaved (TI) successive approximation (SAR) ADC will be optimized by applying different enhancement strategies. The performance of the 7-bit, 2.5 GHz 45 nm TI SAR ADC [35] (architecture in Fig. 5.11), is severely impacted by mismatch of the following parameters: • Capacitor values within every ADC: The linearity of a SAR ADC depends on the ADC’s capacitor ratios, which have to be equal to 2. • Offset and gain mismatch: Different (comparator) offset voltages and gains for the different TI channels also cause spurious tones. • Sampling skew mismatch between the different TI channels. 5.2.1 Capacitor Value Mismatch As depicted in Fig. 5.11, the SAR ADC under study does not rely on a binary weighted DAC, but uses a C-2C DAC instead [36]. This has the advantage that the

112

M. Verhelst et al.

Fig. 5.11 Architecture of the TI SAR ADC (* D parasitic capacitor)

DAC size only increases linearly with the resolution and small, fixed capacitor sizes can be used. However, a C-2C DAC suffers heavily from the parasitic capacitances at the intermediated nodes, which distort the capacitor ratios and can serve as a resolution limit [36]. The proposed SAR however incorporates these parasitic capacitors into the design: The ‘Cu /2’ capacitors in this design (Fig. 5.11) represent the total parasitic capacitances at the intermediate nodes but are also included as an integral part of the DAC [35]. Their nominal (parasitic) value is around 40 fF, which sets the limit of the minimum size design of this ADC, which is hence not thermal noise limited. The values of these parasitic capacitors will be severely mismatched and have to be corrected by calibration. When assessing the SNDR-cost sensitivity of both capacitor trimming and digital post-distortion, capacitor trimming is to be preferred here due to the achievable ftrim being: ftrim D

Ctrim D 0:125; 40f F

with Ctrim D 5:Cp;switch D 5fF in 45 nm CMOS design

(5.23)

As shown in Fig. 5.11, a slightly (2.2X) smaller effective ftrim can even be achieved by placing several trim capacitors in series. The trimming structure is able to compensate the range of 6. mismatch with a Z D 6, assuming a mismatch of D 5% on the nominally 40 fF parasitic capacitors. The resulting measured performance improvement can be seen in Fig. 5.12.

5 Considerations for Cost-Efficient Calibration of Scaled ADCs

113

Fig. 5.12 INL improvement after capacitor trimming of individual SAR ADC

Fig. 5.13 Automated offset calibration loop in TI SAR ADCs [35]

The remaining capacitors ‘Cu ’ and ‘2Cu ’ are formed by metal finger capacitors (MFC). In the technology (45 nm LPCMOS ) used, their matching is more than sufficient to achieve 7-bit resolution without the need for additional calibration.

5.2.2 Offset and Gain Mismatch Once mismatch within the individual ADCs is calibrated, the mismatch across ADCs has to be addressed. In this design, an un-calibrated offset mismatch up to ˙85 mV is expected, while un-calibrated gain mismatch is typically limited to less than 5%. Both offset, as well as gain mismatches can easily be detected in the digital domain. Also the circuitry to correct for them does not require a lot of digital gates. However, due to the large offset mismatch (3- spread typically equals ˙85 mV D 34% of comparator input range) the analog cost adder of the required extra analog redundancy to allow digital post-distortion is very large. As a result, the SNDR-cost sensitivity of digital post-distortion will be poor for offset mismatch correction, while it is good for gain mismatch correction. As a result, the ADC gain is calibrated in the digital domain using a multiplier, while an automated offset calibration loop is implemented in the analog domain (Fig. 5.13).

114

M. Verhelst et al. 0

0 SNDR = 31.4 dB SFDR = 42.8 dB

–20

–40 PSD [dB]

PSD [dB]

–40 –60 –80

–60 –80

–100

–100

–120

–120

–140

SNDR = 36.0 dB SFDR = 49.1 dB

–20

0

2

4

6

8

10

12

–140

14

0

2

4

8

Input frequency [Hz]

x 10

6

8

10

Input frequency [Hz]

12

14 x 10

8

Fig. 5.14 Performance of full ADC before (left) and after (right) TI channel mismatch correction

5.2.3 Sampling Skew Mismatch Sampling skew mismatch affects the maximum ADC performance by [37]: SNRmax D 20 log

1 skew fsignal =fsample

.M 1/ 2 4 dB 10 log M

(5.24)

with skew the sampling error standard deviation and M the number of time interleaved ADC channels. The original sampling skew mismatch in our design is estimated have a skew 1:5% of the sampling period. This sampling skew mismatch between different ADCs is much harder to detect than gain or offset mismatch, and has attracted a lot of attention from the research community. The most common estimation approach is a digitally implemented LMS optimization loop [33]. While implemented in the digital domain, the correction can both be executed digitally (digital post-distortion) or in the analog domain by using digitally assisted trimming. The required compensation step resolution is found from Eq. 5.24 to be about 0.2% of the total ADC sampling period, or 1 psec. Digital post-distortion involves the implementation of a time-varying fractional delay filter, often realized with a (poly-phase) Farrow filter [38]. The instantiation of this, typically at least 30-tap [33], digital filter comes with a large area and power penalty, as it is always on and runs at the full rate. The digitally assisted trimming counterpart consists of a digitally tuned delay line. This can be realized by cascaded inverters, loaded by a programmable capacitor bank (16 2.5 fF capacitors). Both the area (45 um2 /channel) and power cost (50uW/channel) of this are negligible on the total ADC design. The realized performance enhancement of the full TI SAR ADC under the described channel mismatch correction (offset, gain, timing) can be derived from Fig. 5.14.

5 Considerations for Cost-Efficient Calibration of Scaled ADCs

115

6 Conclusions Theoretical analysis of ADC scaling over technology predicts a flat to increasing trend on the ADC’s area and power consumption. A study over survey data of all published state-of-the-art ADC designs over the last decade however shows a cubic area and power improvement with technology gate length. The same survey data seems to indicate that this improvement is realized by heavily relying on capacitorbased ADC architectures, as well as by exploiting digitally assisted performance enhancement techniques. Several digitally assisted techniques to improve performance of mismatchlimited ADCs are evaluated. Digitally assisted trimming, as well as digital postdistortion do not only show the largest SNDR-cost sensitivity, but also demonstrate the same cubic improvement relationship with the technology gate length, which explains the observed ADC scaling. A practical design example is used to describe the ADC enhancement selection for different mismatch parameters. More work is required on system level techniques for fast, but low complexity background mismatch estimation techniques, to limit the calibration time penalty of both digitally assisted trimming and post-distortion.

References 1.P. Scholtens, D. Smola, M. Vertregt, Systematic power reduction and performance analysis of mismatch limited ADC designs, in Proceedings of ISPED, Bordeaux, Aug 2005, pp. 78–83 2.B. Murmann, Limits on ADC power dissipation, in Analog Circuit Design, ed. by A.H.M. van Roermund, H. Casier, M. Steyaert (Springer, Dordrecht, 2006), pp. 351–367 3. Y. Chiu, B. Nikolic, P.R. Gray, Scaling of analog-to-digital converters into ultra-deepsubmicron CMOS, in Proceedings of IEEE CICC, San Jose, 2005, pp. 375–382 4.K. Uyttenhove, M. Steyaert, Speed–power–accuracy tradeoff in high-speed CMOS ADCs. IEEE Trans. Circuits Syst. (CAS-II) 49(4), 280–287 (2002) 5. P. Kinget, M. Steyaert, Impact of transistor mismatch on the speed-accuracy-power trade-off of analog CMOS circuits, in Proceedings of IEEE CICC, Rochester, 1988, pp. 333–336 6. M. Pelgrom, A. Duinmaijer, A. Welbers, Matching properties of MOS transistors. IEEE J. Solid-State Circuits 24(5), 1433–1440 (1989) 7. P. Kinget, Device mismatch and tradeoffs in the design of analog circuits. IEEE J. Solid-State Circuits 40(6), 1212–1224 (2005) 8. P. Nikaeen, B. Murmann, Digital compensation of dynamic acquisition errors at the front-end of high-performance A/D converters. IEEE J. Sel. Top. Signal Process. 3(3), 499–508 (2009) 9. A. Abo, P. Gray, A 1.5-V, 10-bit, 14.3-MS/s CMOS pipeline analog-to-digital converter. IEEE J. Solid-State Circuits 34(4), 599–606 (1999) 10.K. Laker, W. Sansen, Design of Analog Integrated Circuits and Systems (McGraw-Hill, New York, 1994) 11. P. Woerlee, M. Knitel, R. van Langevelde, D. Klaassen, L. Tiemeijer, A. Scholten, A. Zegersvan Duijnhoven, RF-CMOS performance trends. IEEE Trans. Electron. Devices 48(8), 1776– 1782 (2001) 12. C. Diaz, D. Tang, J. Sun, CMOS technology for MS/RF SoC. IEEE Trans. Electron. Devices 50(3), 557–566 (2003)

116

M. Verhelst et al.

13. Q. Huang, F. Piazza, P. Orsatti, T. Ohguro, The impact of scaling down to deep submicron on CMOS RF circuits. IEEE J. Solid-State Circuits 33(7), 1023–1036 (1998) 14. M. Vertregt, P. Scholtens, Scalable high-speed analog circuit design, in Analog Circuit Design, ed. by M. Steyaert, J.H. Huijsing, A.H.M. van Roermund (Kluwer, Boston, 2003), pp. 3–21 15. C.-H. Jan et al., RF CMOS technology scaling in high-k/metal gate era for RF SoC (Systemon-chip) applications, in Proceedings of IEEE IEDM, 2010, San Francisco, 2010, pp. 27.2.1 16. International Technology Roadmap for Semiconductors, 2009 edition: www.16.net/Links/ 200916/Home2009.htm 17. B. Murmann, A/D converter trends: power dissipation, scaling and digitally assisted architectures, in Proceedings of IEEE CICC, 2008, San Jose, 2008, pp. 105–112 18. B. Murmann, ADC performance survey 1997–2010, [Online]. Available: www.stanford.edu/ murmann/adcsurvey.html 19. P. Malla, H. Lakdawala, K. Kornegay, K. Soumyanath, A 28 mW spectrum-sensing reconfigurable 20 MHz 72 dB-SNR 70 dB-SNDR DT † ADC for 802.11n/WiMAX receivers, in Proceedings of IEEE ISSCC, 2008, San Francisco, 2008, pp. 496–497 20. M. van Elzakker, E. van Tuijl, P. Geraedts, D. Schinkel, E. Klumperink, B. Nauta, A 1.9 W 4.4fJ/Conversion-step 10b 1MS/s charge-redistribution ADC, in IEEE ISSCC, 2008, San Francisco, 2008, pp. 244–245 21. H. Chung, A. Rylyakov, Z. Toprak Deniz, J. Bulzacchelli, G.-Y. Wei, D. Friedman, A 7.5-GS/s 3.8-ENOB 52-mW flash ADC with clock duty cycle control in 65 nm CMOS, in Proceedings of Symposium on VLSI Circuits, 2009, Kyoto, 2009, pp. 268–269 22. B. Murmann, Digitally assisted analog circuits. IEEE Micro 26(2), 38–47 (2006) 23. B. Murmann, C. Vogel, H. Koeppl, Digitally enhanced analog circuits: system aspects, in Proceedings of ISCAS, 2008, Seattle, 2008, pp. 560–563 24. R. van de Plassche, A sigma-delta modulator as an A/D converter. IEEE TCAS 25(7), 510–514 (1978) 25. S. Weaver, B. Hershberg, D. Knierim, U. Moon, A 6b stochastic flash analog-to-digital converter without calibration or reference ladder, in IEEE ASSCC, 2008, Fukuoka, 2008, pp. 373–376 26. L. Pileggi, G. Keskin, X. Li, K. Mai, J. Proesel, Mismatch analysis and statistical design at 65 nm and below, in IEEE CICC 2008, San Jose, 2008, pp. 9–12 27. M. Flynn, C. Donovan, L. Sattler, Digital calibration incorporating redundancy of flash ADCs. IEEE Trans. Circuits Syst. CAS-II 50(5), 205–213 (2003) 28. B. Verbruggen, J. Craninckx, M. Kuijk, P. Wambacq, G.V. der Plas, A 2.2 mW 1.75 GS/s 5 bit folding flash ADC in 90 nm digital CMOS. IEEE J. Solid-State Circuits 44(3), 874–882 (2009) 29. D.C. Daly, A.P. Chandrakasan, A 6-bit, 0.2 V to 0.9 V highly digital flash ADC With comparator redundancy. IEEE J. Solid-State Circuits 44(11), 3030–3038 (2009) 30. G.V. der Plas, S. Decoutere, S. Donnay, A 0.16pJ/conversion-Step 2.5 mW 1.25GS/s 4b ADC in a 90 nm digital CMOS process, in IEEE ISSCC, 2006, San Francisco, 2008, p. 2310 31. J. Rabaey, A. Chandrakasan, B. Nikolic (eds.), Digital Integrated Circuits, 2nd edn. (Prentice Hall, Upper Saddle River, 2003) 32. C. Grace, P. Hurst, S. Lewis, A 12b 80MS/s pipelined ADC with bootstrapped digital calibration, in IEEE ISSCC, 2004, San Francisco, 2004, p. 460 33. C. Vogel, S. Saleem, S. Mendel, Adaptive blind compensation of gain and timing mismatches in M-channel time-interleaved ADCs, in Proceedings of IEEE ICECS, Malta, Sept 2008, pp. 49–52 34. Y. Oh, B. Murmann, System embedded ADC calibration for OFDM receivers. IEEE Trans. Circuits Syst. CAS-I 53(8), 1693–1703 (2006) 35. E. Alpman, H. Lakdawala, R. Carley, K. Soumyanath, A 1.1 V 50 mW 2.5GS/s 7b timeinterleaved C-2C SAR ADC in 45 nm LP digital CMOS, in IEEE ISSCC, 2009, San Francisco, 2009, pp. 76–77 36. S. Iyer et al., A 0.5 mm2 integrated capacitive vibration sensor with sub-10 zF/rt-Hz noise floor, in IEEE CICC, 2005, San Jose, 2005, pp. 93–96

5 Considerations for Cost-Efficient Calibration of Scaled ADCs

117

37. Y.-C. Jenq, Digital spectra of nonuniformly sampled signals: a robust sampling time offset estimation algorithm for ultra high-speed waveform interleaving. IEEE Trans. Instrum. Meas. 39(1), 71–75 (1990) 38. C. Farrow, A continuously variable digital delay element. Proc. IEEE ISCAS 3, 2641–2645 (1988)

Chapter 6

A 12b 2.9 GS/s DAC with IM3 < 60 dBc Beyond 1 GHz in 65 nm CMOS K. Bult, C.-H. Lin, F. van der Goes, J. Westra, J. Mulder, Y. Lin, E. Arslan, E. Ayranci, and X. Liu

Abstract A 12b 2.9GS/s current-steering DAC implemented in 65 nm CMOS is presented, with an IM3 < 60dBc beyond 1 GHz while driving a 50 load with an output swing of 2.5Vppd and dissipating a power of 188 mW. The SFDR measured at 2.9 GS/s is better than 60 dB beyond 340 MHz while the SFDR measured at 1.6 GS/s is better than 60 dB beyond 440 MHz. The increase in performance at highfrequencies, compared to previously published results, is mainly obtained by adding local cascodes on top of the current-switches with “always-ON” biasing.

1 Introduction In large Systems-on-Chips (SoC), Data Converters are critical for connecting signals to the real world, often limiting the accuracy and speed of the overall system. The clear trend towards digital signal processing of traditional analog functions, reducing the Analog Front-End (AFE) and Analog Back-End (ABE) to a bare minimum, by connecting the Data Converter directly to the terminals, requires very high bandwidth and sampling speeds. As an example, 10GBASE-T Ethernet requires a Transmit DAC sampling at 1.6 GS/s with at least 70 dB IM3 up to a frequency of 400 MHz, while supporting an amplitude of 2.5 Vppd driven in to a 50 load. In recent years many papers have been published on high-speed DAC design [1–9]. Most of these designs concentrate on obtaining good low-frequency

K. Bult () • F. van der Goes • J. Westra • J. Mulder • Y. Lin • E. Arslan • E. Ayranci • X. Liu Broadcom Netherlands BV, Bunnik, The Netherlands e-mail: [email protected] C.-H. Lin Broadcom Corporation, Irvine, CA, USA

M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 6, © IEEE. Reprinted, with permission, from K. Bult, C.-H. Lin, F. van der Goes, J. Westra, J. Mulder, Y. Lin, E. Arslan, E. Ayranci, and X. Liu, A 12b 2.9 GS/s DAC with IM3 < 60 dBc Beyond 1 GHz in 65 nm CMOS, Journal of Solid State Circuit, December 2009

119

120

K. Bult et al.

IM3 [dB]

–70

–60

–50

Fsig [MHz] 100

200

400

800

1600

Fig. 6.1 IM3 requirements for this design

performance and use techniques like calibration [1] or Dynamic Element Matching [10]. However, these techniques do not help the performance at higher frequencies since at those frequencies matching problems are not the limiting factor anymore. The goal of this paper is to design a DAC which achieves a minimum of 70 dB IM3 at low frequencies, maintains that performance up to a minimum of 400 MHz and achieves at least 60 dB IM3 at 800 MHz, as is shown in Fig. 6.1. Currently, no published designs meet these requirements. Section 2 introduces the chosen architecture. Sections 3 and 4 deal with various distortion mechanisms that are dominant at low- and high-frequencies respectively. In Sect. 5 the DAC-Cell for the thermometer-coded Coarse-DAC is shown and the implemented techniques are discussed in detail. The test-chip and set-up are discussed in Sect. 6 and the results of measurements along with a comparisons with the theory (as discussed in Sect. 5) as well as with previously published data are shown in Sect. 7. Finally, Sect. 8 summarizes the conclusions.

2 Architecture For fast-sampling applications the Current-Steering architecture is the architecture of choice [1–9]. Because of the obvious advantages of thermometer-coded DACs with respect to DNL, glitch-energy, monotonicity and linearity on the one hand and on the other hand the advantages of binary-coded DACs with respect to compactness and simplicity, a segmented design was chosen (Fig. 6.2). Using the approach outlined in [2] the optimal segmentation ratio was determined and a 6b thermometercoded DAC was combined with a 6b binary-coded DAC. A transformer connects the DAC to the load. For proper termination reasons, on each side of the transformer the impedance equals 100 and the total effective impedance to the DAC equals 50 .

6 A 12b 2.9 GS/s DAC with IM3 < 60 dBc Beyond 1 GHz in 65 nm CMOS

100 Ω

121

100 Ω

cable-imp. Therm. DAC

2.5 V

Binary DAC

6 MSBs

6 LSBs

row decoding + latches

latches

Fig. 6.2 Top-level architecture

row

decode

column

column decoding + latches

Fig. 6.3 Thermometer-coded Coarse DAC

The Thermometer-coded MSB section (Coarse-DAC) determines the overall performance with respect to linearity and most of the design effort was focused on that block. The architecture is shown in Fig. 6.3. Although the final layout was not done in a matrix-style layout but rather in a single-line layout, the cells are driven by row and column decoders as this turns out to be area efficient. Latches are placed at the input of the DAC, after the column and row-decoding and inside the cells.

3 Low Frequency Performance This section deals with error-mechanisms that are frequency-independent and as such will dominate the accuracy at low frequencies.

122

K. Bult et al.

Vss Isupply IR

Fig. 6.4 IR-drop in the supply-line causes DNL

3.1 Matching Transistor-mismatch in the current source of the DAC-cell is a source of nonlinearity, especially in the coarse-DAC. Care has to be taken to obtain good matching by using sufficient gate-area [11], short distances between the transistors and equal environments through the use of dummy-transistors. Many papers have been written about various techniques to improve matching between current sources, like calibration techniques [1] or Dynamic Element Matching (DEM) [10]. Although there are ample examples of papers that show that for low-frequencies this does indeed improve the linearity of the DAC, for high-frequencies these techniques in general do not help and often even degrade the performance. This is mostly because both techniques complicate the design significantly, including the layout, whereas high-frequency performance mostly benefits from simplicity and low parasitic capacitances. As a result no matching improvement techniques were used other than proper layout and proper transistor-sizing.

3.2 IR-Drops on the Supply Routing of the supply-lines to the DAC current sources is a critical step. As the supply-lines carry current, any resistance will cause voltage drops (“IR”, see Fig. 6.4) on the supply line, which in turn can cause the individual gate-source voltages to be unequal [12]. As a result the value of the currents will become smaller (or bigger) along the supply-line (Fig. 6.4), causing not only a DNL problem, but, without any randomization, could also cause significant INL problems. Solutions can be found in using wide power supply lines and binary trees for the supply-grid.

6 A 12b 2.9 GS/s DAC with IM3 < 60 dBc Beyond 1 GHz in 65 nm CMOS

RL

123

RL Vout

2Zo (N+n)

(N+n) Io 2

(N–n) Io 2

2Zo (N–n)

Fig. 6.5 The effect of finite current source output impedance on DAC linearity

3.3 Finite Output Impedance Finite output impedance of DAC current sources leads to distortion, as has been pointed out by several authors [4, 8, 9]. Figure 6.5 shows a thermometer-coded DAC with ideal current sources, but with finite current source output impedance Ro . In this graph, N equals the total number of current sources, Io is the value of one current source and n equals the digital input code (N n N). Depending on the digital input code, more or less current sources are connected to the left and the right output nodes. Along with that also more or less output impedances are connected in parallel with the load impedance RL . This means that the total effective load impedance becomes signal-dependent. Since the output voltage is a simple multiplication of the current and the output impedance, a signal-dependent load impedance leads to distortion. A straightforward calculation of the transfer-function from the digital input word (n) to the output voltage Vout in Fig. 6.5 leads to the following expression: h i Vout D RL NIo .n=N/ C .n=N/3 ŒRL N=2Ro 2

(6.1)

For full swing conditions (n D N) the expression for third order distortion becomes: HD3 D ŒRL N=4Ro 2

(6.2)

As Eq. 6.2 shows, low third-order distortion (due to this mechanism only) requires a high output impedance from each current source: Ro NRL . This can be fairly easily achieved at low frequencies by cascoding and if necessary double or active cascoding. For high frequencies this is not a sufficient solution as will be shown in Sect. 4.

124

K. Bult et al.

4 High Frequency Performance High frequency performance is quite a different problem from its low frequency counterpart and special measures have to be taken.

4.1 Switch Gate Driving It is well-known that it is important to keep the voltage across the current sources as stable as possible during the output current transition from one side to the other [2, 8]. Special gate-driving circuits have been developed for that purpose [8]. However, it is also important that the transition itself is as short as possible. The main reasons for that are to reduce the effects of switch driver mismatch, clock jitter, decoding feed through and, to some extent, device noise. CML-drivers are often used to obtain maximum possible clock speed (because of the reduced swing), but the steepest transitions are obtained using regular CMOS logic. In this design the latch of Fig. 6.6 is used for two reasons, it creates the steepest transition [2] and has the shortest clock-to-Q delay known to the authors [8]. Inherently, this circuit has a low crossing of the Q and QB signals. If followed by inverters for Q and QB, the signals become high-crossing and can be used to drive the switch gates directly. It was found that the rapid transitions exhibited by this circuit have more benefits than any reduced swing circuit known to the authors.

4.2 Decoding Feedthrough Since decoding signals contain information about the input signal, any feedthrough from the decoding to the DAC output can lead to distortion. In [8] Data-Dependent Clock Loading is discussed and a solution is presented. Another error mechanism is non-symmetrical decoding. In the design presented here, decoding is done as if the current source cells were distributed in a matrix-style layout, using column and

clk D QB

Q

Fig. 6.6 Fast latch

DB

6 A 12b 2.9 GS/s DAC with IM3 < 60 dBc Beyond 1 GHz in 65 nm CMOS col

125

row clk QB

col+1

row Q

col

col+1

Fig. 6.7 Latch with symmetrical decoding

row signals [2]. A cell can get activated either because the next column signal is active or the current column and the current row are active. The circuit in Fig. 6.7 has been used to make sure that in all cases the cell gets activated with the same signal strength [3]. At high speed the transition of digital signals can still have a finite settling effect at the end of the clock-period and a difference can emerge between the values of those signals that had to transition and those that already transitioned in a previous clock-period. Adding an extra latch reduces this effect considerably.

4.3 Switch Driver Mismatch Mismatch in switch driver transistors can lead to small differences in the speed of the transition and therefore also the exact switching moment. Deviations in switching moments are more important at higher signal speeds and therefore a degradation proportional to frequency is to be expected. For good high-frequency performance, care should be taken to obtain good matching of the driver transistors. However, since any spread in switching moment is proportional to the switching time, the absolute transition of the driving signal should be as fast as possible. Both good matching as well as a fast transition requires the driver transistors to be fairly large. The latch or logic preceding the driver is less sensitive and can be scaled down in size, leading to a tapered design in logic, latches and driver. Also the supply-line needs to be low-Ohmic and routed in such a way that the actual driving strength for all drivers is matching well.

126

K. Bult et al.

4.4 Output Current Tree Equal delays are not only relevant in the clock-tree but just as much in the signal path. For that reason it is crucial to design the paths from the cells to the outputnodes as well-matched as possible. A binary output-current tree is used in an effort to keep all possible delays in the signal path the same.

4.5 Output Impedance at Higher Frequencies As discussed in section III.C, finite current source output impedance causes distortion in current-steering DACs. Equation 6.2 shows the expression for third harmonic distortion (HD3) at low frequencies. In the analysis it was assumed that the output impedance was purely real and equal to Ro . At higher frequencies this assumption is no longer valid. A better assumption would be that the output impedance can be modeled by the parallel connection of a resistor and a capacitor: Zo D Ro == .1=j!Co / :

(6.3)

An expression for HD3 valid for all frequencies is found by substituting Ro in (6.2) by jZo j: HD3 D ŒRL N=4 jZo j2

(6.4)

As the impedance Zo shows a first-order roll-off with frequency, it can be expected that for higher frequencies (! > 1/Ro Co ) HD3 degrades quadratically with frequency and this will become the dominant distortion mechanism. The effective output capacitance Co being switched back and forth between the positive and negative output, will be referred to as the “switching capacitor” in the remainder of this paper and should be kept small. A numerical example may make the above more clear. A 12b DAC is built from a 6b Fine-DAC and a 6b Coarse-DAC and the Coarse-DAC is thermometer coded. The number of current sources in the Coarse-DAC is 64 (DN). Using Eq. 6.4 with a load resistance of 50 (DRL ) and HD3 67.5 dB (D IM3 70 dB) yields jZo j 39k, which seems reasonable and could be achieved through cascoding. However, if this level of HD3 also is required at an output frequency of 600 MHz, the maximum capacitance in this impedance can be no more than 6.8fF. This is very difficult to achieve indeed.

5 The DAC-Cell After discussing various error mechanisms in Sects. 3 and 4, this section deals with the solution proposed here. The DAC-Cell designed to combat these issues is shown in Fig. 6.8.

6 A 12b 2.9 GS/s DAC with IM3 < 60 dBc Beyond 1 GHz in 65 nm CMOS

2.50 V

127

2.5 V supply

DAC-CELL M5

1.75V

M6

3 Decoding + Latches

4

M4

M3

M13 1.0 V supply

1.0 V supply

BIASING

2 M12

M9

M8

M2

M11

0.75 V

M7

M1

M10

0.41 V

Fig. 6.8 DAC-Cell

5.1 Low Frequency Issues To obtain good matching between the Coarse-DAC currents, it is of key importance that transistor M1 is well matched to its counterparts in the other DAC-cells. For this reason M1 is sized large enough to support the required matching [11] and is placed in a large array of transistors with ample dummies placed at each end of the array. Triple cascoding (M2, M3/M4 and M5/M6) is used to prevent finite outputimpedance of the current sources dominating the low frequency distortion.

5.2 High-Frequency Output Impedance The goal of this paper is to obtain good harmonic performance even at high frequencies. As pointed out in Sect. 4.5, it is crucial to keep the effective “switching” capacitance Co as small as possible. Therefore M2 is sized as small as possible to reduce the parasitic capacitance at the sources of switches M3 and M4, which themselves are also sized minimally. Since the switches operate with a large Vgs ,

128

K. Bult et al.

their sizes are close to the minimum allowed in the technology. The device sizes in the DAC-cell are therefore large at the bottom (for good accuracy), getting smaller in the middle (just large enough to support the current) and close to the minimum at the switches. As discussed in Sect. 5.1, M5 and M6 are added in each cell to reduce the effect of finite output-impedance at low frequencies. Unfortunately, at higher frequencies this techniques does not work that well. M5 and M6 do reduce the effect of parasitic capacitances from M1 to M4, but also add their own parasitic capacitances, primarily their own Cgs ’s [9]. When M3 and M4 are switching, so are the Cgs ’s of M5 and M6 and they become the dominant limitation of the switching output impedance of the current source.

5.3 The “Switching Impedance” It is important at this point, to realize that the part of the output impedance that is hurting the distortion performance is only that part which is actually switching. Any impedances (like parasitic capacitances) which are not switching, but are rather fixed at each side of the output, are not contributing to distortion. This observation leads to one of the key contributions of this paper. In the remainder of this paper the term “switching impedance” is used for that part of the output impedance that is actually switching, equivalent to Zo in Eqs. 6.3 and 6.4. At higher frequencies Zo is dominated by the “switching capacitance” Co .

5.4 The Proposed Solution Adding small current sources (M7–M9 and M10–M12) to the sources of cascodes M5–M6 prevents the cascodes from being fully switched off. That means that even if the current from the current source M1/M2 is not routed through a particular cascode, that cascode remains active. In turn this means that the parasitic capacitances associated with nodes 3 and 4 can still be observed from the DACcell output, irrespective of the status of the switches M3–M4. Therefore these capacitors will not contribute to distortion degradation. Now, the first switching capacitors that can be observed are the Cgs ’s of M3 and M4, but their effect on finite output impedance is reduced by the intrinsic gain (gm .rout ) of the cascode transistors M5–M6. The use of the cascode transistors M5–M6 together with the small current sources M7–M9 and M10–M12 achieves a reduction of the switching output-impedance, and therefore also the distortion, by an order of magnitude. To make the solution proposed here more clear, the effect discussed above is illustrated in Fig. 6.9, where both the circuits of the “on”-impedance Zo (on) as well as the “off”-impedance Zo (off) are shown. The “switching-impedance” Zo can be obtained mathematically through:

6 A 12b 2.9 GS/s DAC with IM3 < 60 dBc Beyond 1 GHz in 65 nm CMOS

Zo(off)

Zo(on)

M5

129

Zo

M6

M3

M5

M3

1 / Zo = 1 / Zo(on) – 1 / Zo(off)

Fig. 6.9 Subtracting the “off ”-impedance Zo (off ) from the “on”-impedance Zo (on). The resulting “switching-impedance” Zo is shown on the right

1=Zo D 1=Zo .on/ 1=Zo .off/ ;

(6.5)

as is shown in Fig. 6.9. By “subtracting” the two circuits, as a result of which all the capacitances associated with M5/M6 are cancelled, the first capacitor seen from the output becomes the Cgs of M3/M4. However, this capacitance is reduced by the intrinsic gain of both M3/M4 as well as M5/M6, improving the performance by a factor (gm .rout 2 ). In order to keep the power dissipation as low as possible, the additional current sources should be kept as small as possible. Their purpose is to keep M5–M6 “on” in order to keep the parasitic capacitances associated with node 3 and 4 observable at all times. However, if the additional current source values are too small, the switching of the DAC-cell will vary the Cgs ’s of M5–M6 too much and a finite effect on distortion will be the result. Simulations have shown that a relative small value of 1–2% of the main current source is sufficient to keep the Cgs ’s of M5–M6 fairly constant.

6 Test-Chip The 12b DAC was built from a 6b Thermometer-coded Coarse-DAC and a 6b Binary-coded Fine-DAC (as shown in Fig. 6.2). The Coarse-DAC uses 63 DACCells as depicted in Fig. 6.8. As mentioned before, although the layout of the coarse-DAC was implemented with all the current sources in a straight line (as opposed to a matrix style layout), the decoding was implemented using column and row decoding using the circuit of Fig. 6.7, as this results in a very effective decoding structure.

130

K. Bult et al.

A transformer was used (as depicted in Figs. 6.2 and 6.8) to connect to the loadimpedance. As the total effective load-impedance is 50 and a 2.5 Vppd swing is required by the application, the total available current for driving the load is 50 mA. As a result of that large signal-swing the center-tap of the transformer is biased at 2.5 V, which necessitates the use of thick-oxide devices for M5 and M6. Two Direct Digital Frequency Synthesizers (DDFS) were integrated along with the DAC to enable two-tone testing. This avoids the problem of having to bring highfrequency digital signals onto the chip for test purposes. In the real application that is of course also not necessary because the digital signals to drive the DAC come from a dedicated Digital Signal Processor (DSP).

7 Measurements The design was implemented in 65 nm CMOS technology and measures 0.31 mm2 . The layout is shown in Fig. 6.10. The power-dissipation was 188 mW, combined from a 1.0 and 2.5 V supply. The measured INL and DNL were 0.5lsb and 0.3lsb respectively on a 12b level. All dynamic measurements were performed using the two on-board DDFS’s. Although the circuit was designed for a 1.6 GS/s application, it ran with good performance up to 2.9 GS/s. Many of the measurements were performed at 2.9 GS/s, others at the speed of the application, 1.6 GS/s.

7.1 IM3 Measurements Figure 6.11 shows a spectrum of the output signal of a two-tone test centered at 1 GHz and sampling at 2.9 GS/s. As can been seen the IM3 is better than 62 dB.

Biasing

Tail Current Sources Switches + Cascodes Decoding + Registers

Fig. 6.10 DAC layout

6 A 12b 2.9 GS/s DAC with IM3 < 60 dBc Beyond 1 GHz in 65 nm CMOS

131

Fig. 6.11 Two-Tone measurement centered around 1 GHz while sampling at 2.9 GS/s 100

[6] @ 0.4 GS / s

90

IM3 [dB]

Theory 80 This work @ 2.9 GS / s

70 60 [4] @ 1.0 GS / s 50

[5] @ 1.4 GS / s 400

800 Frequency [MHz]

1200

Fig. 6.12 Measured IM3 versus signal-frequency. Note the differences in output swing and sampling-frequency

Many of those measurements were taken and Fig. 6.12 shows the measured IM3 results versus signal frequency, sampling at 2.9 GS/s. The figure shows a 70 dB IM3 bandwidth of 550 MHz and a 60 dB IM3 bandwidth of more than 1 GHz.

132

K. Bult et al. 100 M Zo(off)

Zo [Ohm]

1M

Zo

Zo(on)

10 K

100

100 K

1M

10 M 100 M Frequency [MHz]

1G

10 G

Fig. 6.13 Simulated output impedances Zo (ON) and Zo (OFF) versus frequency. The switching impedance Zo is calculated using (6.5)

7.2 Comparison with Theory In order to show the validity of the simple high-frequency distortion model discussed in Sect. 4.5, the output-impedance of the DAC-cell, as shown in Fig. 6.8, was simulated, both for the output that is ON as well as for the output that is OFF. The results are shown in Fig. 6.13 as Zo (ON) and Zo (OFF). An effective switchingimpedance Zo was extracted using Eq. 6.5. By taking the imaginary part of Zo we can make an estimate of the “switching-capacitance”. Both the imaginary part of Zo as well as the switching-capacitance is shown in Fig. 6.14. For frequencies up to about 200 MHz the estimated switching capacitance is about 5fF, after which the capacitance increases and peaks around 1.5 GHz at 9.5fF. At a frequency of 600 MHz the value is about 6.5fF, very close to what was needed for our goal of 70 dB IM3 at that frequency. Using the data from Fig. 6.14 in Eq. 6.5, with RL D 50 and N D 64, an estimate was made of the IM3 for higher frequencies. This is shown in Fig. 6.12 as the curve labeled “Theory”. As can be seen, for higher frequencies (>400 MHz) a very close match is obtained with the measured data, showing the validity of the theory. The IM3 at lower frequencies is clearly dominated by other effects, most likely a combination of matching errors, IR-drops and transformer-effects. The peaking observed around 400 MHz can be explained by a cancellation effect. The lowfrequency distortion mechanism (mismatch, IR-drops, transformer) cancels with the high-frequency mechanism (finite output impedance).

6 A 12b 2.9 GS/s DAC with IM3 < 60 dBc Beyond 1 GHz in 65 nm CMOS 10G

10

Co

1 / jωCo

8

1M

6

10K

4

100K

1M

10M 100M

1G

10G

Co [fF]

Zo [Ohm]

100M

100

133

2

Frequency [MHz]

Fig. 6.14 Simulated imaginary part of the switching output impedance and the associated output capacitance Co versus frequency

7.3 SFDR Measurements SFDR was measured at 1.6 GS/s and 2.9 GS/s. A spectrum of the output signal measured at 1.6 GS/s and producing an output tone of 125 MHz is shown in Fig. 6.15. Many of those measurements were performed and Fig. 6.16 shows the results. It shows a 70 dB SFDR bandwidth of 225 MHz and a 60 dB SFDR bandwidth of 550 MHz. The SFDR-results clearly show a lower bandwidth than the IM3 results. This is to be expected and can be explained as follows. In the first place, IM3 measurements only look at close-in third-order components, while SFDR measurements look at all the tones. Secondly, and more importantly, all measurements have been performed through a transformer (see Figs. 6.2 and 6.8), which has a bandwidth of about 300 MHz. This means that if the signal frequency is higher it gets attenuated. However, if spurious tones are generated and folded back to lower frequencies (below the transformer bandwidth), they do not get attenuated. This mechanism degrades the measured SFDR performance results significantly. The same mechanism has no effect on IM3 measurements, since the signal tones and the close-in tones are approximately at the same frequency and get the same attenuation.

7.4 Comparison with Literature Table 6.1 shows a comparison with published data [4–7]. Although all designs are driving a 50 load, the available current varies significantly, from 15 [7] to 50 mA (this work). This results in equally significantly different signal swings.

134

K. Bult et al.

Fig. 6.15 SFDR measurement producing a tone at 125-MHz while sampling at 1.6 G-S/s

90 [6] @ 0.4 GS / s

80 SFDR [dB]

[5] @ 1.4 GS / s

70

[4] @ 1.0 GS / s

This work @1.6 GS / s

60 [7] @0.5 GS / s

50

This work @ 2.9 GS / s

200

400 Frequency [MHz]

600

800

Fig. 6.16 Measured SFDR versus signal-frequency. Note the differences in output swing and sampling-frequency

6 A 12b 2.9 GS/s DAC with IM3 < 60 dBc Beyond 1 GHz in 65 nm CMOS

135

Table 6.1 Comparison with published data Reference

Tech [nm]

Fclk [GHz]

Iload [mA]

Swing [Vppd ]

Power [mW]

NPE [%]

This work [4] [5] [6] [7]

65 350 180 250 180

2.9 1.0 1.4 0.4 0.5

50 16 30 20 15

2.5 0.8 1.5 1.0 0.75

188 110 200 400 216

66 12 23 5 5

When comparingpower dissipation, the available power for the load, P(Rload ) should be considered as well. A comparison of absolute power dissipation would make no sense. Here we use a Normalized Power Efficiency (NPE) defined as: NPE D Ppeak .Rload / =0:25Psupply

(6.6)

The factor 0.25 is used to allow the theoretical maximum of the Normalized Power Efficiency to be 100%. Note that the NPE is varying significantly from design to design (5–66%) and this work achieves the highest NPE. The vast differences in maximum signal swing also have a strong influence on distortion performance. As is generally known, third-order distortion is proportional to the square of the signal amplitude. Figure 6.12 also shows the results presented by [4–6]. Although the results presented here outperform the previously published results for frequencies above 300 MHz, the real difference is, as a result of the vast differences in signal amplitude, much more pronounced. For IM3, the best results to compare to are presented by [6] which are produced at an output swing of 1.5 Vppd , compared to 2.5 Vppd presented here. This results in an additional (2.5/1.5)2 D 9 dB difference in IM3 compared to [6], on top of the (approximately) 9 dB shown in the graph. Note that the results presented here are produced at a sampling speed of 2.9 GS/s, more than twice the sampling speed of the second best in Table 6.1. Figure 6.16 shows the results of SFDR measurements of [4–7] and the results of this work. In this case the best results to compare to are presented by [4], but at an amplitude of 0.8 Vppd , compared to 2.5 Vppd presented here. The difference in amplitude is equivalent to an additional (2.5/0.8)2 D 20 dB in SFDR. Table 6.2 gives an overview of the measured performance of this work.

8 Conclusions A 12b 2.9 GS/s Current-Steering CMOS DAC was presented with a 70 dB IM3 bandwidth of 550 MHz and a 60 dB IM3 bandwidth of 1.0 GHz. These results were obtained while driving a 50 load with 2.5 Vppd swing. The DAC presented combines the highest clock frequency (2.9 GS/s) with the highest output swing

136 Table 6.2 Performance overview

K. Bult et al.

CMOS NOB Fclk Iload Swing Power NPE INL DNL 70 dB IM3 Bandwidth 60 dB IM3 Bandwidth 70 dB SFDR Bandwidth 60 dB SFDR Bandwidth

65 nm 12 2.9 GS/s 50 mA 2.5 Vppd 188 mW 66% 0.5 lsb 0.3 lsb 550 MHz 1,020 MHz 225 MHz 550 MHz

(2.5 Vppd ) at the best power efficiency (66%), while simultaneously achieving the highest 60 dB and 70 dB IM3 bandwidth. The increase in performance at highfrequencies compared to previously published results is mainly obtained by adding “always-ON” cascodes on top of the current-switches.

References 1. H.J. Schouwenaars et al., An oversampled multibit CMOS D/A converter for digital audio with 115-dB dynamic range. IEEE J. Solid-State Circuits 26, 1775–1780 (1991) 2. C.-H. Lin, K. Bult, A 10-b 500-MSample/s CMOS DAC in 0.6-mm2 . IEEE J. Solid-State Circuits 33, 1948–1958 (1998) 3. K. Bult, C.-H. Lin, U.S. Patent 6,191,719, Digital to analog converter with reduced ringing, Feb 2001 4. A. Van den Bosch et al., A 10-bit 1-GSample/s Nyquist current-steering CMOS D/A converter. IEEE J. Solid-State Circuits 36(3), 315–324 (2001) 5. B. Schafferer, R.Adams, A 3V CMOS 400mW 14b 1.4GS/s DAC for Multi-Carrier Applications, in ISSCC Digest Technical Papers 2004, Feb 2004, pp. 360–361 6. W. Schofield et al., A 16b 400MS/s DAC with <80dBc IMD to 300MHz and <160dBm/Hz noise power spectral density, in ISSCC Digest Technical Papers, 2003, San Francisco, Feb 2003, pp. 126–127 7. K. Doris et al., A 12b 500MS/s DAC with >70dB SFDR up to 120MHz in 0.18um CMOS, in ISSCC Digest Technical Papers 2005, San Francisco, Feb 2005, pp. 116–117 8. D.A. Mercer, Low-power approaches to high-speed current-steering digital-to-analog converters in 0.18-m CMOS. IEEE J. Solid-State Circuits 42, 1688–1698 (2007) 9. P. Palmers, M. Steyeart, A 11mW 68dB SFDR 100 MHz bandwidth SD-DAC based on a 5-bit 1GS/s core in 130nm, in ESSCIRC Digest Technical Papers 2008, Sept 2008, pp. 214–217 10. K.L. Chan,J. Zhu, I. Galton, Dynamic element matching to prevent nonlinear distortion from pulse-shape mismatches in high-resolution DACs. IEEE J. Solid-State Circuits 43, 2067–2078 (2008) 11. M. Pelgrom, A. Duinmaijer, A. Welbers, Matching properties of MOS transistors. IEEE J. Solid-State Circuits SC-24, 1433–1439 (1989) 12. T. Miki et al., An 80-MHz 8-bit CMOS D/A converter. IEEE J. Solid-State Circuits SC-21, 983–988 (1986)

Part II

Short-Range Wireless Front-Ends

This second part of the book is on ‘short-range wireless front-ends’, which, as the name suggests, refers to wireless communication of data over short, or relatively (with respect to the application) short, distances. Six chapters will focus on this, from different angles of view. The first papers start from the application view: three of them discuss various forms of sensor networks, and one focuses on RFID. In most of the applications, low power, or ultra low power, plays a dominant role, whereas the data rates are relatively low. In case power dissipation can be kept low enough, energy scavenging comes into play, making the use of batteries even obsolete. The first paper, of William Scanlon, addresses sensor networks from a high level, with system and network implications, focusing on the free-space channel, and on the consequences, in the sense of requirements, for the physical layer. The second paper, of Guido Dolmans, addresses the field of ultra-low-power wireless body-area networks for medical applications, like EEG measurements. The ultra-low-power requirement requires cross-layer optimization, including MAC, PHY, protocols, standardization, architecture, and final circuit design. The design of a specific ultra-low-power transmitter and a super-regenerative receiver is shown. Tim Piessens then addresses low-power RF energy harvesting in the context of RFID tags, and Frank Henkel finally addresses sensor network frontends for military, agriculture, and industrial applications. The last two papers approach the field from a different angle of view: they focus on very-high-frequency technology, for future short-range communication. Reza Mahmoudi discusses microelectronic frontend design for 60 GHz and higher, for very-high data-rate applications. Here, microwave design, Maxwell equations, 2.5 and 3D simulation, parasitics, and high-frequency measurement techniques play a dominant role. Lorenzo Tripodi, finally, approaches the terahertz field: he discusses a transmitter and a receiver operating at very wide bandwidths, of about 200 GHz, for spectroscopy applications. This work is showing that the so-called ‘terahertz gap’ between electronics and photonics is entered now, which promises a successful future for a complete new field of terahertz electronics and a lot of new applications. Arthur van Roermund

Chapter 7

Short Range Radio Communication – Novel Applications and Their Physical Layer Requirements William G. Scanlon

Abstract Technologies and applications based on short range radio communication links have pervaded almost all aspects of modern life. From wireless doorbells through Bluetooth headsets and WiFi browsing to streaming high definition multimedia, we have never been so dependent on short range links. This paper outlines some of the physical layer requirements for short range radio systems and gives an overview of some of the novel applications and future trends for this area of technology.

1 Introduction Since its discovery, wireless communication has always captivated mankind with its air of ‘magic’ and obvious application. However, it is not just the ability to travel over long distances that made wireless communication so attractive, particularly in terms of personal communication it was the ability to bring freedom from the restrictions and limitations of cabling that made wire-less so popular. And in the latter age of electronics, such tether-less operation brought about ‘freedom’ through unobtrusiveness and mobility. Short range radio can be defined as those radio systems bringing wireless communication with links covering modest distances or over distances that are short relative to the scope of the application. Common and familiar examples include wireless LAN systems, children’s toys, home automation (garage door opener, heating thermostat), DECT home telephones or any Bluetooth application. However, as suggested above, many of the challenges facing short-distance links

W.G. Scanlon () Wireless Communications Research Group, Queen’s University, Belfast, UK Telecommunication Engineering, University of Twente, Enschede, The Netherlands e-mail: [email protected] M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 7, © Springer ScienceCBusiness Media B.V. 2012

139

140

W.G. Scanlon

are also applicable to longer range links which are relatively short in the context of their application. For example, km-scale communication between an array of micro satellites forming a low-frequency radio telescope could be considered a short-range radio application. This paper discusses some of the challenges and requirements of short range radio systems in Sect. 2. An example novel emerging application is described in Sect. 3.

2 Short Range Radio Physical Layer Requirements Regardless of the application, it is the nature of the radio channel itself that determines every other aspect of short range radio communications. For example, communication through anything more than a few 100 m of sea water is almost impossible, except at very low frequencies due to the dielectric losses of the medium. In many cases it is the application itself that determines the defining features of the channel – for example, in body area networks (BANs), the movements of the human body perturb the radio channel in a way that is quasi-repetitive for regular movements yet is uncontrolled in that the user can move around as much as they like. Interestingly, the radio channel characteristics might themselves lead to novel applications. A good example of this is the regular changes in impedance that a wearable antenna might see as the user respires with many research groups currently trying to build respiration based physiological monitors around that particular feature. It is the channel then that sets some of the key parameters for the short range radio system including expected received power level, dynamic range and levels of inter-symbol interference or, in narrowband systems, multipath fading. However, the peculiar nature of the channel in some applications can also have unexpected effects in terms of other radio system parameters such as co-channel and adjacent channel interference. This is discussed further in Sect. 3. One of the driving forces behind the popularity of short range radio systems, particularly at UHF frequencies, is the unlicensed use of the spectrum with most high volume short range radio applications operating in the Industrial Scientific and Medical (ISM) bands. Unlicensed use requires careful consideration of effective radiated power (ERP) and spectral characteristics to reduce interference, increase re-use and promote efficient use of the radio spectrum. However, while many low data rate applications operate quite well under existing regulation in the popular ISM bands such as 868 MHz and 2.45 GHz, there is a desire for higher bandwidth services. This has led to renewed interest in short range links operating at 60 GHz and higher millimetre wave frequencies. In all but the most ‘open’ of line of sight (LOS) applications, the poor channel characteristics at these frequencies at first glance appear to be a major concern. However, sometimes the ease of blocking at these frequencies can lead to useful advantages such as frequency re-use and security as demonstrated in the soldier to soldier system outlined in [1].

7 Short Range Radio Communication – Novel Applications. . .

141

Average current (mA)

0.25

0.20

0.15

0.10

0.05

0.00 0

30

60

90

120

150

180

210

240

active interval (s) Fig. 7.1 Example low duty cycle RFID average current versus active interval (transceiver active 10 ms in every interval, 20 mA; idle current 0.02 mA)

Energy consumption, while only indirectly related to the propagation channel, is another extremely important aspect of short range radio systems. As portable or wearable devices get smaller due to chip integration developments, the relative size and weight of any battery becomes more significant. Recent advances in low power circuit design have helped to improve the situation but reducing energy consumption at all stages, from RF circuitry upwards is a major requirement. Even the choice of power supply circuit is important, particularly in low-duty cycle designs. For example, consider an active RFID transceiver based on the Texas Instruments CC1110 system on chip (low power 868 MHz transceiver with integrated 8051 microcontroller) with an active time of 10 ms every 2 s (0.5% duty cycle). Assuming an idle current of 0.02 mA including regulator losses, the average current for this duty cycle is 0.12 mA, given an active current of 20 mA. If the duty cycle is reduced further (Fig. 7.1), for this configuration the idle current becomes dominant after an active interval time of around 60 s. For this reason, many low cost wireless sensor node systems, notable for their long life requirements and extremely low duty cycle, operate without any voltage regulation. Often short range radio devices need to wake regularly to listen for commands. In ultra low power systems one way to avoid this cost is to utilise “wake on radio” techniques. For example, the Zarlink medical implant communications ZL70102 chip has an integrated 2.45 GHz wake on radio circuit. Other important requirements for short range radio applications include the need for active antenna tuning in ultra low power medical implants and wearable body sensor nodes. These devices will often use electrically small antennas which because of their high Q factor (stronger near field) are subject to detuning effects caused by the proximity of human body tissue.

142

W.G. Scanlon

3 BAN to BAN Application Example As the demand for BANs increases in medical, military, sport and entertainment application, the issue of inter-body interference will become an important concern for protocol design and quality of service for the end user. In a BAN communication is intended to be over the surface of the user’s body, for example from distributed biomedical sensors to a “controller” node for further processing or storage. However, it is widely recognised that since wave propagation over the human body is subject to increased losses compared to free-space propagation, that there is a potential major problem with multiple co-located BANs. Since wave propagation “off” the human body suffers from much less path loss, then a wearable receiver node will experience high levels of either co-channel or adjacent channel interference depending on the channel allocations in a particular situation. This issue was investigated through a series of “live” channel measurements using off the shelf 2.45 GHz wireless transceivers. A total of 12 nodes were used, 6 on each body, and the path loss between each node in the mesh was determined 17 times a second. Figure 7.2 shows the two BANs. Details for the experimental setup can be found in [2]. Wearing the jackets and nodes shown in Fig. 7.2, the test users initially sat in a common room environment and then moved away from each other heading in different directions along a corridor until they were both at opposite ends. Here the users waited for approximately 45 s, before returning back down the corridor to the common room area to finish the experiment by sitting once again. The entire test lasted about 215 s. To get a full understanding of the channel, received signal strength (RSSI) time-series for the full mesh needs to be studied to reveal how body movements affect this balance between on-body and off-body channels. Due to space limitations, only a subset of results including both LOS and non-LOS (NLOS) cases are shown.

b

a B

B

G

A

H

G

H

A

E

K

F I C

D

L

J

Fig. 7.2 Location of wearable nodes for BSN to BAN experiments: (a) BAN1(nodes A–F), (b) BAN2 (nodes G–L)

7 Short Range Radio Communication – Novel Applications. . .

143

–30 S3

S2

H

S1

S4

I

RSSI (dBm)

–40 –50 –60 –70 –80

BAN leaving Environment

–90 110

120

BAN returning to Environment

BAN stationary and away from Environment 130

140

150

160 170 Time (s)

180

190

BAN back in Environment 200

210

Fig. 7.3 RSSI time-series for LOS on-body BAN link (H–I)

–30 S2

–40

BAN leaving Environment

BAN returning to Environment

BAN stationary and away from Environment

–50 RSSI (dBm)

S4

S3

S1

B I

BAN back in Environment

–60 –70 –80 –90 110

120

130

140

150

160

170

180

190

200

210

Time (s)

Fig. 7.4 RSSI time-series for example off-body (interfering) link (B–I)

The RSSI time-series results in Fig. 7.3 show that for the LOS on-body link (H–I), channel fading is greatly increased when the user is mobile (see dynamic range, Table 7.1). Figure 7.4, however, shows a combination of fading and distance related path loss as BAN2 moved away to the far end of the corridor from BAN1. The key result from these RSSI time series and Table 7.1 is that while the potential interference from nearby BANs is unpredictable and extremely variable, that the average levels (Table 7.1) are significantly high. The relative user movements only serve to complicate the situation. In practical deployments, BAN systems will have to employ interference mitigation techniques such as dynamic channel assignment.

144 Table 7.1 Summary link statistics for victim receiver I (BAN2)

W.G. Scanlon

Links

Mean RSSI (dB)

Dynamic range (dB)

A–I B–I C–I D–I E–I F–I H–I (LOS) L–I (NLOS)

58.3 56.9 46.8 41.2 43.6 50.9 53.5 54.0

42.5 41.5 57.5 55.0 61.5 48.5 37.0 34.5

Nonetheless, given that these are low cost, low power devices their RF front ends need to be carefully designed to cope with the expected large dynamic range and high levels of co- and adjacent channel interference.

4 Conclusions A short overview of short range radio requirements and challenges has been presented. While the applications for short range radio systems continues to grow there will be increased need for innovation in a number of areas, particularly at the physical layer.

References 1. S.L. Cotton, W.G. Scanlon, B.K. Madahar, Millimeter-wave soldier-to-soldier communications for covert battlefield operations. IEEE Commun. Mag. 47(10), 72–81 (2009) 2. S.F. Heaney, E. Garcia-Palacios, W.G. Scanlon, Context-aware body area networks (CABAN) for interactive smart environments: interference characterization, in 5th International Conference Body Area Networks (Bodynets), Corfu, Greece, Sept 2010

Chapter 8

Ultra Low-Power Wireless Body-Area Sensor Networks G. Dolmans, F. Bouwens, A. Breeschoten, B. Busze, P. Harpe, L. Huang, X. Huang, M. Konijnenburg, V. Pop, M. Vidojkovic, Y. Zhang, C. Zhou, and H. de Groot

Abstract In wireless body area network (WBAN) applications, wireless sensors are used to collect, monitor and transmit vital signs and other medical information. In such scenarios, it is critical to maximize the autonomy, while satisfying application performance. A unique platform for introduction of such ultra-low power technology components is an electrocardiography (ECG) patch for BAN applications, and is taken in this work as an example to illustrate the development of an ultra low-power transceiver.

1 Introduction The rapid growth in physiological sensors, low power integrated circuits and wireless communication has enabled a new generation of wireless sensor networks. Ultra-low power design and architectural level thinking is driving game-changing circuit and system innovation in wireless sensor networks. A number of intelligent comfortable physiological sensors can be integrated into wearable wireless body area networks, which can be used for personalized, predictive, preventive and participatory health care. The sensor information will be transmitted wirelessly to an external processing unit. The unit instantly transmits all information to the backbone in real time to the doctors throughout the world. If an emergency is detected, the physicians can inform the patient by sending appropriate messages or alarms. Currently the level of information provided and energy resources capable of powering the sensors are limiting. The technology is still in its primitive stage

G. Dolmans () • F. Bouwens • A. Breeschoten • B. Busze • P. Harpe • L. Huang • X. Huang • M. Konijnenburg • V. Pop • M. Vidojkovic • Y. Zhang • C. Zhou • H. de Groot Holst Centre/Imec, HTC31, P.O. Box 8550, 5605 KN, Eindhoven, The Netherlands e-mail: [email protected] M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 8, © Springer ScienceCBusiness Media B.V. 2012

145

146

G. Dolmans et al.

Fig. 8.1 Short-range WPAN standardization from 1998 to 2011

and it is being widely researched. Once adopted, it is expected to be a breakthrough invention in healthcare for ubiquitous monitoring, real-time diagnostics, and patientcentric therapy.

2 WBAN 2.1 What Is a WBAN? A Wireless Body Area Network (WBAN) enables wireless communication between several miniaturized body sensor nodes and a single central unit. The development of WBAN technology can be seen as an extension of wireless personal area network (WPAN) technologies for communications on, near and around the human body. A WBAN system can use WPAN wireless technologies as gateways to reach longer ranges. WPAN systems have been standardized in IEEE 802.15.1 (Bluetooth) and IEEE 802.15.4 (Zigbee). The evolution of these 802.15 standards is shown in Fig. 8.1. The draft standard IEEE 802.15.6 [1] is a WBAN standard for short range, wireless communication in the vicinity of, or inside, a human body (but not limited to humans). It uses existing ISM bands as well as frequency bands approved

8 Ultra Low-Power Wireless Body-Area Sensor Networks

147

Fig. 8.2 WBAN communication paths: on-body to central unit, on-body to on-body, on-body to implant, implant to implant

by national medical and/or regulatory authorities. Support for quality of service, extremely low power, and data rates up to 10 Mbps is required. The study group Medical Body Area Network (MBAN) has recently being transformed in task group 802.15.4j. The purpose is to define a physical layer for IEEE 802.15.4 in the 2,360–2,400 MHz band that complies with Federal Communications Commission (FCC) MBAN rules. This 4j taskgroup may also define modifications to the MAC needed to support the new physical layer. The purpose of an international standard for WBAN and MBAN is to facilitate low-power and highly reliable wireless communication for use in close proximity to, or inside, a human body. Body area wireless communications scenarios are shown in Fig. 8.2. Standard radio solutions will co-exist in the market with proprietary solutions optimized for required performance in healthcare networks at the lowest possible power. A proprietary ultra low-power radio solution is the focus of this paper. Key elements of the upcoming standards such as anticipated use in medical licensed bands close to ISM bands are taken into account in the radio design.

2.2 The Need for a New PHY and MAC The state-of-the-art Bluetooth (LE) and Zigbee standardized radio’s for WPAN consume up to 100 times more than what is allowed for advanced health care applications. Their power consumption is at the level of 50–100 nJ/bit. The transceiver design in this paper has a target of 1 nJ/bit efficiency. Furthermore, the standardized radios do not meet the medical (proximity to human tissue) regulation. Neither do they support the combination of reliability, Quality of Service (QoS), low power, data rate and noninterference, which is required to address the broad class of body area network applications. There is a need for a PHY optimized for ultra-low power devices to operate on, in or around the human body for a variety of applications including medical and personal entertainment. A WBAN has a typical operation distance of 3 m within the boy area, and the power budget is stricter compared to WPAN (Fig. 8.3). The PHY should be of low complexity, in support of ISM bands as well of 2.36– 2.4 GHz MBAN for FCC, and the 2.485–2.5 GHz band for ETSI. To support the broad class of applications, a variable data rate is preferred.

148

G. Dolmans et al.

Fig. 8.3 Operational power of WBAN, WPAN, and WLAN systems Beacon CFP1

CAP

TSRP

AC1AC2

CFP2

Inactive TSRB

Superframe

Fig. 8.4 Proposed Frame format: control channels ac1 (medical) and ac2 (non-medical) are based on slotted Aloha. The data channels are based on TMDA on demand, with a periodical data channel TSRP and a non-periodical bursty data channel TSRB

There is also a need for a MAC optimized for WBAN operation. To satisfy both medical and consumer electronics (CE) applications with a uniform medium access control (MAC) protocol becomes a challenge. We propose a priority-guaranteed (PG) MAC protocol [2, 3]. With this protocol, data channels are separated from control channels to support collision-free high data rate communication for CE applications. Priority-specific control channels are adopted to provide priority guarantee to life-critical medical applications. Traffic-specific data channels are deployed to improve resource efficiency and latency performance. The frame format is shown in Fig. 8.4. As compared with the IEEE 802.15.4 MAC and its improved versions, the PG-MAC demonstrates significant improvement on throughput, energy efficiency, scalability, reliability, and co-existence with other networks, with a tolerable penalty on latency performance of bursty traffic in CE applications. The energy consumption per bit of data transmission is saved up to 60% as compared to IEEE 802.15.4. Most important of all, it can minimize the access latency of medical applications of higher priority, which is crucial for life-critical applications. Another proposal is to use dynamic channel selection by spectrum sensing and adaptive channel search. This would reduce the impact of interference in a crowded ISM band.

8 Ultra Low-Power Wireless Body-Area Sensor Networks

149

Fig. 8.5 A schematic sketch of a wireless body area network node

2.3 Why Needs a WBAN a Low Power Radio? A WBAN node typically consists of sensor & read-out, actuators, analog-digital interface, microcontroller, power management unit, and a wireless transceiver. The complete system is powered by a small sized energy source. The sensor and read-out module is used to sense and convert the signals from the body, such as skin temperature, electroencephalography (EEG), electrocardiography (ECG), and electromyography (EMG) to the continuous analog signals. These analog signals are then sampled by the analog-to-digital converter (ADC) module. The microcontroller module carries out low level processing of the sampled signals, and manages the wireless transceiver module to transmit and receive the signals in accordance with the communication protocols being used. The power management module is used to regulate and distribute the power for all the consisting modules. The micropower module is the power source that supplies the sensor node. A sketch of a WBAN node is illustrated in Fig. 8.5. Low-power optimized designs at the block level can be indicated, but the success of these is largely dependent on their power optimization ability at the architecturallevel. This strengthens the role of architectural-level thinking and of in-depth interactions during the electronics design to avoid wasting time on sub-optimal system solutions. The ECG BAN application is taken further as an example to illustrate a power optimized architecture. A streaming ECG with commercial stateof-the-art components would consume 1.3 mW, which results in a 2-week lifetime. The target is 60 days lifetime, and therefore a 300 W power consumption of the total system for streaming ECG systems. With a duty cycling less than 30%, the transceiver in this paper is being designed with a continuous power consumption below 1 mW. For non-streaming ECG application, such as beat-detection only, stateof-the-art low power solutions exist. However, there are two trends in vital body signals monitoring that justify the development of new low-power modules. One is the tendency to ask for more bandwidth by measuring multiple complex signals at once. Another is the shrinking size of the nodes, such that only small thin-film batteries can be implemented. The storage capacity of these batteries is small, and can only be connected to body area nodes with ultra low-power consumption.

150

G. Dolmans et al.

2.4 WBAN Radio Market The Body Area Network field is an interdisciplinary area which could allow inexpensive and continuous health monitoring with real-time updates of medical records via Internet. Examples of applications are: EEG, ECG, EMG, vital signals monitoring (temperature (wearable thermometer), respiratory, wearable heart rate monitor, wearable pulse oximeter, wearable blood pressure monitor, oxygen, pH value, wearable glucose sensor, implanted glucose sensor, cardiac arrhythmia), wireless capsule endoscope (gastrointestinal), wireless capsule for drug delivery, deep brain stimulator, cortical stimulator (visual neuro-stimulator, audio neuro stimulator, Parkinson’s disease, etc. : : : ), remote control of medical devices such as pacemaker, actuators, insulin pump, hearing aid (wearable and implanted), retina implants, disability assistance, such as muscle tension sensing and stimulation, wearable weighing scale, fall detection, aiding sport training. These health applicaions are based on body-centric solutions for future wearable intelligent computer nodes. The same technology can provide effective solutions for personal entertainment as well, such as video streaming, audio streaming for headsets, surround sound streaming, data file transfer, image file transfer, small data transfer for remote controls, body motion captures, PC control signals, smart keys, identification, gaming. The existence of a body area network standard will provide opportunities to expand these product features, better healthcare and wellbeing for the users. It will therefore result in economic opportunity for technology component suppliers and equipment manufacturers.

3 Design Considerations of WBAN Radios 3.1 Requirements As indicated earlier, the use of a commercial low power radio would limit the lifetime of an ECG WBAN sensor to 14 days. An ultra low-power radio is needed with continuous power consumption below 1 mW. The use of the world-wide available 2.4–2.485 GHz ISM band is proposed, with an extension to the 2.36– 2.4 GHz FCC medical band and the 2.485–2.5 GHz ETSI medical band. The data rate should be programmable, to serve the diverse application space of a WBAN. A scalable data rate of 64 kbps to 1 Mbps has been chosen.

3.2 Duty Cycling and Data Rates One of the widely known solutions to minimize power consumption is duty cycling, which allows switching on the transceiver only for the instants where signals must

8 Ultra Low-Power Wireless Body-Area Sensor Networks

151

be transmitted or received. Thus, the average power consumption is significantly lower than the continuous peak power. The duty cycling can be made more aggressive when the data rate is much higher than the sensor information rate. Duty cycle indicates how long the portion of time that the device is in the active mode. It is desirable that the node is awake mainly for active communication by keeping the overhead of control signaling small. Beacon listening takes a significant part of power consumption, especially for nodes with sporadic data communication. Regular beacon listening is used for synchronization and to get information about the control channel and the resource allocation. In a proper designed system, beacon listening per superframe should be avoided.

3.3 Tx Architectures Compared to transmitters in WPAN standards, the WBAN transmitter output level is lower. As the output power scales down, it becomes more difficult to maintain good overall power efficiency. A Tx frontend can be separated into a power amplifier (PA) stage and pre-PA stages. The pre-PA power consumption has to scale down proportionally with the output power to maintain a constant overall efficiency.

3.4 Rx Architectures Superheterodyne, low-IF, and direct conversion architectures are not the best candidates, because of complexity and power consumption. A subsampling receiver is promising, but is very challenging for narrowband systems due to noise folding and sampling jitter. A wideband FSK system is also promising, but high data rates are difficult to achieve. The proposed Rx architecture is based on a super-regenerative receiver and an envelope detector. This results in a design with a low complexity and low power consumption. The core block is an RF oscillator, which is periodically started and stopped by a quench oscillator.

3.5 Signal Processing A frame consists of a synchronization preamble (SHR and SFD), a physical layer header (PHR), and a data field. A preamble based timing acquisition algorithm has to be used, to compensate the unknown timing information between transmitter and receiver. The timing information is used to determine the start frame delimiter (SFD). A five-step

152

G. Dolmans et al.

acquisition algorithm is proposed using noise power estimation, signal detection, confirmation, two timing parameters estimation, and SFD detection. A scalable data rate is used by direct-sequence spread spectrum (DSSS) and/or duty cycling.

4 Analog BAN Radio IC Design 4.1 Receiver Frontend Known superregenerative receivers support only low date rate that can only accomodate a very limited number of WBAN applications. To overcome this limitation, we designed a high date rate frontend (up to 5 MHz) that enables much wider WBAN applications [4]. A basic sketch of a superregenerative receiver is shown in Fig. 8.6. The receiver frontend consists of the following parts: • Low-Noise Amplifier (LNA) &Voltage-Controlled Oscillator (VCO), providing selectivity and super-regenerative amplification • Differential Single-ended Converter (DSC) suppresses the VCO common-mode variations • Envelope detector (ED) detects baseband bit-stream • Quench wave generator (QWG) generates Iquench for the VCO The receiver front-end is shown in Fig. 8.7. A low power consumption is achieved by a high Q external inductor, and the use of a complementary cross-coupled VCO with bond-wire inductance. There are two operational modes: 1. for Iquench < Icritical selectivity mode 2. for Iquench > Icritical super-regeneration (amplification) mode The critical current floor is programmed by a DAC and the saw-tooth waveform is made by an analog block (Fig. 8.8).

RF Oscillator LNA

Low-pass Filter Envelope Detector

Quench Oscillator

Fig. 8.6 Architecture of a superregenerative receiver

VGA

ADC

Data Out

8 Ultra Low-Power Wireless Body-Area Sensor Networks

153

Fig. 8.7 Schematic of ULP wireless BAN FE

Fig. 8.8 Schematic of analog/digital quench wave generator

4.2 Receiver Analog Baseband The RX analog baseband (BB) is composed of three blocks: a variable gain amplifier (VGA), an ADC and a bias current generator (BCG). The VGA [5] includes an analog integration function. It can be used either as an amplifier or as an integrator. The BCG generates the various bias currents required for the baseband of the receiver to reduce the number of external bias currents. The VGA has an open-loop first stage to save power. The second stage has a programmable gain and bandwidth (Fig. 8.9). The ADC is designed for a 5 MHz analog bandwidth (10 Msps Nyquist). An 8 bit resolution has been chosen. The ADC architecture is based on asynchronous successive approximation, where the power consumption is proportional to the sample rate [6, 7]. There are three control components in the system: main control, comparator control and DAC control. The choice has been made to use start/ready flags (instead of delay lines) and to use custom logic (instead of standard CMOS cells). Timing and schematic are shown in Fig. 8.10.

154

G. Dolmans et al.

Cfb Rfb INP

+ –

– +

OUTP

+ –

OUTN

Rin INN

– + Rin

Rfb Cfb

Fig. 8.9 Schematic of analog baseband VGA

Fig. 8.10 Schematic and timing of SAR ADC

4.3 Transmitter Choice The transmitter (TX) consists of a VCO, a power amplifier (PA), buffers as well as biasing circuitry [8]. It supports amplitude modulation in the form of amplitudeshift keying (ASK) and on-off keying (OOK) with pulse-shaping, and its 10 Mbps data rate makes it capable of handling data intensive applications such as different WBAN waveform transmission or high quality personal audio/video streaming (Fig. 8.11).

8 Ultra Low-Power Wireless Body-Area Sensor Networks

155

Fig. 8.11 Transmitter architecture

Fig. 8.12 Circuit details of the WBAN transmitter

The 2.4 GHz carrier is generated by a VCO. The amplitude modulation (on/off) is directly applied at the PA, which results in faster start-up and no spectral artifacts. A swing detector adjusts driving level for optimal power efficiency. A digital pulseshaping technique is being used to improve spectral efficiency. More circuit details are shown in Fig. 8.12. The VCO is a complementary cross-coupled oscillator, with on-chip integrated inductor, a switched varactor bank and varactor analog tuning. The PA is made of 15 pseudo-differential parallel NMOS pairs in class-AB operation.

4.4 SoC The system-on-chip is based on analog Tx and Rx, a phase-locked-loop system, and digital baseband [9]. The digital baseband is designed to support a maximum date rate of 1 Mbps with an oversampling factor of 3. The phase-locked loop is shown in Fig. 8.13. The digital baseband part includes the blocks of pulse shaping for data transmission, data spreading and despreading to achieve scalable data rates, reliable timing synchronization and data detection algorithms, and CRC-16-CCITT encoding and decoding for packet validation. The digital baseband generates accurate RF Tx

156

G. Dolmans et al.

Fig. 8.13 Phase-locked-loop to tune the radio channel in Tx and Rx mode

Fig. 8.14 Clock generation part implemented in the digital baseband section

and Rx timing (by delay lines), autonomous DC offset correction, automatic gain control, and receive timing tracing. The clock generation part is shown in Fig. 8.14. It is based on two delay lines with a 750 ps phase and duty cycle resolution. The SoC diagram is shown in Fig. 8.15. The transceiver chip photograph is shown in Fig. 8.16. This chip is implemented on an application-specific integrated circuit (ASIC) using a standard 90 nm complementary metal–oxide–semiconductor (CMOS) technology.

4.5 Measurements The power consumption of the WBAN radio is more than one order of magnitude lower than commercially available transceivers. The measured power consumption at 1.2 V when transmitting an OOK packet at 0 dBm is 2.5 mW. For a 10 dBm

8 Ultra Low-Power Wireless Body-Area Sensor Networks

Fig. 8.15 Schematic of WBAN transceiver

Fig. 8.16 Chip photograph of WBAN transceiver

157

158

G. Dolmans et al.

Fig. 8.17 Power breakdown of the WBAN transceiver

Fig. 8.18 WBAN Transceiver comparison with state-of-the-art

output level, the transmitter power consumption is 900 W. For Rx, the PLL, RF frontend, analog and digital baseband blocks consume 1,100,468,48 and 199 W, see Fig.8.17. The measured performance of the transceiver is compared to state-of-the art in Fig. 8.18. In a wireless transmit-receive link we verified that there were no received packet errors (i.e. below 0.002) within 14 m of transmission distance at a data rate of 256 kbps. At a 64 kbps mode, the maximum distance is up to 30 m. It is sufficient to satisfy the required communication range of WBAN, which is typically within a few meters.

8 Ultra Low-Power Wireless Body-Area Sensor Networks

159

5 ECG Example Cardiac monitoring is one of the earliest adopters of wearable healthcare technology and the impact of technology on the efficiency of care and reduction of hospitalization has been shown in various studies. Our first system is based on an ECG necklace node, [10, 11, 12]. The node transmits/receives to a basestation or to a smart phone (Fig. 8.19). The ECG node can be used for 24/7 connectivity to the public network. The second system will be based on a thin, comfortable patch ECG node (Fig. 8.20). The starting point was a necklace node with commercial state-of-the art components. The power consumption breakdown for such a sensor node is shown in Fig. 8.21. The current necklace system uses the WBAN transceiver and it consumes 362 W. The power breakdown of the ECG necklace based on Imec WBAN transceiver is shown in Fig. 8.22. The RF transceiver is not the dominant part anymore. In an ECG application, the signal could be processed locally to detect the R-peak. RR interval information is sent wirelessly at each detected beat. Power optimization at architectural-level has been carried-out during integration without affecting system functionality. The estimated power consumption results are illustrated in Fig. 8.23. It follows that the total power consumption for the ECG application equals 96 W

Fig. 8.19 ECG necklace node and smart phone basestation

160

Fig. 8.20 ECG wireless patch

Fig. 8.21 State-of-art necklace. The total power consumption equals 1,299 W

G. Dolmans et al.

8 Ultra Low-Power Wireless Body-Area Sensor Networks

161

Fig. 8.22 Imec WBAN necklace. The total power consumption equals 362 W

Fig. 8.23 Relative power consumption estimated for the ECG beat-detection application; 37% of the power is attributed to the DSP, radio (RF) and analog-to-digital converter (ADC) blocks; the total power consumption equals 96 W

162

G. Dolmans et al.

6 Conclusions An ECG patch worn on the body will have difficulty to be on the air frequently with the power consumption of state-of-the-art transceivers. An ultra low-power WBAN transceiver is designed with optimized sensitivity for high data rates. The 1 nJ/bit solution is a single chip OOK transceiver fully optimized for on- and off- body communication operating in the 2.36–2.5 GHz medical BAN and ISM band. The transceiver is integrated in ECG necklace and patch monitoring systems. The transceiver is implemented in 90 nm CMOS and occupies 2.4 1.85 mm2 .

References 1. IEEE P802.15.6/D02, Draft trial-use standard; Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for Wireless Personal Area Networks (WPANs) used in or around a body, Dec 2010, http://grouper.ieee.org/groups/802/15/pub/LB66/LB66.html 2. Y. Zhang, G. Dolmans, A new priority-guaranteed MAC protocol for emerging body area networks, in Fifth International Conference on Wireless and Mobile Communications, ICWMC 2009, Cannes/La Bocca, France, 23–29 Aug 2009, pp 140–145 3. Y. Zhang, G. Dolmans, Priority-guaranteed MAC protocol for emerging body area networks. Ann. Telecommun., special issue on Body Area Networks applications and technologies (2010). doi:10.1007/s12243-010-0232-9 4. M. Vidojkovic et al., A 500 W 5 Mbps super-regenerative RF front-end, in European SolidState Circuits Conference (ESSCIRC), Seville, Sept 2010 5. C. Zhou et al., A 56uW VGA with 5MHz bandwidth and 47dB gain-range in 90nm CMOS, in VLSI-DAT, Hsinchu, 26–29 Apr 2010 6. P. Harpe et al., A 12fJ/conversion-step 8bit 10Ms/s asynchronous SAR ADC for low energy radios, in European Solid-State Circuits Conference (ESSCIRC), Seville, Sept 2010 7. P. Harpe et al., A 30fJ/conversion-step 8b 0-to-10MS/s asynchronous SAR ADC in 90nm CMOS, in ISSCC 2010, San Francisco, 6–11 Feb 2010 8. X. Huang, P. Harpe, X. Wang, G. Dolmans, H. de Groot, A 0dBm 10Mbps 2.4GHz ultra-low power ASK/OOK transmitter with digital pulse-shaping, in RFIC 2010, Anaheim, 23–25 May 2010 9. M. Vidojkovic et al., A 2.4GHz ULP OOK single-chip transceiver for healthcare applications, in ISSCC 2011, San Francisco, 20–24 Feb 2011 10. V. Pop et al., Improving power diagnosis by architectural modeling in wireless autonomous transducer solutions, in Globecom 2010, Miami, Dec 2010 11. L. Huang et al., Ultra low power wireless and energy harvesting technologies – an ideal combination, in Proceedings IEEE International Conference on Communication Systems (ICCS), Singapore, Nov 2010 12. L. Huang et al., Performance evaluation of an ultra-low power receiver for Body Area Networks (BAN), in Personal, Indoor and Mobile Radio Conference (PIMRC), Istanbul, Sept 2010

Chapter 9

Low Power RF Power Harvesting Enabling More Active Tag Functionality Tim Piessens, Yves Geerts, Wim Vanacken, Eldert Geukens, Bram De Muer, Tim Butler, and Bob Hamlin

Abstract This paper presents the analog part of a production integrated circuit (IC) for EPC Gen2 UHF RFID applications in the 900 MHz band. The tag is unique for its on chip 32 kB non-volatile memory (NVM) I2C functionality and its large reading and writing distance. To achieve these goal a power oriented architectural and block level design approach has been followed. The main considerations concerning energy harvesting and RFID communication are presented in this paper and some specific building blocks are more elaborated like a 2.5% accurate clock reference consuming only 0.3 A and a 6.25 s TARI ASK demodulator. The chip is currently in production and is going to be used in aviation for airplane parts logging.

1 Introduction 1.1 RFID Tag Classification Although Radio-Frequency IDentification (RFID) technology is available since the second world-war [1], it took until recent developments in semiconductor and telecommunication technologies for RFID to become ubiquitous. Currently the term RFID is a flag covering a broad range of applications from cheap anti-theft systems in retail to expensive and complicated long-life logbook type applications. A first classification can be made between active and passive devices based on the presence of a battery on the tag or not. Table 9.1 shows a comparison between active

T. Piessens () • Y. Geerts • W. Vanacken • E. Geukens • B. De Muer ICsense, Gaston Geenslaan 9, 3001 Heverlee, Belgium e-mail: [email protected] T. Butler • B. Hamlin TEGOinc, 375 Totten Pond Road, Waltham, MA, 02451 USA M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 9, © Springer ScienceCBusiness Media B.V. 2012

163

164

T. Piessens et al.

Table 9.1 Passive versus active RFID tags

Power Frequency

Passive None 135 kHz or 13.56 MHz

Read range Tag life Tag costs Readers

Few meters Up to 10 years $.05 Higher cost

Active Battery ISM Bands 433 MHz, 900 MHz, 2.45 GHz Tens of meters 3–8 years $15–50 Lower cost

EPCglobal Gen-2 Passive None 860 MHz–960 MHz >Few meters 10 years and longer $0.1–4 Higher cost

and passive tags [2]. Classic passive tags are using near-field coupling to transmit power and data. Due to the d13 attenuation in this frequency range, the operating distance and functionality of these type of tags is very limited. Communication is performed by load modulation of the incoming wave. They are mostly used in retail and inventory management and in animal husbandry. When higher security, environmental sensoring or read/write functionality is needed, the active tag comes into the picture. Since power is not a real issue for the tag, higher frequencies and thus far field communication can be used, reducing the attenuation by d1 and thus drastically increasing the reader distance.

1.2 EPCglobal Gen-2 Passive Tags This paper focusses on tags made for the EPC Gen2 standard [3] which was approved in December 2004. By its use of ultra-high frequencies (UHF), more energy can be harvested from the incoming radio wave. However to ensure better power efficiencies in practical implementations, the frequency needs to be limited. For instance power losses due to substrate capacitance are given by (9.1) [4] Ploss D

Rsub 1 2 1 v 2 v2 .!Csub /2 Rsub 2 Rsub 2 C .!Csub /2

(9.1)

However, with the EPC Gen2 tags, with a clever architectural and circuit implementation, sufficient energy can be harvested to provide active tag functionality in a passive device. Due to the absence of a battery, the passive tags provide much longer lifetimes without sacrifying the possibility to go beyond identification: the possibility to integrate sensors [5], higher memory ([6, 7] and This Work), more digital processing and I2C communication, including powering of external devices (This Work). All this extra functionality opens up plethora of new applications like decentralized logbook management in e.g., aviation [7], transportation, active monitoring in healthcare, better control of goods in logistics, anti-theft control, : : :

9 Low Power RF Power Harvesting Enabling More Active Tag Functionality Fig. 9.1 Basic ASK (a) and PSK (b) modulation implementation

a

165

b

This all leads to harsh specifications in the design of the tag. The front-end not only needs to be tailored for optimal energy harvesting but also for minimal power consumption at all levels, analog and digital. A thorough design methodology aimed for minimizing the power consumption is a must for successful design.

1.3 Backscattering Since the tag does not have a power source, its most efficient communication mechanism is to backscatter the incoming wave. By modulating the reflection coefficient of the antenna, the reflected wave is changed and detected by the reader. Since the reflection coefficient is modulated by modulating the impedance as illustrated in Fig. 9.1, two basic modulation types are possible: Amplitude Shift Keying (ASK) and Phase Shift Keying (PSK). In low frequent applications, with low bitrates, one state is much more active as the other. An optimal choice for communication is than to use ASK with the most active state perfectly matched and the other state maximally unmatched, with a short or open chain at the antenna. This gives optimal power transfer in the most frequent state and optimal communication. For Gen-2 tags, with high bitrates, this would lead to half of the power being wasted. A more optimal modulation is to use a symmetrical mismatch in both states, giving a modulation indec of ˙ m for ASK and ˙ jm in PSK [4]. Both modulation schemes will provide the same reflected power: Pbs D m2

Pavail Lant

(9.2)

with Lant the antenna loss factor and Pavail the available power at the antenna. For the PSK case, the remaining power can be harvested for the ASIC: (9.3) PRF;in D 1 m2 Pavail

166

T. Piessens et al.

However, to create the needed modulation index in the ASK case, a considerable amount of power will be dissipated in the resistor that modulates the index. This series resistor can be calculated as: Rseries D

4m Rantenna 1 m2

(9.4)

Resulting in the following available power: PRF;in D

1 m4 .m C 1/2

Pavail

(9.5)

For this work a PSK implementation has been chosen.

2 Chip Architecture Figure 9.2 shows the architecture of the proposed tag. Key feature of this ASIC the high amount of NVM memory present. While passive tags in general have a limited memory between 196 and 512 bits [8], this ASIC has a non-volatile memory of 262,144 bit or 32 kB. This high memory tag opens up a world of possibilities since the memory allows retention of e.g., • Total life history information and usage profile of a tagged item which is important in logistics and expedition. • Original creation record, maintenance & repair events of tagged parts for e.g., aviation parts. • Access all stored information without dependency of a network, which adds reliability to the log like e.g., in healthcare. • Storing of security certificates and encrypted data for e.g., ID cards and Medicare. The digital controller is doing the communication protocols according the EPCglobal Gen-2 standard [5] and has an I2C interface to connect the tag to other external components like sensors, memory, displays : : : and even provide sufficient power to feed these components. The analog part consists of the PSK modulator

PSK modulator

Data decoder

Power Management Command processor

ASK demodulator

Data encoder

Memory Manager

Regulators Rectifier

NVM Memory Charge pump

Fig. 9.2 Architecture of the proposed ASIC

I2C

9 Low Power RF Power Harvesting Enabling More Active Tag Functionality

167

and an ASK demodulator for communication between tag and reader and a power management block which generates several voltages from the incoming rectified voltage. To save power, the digital is running on the lowest possible voltage for the technology. Also a specific reading supply is generated to communicate with the NVM macro. A charge pump with integrated high frequency oscillator is included to create the higher supply voltage needed for a write operation. Other general analog building blocks on the tag are an accurate 3 MHz clock generator to drive the digital, a persistance block and a random number generator for unique addressing of the tag.

3 Analog Building Blocks 3.1 RF Rectifier To generate the internal DC supply voltage a Dickson’s charge pump, modified for UHF multi-stage rectification has been used as depicted in Fig. 9.3. The rectifier is constructed with MiM capacitors and RF Schottky diodes available in the technology. The available DC voltage at the output of the rectifier can be calculated [9] as follows: VDC D

p Pavail :RANT :8 GAIN RECT VDIODEDROP ZOUT :ILOAD

GAIN RECT D

(9.6)

N:˛ 1C

N:GSUB GANT

(9.7)

BVDD

RFIN

GND

Fig. 9.3 4-stage Dickson’s rectifier schematic

168

T. Piessens et al.

VDIODEDROP

ZOUT D

:ILOAD D nVT :N:ln IS

N 2 :2:RANT :˛ 2 : 1C

N:GSUB GANT

D

CC C CPAR CC

N .CC C CPARA/:f

(9.8)

(9.9)

2 (9.10)

1 the antenna resistance, N the number of diodes used in the With: RANT D GANT charge pump, GSUB the substrate conductance, ˛ the capacitive division between CC and the parasitic capacitance at the pumping node, the coupling capacitor impedance transformation and an averaging factor to compensate the true current profile. As calculated in (9.6), the incoming energy is converted into an output voltage. Several effects contribute to the loss in output voltage. For the rectifier the coupling capacitor and number of diodes are the most important parameters. The tag presented in this work has a low antenna resistance of 12˝, to be able to use Q-boosting of the coupling. The number of diodes has been determined to be able to deliver a minimum needed output power for the maximal writing distance, maintaining a high efficiency. For smaller reading distances the output voltage of the rectifier can become too high for the safe operating region of the process technology. Therefor a smart limiter has been designed as depicted in Fig. 9.4. The limiter needs to combine a fast turn on time, to be able to limit instantaneously a closeby incoming wave, with little variation on the limiting level, cause this would reduce the maximum writing distance. Typical solution use thresholds based on MOS VT ’s [10] and thus need to take sufficient safety margin with respect to process variations, leakage and temperature. To overcome this limitation, the architecture starts up with a fast and inaccurate reference, limiting the supply voltage low enough. When the system’s bandgap has started up, the limiter switches over references. Since the bandgap provides a much more stable reference the output voltage can be set to a higher value, increasing the amount of stored energy (CV2 ). In the same time the bandwidth of the limiter set pushed further, since due to a more stable bias current, loop stability can be guaranteed for higher bandwidths. By extending the limiters bandwidth, the voltage can be set even higher, since the limiter can react on the transients imposed by the ASK modulation.

3.2 Clock In a Gen-2 communication system, the clock generator is one of the most important building blocks due to the high accuracy needed to enable communication [5, 11].

9 Low Power RF Power Harvesting Enabling More Active Tag Functionality

169

Vdd

+ − VBG

Fig. 9.4 Schematic of the voltage limiter at the rectifiers output

On the other hand, since the clock is during communication always on, its power consumption is largely limiting the maximum distance on which the tag still can operate. Figure 9.5 shows the schematic of the current starved oscillator used to provide a stable clock frequency of 2.75 MHz. By synthesizing the clocks bias current for a specific temperature coefficient in the band gap, the frequency only has a 0. 5% variation over the full temperature range from 40 ı C to 125 ı C. Over temperature and supply variations, the clock frequency is kept within 2. 5%. The duty cycle varies from 37 to 60% over PVT. The total power consumption of the clock is 0.3 A. The clock does not to be trimmed for production to meet the EPC Gen2 specification.

3.3 ASK Demodulator Figure 9.6 shows the schematic of the integrated ASK demodulator. The ASK demodulation needs to respond as quickly as possible to an incoming wave without unwanted togglings due to the ramping up of the power and reference signals. This is solved by adding a fixed offset in the biasing chain of the comparator following the envelope demodulator.

170

T. Piessens et al. Vdd

Duty cycle restorating level shifter

Fig. 9.5 Schematic of the current starved oscillator Vdd

− +

Fig. 9.6 Schematic of the implemented ASK demodulator

The communication frequency goes up till 160 kHz. This corresponds with a Type A Reference Interval (TARI) of 6.25 s. The ASK demodulator complies with all specified TARI’s and ASK modulation types set in the EPC Gen2 specification. Its current consumption is only 200 nA.

9 Low Power RF Power Harvesting Enabling More Active Tag Functionality

171

3.4 Random Number Generator The Gen2 specification allows an infinite number of tags to be in the field of the reader. The protocol is designed to be able to handle this and several anti-collision algorythms have been developed for Gen2 UHF RFID tags. These algorithms require a unique identifier for each tag in the field. Since a hard coded unique identifier is logistically hard to achieve in a disposable like a RFID tag. For this reason an analog random number generator (RNG) is implemented in this work. During the tag start-up a random number is generated and used as a unique identifier for the tag. Due to its true randomness, the probability of having two identical identifiers within the range of one reader is very small. Since digital random number generators require quite some computational power, we have opted to implement an analog random number generator. The RNG consists of four uncoupled noisy clock generators. Their outputs are XOR’ed to form a random bit stream sampled by the digital. The RNG starts up as soon that there is sufficient analog power. Once the digital has started up, it samples the random stream and shuts down the RNG. Only during this limited time a 1.8 A current is needed to generate a true random number. Since the total start-up time of the digital is dependent on the incoming power, this gives another level of randomness next to the randomness due to the high clock jitter. The random number generator has been measured and compared with the Mersene Twister algorithm [12] and shows at least equal randomness.

4 Tag Asic and Prototype Figure 9.7 shows a chip photograph of the implemented RFID tag. The 32 kB of memory can be clearly seen in the middle of the ASIC. Left and right, two rectifier structures can be observed. In this way the tag can operate on two differently

Fig. 9.7 Chip photograph of the RFID tag

172

T. Piessens et al.

Fig. 9.8 Prototype (a) and product version (b) of the RFID tag

orientated antennas for increasing the energy harvesting or the rectifier can be connected as a Greinacher full wave rectifier. Figure 9.8 shows the measurement prototype of the presented tag and an application production example. The tag is fully compliant with the EPC Gen2 specification. The production samples are available in several forms with antenna on foil. The tag can perform read operation from a distance above 5 m and write operations from a distance higher than 1 m. Even with small antenna structures and attached to metal, reading distances of a few meters can be obtained. Currently tags are in full production in different series for a wide variety of antenna structures.

5 Conclusion In this paper a selection of analog building blocks from a production high memory Gen2 RFID tag have been presented. Starting form the design consideration on a high level based on the EPC Gen2 specification, the analog blocks have been designed aiming at a maximum reading and writing distance. Therefor not only very careful and power oriented state-of-the-art building block design was needed but a constant system and architectural consideration with feedback between RF, analog and digital design. The building blocks touched in this paper are the power harvesting stage, the high accuracy clock, the ASK demodulator and the random number generator. The RF rectifier has been designed for an optimal power efficiency. For high incoming powers a voltage limiter has been implemented to protect the circuitry. By switching the references of this block a high accuracy with a high speed could be combined with a low power consumption. The communication protocol requires a very stable clock oscillator. The presented clock achieves a 2.5% accuracy over temperature and supply variation by carefully designing the temperature coefficient of the bias current. The tag is currently in production and used in amongst others airplane part tracking applications.

9 Low Power RF Power Harvesting Enabling More Active Tag Functionality

173

Acknowledgements The authors wish to thank Tim Morlion and Ramses Valvekens of EASICS for their contribution to this chip development.

References 1. M. Roberti, “The history of RFID technology.” [Online]. Available: http://www.rfidjournal. com/article/view/1338 2. J.-P. Curty, M. Declerq, C. Dehollain, and N. Joehl, Design and Optimization of Passive UHF RFID Systems. Springer, 2007. 3. EPCglobal, “Epc radio-frequency identify protocols class-1 generation-2 UHF RFID protocol for communications at 860-MHz - 960 MHz version 1.2.0.” [Online]. Available: http://www. gs1.org/sites/default/files/docs/uhfc1g2/uhfc1g2 1 2 0-st%andard-20080511.pdf 4. U. Karthaus and M. Fischer, “Fully integrated passive UHF RFID transponder IC with 16.7-w minimum RF input power,” IEEE Journal of Solid-State Circuits, vol. 38, no. 10, pp. 1602– 1608, Oct. 2003. 5. J. Yin, J. Yi, M. K. Law, Y. Ling, M. C. Lee, K. P. Ng, B. Gao, H. C. Luong, A. Bermak, M. Chan, W.-H. Ki, C.-Y. Tsui, and M. Yuen, “A system-on-chip EPC gen-2 passive UHF RFID tag with embedded temperature sensor,” IEEE Journal of Solid-State Circuits, vol. 45, no. 11, pp. 2404–2420, Nov. 2010. 6. H. Nakamoto, D. Yamazaki, T. Yamamoto, H. Kurata, S. Yamada, K. Mukaida, T. Ninomiya, T. Ohkawa, S. Masui, and K. Gotoh, “A passive UHF RF identification CMOS tag IC using ferroelectric RAM in 0.35-m technology,” IEEE Journal of Solid-State Circuits, vol. 42, no. 1, Jan. 2007. 7. B. Hamlin, “Beyond identification - high memory RFID in aviation,” in SAE AeroTech Congress, Nov. 2009. 8. C. Swedberg, “Nxp boosts epc gen 2 tag memory, performance.” [Online]. Available: http:// www.rfidjournal.com/article/view/3637 9. R. E. Barnett, J. Liu, and S. Lazar, “A RF to DC voltage conversion model for multi-stage rectifiers in UHF RFID transponders,” IEEE Journal of Solid-State Circuits, vol. 44, no. 2, Feb. 2009. 10. N. D. C. D. M. J. Curty, J. P. Joehl, “Remotely powered addressable UHF RFID integrated system,” IEEE Journal of Solid-State Circuits, vol. 40, no. 11, Nov. 2005. 11. C. Dorio, “Watching the clock,” Jan. 2006. [Online]. Available: http://www.rfidjournal.com/ article/view/2040/1 12. Nishimura and Matsumoto, “The mersenne twister homepage.” [Online]. Available: http:// www.math.sci.hiroshima-u.ac.jp/‘m-mat/MT/emt.html

Chapter 10

Low Power RF Frontend for Wireless Sensor Networks Frank Henkel, Thomas Leineweber, Mohamed Gamal El-Din, and Ralf Wilke

Abstract An essential requirement for Wireless Sensor Networks (WSN) is the low power consumption, so that the maximum time of operation can be achieved with the available energy. Particular the RF Frontend has an important role, since most of the energy is consumed here. After a short introduction the different RF Frontend Architectures are discussed in Chap. 2. Chapters 3 and 4 give details about the frontend circuit implementations for the RX and the TX path with the focus on low power consumption, respectively.

1 Introduction to Wireless Sensor Networks Wireless sensor networks (WSNs) have become popular for monitoring functions used in military, agriculture/farming, or industrial applications. The network consists of several single nodes whose number can range from at least two to over a thousand. Each of these nodes is connected to at least one sensor which can in general measure dimensions of any kind; common examples include temperature, light, pressure, humidity, sound, speed etc. [1]. The values of the measured data are digitized and transmitted through the network to a dedicated base station. In a star network configuration the nodes communicate directly to the base station while in a more advanced application the nodes communicate with each other to finally pass the data to the base station. This multi-hop network allows extending the network range, but each node must be capable of also receiving data. For transmitting and receiving each node is equipped with a wireless transceiver radio and antenna. A microcontroller and energy source complete the WSN node. The energy source might be a simple battery but with the goal for maintenance free

F. Henkel () • T. Leineweber • M.G. El-Din • R. Wilke IMST GmbH, Carl-Friedrich-Gauss-Str. 2, 47475 Kamp-Lintfort, Germany e-mail: [email protected] M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 10, © Springer ScienceCBusiness Media B.V. 2012

175

176

F. Henkel et al.

sensors more advanced concepts such as energy harvesting need to be applied. In any case the available energy is limited which leads to the demand for low power devices. One popular standard used for WSNs is the IEEE 802.15.4 which operates in unlicensed ISM bands.

2 RF Frontend Architectures The RF frontend can be split in a receiver (RX) and a transmitter (TX) path. On the RX side the signal is amplified by a Low Noise Amplifier (LNA) followed by a mixer as downconverter. On the TX side there is the Power Amplifier (PA) driven by the upconverter mixer if e.g. not a direct PLL modulation approach is applied. The RF interface of the frontend may be either differential or single ended. A fully differential structure will benefit from higher immunity against common mode distortion but will require an external balun if the antenna is single-ended. It might be possible to share a common RF IO for LNA input and PA output because frontend does not receive and transmit at the same time but this will require integration of the antenna switch or power-down of the unused path. For a low-power application there are two main configurations: direct-conversion (also known as zero-IF) and low-IF architecture each with its own advantages and disadvantages. The direct-conversion architecture is known for its simplicity since the absence of the image frequency eases the base band signal filtering. However, its major drawbacks are DC offsets due to LO-self mixing, spurious near DC due to second order intermodulation and flicker noise which may severely degrade the bit error rate [2]. According to [2] high-pass filtering is possible to circumvent this if the flicker noise corner is low enough. To achieve this some approaches suggest passive mixers [3, 4] which indeed minimizes the flicker noise but also suffers from reduced RF gain (higher noise floor) and the need for a strong LO amplitude and hence higher power consumption. Furthermore the above mentioned DC offsets and intermodulation products in direct conversion receivers need high effort in calibration techniques [5] which may compensate the benefits of the simple signal path architecture. As summarized in [6] the zero-IF approach may indeed lead to higher chip area. The alternative to direct-conversion is the Low-IF approach [2, 7] which overcomes the problems of LO-self-mixing and flicker noise by converting the RF frequencies to a base band a few MHz apart from DC. This enables high-pass filtering without losing signal information. Since the TX PA will draw a significant amount of the total supply current budget the efficiency of the PA should be maximized for low power applications. It is well known that nonlinear PAs achieve better efficiency than linear versions but their use is only possible with constant envelope modulation.

10 Low Power RF Frontend for Wireless Sensor Networks

177

The classical transmitter requires a mixer as part of the TX frontend to convert the analog baseband frequencies to RF. However, some approaches suggest direct modulation of the VCO [6, 7] which is very effective because it eliminates the need for the analog TX base band path including the upconverter.

3 RX RF Frontend Circuits The main building blocks of the RX RF frontend circuits, LNA and downconverter mixer will be discussed in this section.

3.1 LNA The low noise amplifier is an essential building block in wireless sensor networks transceivers. Its function is to amplify the received signal with minimum added noise. In [8] the noise generation mechanisms in CMOS transistors were discussed and the optimum source impedance for minimum noise generation was driven. If we consider the transistor as a two port network the noise figure can be written in terms of equivalent resistances and conductance as F D Fmin C

2 2 i Rn h Gs C Gopt C Bs C Bopt Gs "s

Fmin D 1 C 2Rn

Gu C Gc2 C Gc Rn

(10.1)

# (10.2)

where Gopt and Bopt are the source optimum conductance and susceptance [8]. An essential problem in designing a low noise amplifier (LNA) is that the optimum impedance for minimum noise generation is not necessarily the optimum impedance needed for input matching. Using MOS devices to implement the LNA adds another problem which is that the input impedance is mostly capacitive. Several amplifier topologies were presented to address this problem and theoretically there are clear differences in their noise, gain and power consumption characteristics. Nevertheless, as these structures are investigated the differences in the realized noise figures become smaller and the main retention is the loss associated with the passive elements used to realize the input matching. In this section two well known LNA structures namely shunt-series feedback LNA and inductively degenerated common source LNA will be discussed and compared regarding their noise figure, gain and power consumption characteristics. The target specifications for the LNA input stage are summarized in Table 10.1, special emphasis was placed on lowering the power consumption making this LNA suitable for operation in wireless sensor nodes.

178

F. Henkel et al.

Table 10.1 LNA target specification

Noise figure Gain Power consumption Output referred IP3 Frequency Technology

<3 dB >10 dB 6 mW 10 dBm 2.4–2.5 G 150 nm CMOS

Fig. 10.1 Shunt-series feedback LNA

3.1.1 Shunt-Series Feedback LNA Using shunt-series feedback as shown in Fig. 10.1, an input impedance with a real part can be realized over a relatively wide bandwidth. Since the IEEE 802.15.4 frequency bands extends from below 1 GHz to above 2 GHz [9], this topology represents an attractive LNA candidate for multiband operation. An approximation of the input impedance, gain, and noise figure of this amplifier is given by Eqs. 10.3–10.5 [10]. Zin

RL C RF 1 C jAj

(10.3)

jAj gm .RL k RF / 2 1 NF 1 C 3 gm Rs

1 1 C 2 Rs RF Rs

C

f fT

(10.4) 2

2 Rs gm Rs C 3 RF

(10.5)

Based on the same concept the current reuse shunt-series feedback LNA Fig. 10.2 was introduced in [10]. Through stacking a P-MOS with an N-MOS the transconductance is increased for the same bias current, thus giving the designer a higher degree of freedom in choosing the feedback resistance Rf and load resistance RL . Another advantage is that the two transistors are kept in the saturation region during

10 Low Power RF Frontend for Wireless Sensor Networks

179

Fig. 10.2 Shunt-series feedback LNA with current reuse

Fig. 10.3 NF, Gain and input/output matching of the shunt-series feedback LNA

their operation. To compensate for the gate capacitance an additional inductor Lg is needed in front of the gate terminal. In this work this inductor is inserted in series with the feedback resistance; through this way the input as well as the output capacitance can be compensated. Using Eqs. 10.3–10.5 an LNA for operation in the 2.4–2.5 GHz band was designed using 150 nm CMOS technology with thick metal for inductors. Figure 10.3 shows the small signal performance of the shunt-series LNA, the input matching and output matching is better than 10 dB over the band from 1 to 3 GHz and the gain is higher than 9 dB. Most importantly the noise figure is below 3.1 dB in the same band. In the band from 2.4 to 2.5 GHz the noise figure is 2.8 dB. This high noise figure can be attributed to the thermal noise of the resistive feedback network. Another drawback of this topology is the high current consumption, in our case the amplifier draws 11.7 mA from the 1.8 V supply voltage. The current can be reduced through reducing the transistors gm , but as a result the gain is reduced and the noise figure increases. Figure 10.4 shows the input referred IP3 to be 8 dBm for two-tone signals of 2.399 and 2.401 GHz. Another problem with this LNA is that it has poor reverse isolation, due to the feedback path.

180

F. Henkel et al.

Fig. 10.4 Two tone linearity test of the shunt-series feedback LNA

Fig. 10.5 Inductively degenerated common source LNA

3.1.2 Inductively Degenerated Common Source LNA The inductively degenerated common source amplifier shown in Fig. 10.5 (which is a series-series feedback amplifier) is another way to match the input of the LNA to the 50 source impedance. In contrast to the shunt-series feedback LNA no resistance with its associated thermal noise is needed.

10 Low Power RF Frontend for Wireless Sensor Networks

181

Fig. 10.6 NF, Gain and input/output matching of the common source LNA

Using the small signal model of the transistor T1 and neglecting the gate-drain capacitance Cgd , the input impedance can be written as Zin D s.Lg C Ls / C

1 gm C Ls sCgs Cgs

(10.6)

The inductance Lg plays a helping role in compensating the gate capacitance since the main role of Ls is to tune the real part of the input impedance through the m term Cggs Ls . The cascoding transistor T2 is intended to isolate the input from the output and thus improving the reverse isolation, also it helps to reduce the effect of transistors T1 Cgd . The design process as shown in [8] starts by determining the optimum device width Wopt D 3!LC1ox Rs . Since the allowed drain current is known (from power requirements) the device bias can be determined. Knowing the source impedance (Rs ) and the device transconductance gm , the degeneration inductance Ls can be calculated using the Eq. 10.6. The inductance Lg can be calculated such that together with Ls they resonate the gate capacitance Cgs . The size of the cascading transistor is initially chosen to be equal to the size of the transistor T1, this size is latter optimized to achieve the linearity goals. The Inductance Ld is chosen to resonate the output capacitance of the cascading transistor T2 . These guidelines were used to design an LNA for operation in the 2.4–2.5 GHz band for wireless sensor applications using a digital 150 nm CMOS process. The small signal performance of the designed LNA can be seen in Fig. 10.6. In the band of interest (2.4–2.5 GHz) the amplifier has a gain of 12 dB and input and output return loss greater than 14 and 11 dB, respectively. The noise figure in the target band is smaller than 2.4 dB with current consumption of 2.8 mA, which is significantly smaller than that of the shunt-series feedback LNA. Although

182

F. Henkel et al.

Fig. 10.7 Two tone linearity test of the shunt-series feedback LNA

Fig. 10.8 NF variation with bias current using different technology options

the inductively degenerated common source LNA has a theoretical noise figure which is superior to the shunt–series feedback LNA, the difference in noise figure performance is not as expected (Fig. 10.6). In the commonsource LNA a large inductor is used to resonate the small gate source capacitance. This large inductance is difficult to realize with low loss (high quality factor) due to the loss in the conductive silicon substrate used in conventional digital CMOS processes. For realizing high quality passives some CMOS processes offer the possibility of back etching the substrate under the inductance and replacing it with a dielectric, this option was investigated together with the option of using external components for the gate inductance Lg . Figures 10.8 and 10.9 show the effect of the supply

10 Low Power RF Frontend for Wireless Sensor Networks

183

Fig. 10.9 S21 variation with bias current using different technology options Table 10.2 Comparison of LNA key parameters Ref [11] This work This work This work

Tech CMOS 150 nm CMOS 150 nm CMOS 150 nm CMOS 150 nm

Freq (GHz) 2.46

Pdc (mw) 4.65

S21 (dB) 14

NF (dB) 2.36

2.5

5.04

12.3

2.303

2.5

5.04

13.6

1.5

2.5

5.04

14.1

1.1

FOM 34.7 External matching 31.9 Integrated inductor 36.2 High Q integration 38.9 External matching

current (and eventually the transistor transconductance) on the noise figure and gain (S21 ) using different technology options for realizing Lg . It can be seen that the losses in the gate inductor plays a crucial role in determining the noise figure and the gain of the LNA and thus the NF of the whole receiver. To be able to compare this LNA to other results the figure of merit below is used [11] (Table 10.2) FOM D 10log 100

S21.lin/ f02 .F 1/ Pdc .mW /

OIP3.mW / Pdc .mW /

(10.7)

3.1.3 Passive Elements Options (Full Integration, High Quality Integrated Passives, External Components) Securing the necessary energy for powering the wireless nodes and its costs is an important aspect which should be regarded during the design of the whole network [12]. The use of external components for input matching offers the lowest noise figure and power consumption, nevertheless the economical side of this choice

184

F. Henkel et al.

Fig. 10.10 Operation cost vs time using different passive technology options

should also be considered. Although both options, using external components or reverting to the total integration solution is a question of costs, it is important to differentiate between two kinds of costs, fixed costs and variable costs. The fixed costs are the costs of the external components and PCB area needed, while the variable costs are the costs per unit time of the energy source used to power the wireless node. The operation cost of the node can be simply defined as OC D FC C VC t

(10.8)

where OC is the operation cost, FC is the fixed cost, VC is the variable cost and t is time. Figure 10.8 shows the noise figure with respect to supply current. In case of using external components for input matching, lower supply current and noisier active elements can be tolerated. In case of using the total integration solution, higher supply current is needed to minimize the noise generated by the active devices. Using external components means higher fixed costs but lower maintenance costs due to lower power consumption for the same noise figure, while using the total integration solution means lower fixed costs but higher maintenance costs. To make this point clear, Fig. 10.10 shows the operation costs of three possible solutions over time. The fixed costs for using external components was assumed to be 1$ and the other costs are 0.5$ and 0.25$, the same principle was applied to the variable costs. The time t is also a normalized variable it can be weeks, months or years. Two breakeven points can be identified B1 and B2. The breakeven point B1 where the operation costs of the high quality integrated passives solution is equal to the operation costs of the total integration solution. If the wireless node is intended to be used beyond the point B1 it is better to use the high quality integrated passives solution. If the wireless node lifetime is beyond B2 using external matching with its higher costs can be justified.

10 Low Power RF Frontend for Wireless Sensor Networks

185

Fig. 10.11 Simplified schematic of standard Gilbert-cell

3.2 Mixer Introduction In RF Transceivers the mixer is one of the key building block and often the limiting part, because mixer properties determine the system linearity. There are mainly two types of mixers, passive and active mixers. The linearity is described by the compression point P1dB and the third order input interception point IIP3. Generally, passive mixers have a high linearity and low power consumption, but show conversion loss and high noise figure [13]. Active mixers on the other hand, provide an acceptable gain, lower noise figure and high port-to-port isolation. However, they have higher power consumption and reduced linearity. In accordance with the requirements for low power, low noise, high gain and high linearity, the most commonly used topology is the Gilbert-cell. A double balanced Gilbert-cell mixer is shown in Fig. 10.11. Transistor T1 is the tail current source. Transistors T2 and T3 form the radio frequency (RF) input and act as a transconductance stage, which converts the RF input voltage into a current. The transistors T4–T7 act as switches and steer the current depending on the local oscillator (LO) signal. The load resistors R1 and R2 form the current to voltage transformation and providing the differential output of the intermediate frequency (IF). Unfortunately, not all the mixer specifications can be fulfill the requirements needed for the target Wireless System Network transceiver. To improve gain and linearity the current through the mixer should be increased, which leads to higher power consumption. Additionally, a higher current through the switching quads and

186

F. Henkel et al.

using of resistive load can cause headroom problems. On the other hand, the use of active loads increases the noise figure. Furthermore, the larger current through the switching quads increases the flicker noise and also needs larger LO drive current which increases the DC power consumption of the LO signal generation block. It is necessary to find a solution that offers a compromise for low supply voltage, low power consumption, high linearity and low noise.

3.3 Down-Conversion Mixer In the receiver path a mixer translates the 2.4 GHz RF signal down to a lower IF signal or baseband. For low power applications the Gilbert-cell has some disadvantages. To keep the three stacked transistors in the saturation region a high supply voltage is needed. Similarly, a strong LO voltage is needed, which is a serious disadvantage in terms of power consumption. But a reduction of supply voltage will result in a poor conversion gain and linearity performance. Furthermore, the working range of Gilbert-cell MOS transistors is limited to the saturation region and limits the linearity. For low power applications the structure of the Gilbert-cell has to be modified. When the RF input transistors operate in the saturation region and the LO switch transistors operate in the perfect switching situation, the conversion gain (CG) and IIP3 can be approximated expressed as [14]: CG

p 2 2 RL gm RL KN IdsRF

(10.9)

s IIP3 4

2 IdsRF 3 KN

(10.10)

W L

(10.11)

KN D 2N COX

Equation 10.9 shows the relationship between the transconductance gm and the conversion gain as a function of the load resistance RL . In Eq. 10.10 the IIP3 is shown in dependence of the drain current of the RF input IdsRF . For a better mixer performance gm and IdsRF should be increased. A possible solution is described in the following sections (Fig. 10.12). 3.3.1 Modified Transconductance Stage The linearity of the Gilbert-cell mixer mainly depends on the linearity of the transconductance stage. To improve linearity of the transconductance stage in the last years several new approaches were presented. In [15, 16] a modified Gilbert-cell

10 Low Power RF Frontend for Wireless Sensor Networks

187

Fig. 10.12 Simplified schematic of proposed down-conversion mixer Fig. 10.13 IDS .VGS / for large-size and normal-size transistors around thresholdvoltage VT : same slope of current in two different curves means the same transconductance gm [17]

was presented without tail current transistor to increase linearity, voltage headroom, and allow low voltage operation. A proposal for a RF CMOS subthreshold active mixer design, based on subthreshold biasing of MOS transistors is made in [17]. The improvement in [17] made it possible to reach a high transconductance gm with large sized transistors, biased in subthreshold, similar to the case of normally sized transistors biased in superthreshold as shown in Fig. 10.13.

188

F. Henkel et al. VDD

RL1

RL2 IF +

IF – IBLD

LO +

LO –

MSW2

MSW1 ID1

ID2 Ibias

RF

MRF

Fig. 10.14 Single-balanced mixer with bleeding current source IBLD [19]

This reduces the power consumption of the mixer. Similarly with the same drain current a higher transconductance can be reached. Combining these techniques, the mixer transconductance stage operates in a kind of class AB configuration, i.e. the typical current consumption is low but for high input voltages the differential pair transistors are alternatively pushed into saturation and draw more current. This improves the linearity of the transconductance stage and increases the linearity of the whole mixer. In this case the high current causes a high voltage drop over the load transistors. To prevent the LO switches going into the ohmic region the load stage could be implemented with active PMOS loads. This is controlled by a common mode feedback circuitry (CMFB) which additionally improves the linearity by allowing the maximum range of output voltage swing.

3.3.2 Current Bleeding In addition to the active loads the current bleeding technique [18] is used. The bleeding source transistor reduces the current through the LO switches and loads. For a given current through the transconductance stage, gain decreases in dependence on the current through the bleeding source transistor, and the flicker noise caused by the LO switches and active loads can be reduced. Further increase of gain and lowering of the noise figure can be achieved through higher power consumption (Fig. 10.14).

10 Low Power RF Frontend for Wireless Sensor Networks

189

Fig. 10.15 IDC .RFpower/

Fig. 10.16 Compression points for standard and proposed mixer

3.3.3 Simulation Results As test conditions a radio frequency of 2.4 GHz and a LO frequency of 2.41 GHz resulting in an IF of 10 MHz have been assumed. The mixer is fed with a LO power of 12 dBm from a differential 100 Ohm source. This corresponds to an amplitude for LO C and LO-of about 220 mVpp. Figure 10.15 illustrates the DC current dependence on RF power for the proposed mixer and a standard Gilbert-cell. The operating voltage is 1.8 V. The current consumption of the presented mixer depends directly on the input voltage at the RF input. As described, the current for small input signals is relatively low (IDC < 600 A for RF power <20 dBm). Above a certain input power (around 20 dBm) the available current is not sufficient to substantially increase the input signals. This limits the linearity and affects on the compression point, shown in Fig. 10.16. The proposed structure has a 11 dB higher linearity.

190

F. Henkel et al.

Fig. 10.17 Noise figure(f), improved noise figure with current bleeding

Table 10.3 Simulation results of down-conversion mixers Gilbert-cell Parameter Proposed with modified results mixer transcond. stage

Standard Gilbert-cell

Conversion gain Noise figure @ 10 MHz P1dB IIP3 DC current (mixer core)

9.4 dB 10 dB 19 dBm 10 dBm 580 A @prf D 20 dBm

10.4 dB 14.3 dB 7.9 dBm 5.4 dBm 580 A @prf D 20 dBm

10.8 dB 14.5 dB 8.9 dBm 6 dBm 560 A @prf D 20 dBm

For small input signals the current is the same. The gain reaches up to 10.4 dB. Depending on the input signal strength more current is available for amplification which is a form of adaptive biasing. In Fig. 10.17, the influence of the current bleeding technique is shown on the noise figure. For an intermediate frequency lower than 1 MHz the noise figure is reduced by 2 dB. For a higher bleeding current ratio the noise figure can be further reduced, but the power consumption increases. The simulation results of the different structures are shown in Table 10.3.

4 TX RF Frontend Circuits In this chapter the main building blocks of the TX RF frontend circuits, PA and upconverter mixer are discussed.

10 Low Power RF Frontend for Wireless Sensor Networks

191

4.1 Power Amplifier At the transmitter side of the transceiver another kind of amplifier is needed, which is the power amplifier. The most important characteristics of the power amplifier are linearity, gain, and efficiency. For low power applications high efficiency power amplifiers are of great importance, since high efficiency means lower power consumption and longer battery life. At this point it is important to discuss how power amplifier efficiency is defined. The simplest definition is D

PRF PDC

(10.12)

where PRF is the output RF power and PDC is the DC power. This definition is most suitable when the input of the amplifier is a signal with a constant envelope (like FM signals), since the output RF power is constant with time. To increase the spectral efficiency modern communication systems use complex modulated signals with variable envelope. These signals have high peak to average power ratio (PAPR). To amplify such signal with acceptable linearity, the amplifier is operated at backoff, such that it can accommodate the occasional high power peaks. Since the output power varies with time, the efficiency is also varying with time. In this case the average efficiency would deliver a more accurate measurement of how efficiently the amplifier converts the DC power to RF power. Since communication signals have a stochastic (non-deterministic) nature, the probability density function of the output signal should be known to be able to calculate the average efficiency. For this reason the power amplifier should be designed with the modulation scheme which the transmitter is targeted for in mind. In the last section it will be shown how the average efficiency can be calculated. Several techniques are proposed to efficiently amplify signals with high dynamic range, like envelope elimination and restoration (ERR) and Doherty amplifier systems. Nevertheless these techniques are costly and justifiable only at high power levels. Wireless sensor networks are intended to be low cost and very low power systems, for this reason another approach is used. In wireless sensor networks O-QPSK can be applied; in this modulation scheme only one bit is changed at a time and thus the signal has no zero crossing. In this way the signal dynamic range is minimized. Another optimization factor is the pulse shaping filter used to limit the spectrum of the digital signal and avoid inter-symbol interference. In this work a class B power amplifier shown in Fig. 10.18 was designed for a wireless sensor network node. Table 10.4 shows the required specifications. It is important to point out that although high efficiency operation is required, the final efficiency will partially depend on the form of the input signal as mentioned previously. The design process starts with determining the maximum drain current, after that the optimum load impedance can be calculated using the load line method [20].

192

F. Henkel et al.

Fig. 10.18 Cascode class-B power amplifier

Table 10.4 Power amplifier target specification

Output P1dB Gain Efficiency Output referred IP3 Frequency Technology

13 dBm >15 dB >40% 15 dBm 2.4–2.5 G 150 nm CMOS

Since the output power and supply voltage are known, the DC current (for class B operation) can be calculated using the following formula: PRF D

IDC VDC 4

(10.13)

Imax

(10.14)

For class B operation: IDC D

Knowing Imax the size of the transistor T1 can be determined. Since the output voltage swing is double the DC supply voltage the optimum load can be calculated as follows Ropt D

2 .VDC Vk / Imax

(10.15)

where VDC is the supply voltage and Vk is the knee voltage. Since in sub-micron CMOS process the knee voltage can be as large as 50% of the supply voltage, the optimum load which will be used will be larger than the calculated value and the transistor will be pushed into the ohmic region during operation.

10 Low Power RF Frontend for Wireless Sensor Networks

193

Fig. 10.19 Simulated S11 and S22

The cascoding transistor T2 has two functions, first it isolates the input from the output and reduces the effect of the drain gate capacitance and thus increases stability. The second function can be understood if we observe that the output voltage swing which is 2 .VDC / is applied to two stacked transistors, and in this way the drain gate voltage of each transistor can be kept within the safe limits of the technology. The resonant circuit built of the LD and CD is open circuit at the fundamental and short at all harmonics, additionally the output capacitance of the amplifier can be absorbed in the capacitance CD . At this point it is important to test the stability of the amplifier and determine the regions of stable operation in the smith chart. The input matching circuit made of Cin and Lg is designed to match the large signal input impedance to 50 . Although the loss in the gate inductor Lg affects the gain and efficiency negatively, it increases the stability of the amplifier and can replace the stabilization resistors if needed. Figure 10.19 shows the simulated S11 and S21 of the designed amplifier, it can be seen that the input matching is better than 10 dB in the band of operation and the small signal gain is 17.34 dB at 2.5 GHz. Figures 10.20 and 10.21 show the single tone tests for the amplifier. It can be seen that the power gain is 17.25 dB and the output 1 dB compression point is at 13.15 dBm. The efficiency at this point is 43%; nevertheless the maximum efficiency is 48% at 3 dB compression as shown in Fig. 10.21. The OIP3 of the amplifier is 15 dBm as shown in Fig. 10.22. As mentioned before the average efficiency of the amplifier will depend on the probability distribution function of the input signal. To calculate the average efficiency under realistic conditions the amplifier was excited with 4 O-QPSK signals with a root raised cosine filter for pulse shaping with a roll of factor of 1. To get an estimation of the power variation with the time and the effect on efficiency,

194

F. Henkel et al.

Fig. 10.20 Gain and Pout at 2.5 GHz

Fig. 10.21 Efficiency at 2.5 GHz

the histogram of the output envelope power was calculated and plotted together with the efficiency curves. Figure 10.23 shows the case of using a root raised cosine filter with alpha D 1. It can be seen that the signal has a peak power of 13.57 dBm and a dynamic range of 3 dB. Most importantly the output signal is at its peak only 1% of the time where the efficiency is 45.4%, and at a lower level (12.89 dBm) 14% of the time where the efficiency is 42.6%. The average efficiency can be calculated by summing the product of the two curves, in this case the average efficiency is 41.6%. At this point it is instructive to change the filter parameters and see its effect on the signal dynamic range and thus the average efficiency. Changing the value of alpha to 0.33 the signal dynamic range increases as shown in Fig. 10.24.

10 Low Power RF Frontend for Wireless Sensor Networks

195

Fig. 10.22 Two tones linearity test

Fig. 10.23 Envelope power distribution for ’ D 1

The signal has peak power of 14.12 dBm and a dynamic range of around 7 dB. In this case the average efficiency drops to 40%. It can be seen that the pulse shaping filter has an effect on the amplifier final efficiency. In this case the efficiency drop was not large but it showed the importance of considering the performance of other system block in calculation the amplifier final efficiency.

196

F. Henkel et al.

Fig. 10.24 Envelope power distribution for ’ D 0.33

4.2 Up-Conversion Mixer In the RF transmitter front end the up-conversion mixer is one of the essential parts. It converts the incoming intermediate frequency to radio frequency. Compared to the down-conversion mixer the linearity requirements of an up-conversion mixer will be higher, when driving the preamplifier or directly connecting to the power amplifier. For high linearity, a large DC current through the transconductance stage is required which increases the power consumption of the mixer. Accordingly a larger current is flowing through the LO switches. The higher current flow requires larger sizes for the transistors in order to preserve the voltage headroom.

4.2.1 Circuit Description A modified transconductance stage with current reuse bleeding technique is used in [19]. In addition, the tail current source is removed [15, 16]. This can ensure a high linearity and sufficient gain. Figure 10.25 shows the proposed up-conversion mixer. The NMOS transistors T1 and T2 and the PMOS transistors P1 and P2 are the transconductance stage. P1 and P2 act as bleeding current sources. The transistors T3 and T6 form the LO switches, R1 and R2 are the load elements.

10 Low Power RF Frontend for Wireless Sensor Networks

197

Fig. 10.25 Simplified schematic of proposed up-conversion mixer

Table 10.5 Simulation results for up-conversion mixer

Parameter

Results

Conversion gain Noise figure @ 10 MHz input P1dB IIP3 DC current (mixer core)

1.3 dB 12.5 dB 0.4 dBm 2.5 dBm 400 A

4.2.2 Simulation Results The conversion gain of the up-conversion mixer is 1.3 dB for an LO power of 0 dBm and a supply voltage of 1.8 V. In this case the current consumption of the mixer core is 400 A. The linearity is simulated at an RF output of 2.4 GHz and input frequency of 10 MHz (and 10.1 MHz for IP3). The resulting IIP3 and P1dB were 2.5 and 0.4 dBm, respectively. The simulation results of the proposed upconversion mixer were summarized in Table 10.5.

5 Conclusion Details about RF frontend circuit implementations for different topologies have been presented which show the potential use in integrated radio transceivers for low power applications.

198

F. Henkel et al.

References 1. E. Estrin, R. Govindan, J. Heidemann, S. Kumar, Next century challenges: scalable coordination in sensor networks, in ACM MobiCom ’99, Washington, DC, USA, 1999, pp. 263–270 2. N. Scolari, C.C. Enz, Digital receiver architectures for the IEEE 802.15.4 standard, in Proceedings of the 2004 International Symposium on Circuits and Systems ISCAS ’04, vol. 4, Vancouver, 23–26 May 2004, IV – 345–348 3. G. Cornetta, A. Touhafi, D.J. Santos, J.M. Vazquez, A direct down-conversion receiver for lowpower wireless sensor networks, in World Academy of Science, Engineering and Technology, Issue 51 March 2009 4. S. Chang, J. Park, K. Won, H. Shin, Design of a 2.4-GHz fully differential zero-IF CMOS receiver employing a novel hybrid balun for wireless sensor network. J. Semiconductor Technol. Sci. 8(2), 143–149 (2008) 5. L. G¨opfert, F. Hofmann, G. Jacobasch, A 900 MHz CMOS RF transceiver including digital baseband and hardware-MAC for IEEE 802.15.4/ZigBee applications. Eur. Trans. Telecommun. Spec. Issue on IST Summit 2005. 17(2), 283–292 (2006) 6. J. Notor, A. Caviglia, G. Levy, CMOS RFIC Architectures for IEEE 802.15.4 networks, Whitepaper, http://www.cadence.com/whitepapers/cmosrficarchforieee80215.pdf 7. W. Kluge, F. Poegel, H. Roller, M. Lange, T. Ferchland, L. Dathe, D. Eggert, A fully integrated 2.4-GHz IEEE 802.15.4-compliant transceiver for ZigBee applications. JSSC 41(12), 2767–2775 (2006) 8. T.H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits, 2nd edn. (Cambridge University Press, New York, 2004) 9. IEEE standard 802.15.4d-2009 10. T. Taris, J.B. Begueret, Y. Deval, A low voltage current reuse LNA in a 130-nm CMOS technology for UWB applications, in European Microwave Conference, Munich, 9–12 Oct. 2007, pp. 1105–1108 11. V. Chandrasekhar, C.M. Hung, Y.C. Ho, K. Mayaram, A packaged 2.4 GHz LNA in a 0.15um CMOS process with 2 kV HBM ESD protection, in Proceedings of the 28th European SolidState Circuits Conference, ESSCIRC, Sept. 2002, pp. 347–350 12. H. Long, Y. Liu, Y. Wang, R.P. Dick, H. Yang, Battery allocation for wireless sensor network lifetime maximization under cost constraints, in Proceedings of the International Conference on Computer-Aided Design, San Jose, California, USA. Nov. 2009, pp. 705–712 13. S.K. Alam, J. Degroat, A 2 GHz highly linear downconversion mixer in 0.18- m CMOS, in 12th Nasa Symposium on VLSI Design, Coeur d’Alene, 4–5 Oct. 2005 14. Q. Wan and C. Wang, “A 0.18-m CMOS High-Performance Up-Conversion Mixer for 2.4-GHz Transmitter Application”, Frequenz, Journal of RF-Engineering and Telecommunications, 1-2/2010 Volume 64 January/February 2010, pp. 14–18 15. G. Yao, B. Chi, C. Zhang, Z. Wang, A low-power monolithic reconfigurable direct-conversion receiver RF front-end for 802.11a/b/g applications, Institute of Microelectronics, Tsinghua University, 100084, Beijing 16. T. Elesseily, T. Ali, K. Sharaf, “A crystal-tolerant fully integrated CMOS low-IF dual-band GPS receiver”, Analog Integrated Circuits and Signal Processing, (2010) 63(2), 143–159 17. H. Lee, S. Mohammadi, A 500 W 2.4 GHz CMOS subtreshold mixer for ultra low power applications, School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, 47907, U.S.A. 18. R. Hedayati, S. Haddadian, H. Nabovati, A low voltage high linearity CMOS Gilbert cell using charge injection method, World Academy of Science, Engineering and Technology 38, 2008 19. S. Douss, F. Touati, M. Loulou, Design optimization methodology of CMOS active mixers for multi-standard receivers. Int. J. Electr. Comput. Eng. 2(9), 571–579 (2007) 20. S.C. Cripps, RF Power Amplifiers for Wireless Communications (Artech House, Norwood, 1999)

Chapter 11

Ultra High Data Rate CMOS Front Ends Reza Mahmoudi and Arthur van Roermund

Abstract The availability of numerous mm-wave frequency bands for wireless communication has motivated the exploration of multi-band and multi-mode integrated components and systems in the main stream CMOS technology. This opportunity has faced the RF designer with the transition between schematic and layout. Modeling the performance of circuits after layout and taking into account the parasitic effects resulting from the layout are two issues that are more important and influential at high frequency design. Performing measurements using onwafer probing at 60 GHz has its own complexities. The very short wave-length of the signals at mm-wave frequencies makes the measurements very sensitive to the effective length and bending of the interfaces. This paper presents different 60 GHz corner blocks, e.g. Low Noise Amplifier, Zero IF mixer, Phase-Locked Loop, a Dual-Mode Mm-Wave Injection-Locked Frequency Divider and an active transformed power amplifiers implemented in CMOS technologies. These results emphasize the feasibility of the realization 60 GHZ integrated components and systems in the main stream CMOS technology.

1 Introduction Driven by customer demands, the last two decades have experienced unprecedented progress in wireless portable devices capable of supporting multi-standard applications. The allure of “being connected” at anytime anywhere and desire for untethered access toinformation and entertainment “on the go” has set the ever increasing demand for higher data rates. As shown in Fig. 11.1, contemporary

R. Mahmoudi () • A. van Roermund Department of Electrical Engineering, Eindhoven University of Technology, Den Doleh 2 E. H. 5. 26, 5600 MB Eindhoven, The Netherlands e-mail: [email protected]; [email protected] M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 11, © Springer ScienceCBusiness Media B.V. 2012

199

200

R. Mahmoudi and A. van Roermund

Fig. 11.1 Left: data rate and distance comparison for different WPAN and WLAN technologies. Right increasing data rate trend according to Edholm’s law [2]

systems are capable of supporting light or moderate levels of wireless data traffic, as in Bluetooth and wireless local area networks (WLANs). However, they are unable to deliver data rates comparable to wired standards like gigabit Ethernet and highdefinition multimedia interface (HDMI) [1]. Furthermore, as predicted by Edholm’s law [2], the required data rates (and associated bandwidths) have doubled every 18 months over the last decade. This trend is shown in Fig. 11.1 for cellular, wireless local area networks and wireless personal area networks for last 15 years. In 2001, spurred by the increasing demand of high data rate applications and limitations of current wireless technologies, a 7 GHz contiguous bandwidth was allocated world-wide by the FCC. The regional regulatory bodies allocated local frequency bands with slight shift and defined the maximum effective isotropic radiated power (EIRP). The maximum allowed EIRP at 60 GHz is much higher than other existing WLANs and WPANs. This is essential to overcome the higher space path loss (according to classic Friis formula) and oxygen absorption of 10– 15 dB/km as shown in Fig. 11.2. These two loss mechanisms dictate the use of 60 GHz for short range multi-gigabit per second transmission. The attenuation also means that the system provides inherent security, as radiation from one particular 60 GHz radio link is quickly reduced to a level that does not interfere with other 60 GHz links operating in the same vicinity. Using the 60 GHz band for high data rate and indoor wireless transmission, a multitude of potential applications can be envisioned. The high definition multimedia interface (HDMI) cable could be replaced by a wireless system, transmitting uncompressed video streams from DVD players, set-top boxes, PC’s to a TV or monitor. Current wireless HDMI products utilize the 2.5 and 5 GHz unlicensed spectrum where bandwidth is limited. As a result, these systems implement either lossy or lossless compression, significantly adding component and design cost, digital processing complexity and product size. Typical distance between these gadgets is 5–10 m and this communication can be point-to-point or point–multipoint. The span of the potential services and applications in conjunction with the maturity of the main stream CMOS technology have stimulated the large activity for the realization the required corner blocks and systems in the cheap main stream CMOS process technology at 60 GHz.

11 Ultra High Data Rate CMOS Front Ends

201

40 15 dB / km @ 60 GHz

20 ATTENUATION dB / km

10 4 2 1 .4 .2 0.1 .04

H2O A O2

.02 .01

O2 B H2O

.004 .002 .001 10

H2O

15

20

25 30

A: Sea Level T = 20 °C P = 760 mm 3 ′H2O = 7.5 gr / m

4 5 6 7 8 9 100 FREQUENCY GHZ

150

B: 4 km T = 0 °C ′H2O = 1 gr / m3

200 250 300

400

Fig. 11.2 Gaseous absorption at 60 GHz

Designing at 60 GHz requires dealing with multiple challenges which might be irrelevant or negligible at low frequency designs. One of the most important challenges of 60 GHz circuit design occurs in the transition between schematic and layout. Modeling the performance of circuits after doing the layout and taking into account the parasitic effects resulting from the layout are two issues that are more important and influential at high frequency design. The pronounced impact of parasitics at such high frequencies makes it more difficult to obtain the desired level of performance from the circuits. In addition, the necessity of accurate modeling of the parasitic effects brings about another design complexity. In fact, these complexities lead to the necessity of an iterative shift of the design focus from the schematic to the layout and vice versa, rendering the design a more time consuming process. The electromagnetic modeling of complex structures including the skin effect, substrate loss and the coupling impact of adjacent components is another issue which is sometimes impractical with the currently available simulation software, as they may require immense computational power. Therefore, the question facing the designers is whether the currently available software and tools are computationally capable of including all the layout impacts in their prediction of the performance of the circuits and how such predictions can be accurate regarding all the aforementioned limitations and the accentuated impact of layout-level issues. Performing measurements using on-wafer probing at 60 GHz has its own complexities. The very short wave-length of the signals at mm-wave frequencies makes the measurements very sensitive to the effective length and bending of the

202

R. Mahmoudi and A. van Roermund

interfaces. Especially to perform on-wafer measurements one must pay utmost attention to the rigidity of the interfaces connected to the probes to keep all the connection lengths and orientations constant during the whole period of the measurement and calibration. Also special care must be taken to preserve the position of the probes on the bondpads and impedance standard substrates, since the measurement accuracy can be very much dependent on the positioning and landing of the probes. Another difficulty of mm-wave measurements arises from the overwhelming cost of equipment needed for instrumentation.

2 A Noise and S-Parameter Measurement Setup In this section, measurement setups are introduced which use waveguide interfaces to provide the required rigidity in the vicinity of the probes and utilize magic-T single-ended-to-differential converters to facilitate the measurement of differential circuits. The noise measurement of a 60 GHz double-balanced zero-IF mixer (see Sect. 4) and the noise and s-parameter measurement of a differential 60 GHz LNA (see Sect. 3), using the introduced setup, are explained in the following sections.

2.1 Noise Measurement of a Double-Balanced Mixer The waveguide setup used for on-wafer measurement of the differential circuits is illustrated in Fig. 11.3. In the case of the zero-IF mixer, four probes are needed. The probe on the top of the picture is an eye-pass probe used for biasing. The probe at the bottom of the picture is a GSGSG microprobe suitable for up to 50 GHz measurements and used here at the IF output of the DUT mixer. The other

Fig. 11.3 Left: the waveguide-based setup including two magic-Ts for measuring a doublebalanced mixer. Right: noise measurement setup for the mixer

11 Ultra High Data Rate CMOS Front Ends

203

two probes on the left and right side are infinity GSGSG probes suitable for mmwave signals and used here at the RF and LO differential inputs of the mixer. The waveguide structures are mounted on metal plates which are screwed to the probe station, preventing all kinds of unintentional movements in the setup. Figure 11.3 shows the block diagram of the setup used for the noise measurement using the Y-factor method [3]. The network analyzer is used as a signal generator to produce the LO signal. The 60 GHz noise source is connected via an isolator and a waveguide to the magic-T and then to the RF port of the mixer. The differential IF output of the mixer is converted to single-ended via a hybrid and then connected to the spectrum analyzer via a low-frequency amplifier which covers 30 MHz to 4 GHz. The spectrum analyzer is set to Noise Figure mode and DUT is specified as a downconverter with a 60 GHz LO. The RF frequency range is set to 30 MHz to 2 GHz. The 60 GHz noise source generates noise only in the range of 60–75 GHz. Therefore, another noise source, capable of generating noise in the IF range, is needed for calibration of the output path and the spectrum analyzer. Figure 11.3 shows the block diagram of the noise calibration setup. The low-frequency amplifier is essential for obtaining good calibration results by amplifying the noise. Since two different noise sources are used, the ENR (excess noise ratio) list of the two noise sources must be manually entered in the ENR table of the spectrum analyzer. Both noise sources are controlled by the spectrum analyzer. The effect of the low-frequency amplifier and the cable, connecting the IF balun to the low-frequency amplifier, is automatically taken into account during the measurement, because they are in the calibration setup. However, the impact of the IF balun and the RF interfaces between the 60 GHz noise source and the input of the DUT must be manually calculated after the measurement. The loss of the combination of the magic-T, waveguide structure, and the infinity probe can be measured via two methods. The first employs a delta measurement and utilizing the network analyzer as a signal generator, the amplitude of the 60 GHz signal is measured by the spectrum analyzer. Since the spectrum analyzer does not support 60 GHz measurement, a preselected millimeter mixer is used to downconvert the 60 GHz signal to the range of the spectrum analyzer. Keeping the same amplitude for the signal generated by the network analyzer, the magic-T and the probes are introduced into the setup. A through of an impedance standard substrate is used between the probes. The difference in the readings of the two steps gives the loss of the introduced interface. Assuming a negligible loss for the through and equal loss for the two probes and magic-Ts, the loss of the RF interface, used between the noise source and the mixer input, can be calculated by dividing this number by two. In the second method, two one-port calibrations are performed using the network analyzer. First a cable, used in the next step for connecting the network analyzer to the magic-T and probe, is calibrated and the calibration dataset is saved. Then an onwafer one-port calibration is performed using an impedance standard substrate and including the magic-T and the probe in the setup. Again the calibration dataset is saved. Having the two datasets, the magic-T and probe combination is characterized.

204

R. Mahmoudi and A. van Roermund 18 17.5

Measurement Simulation

NF (dB)

17 16.5 16 15.5 15 14.5 0.5

1

1.5

2

IF Frequency (GHz)

Fig. 11.4 Measured and simulated noise figure of the mixer

The results are the same as the first method (delta measurement). After calculating the impact of the IF Balun, the magic-T and waveguides, and the infinity probe, the final noise measurement results are obtained as shown in Fig. 11.4. The measurement results are close to the simulations.

2.2 Noise Measurement of a Differential LNA The noise measurement of the 60 GHz LNA is impeded by the fact that the output of the LNA is at a higher frequency than supported by the spectrum analyzer. Even the preselected mixer of previous section cannot be used here because the Noise Figure mode of the spectrum analyzer does not support it and it cannot be used with an external LO either. Therefore a passive mm-wave mixer is used in the noise measurement setup, as shown in Fig. 11.5, to down-convert the output of the LNA to the range of the spectrum analyzer. The passive mixer can be included in the calibration setup as shown in Fig. 11.5, making the post-measurement calculations much easier. The measured noise is in close agreement with the simulated values as shown in Fig. 11.6.

11 Ultra High Data Rate CMOS Front Ends

LNA (DUT)

Probe + Magic T

LowFrequency Amplifier

60 GHz Amplifier

LowFrequency Amplifier

60 GHz Amplifier

Spectrum Analyzer

Network Analyzer

60 GHz Noise Source

Magic T + Probe

60 GHz Amplifier

60 GHz Noise Source

205

60 GHz Amplifier

Spectrum Analyzer

Network Analyzer

Fig. 11.5 Left: LNA noise measurement setup. Right: noise calibration for the LNA

During measurement: LNA (DUT)

12 11

During calibration: Impedance standard substrate

10

Magic T + Probe

Magic T + Probe

dB

9

Network Analyzer

8 7 6 5 4 3

Simulated Gt Measured Gt Simulated NF Measured NF

56

58

60

62

64

66

Frequency (GHz)

Fig. 11.6 Left: S-parameter measurement and calibration setup of a differential two-port circuit. Right: measured and simulated noise figure and transducer gain of the 60 GHz LNA

2.3 S-Parameter Measurement Performing s-parameter measurements on differential circuits with a two-port network analyzer is also facilitated by utilizing the magic-Ts. As shown in Fig. 11.8, each port of the network analyzer is connected to a magic-T and then to the probes. SOLT (Short-Open-Load-Through) calibrations are performed on a standard impedance substrate, suitable for GSGSG probes. Then the impedance standard substrate is replaced by the DUT and the measurement is done. The measured transducer gain of the 60 GHz LNA, using this setup, is compared with simulation results in Fig. 11.6. Conforming to the following considerations can promote the accuracy of the measurements and calibrations: • Accurate definition of the impedance standard substrate in the network analyzer or the software which controls the network analyzer • Precise positioning of the probes on the bondpads or on the impedance standard substrate

206

R. Mahmoudi and A. van Roermund

• Repeating the calibration after some period due to invalidity of the calibration results after a certain period • Using undamaged samples of impedance standard substrate Employing waveguide-based measurement setup enabled performing accurate and repeatable measurements on 60 GHz receiver components. The fixed waveguide structures, specially provided for the probe station, serve for the robustness of the setup as they circumvent the need for cables, which are by nature difficult to rigidify, in the vicinity of the probes. Taking advantage of magic-Ts, it is possible to measure differential mm-wave circuits with a two-port network analyzer rather than using a much more expensive four-port one. Furthermore, the differential circuit can be driven by a single-ended noise source necessary for the noise measurement. The noise and s-parameter measurements performed on a 60 GHz mixer and LNA yield consistent results with the simulations.

3 Fully Balanced 60 GHz LNA The market demand for RF transceivers providing communication links of several Gb/s data rate motivates the use of the broadband WPAN ISM band at 60 GHz. These systems require receivers with a low noise figure (NF) and flat band response because of the complex modulation scheme. Combination of low NF, sufficient bandwidth, high gain and low voltage operation are important properties of LNAs. The design of mm-wave LNAs in CMOS causes many challenges because of lossy passives and the Miller capacitance. Several LNAs have been reported in recent years [4]. This section describes a fully differential 60 GHz LNA (Fig. 11.7) in bulk CMOS employing transformer feedback resulting in a flat and broadband response. The Miller effect is defeated using gate-drain capacitance neutralization [5], which is achieved when the following equation is satisfied (n is the transformer turn ratio and k is its coupling). s Cgs n ;n D k Cgd

Ld Ls

(11.1)

3.1 Design Procedure Main design goal for the LNA is low NF combined with a high gain. Both are a function of MOS transistor bias and width, passives choices, and source impedance Zsrc. The MOS transistor bias was chosen as a compromise between noise and gain performance. The small signal circuit is shown in Fig. 11.7.

11 Ultra High Data Rate CMOS Front Ends

207

Fig. 11.7 Left: circuit of the V-V transformer FB LNA as discussed in [5]. The coupling is indicated by the symbols next to the coils. Right: small signal circuit of the V-V transformer feedback LNA. For reasons of clarity the single ended circuit is shown

3.2 Transformer Specifications and Voltage Gain To achieve Cgd neutralization, the transformer turn ratio n divided by the coupling factor k should be equal to the ratio between Cgs and Cgd with a negative sign (11.1), which is approximately 2.3 in the used technology. To maximize gain, the turn ratio should be as high as possible and Ls should resonate with (n2Cgd C Cgs ) to tune out these parasitic capacitances. The former leads along with (11.1) to a high jkj (which is ˙1 at maximum), and the latter sets the inductance value for the inductors used in the transformer. The resulting voltage gain then converges to n. Given a certain MOS transistor width at the chosen bias the transformer properties are thereby known.

3.3 Transformer Design The transformer used in the LNA was constructed using EM simulation software (ADS Momentum). The resulting structure is shown in Fig. 11.8. The transformer has been optimized to have high jkj and high Q-factor inductors [6]. To satisfy Eq. 11.1 a turn ratio n of 1.8 has been chosen along with a coupling factor k of – 0.76. The simulated Q-factors of the inductors are higher than 10 at the frequency of interest. Simulated values for Ld and Ls are respectively 137pH and 42pH. A patterned shield has been placed underneath the transformers to reduce substrate coupling.

3.4 Layout Consideration In Fig. 11.8 the layout of the core of the LNA is shown. At the left the differential input of the first stage is shown and at the right the differential output of the second

208

R. Mahmoudi and A. van Roermund

Fig. 11.8 Left: used transformer structure. For reasons of clarity the vias connecting the two bottom metals are only shown at the beginning and at the end of the metal strips. In reality many vias are distributed along the metal lines. The top inductor (Ls ) connects two metal lines in parallel to lower the inductance and increase the Q-factor. The lower inductor (Ld ) has two turns. The two inductors are placed exactly on top of each other to achieve the highest possible coupling (kj 1). The width of the metal lines is chosen to be 3 m. Right: the schematic of the realized two stages LNA

Fig. 11.9 Left: layout of the LNA (330 170 m2 ). Shown are only the top metal layers to clarify the structure. Patterned shields are used underneath the inductors, transformers and coplanar waveguides (not shown). In and output reference planes are indicated by the dashed lines. Right: Total LNA chip with bondpads and one de-embedding structure. Size die D 960 980 m, size LNA D 330 170 m2

stage. The two stages are connected to each other with a DCblocking capacitor between the output of TF1 and the input of Lg2 . All RF interconnects longer than 10 m used were simulated in ADS Momentum and Cadence RCextraction was used for all other structures. Lg1 and Lg2 are approximately 110pH and 150pH respectively. The transistors are indicated in Fig. 11.9 and are situated underneath the metal lines connecting the transformer structures. Transistor width stage 1 is 35 m and stage 2 is 25 m. The vertical lines surrounding the transformers are the DC power lines and biasing of the LNA. Coplanar waveguides with shielding have been used to connect the different components to each other. This results in low coupling to the substrate and between components. The input and output of the LNA are connected to bondpads using CPWs (see Fig. 11.9). This results in losses

11 Ultra High Data Rate CMOS Front Ends

209

and an impedance shift. The resulting source and load impedance of the circuit at the input and output indicated in Fig. 11.9 is approximately 37 C j10. Open-shortload structures are added to de-embed the circuit. A lot of effort has been put into making the design as symmetrical as possible to reduce common mode.

3.5 Simulation Results The design consisted of an iterative process between circuit simulations, EM simulations and RC-extraction. After the first circuit simulation a Gt of 13 dB with a NF of 3.1 dB was simulated at 61 GHz. The IIP3 of the LNA was approximately 2.6 dBm with a 1 dBc of 11.8 dBm. After EM-simulation and RC-extraction the performance changed due to the parasitic effects. Gt decreased by 2.3–10.7 dB and the NF increased by 0.5–3.6 dB. These simulation results are shown in Fig. 11.6. The IIP3 increased to 4 dBm and the 1 dB compression point increased to 9.8 dBm. The simulated Gt variation in the band of interest is smaller than ˙0.15 dB and the 3 dB bandwidth is approximately 50–73 GHz which is approximately 37% of the center frequency at 61 GHz. The simulated power consumption is 35 mW at 1.2 V supply and 0.8 V gate bias. All simulations were performed using a source impedance of 30 , which was chosen as a compromise between NF and Gt. This is not equal to the conventional 100 for a differential topology. This is because the antenna could be connected directly to the LNA, allowing a different antenna (source) impedance.

3.6 Measurements and Verifications To verify the behavior of the LNA a number of measurements were performed using a differential measurement setup. DC power consumption is seen to be equal to the simulated value of 35 mW. The NF and sparameters are verified independently by the Eindhoven University of Technology and NXP Research. The S-parameters were measured using Agilent E8361A PNA. Calibration was verified using WinCal XE software. After de-embedding the measured Gt with Zsrc D 30 is 10 dB at 61 GHz (Fig. 11.6). The measured in-band deviation is ˙0.25 dB. The s12-parameter is below 47 dB over the entire measured band of 55–67 GHz and the group delay is 20 ps and behaves constant over the band of interest. The differential stability factor (K-factor) stays above 30 in the measured band. In common mode, the maximum transducer gain is equal to 2 dB resulting in a CMRR of 12 dB. The s12- parameter is below 42 dB, and K-factor stays above 70 in this case. NF was measured in the band 59.5–66 GHz (Fig. 11.6). Zsrc during this measurement is equal to 37 C j10 , while the input reflection coefficient for the noise source stays below 15 dB. The average measured value in this band is equal to 3.8 dB. To the author’s knowledge this is the lowest value found in literature

210

R. Mahmoudi and A. van Roermund

around 60 GHz. NFmin of the circuit is found to be 3.7 dB using a load-pull setup in NXP. During this measurement the source impedance for NFmin was also verified with the simulated value. 13BC. The measured IIP3 is equal to 5 dBm at 57.5 GHz and 4 dBm at 60 GHz which is in close agreement with the simulation. Measured 1 dBc is 4.6 dBm and deviates from the simulated value because in simulation a Zload of 100 was used.

3.7 Benchmarking The performance of existing 60 GHz LNAs is compared with this work in Table 11.1. The LNAs presented in [7–9] are single ended, and [10] has a differential output. It is seen the work presented in this section shows the lowest NF along with the highest bandwidth. The relative low gain is because only 2 CS stages are used. The use of feedback results in a high IIP3.

4 60 GHz Zero-IF Mixer Utilized with a Three Dimensional Tuning The zero-IF receiver architecture is a promising candidate for mm-wave high data rate communication. While offering the possibility of low-cost and compact solutions for receivers operating in the license-free band around 60 GHz, the zero-IF architecture suffers from problems such as dc offset, flicker noise, and second order intermodulation distortions. In this section the wideband minimization of second order intermodulation distortion (IMD2 ) in a 60 GHz mixer is investigated. MultiGBps applications envisioned for the 60 GHz band require the zero-IF mixer to provide around 1 GHz of IF bandwidth. Therefore, any IMD2 cancelation mechanism applied to such a mixer must be functional across a wide frequency range. Thus, narrowband IMD2 cancelation techniques are not beneficial in this case. However, conventional single-parameter and double-parameter tuning techniques appear to be ineffective for high IF bandwidth applications. Therefore, in this section a three-parameter tuning method is proposed and is shown both in theory and measurement to be effective in wideband cancelation of IMD2 .

4.1 Second Order Intermodulation Mechanisms The downconversion mixer is normally the main contributor to second order nonlinearity distortions in a zero-IF receiver. The low-frequency second-order distortions generated in the RF path preceding the mixer can easily be filtered by RF coupling

Reference [7] [8] [9] [10] [4] (LNA C mixer, HG) This work

Table 11.1 Benchmarking

Process (nm) 90 65 (SOI) 90 65 45 65

Topology 3 stage CS 2 stage casc. 2 stage casc. 2 casc. C 1 CS 2 stage casc. 2 stage CS

Gt (dB) 15 12 14.6 22.3 (Av ) 26 10

NF (dB) 4.4 8 <5.5 6.1 6 3.8

3 dB BW (%) 10 22 25 13 N/A 37

IIP3 (dBm) N/A N/A 6.8 N/A 12 4

Vdd (V) 1.3 2.2 1.5 1.2 1.1 1.2

PDC (mW) 4 36 24 35 23 35

11 Ultra High Data Rate CMOS Front Ends 211

212

R. Mahmoudi and A. van Roermund VDD RL

RL

VR1

VOUT

VR2

VC1

VC2

VLO– VLO+

M3

M4

M5

VG2

VG1

VRF+

VG1

VLO+

M6

VG2

VRF–

l1

l2

Fig. 11.10 Left: circuit schematic of the Gilbert-cell-like mixer with tunable output impedance and tunable gate biasing. Right: die photo of the mixer

or band-pass filtering. Figure 11.10 shows a typical Gilbert-cell-like mixer used in this article [11]. The input RF voltage is applied via two RF-coupling capacitors to the switching stage. The transistors M1–M4 are responsible for switching and downconverting the RF signal. At the output, the downconverted signal is converted from the current domain to voltage domain by means of the resistors (RL ). The CL capacitors represent the input capacitance of the following stage as well as the parasitics of the switching transistors at the output node. The differential output IMD2 voltage (Vimd2;out ), comes from two sources: (1) the common-mode output IMD2 current combined with output load mismatch and (2) the differential-mode output IMD2 current, as defined in (11.2). Iimd 2CM D

Iimd 2;1 C Iimd 2;2 ; Iimd 2Diff D Iimd 2;1 Iimd 2;2 2

(11.2)

where, Iimd2;1 and Iimd2;2 are as shown in Fig. 11.10. The differential output IMD2 voltage is described as a function of these currents in (11.3): Vimd 2out D Iimd 2;1 ZL;1 Iimd 2;2 ZL;2

(11.3)

where, ZL;1 and ZL;2 are the impedances seen from Vout C and Vout nodes to the RF ground respectively as shown in (11.4), where, Rout is the resistance seen from the output node. ZLi D

Routi 1 C Routi CLi j!

(11.4)

Defining a nominal value for output impedance as in (11.5), the differential output IMD2 voltage can be rewritten as a function of common-mode and differential-mode output IMD currents as depicted in (11.6).

11 Ultra High Data Rate CMOS Front Ends

213

ZL;1 D ZL .1 C •zL / ZL;2 D ZL .1 •zL /

(11.5)

Vimd 2out D 2Iimd 2CM ZL •zL C Iimd 2Diff ZL

(11.6)

Iimd2CM is a function of the input-stage and switching stage even-order nonlinearities and is present at the output current even if there is no mismatch in the circuit. However, it can be vanished in the differential output voltage by a perfect matching between ZL1 and ZL2 . Three mechanisms are responsible for generation of Iimd2DIF : self-mixing, input stage nonlinearity combined with switching pair mismatches, and switching pair nonlinearity combined with its mismatches [12]. Self-mixing is a result of the leakage of RF signal to the LO and vice versa. This mechanism is in general a function of the layout parameters and its contribution is zero in an ideally matched fully balanced downconverter. However, in practice any kind of mismatch in the LO or RF paths can activate this mechanism. The contribution of second and third mechanisms is determined by the mismatch between transistors in the switching pair [13]. The mismatch between the two transistors in a differential pair can be represented by an equivalent voltage offset at the gate of one of them [14]: @f @f ˇ C @ˇ @ 0 1 s 2ˇ A C VT f D IDS @1 C 1 C 2 ˇ IDS Voff D VT C

(11.7)

(11.8)

where, IDS is the biasing current of the transistor, “D n Cox W/L, ™ is the factor taking into account the velocity saturation effect, and VT is the threshold voltage. Therefore, these mechanisms can only be activated if there is mismatch between the switching stage transistors and the effective mismatch between transistors can be controlled by modifying the threshold voltage or biasing voltage of the gates of transistors. The latter approach, as a circuit-level parameter tuning, is considered in this work to avoid the requirement for very accurate tuning of the threshold voltage in the process.

4.2 Wideband IMD2 Cancellation Vimd2out can be minimized by tuning different parameters. Single-parameter tuning methods can adjust Voff to vary Iimd2DIF [15]. They can also adjust output resistance mismatches (•Rout ) or output capacitance mismatches (•CL ) to vary •zL

214

R. Mahmoudi and A. van Roermund

[16, 17]. To make the two terms in (11.6) cancel out each other by tuning only one parameter, the following should be satisfied: •zL D

•Rout Rout CL •CL j !b Iimd 2Diff D .1 C Rout CL j !b / 2IimdCM

(11.9)

where, ¨b is the IMD2 frequency at the output of the mixer. Higher powers of •Rout and •CL are neglected in this approximation of •zL . However, choosing only one parameter to tune, can only satisfy (11.9) at one single frequency point, because for each frequency the tunable parameter has a different optimum. Even a two-dimensional tuning involving •Rout and •CL is not sufficient [17], because it can only set (11.6) to zero at a single frequency point. Of course one might suggest using higher order filters as ZL which can annul (11.6) at multiple frequency points, but that would complicate the baseband filter design and the parameters needed to be tuned increase with the required flatness of IMD2 over the IF band. The approach chosen in this work is tuning all three parameters at the same time. This will result in simultaneous nullification of both terms in (11.6) as shown in (11.10). Since both •Rout and •CL are set to zero in this approach, nullification of •zL is (ideally) frequency-independent. Due to the narrowband assumption of the interferer at RF, Voff can be chosen in a way that all three mechanisms responsible for Iimd2Diff can cancel out each other. •zL D

•Rout Rout CL •CL j !b D 0; Iimd 2Diff Voff D 0 .1 C Rout CL j !b /

(11.10)

4.3 Circuit Design Variable resistors and varactors are added to the output, to provide tunability of the output impedance as required by (11.10). Variable resistors are in the simple form of series transistors biased in the triode region. The biasing of the gates of the switching pair transistors, can be adjusted separately for each half-circuit as required by (11.10). The circuit is designed and fabricated in CMOS 45 nm technology and the die photo is shown in Fig. 11.10. The supply voltage (VDD) is 1.1 V. VR1 and VR2, which control the value of the variable resistors, are differentially tuned around 100 mV. VC1 and VC2 control the varactors to tune the output capacitance and are differentially tuned around 500 mV. VG1 and VG2 tune the biasing voltage of the gate of switching pair transistors and are differentially tuned around 0.9 V. IBias draws 300 A and with a current mirror translates approximately the same current to I1 and I2 in Fig. 11.10. Therefore the circuit in Fig. 11.10 draws less than 600 A from VDD. The complete chip includes the mixer core shown in Fig. 11.10 as well as two active baluns and matching networks at RF and LO inputs. In addition, an IF

11 Ultra High Data Rate CMOS Front Ends

215

–40

–20

–45 –25 Power@IF (dBm)

IMD2 (dBm)

–50 –55 –60 –65

–75 –60

–40

–20

0

–35 –40

IMD2@970MHz IMD2@60MHz

–70

–30

20

–45 –20

–10

VR (mV)

0

10

20

VG (mV)

Fig. 11.11 Left: IMD2 vs. (a) variable resistance control voltage (b) varactor control voltage. Right: IMD3 and fundamental term versus VG, varied in a range twice bigger than needed

buffer is used at the IF output to drive the 50 load of the measurement equipment. Four inductors are used in the design. Two of them are used in the input matching networks and the other two are the loads of active baluns.

4.4 Measurement and Experimental Results To test the capability of IMD2 cancelation across a wide IF frequency range, a 3-tone out-of-band signal is applied to the RF input of the mixer to emulate an out-of-band interferer. The 3 tones are at 61.07 GHz, 61.130 GHz, and 62.1 GHz. The LO signal is at 60 GHz. Therefore, the resulting IMD2 terms are at 60 MHz and 970 MHz which are measured as a function of the tuning parameters. There is another IMD2 term at 1,030 MHz which is considered as out-of-band and is not measured. The closest fundamental term of the downconverted interferer is at 1,070 MHz which is also measured as a function of tuning parameters to see how much the conversion gain can be affected by IMD2 cancellation. One of the IMD3 terms is also measured to observe the variation of IMD3 due to IMD2 cancelation. First of all, the single-parameter tuning is examined. Figure 11.12 shows the variation of IMD2 as a function of the control voltage of the variable resistors. This voltage is varied differentially around a common value of 100 mV. IMD2 at 60 MHz is minimized to around 5 mV whereas IMD2 at 970 MHz is minimized to around 20 mV. In fact, when IMD2 at 970 MHz is minimized, IMD2 at 60 MHz is significantly degraded. The same problem is observed when only one of the other two parameters, VG or VC, is tuned. Figure 11.11 shows the variation of IMD2 when VR and VC are changed simultaneously while keeping VG equal to zero. VR is swept from 40 to 40 mV in steps of 0.5 mV and in each step VC is swept from 300 to 300 mV in steps of 5 mV. According to Fig. 11.12 IMD2 at 60 and 970 MHz are never at the lowest points simultaneously, proving the inefficiency of two-dimensional tuning in this case.

216

R. Mahmoudi and A. van Roermund

Fig. 11.12 Left: IMD2 tuning by simultaneous variation of VC and VR while keeping VG constant at 0. IMD2 tuning by simultaneous variation of VC and VR while keeping VG constant at 10 mV

However, as shown in Fig. 11.12, when VG is also tuned, an optimum point can be found where both 60 and 970 MHz IMD2 terms can be reduced to 70 dBm. In this case three-dimensional tuning improves the IMD2 components at 60 and 970 MHz by 10 and 20 dBm respectively. These results demonstrate both in theory and measurement that a threedimensional tuning is beneficial for wideband cancelation of second order intermodulation distortions (IMD2 ) in a zero-IF downconverter. The resistance and capacitance at the output of the mixer as well as the gate biasing of the switching pairs are tuned together to suppress IMD2 across a wide bandwidth. A 60 GHz zero-IF mixer is designed and measured on wafer to show that the proposed tuning mechanism can simultaneously suppress two IMD2 tones with a frequency difference of 910 MHz while having minor effect on conversion gain and third order intermodulation distortions.

5 A 40-GHz Phase-Locked Loop for 60-GHz Sliding-IF Transceivers Figure 11.13a illustrates a generalized two step down-conversion, sometimes also referred to as sliding-IF architecture. The incoming RF signal fRF is first downconverted by mixing with the RF local oscillator signal fRFLO producing a difference component at fRF – fRFLO . The second down-conversion to baseband is achieved by using the output of the prescaler fIFLO . The factor ‘M’ is an integer frequency multiplier which usually has a range between 1 and 3. The value of 1 implies a direct connection between the oscillator and the mixer, whereas the values 2 and 3 imply a frequency doubler and tripler, respectively. The factor ‘P’ is the division ratio of the prescaler and can also have a value between 1 and 3. The overall division ratio of the synthesizer is separated into ‘P’ and ‘N’ as the prescaler requirements and utilization in mm-wave synthesizers is distinct from the lower

11 Ultra High Data Rate CMOS Front Ends

a

217

b

fRF

Baseband fRF–LO

60 GHz

Baseband

20 GHz

fIF–LO

40 GHz

20 GHz I-Q

×M

÷2 fOSC

÷P LPF, PFD, CP

LPF, PFD, CP

÷N

÷N

Fig. 11.13 Generalized PLL architecture for 60 GHz transceivers

frequency divider chain. The frequency conversion to baseband is carried out as: fosc ; fRF fosc M D P

fosc D fRF

P MP C 1

(11.11)

Using different values for M and P between 1 and 3 in (11.11) yields synthesizers operating at different frequencies. For instance, for M D 1, P D 1 the synthesizer operates at 30 GHz and provides both the RF-LO and IF-LO signals. This architecture is termed as “half-RF” and offers the lowest possible LO without doublers or triplers. However, it has two major drawbacks: third harmonic image and LO-IF feed-through. Other demonstrated combinations include M D 3, P D 2 and M D 2, P D 2. The former operates the synthesizer at 17 GHz and using a frequency tripler to down-convert the RF signal to 8.5 GHz. The conversion to baseband is by using the outputs of the prescaler [18]. The latter uses a 24 GHz PLL and 48 and 12 GHz as the first and second down-conversion steps. In this paper, a fully integrated 40 GHz PLL is presented (using M D 1, P D 2) as shown in Figs. 11.13b and 11.14. The required quadrature IF-LO is provided by the prescaler to down-convert the 20 GHz IF signal to baseband. This architecture prevents the need for doubler or tripler circuits as they tend to be lossy at these frequencies and typically do not provide quadrature outputs. Furthermore, it provides a good trade-off between tuning range and phase noise requirements and enables to satisfy the IEEE 802.15.3c channelization requirements.

5.1 Circuit Design Frequency synthesizers operating below 10 GHz can utilize broadband static prescalers, so there is no issue of synchronization between the VCO and the prescaler (together termed as PLL front-end) as the latter can easily cover the complete tuning range of the VCO. In contrast, mm-wave synthesizers generally

218

R. Mahmoudi and A. van Roermund VbiasCP To bond-pad fref

Iout PFD

fdiv_out

fdiv

fRF-LO

Vtune

CP 74.2 pF

5.74pF

2 kΩ

CML-toCMOS

÷2I-Q ILFD

Selector fIF-LO

Divide-by64

fext

Fig. 11.14 Forty giga hertz PLL block diagram

use LC based VCOs and prescalers, and their frequency selectivity necessitates careful alignment of their respective working ranges. Any frequency mismatch due to design inaccuracy or layout parasitic can reduce the effective operation range of the synthesizer or, in worse case, make it completely devoid of locking. The complete schematic of the PLL front-end is shown in Fig. 11.16. The 40 GHz VCO shown on the left hand side is based on an NMOS-only cross coupled topology and the tank is formed by a top metal single-turn inductor and a varactor setup. The maximum and minimum capacitances are 106 and 30 fF resulting in a Cmax /Cmin ratio of 3.53. The Q-factor of the varactor setup is between 6 and 20, for a tuning voltage of 0–1.2 V. The post-layout simulation of the VCO yields a frequency tuning range (FTR) from 38 to 45 GHz (16%). The VCO consumes 5 mA from a 1.2 V supply and the peak-to-peak amplitude is about 1.5 V. The quadrature injection locked frequency divider is also shown in Fig. 11.15. The differential outputs of the VCO are injected to the input transistors M3 and M6 present in the two separate stages of the ILFD which are coupled in anti-phase to generate 90ı spaced outputs. As the output swing of the VCO is sufficiently large, buffers are not required between the ILFD and VCO, which greatly simplifies the routing during layout and decreases the power consumption of the overall system. The divide-by-64 block consists of six cascaded divide-by-2 stages which are optimized individually for low power consumption and required output power. Each divide-by-2 stage is based on current-mode-logic (CML) D-latches in negative feedback. The differential small-swing output from the last stage is converted to rail-to-rail square pulses for comparison in the PFD by means of a differential to single-ended converter followed by a pair of inverters.

11 Ultra High Data Rate CMOS Front Ends

219

VDD VDD

Vtune

fref

VDD

VDD

VDD

Buffer M3

Q+

VDD

Q– UP

l+

Vtune

UP

l–

VCO+

VDD

M10

M11

M7

M8

M1

M3

Reset delay

VCO–

IoutLoop filter Vtune

VDD Vtune

VDD

l+

M6

C1

Icp

Buffer

M9

M4

C2 R1

l– M5

Q–

M6

M12

DN

M2

DN fdiv

Fig. 11.15 Left: PLL frontend including LC-VCO and quadrature ILFD. Middle: PLL backend components, PFD. Right: charge pump

G

G

fout+

G

fout–

G

Loop filter

fext+ G

÷64

PFD f CP div

buffer

ILFD fext–

~3.5 MHz synthesizer loop bandwidth

fdiv_out G fref

G VD ILFD

Ground meshing

G

VCO Vt

VDD VDD

Vt VDD VDD VDD OUT

Vb VDD SLCT G CP

G

Fig. 11.16 Left: die micrograph of 40 GHz PLL. Right: Fig. 11.6. Close-in spectrum of a locked PLL frequency

5.2 Layout and Technology The PLL is fabricated (Fig. 11.16) in TSMC bulk CMOS 65 nm LP (low-power) process having six metallization layers. The process offers MIM capacitors and poly-silicon resistors. The measured fT of NMOS and PMOS transistors is 140 and 80 GHz, respectively. The layout is done compactly to avoid parasitics, especially in the PLL front-end. Due to bond-pad limitation, only the ILFD output is measured. Transmission lines are used for all RF inputs and outputs. These TL’s are coplanar waveguide based with lateral ground plane consisting of all metal layers. The width of the TL is 5 m and spacing from the ground plane is 4.22 m. The total chip-area of the synthesizer including bond-pads is 1.67 0.745 mm2 .

220

R. Mahmoudi and A. van Roermund

Fig. 11.17 Measured phase noise of the PLL

5.3 Measurement Results The PLL is measured on wafer using Agilent PSA (E4446A) and LeCroy real-time oscilloscope (Wave Master 830Zi). The free-running center frequency of the PLL is observed at 20.2 GHz and the VCO and ILFD consume 5 and 9 mA from a 1.2 V supply, respectively. The divide-by-64 block is included in the circuit by keeping the selector voltage HIGH and the divided frequency of 315 MHz is observed on the oscilloscope. The divide-by-64 circuit consumes 6 mW and the corresponding output buffer which is a cascade of four inverter stages consumes 2 mW. Both, the divided signal and the reference signal are observed on the oscilloscope. The reference signal is varied in steps from 290 to 344 MHz, which corresponds to an output frequency of 18.5–22 GHz (or 37–44 GHz at the VCO output). From these values, the ILFD output of the synthesizer locks between 19.1 and 21.8 GHz. The corresponding locked frequency range at the VCO output is 38.2–43.6 GHz. Thus, the PLL can down-covert a 60 GHz signal within a range of 57.3–65.4 GHz, thus covering all the four high-rate PHY (HRP) channels of the IEEE 802.15.3c standard. A locked spectrum for a reference frequency of 306.2 MHz is shown in Fig. 11.16. In a typical PLL, the sideband spectrum noise is cleaned-up within the loop bandwidth which is evident by the highlighted area in Fig. 11.16. The loop bandwidth estimated from the screenshot is about 3.5 MHz as opposed to the calculated value of 4 MHz. The output power of 8.55 dBm also includes the 1.5 dB of cable and other measurement related losses. The phase noise of the synthesizer is measured by the spectrum analyzer at the ILFD output and reflects its loop performance. Figure 11.17 shows one typical plot for a locked frequency of 20.12 GHz from 100 to 10 MHz. The value at 1, 4 and 10 MHz offset from the carrier is 95.7, 100 and 118 dBc/Hz, respectively. The first of the above values is the in-band phase noise (within the loop bandwidth),

11 Ultra High Data Rate CMOS Front Ends

221

the second at the calculated loop bandwidth (forming the “knee”) and the third corresponds to the out-of-band phase noise. The variation of phase noise over the synthesizer operation range is C2.5 dB. The phase noise at the VCO output (at double the frequency, i.e. 42.24 GHz) can be estimated by adding 6 dB to the above mentioned values, resulting in 89.7, 94 and 112 dBc/Hz at 1, 4 and 10 MHz offsets from the carrier, respectively. The presented PLL is compared with published works in Table 11.2. It is the only design which covers all four HRP channels of the IEEE 802.15.3c standard. It demonstrates the lowest power consumption with the second highest locking range.

6 A Dual-Mode mm-Wave Injection-Locked Frequency Divider The proposed ILFD shown in Fig. 11.18 achieves dual-mode operation by preserving both even and odd harmonics, and features an increased locking range by improving both injection efficiency and varactor tunability. The former is achieved by direct differential injection via M3–M4, at the same time enhancing noise immunity as well as symmetric loading for a differential VCO. De-tuning of the Miller capacitances, introduced for the first time in an ILFD, improves the input– output transfer of the injection signal and cancels out the fixed tank capacitance, thus widening the locking range. This is achieved by a transformer feedback, which further accomplishes input matching without need for extra components. In addition, the over-drive voltage of the injection transistors M3–M4 is increased resulting in an enhancement of their effective transconductance. AC and transient simulations were used to determine the optimum transformer and injection-transistor parameters for achieving a wide locking range with minimum input power. The center-tap inductor of the LC-tank is a 9 m wide single-turn top-metal coil having an inductance of 192pH and a Q-factor of 28 around 20 GHz. The varactors provide a capacitance tuning of 150–39 fF and a Q-factor of 8 20,for a tuning voltage of 0–1.2 V. Operating at 0.8 V, the circuit is also suitable for future CMOS processes. The RF-input is AC coupled on-chip and the

VDD

VDD

Vout–@ finj / 2 or finj / 3

VDD Ltank

M6

Vinj+@ finj M1

M3 k

G

OUT+

G

OUT–

G G

G

Divider core & Buffers Vtune

Transformer

M5

VDD

Vtune

Vinj–@ finj

Vout+@ finj / 2 or finj / 3

Vbias

M4 k

VDD Ltank

M2

G

G G

IN+

G

IN–

G

Fig. 11.18 Left: dual-mode ILFD circuit schematic and output buffer. Right: chip micrograph of DM-ILFD

b

a

Supply for PFD and CP is 1.8 V Excluding buffers

80 1.6 1.9

65 CMOS 1.2a 17.5–20.94 (17.9%) 35–41.88 (17.9%) 100 (at 20 GHz)

Tech. [nm] Supply [V] VCO range [GHz]

Phase Noise [dBc/Hz] Fref [MHz] Ref. spurs [dBc] Power [mW] Area [mm2 ]

ISSCC 10 [31]

Reference 130 CMOS 1.5 45.85–50.6 (9.8%) 72 (@1 MHz) 45.1 40 to 27 45.8b 1.16 0.75

90 (@ 1 MHz) 50 54 64b 1.77 0.87

JSSC 07 [33]

90 CMOS 1.2 39.1–41.6 (6.2%)

ISSCC 08 [32]

Table 11.2 Performance summary and comparison of PLL

85.1 (@1 MHz) 234.1 50.4 80 0.95 1

90 CMOS 1.2 58–60.4 (4.1%)

ISSCC 07 [34]

89.7 (@ 1 MHz) 300 42 22.8* 1.67 0.745

65 CMOS 1.2 38.2–43.6 (13.2%)

This work

222 R. Mahmoudi and A. van Roermund

11 Ultra High Data Rate CMOS Front Ends

223

output employs a 50 matched differential common-source buffer (M5–M6) for measurement purposes. The basic purpose of the transformer, as mentioned, is to transfer the injection signal without loss to the drain and source of M3–M4 for signal-mixing. As the required inductance is below 100pH, a small structure with high self-resonance frequency could be chosen. In order to achieve high coupling and Q-factor, the top two metal layers (Me6, Me5) are used and placed exactly on top of other (Fig. 11.18). Me1 is placed below the transformer to improve substrate isolation and reduce capacitive coupling to the substrate.

6.1 Layout and Technology Figure 11.18 shows the micrograph of the 65 nm bulk-CMOS DM-ILFD IC. A separate transformer along with de-embedding structures (open, short, load) is also fabricated. Coplanar transmission lines, wideband 50 matched including the bond-pads, are used at input and output. The core DM-ILFD circuit occupies 200 150 m2 and the total chip area is 800 500 m2 . The transformer is smaller than a ‘ground’ bond-pad and only occupies 54 52 m2 .

6.2 Measurement Results The measured inductances of the primary and secondary coils of the transformer are 70 and 88pH at 60 GHz, and 62 and 80pH at 35 GHz. At 60 GHz, the measured values differ by 3 and 14pH compared to the corresponding EM-simulated (in Momentum and Sonnet V12) values. The measured coupling factor at the two frequencies is 0.69 and 0.67, respectively. The input sensitivity curves of the DM-ILFD are shown in Fig. 11.19. The free-running frequency of the DM-ILFD lies between 16.8 and 19.2 GHz for a tuning voltage (Vtune in Fig. 11.18) of 0–1.2 V while consuming 4 mW from a 0.8 V supply. A shift in center frequency due to an estimated 50 fF interconnect capacitance is observed. In divide-by-2 mode, with an input power less than 2 dBm, the locking range for each Vtune is about 3 GHz (8.27%) and total operating range is 33–39.5 GHz (17.93%). In divide-by-3 mode, the required input power is less than C1 dBm whereas the locking range for each Vtune is about 4 GHz (7.4%), and the total operating range is 48.5–59.5 GHz (20.4%). Figure 11.19 shows an example of the locking operation. A maximum variation of C3 and C1.15 dB is observed in the de-embedded output power and phase noise over the complete operating range as shown in Fig. 11.19 for different tuning voltages in the two division modes. The phase noise at a locked output of 19.05 and 16.6 GHz in divide-by-2 and divide-by-3 mode is 130.6 and 13 dBc/Hz at 1 MHz offset, respectively. These values are within C0.2 dB to the theoretical 6 and 9.5 dB difference (due to frequency division) from the generator phase noise.

224

R. Mahmoudi and A. van Roermund

Fig. 11.19 Left: measured input sensitivity of DM-ILFD in divide-by-2 and divide-by-3 mode. Middle and Right: measured output power and phase noise variation in the two division modes for different tuning voltages

Fig. 11.20 Left: Figure-of-merit principle and definitions. Middle and Right: measured locking operation in the two division modes

6.3 ILFD Figure-of-Merit For proper comparison to state-of-the-art ILFDs, two new figure-of-merits (FOMs) are introduced here. Varactor tuning is widely adopted to increase the operating range of ILFDs [19–21]. It is noticed, however, that two different definitions are used for locking ranges obtained with [22] or without [20] varactor tuning. Therefore, an appropriate comparison demands a FOM incorporating the number of required varactor tunings to cover the complete operating range. Furthermore, input injection power and DC power consumption are important benchmarks that should be reflected in the FOMs. To this end, the total locking range is divided (Fig. 11.20) into p parts f1 , f2 , : : : , fp , related to tuning voltages Vtune1 , Vtune2 , : : : , Vtunep and minimum input powers Pmin1 , Pmin2 , : : : , Pminp . Averaging both locking range and minimum input power leads to favg D (f1 C f2 C : : : C fp )/p and Pminavg D (Pmin1 C Pmin2 C : : : C Pminp )/p. The total locking range (flock) thus equals n x favg . FOMP-in , also shown in Fig. 11.20, reflects the injection efficiency by assessing the average injection power Pminavg (in watts) required for an average relative tuning range (favg /fcenter ), and FOMPdc reflects the tuning efficiency by assessing the DC power consumption needed for the same average relative tuning range. Both FOMs comprise n and clearly, a lower value of n is preferred. It should be noted that without varactor tuning n equals 1. A higher FOM value indicates

11 Ultra High Data Rate CMOS Front Ends

225

a better ILFD. A comparison with reported designs using the proposed FOMs and underlying parameters is shown in Table 11.3. The FOMPin of the presented DM-ILFD, in divide-by-2 and 3 mode, is 28 and 34.5 dB better than the dual-mode ILFD in [21] (ISSCC 2009), reflecting a considerable improvement in injection efficiency whereas the FOMPdc is comparable. Compared to the single-mode dividers, the 20.4% operating range of the DM-ILFD in divide-by-3 mode is better than [19, 20], resulting in a better or comparable FOMPin . The FOMPdc , on the other hand, is lower than single-mode ILFDs in [23, 24]. Finally, the operating range of 11 and 6.5 GHz at 60 and 35 GHz can easily cover the respective mm-wave bands of a multi-mode synthesizer.

7 Fully Integrated 60 GHz Distributed Transformer Power Amplifier Realization of high speed short-range wireless communication systems has motivated the employment of the available 6 GHz bandwidth around 60 GHz. Cost effective solutions for those applications can be achieved through the realization of fully integrated transceivers comprising the digital circuits in the main stream CMOS technology. However the low breakdown voltage of the active devices and the poor quality factor of passive components in CMOS technology have complicated the realization of power amplifiers capable of achieving the required output power and decent efficiency. The achievement of high power levels demands the realization of high ratio trans-impedance matching networks which declines the efficiency. As an alternative, one may use of distributed transformer power amplifier [25] which enables simultaneous power combining and impedance matching. The inequality of transformers’ input impedances engenders common-mode and unequal differential voltage-swings which might prevent the achievement of maximum output power. To surmount these problems we present two universal methods by adding auxiliary tuning components and introducing different device size for each combining stage, instead of complicating the transformer design [26, 27].

7.1 DAT Topologies DAT topology is composed of two sets of magnetically coupled inductors from which the secondary is connected to the output and the set of primaries are connected to amplifiers, creating power combining stages. Since DAT topologies accumulate the voltages of all combining stages, proper performance of DAT demands the achievement of maximum voltage swings of each stage and processing negligible common-mode component. A literature survey reveals the existence of different DAT topologies [25, 28], shown in Fig. 11.21, which share a common

This work

Reference [19] ASSCC 07 [20] MWCL 09 [23] WITS 08 [24] MWCL 08 [21] ISSCC 09

3 3

2

2 2 3 2 3

130 65

90

130 130

65

Division ratio

CMOS Tech. (nm)

25–31.2 35.6–39.3 53.8–57.8 33–39.5 48.5–59.5

35.7–54.9

66–72 48.8–54.6

Op. Fteq (Gliz)

Table 11.3 Comparison with published results of ILFD

22 9:8 7:1 17:9 20:4

42:3

9:1 11:2

LR. (%) 2 3 0.8

1.86 3.12

13 55:2 26

39 18 12 45:3 45 4 4

Pdc (mW)

Pmin-avg (dBm)

130 133:7 131:5 130:6 132

118:4

120:8 115

Ph. Noise (dBc/Hz @ 1 MHz)

1 1.57 2 2.29 2.86

1

1.63 5

Factor ‘n’

62.4 35.9 28.9 64.2 63.5

52.2

30.4 69.7

FOMPin

20:7 13:0 12:0 12:9 12:5

27:2

12:6 8:7

FOMPdc

226 R. Mahmoudi and A. van Roermund

11 Ultra High Data Rate CMOS Front Ends

227

Fig. 11.21 Left: DAT topologies. (a) Inline non-alternating topology with center output [28]. (b) Inline alternating topology with side output [28]. (c) Ring topology [25]. Primary inductors are in red and secondary inductors are in blue. Right: simplified equivalent half-circuit diagram of Fig. 11.18a

problem regarding the inequality of input impedances. This inequality can occur between the values of differential input ports (ZinIi ) as well as between the individual nodes of those ports (ZnodIi ). The latter engenders unbalanced voltage swings at the output of differential amplifier stages (common-mode component), while the former imposes unequal voltage swings among those differential amplifier stages due to unequal loadline terminations [29]. This inequality might occur due to a combination of three mechanisms; different physical distances of power combining stages towards the output port (see Fig. 11.21), asymmetric physical position of input nodes with respect to the virtual AC grounding and asymmetric interwinding capacitances (see Fig. 11.21). Notice here that the AC grounding results from the differential output. Without loss of generality, Fig. 11.21 illustrates the half-circuit diagram of Fig. 11.21. Different voltage levels can be noticed at nodes V5 and Vx with respect to the virtual AC ground. These differences in combination with the asymmetric values of interwinding capacitances create common mode voltages and unequal differential voltage swings.

7.1.1 Compensating Common-Mode Effect In order to alleviate the impact of the common-mode effect, one can use the biasing connection at the center tap of the primary inductors. However the impact of this method is depending on the DAT topology and its contribution could

228

R. Mahmoudi and A. van Roermund Table 11.4 Results examples common-mode compensation Fig. 11.21 U .a/ C U .c/ C

Znod,1 9.2 C j7.8 11.3 C j5.7 12.7 C j1.5 10.9 C j1.4

Znod,2 17.9 j3.4 12.5 – j5.1 8.8 j1.3 9.2 j0.1

Znod,3 17.4 C j3 19.6 j0.3 7.7 C j0.6 8.7 j0.9

Znod,4 25.2 j5 19.6 – j7.8 10.2 j0.2 8.9 – j0.3

Fig. 11.22 RF-voltages at nodes 3, 4 related to Fig. 11.21c. (a) before CM-compensation. (b) after CM-compensation. The dashed line indicates VRF; max

be turn to be positive or negative. Furthermore, one can adjust the transformer layout for a reduction of common mode effect. However these methods impose complex and iterative design procedure. To surmount the common mode effect, we propose the insertion of auxiliary components, e.g. capacitances and resistances, for equalization of node impedances. Employing mixed-mode parameters facilitates the determination of the common and differential components of each input port. A general analyzing procedure targeting zero common-mode voltages leads to a set of required conditions. As an example the required conditions in relation to Fig. 11.21a with use of Fig. 11.21 can be formulated as: Zcd11 C Zcd12 D 0; Zcd22 C Zcd21 D 0

(11.12)

In which ZcdIij represents the differential to common impedance-parameters of the transformer. Employing the equivalent circuit, presented in Fig. 11.21, enables the determination of the values and the location of auxiliary components. The result of this example is presented alongside another example regarding an implemented ring variant ([30], Fig. 11.21c) in TSMC 65 nm CMOS technology at Table 11.4 with the location of auxiliary components shown in Fig. 11.21. In Table 11.4 ’U’ denotes the uncompensated node impedances and ’C’ denotes the compensated node impedances. For the latter example the RF voltages at nodes 3, 4 are shown in Fig. 11.22 before (a) and after (b) compensation.

11 Ultra High Data Rate CMOS Front Ends

229

Table 11.5 Results example load-line adjustment by device scaling W1 D W4 (m) 293 293

W2 D W3 (m) 293 258

Zin,1 D Zin,4 23.6 C j0.5 21.9 C j0.1

Zin,2 D Zin,3 17.3 C j0.6 18.5 C j0. 15

7.1.2 Compensating Unequal Differential Voltage-Swings Assuming the negligible effect of common mode, one may compensate the inequality of voltage swings between different combining stages. This goal can be achieved by combining currents and voltages as is suggested in [27]. However, this method complicates the design procedure for achievement of a higher output level where a larger transimpedance ratio is necessary. In conventional amplifier design, the required relation between the voltage and current swing is dictated by the impedance value of the output termination. However the existing magnetic and capacitive coupling between different stages of DAT topology facilitates the implementation of a new approach. This property enables adjusting the voltage swing of one stage by means of altering the amplitude and phase of current flows at the other amplifier stages. Taking benefit of this distinguishing property, an equalization of the voltage swings can be achieved by conscious introduction of inequality in the physical dimension (e.g. width) of the implemented amplifiers which in turn alters the required loadlines for achievement of maximum output swing for each stage. Table 11.5 represents the optimized transistor sizes with resulting load-line values of the ring variant, mentioned in the previous section. After applying compensation the voltage swings become approximately equal.

7.2 Circuit Design Due to the restriction on the layout size, a compact two stage inline alternating structure has been implemented in TSMC 45 nm CMOS technology. Cascoded differential pair topology forms the core of the combining stages. These amplifiers are operating in Class A, providing maximum gain and linearity. Minimizing interconnection loss has been achieved by an appropriate arrangement of the transistors. Transistors’ width of the first and the second combining stage are respectively 150¹m and 160¹m and they are biased according to the 0.29 mA/m guideline for achievement of optimal linearity. Fulfilling the DC-electro-migration design rules has imposed a larger line-width of the primary (red) windings. An optimal transformer design has been carried out in Momentum and it has led to an achievement of 76% efficiency, primary inductors of 70pH and a secondary inductor of 157pH. A T-junction power divider delivers the required power level to each amplifier stage and has angles of 45ı to reduce the length and losses. A pair of LC-matching networks together with the power divider transforms the 4.6–j8.7

230

R. Mahmoudi and A. van Roermund

RFin

Rin,1

Rin,2

RFout

Fig. 11.23 Left: two-stage inline alternating transformer. Right: chip photograph of two-stage DAT PA. The core is shown within the dashed box

amplifiers input impedances into 100 . The large ratio of the required transimpedance matching networks reduces the overall bandwidth. This affect might be alleviated by adding pre-amplifiers which will raise the impedance level of the amplifier input ports. The required passive and distributed components have been designed in Agilent Momentum and in Fig. 11.23 the chip photograph of the realized PA is shown with a core-size of 210 m 270 m.

7.3 Measurement Results 7.3.1 Small-Signal Measurements On-wafer small signal measurements are carried out using a 67 GHz Agilent PNA and Cascade Microtech infinity probes with integrated baluns with a supply voltage of 1.8 V and a total bias-current of 178 mA. Figure 11.24 shows the simulated results alongside the measured results in which a maximum measured gain of 7.8 dB can be observed at 56 GHz. Noticeable is a discrepancy at peak and notch frequency of jS21j and jS11j, respectively. Measurement results reveal a 1 dB-bandwidth of 5.5 GHz (from 53.6 to 59.1 GHz).

7.3.2 Large-Signal Measurements Due the inaccuracy involved with the modeling of passive components, a small discrepancy has been observed between the simulated and the measured load-line terminations. A proper measurement of the output power level has then been carried out by employing a 60 GHz Focus MPT load-pull system. Verification between the simulated and measured optimum load reflection coefficients reveal a discrepancy only in the phase component of 45ı . The measured value of the 1 dB compression

11 Ultra High Data Rate CMOS Front Ends

231

Fig. 11.24 Left: S-parameters at Vdd D 1.8 V and Idc D 178 mA. Right: output power, power gain and PAE at 60 GHz for Vdd D 1.8 V and Idc D 178 mA

output power at the reference plane directly after the transformers’ output shows only a 0.7 dB discrepancy between load-pull system and nominal 100-termination. The large signal measurement has resulted in a 1 dB compression point of 13.2 dBm, a saturated power level of 16.3 dBm and a maximum PAE of 8.7% which are illustrated in Fig. 11.24. In the definition of the latter the power gain is taken into account. Employing 60 GHz Agilent signal generator, 67 GHz Agilent PNA and Agilent Spectrum Analyzer in combination with the 60 GHz Focus MPT load-pull system has facilitated the realization of an accurate 2 tone measurement setup. The 2 tone measurement results reveal an 18.7 dBm of OIP3 for a center frequency of 60.02 GHz and a tone-spacing of 40 MHz. Table 11.6 summarizes the performance of the presented differential DAT power amplifier alongside the latest reported CMOS power amplifiers at 60 GHz. This table shows that the presented work has a good output power level and a moderate efficiency in respect to the reported amplifiers.

8 Conclusion The parasitic effects due to layout, which are more influential at high frequencies, are taken into account by performing automatic RC extraction and manual L extraction. The long signal lines are modeled with distributed RLC networks. The problem of substrate losses is addressed by using patterned ground shields in inductors and transmission lines. It is observed that accurate simulation of all these parasitic effects is sometimes very time consuming or even infeasible. For instance electromagnetic simulation of a transformer in the presence of all the dummy metals is beyond the computational capability of existing EM-simulators. The on-wafer measurements on the 60 GHz circuits designed in this work are performed using a waveguide-based measurement setup. The fixed waveguide structures, specially provided for the probe station, serve for the robustness of

Description

3-Stage DAT cascode 2-Stage cascode 3-Stage differential CS 3-Stage DAT cascode 3-Stage CS 2-Stage push pull 2-Stage single-ended cascode 2-Stage differential cascode 1-Stage DAT differential cascode

Reference

[29] [35] [36] [37] [38] [39] [40] [41] This work

Bandwidth ldB/3 dB (GHz) 55–71/NA 55–65/NA NA/57–62 NA/53–68 NA/51–61 50–67/NA 50–60/47–60 55–65/NA 53–59/NA

Gain 26 16 15:8 15:5 19 6 16 20 6

Table 11.6 Comparison with reported pas in literature

14.5/18 12.7/14.5 2.5/11.5 11.5/18 NA/7.9 11/13.8 7.6/12 11.2/14.5 13.2/163

Output power 1 dB/sat (dBm) 12:2 25 11 3:6 19:4 7 12:3 14:4 8:7

PAE max (%)

90 SOI 65 65 65 45 45 45 45 45

Technology (nm)

232 R. Mahmoudi and A. van Roermund

11 Ultra High Data Rate CMOS Front Ends

233

the setup as they circumvent the need for cables, which are by nature difficult to rigidify, in the vicinity of the probes. Taking advantage of magic- Ts, it is possible to measure differential mm-wave circuits with a two-port network analyzer rather than using a much more expensive four-port one. Noise, s-parameter, and phase noise measurements are performed using the mentioned setups. However, in practice, we have succeed to realize different 60 GHz corner blocks with excellent performance. These results have proven the capability of the main stream CMOS technology combined with proper and time consuming design procedure for the realization of 60 GHz RF corner blocks. Acknowledgement The author would like to thank Hammad M. Cheema, Jaap Essing, Pooyan Sakian, Erwin Janssen from Department of Electrical Engineering, Eindhoven University of Technology, The Netherlands; Anton de Graauw and Edwin van der Heijden from NXP Semiconductors, the Netherlands; and Paul T.M. van Zeijl from Philips Research, The Netherlands, for contribution to this work, NXP Semiconductors and Philips research for giving access to the CMOS technologies and STW for the financial support.

References 1.Yong Su Khiong, Chong Chia-Chin, An overview of multigigabit wireless through millimeter wave technology: potentials and technical challenges. EURASIP J. Wirel.Commun. Netw. 1 (2007). pp. 50 2. S. Cherry, Edholm’s law of bandwidth. IEEE Spectr. 41(7), 58–60 (2004) 3. Noise Figure Measurement Accuracy – The Y-Factor Method, Agilent Application note 57–2, http://cp.literature.agilent.com/litweb/pdf/5952-3706E.pdf 4. J. Borremans, K. Raczkowski, P. Wambacq, A digitally controlled compact 57-to-66 GHz front-end in 45 nm digital CMOS. ISSCC 2009, San Francisco, Feb 2009 5.D.J. Cassan, J.R. Long, A 1 V transformer-feedback low-noise-amplifier for 5 GHz wireless LAN in 0.18 m CMOS. IEEE J. Solid-State Circ. 38(3), 427–435 (2003) 6.H.M. Cheema, E. Janssen, R. Mahmoudi, A. van Roermund, Monolithic transformers for high frequency bulk CMOS circuits, in IEEE Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems 2009. SiRF 0 09, ed. by J. William (IEEE, San Diego, 2009), pp. 1–4 7. E. Cohen, S. Ravid, D. Ritter, An ultra low power LNA with 15dB Gain and 4.4 dB NF in 90nm CMOS process for 60GHz phase array radio. IEEE RFIC Symposium of Digest, June 2008, pp. 61–64 8. A. Siligaris, C. Mounet, B. Reig, P. Vincent, A. Michel, CMOS SOI technology for WPAN. Application to 60 GHz LNA. IEEE ICIDT, International Conference. 2008 9. T. Yao, M.Q. Gordon, K.K.W. Tang, K.H.K. Yau, M.-T. Yang, P. Schvan, S.P. Voinigescu, Algorithmic design of CMOS LNAs and PAs for 60 GHz radio. IEEE J. Solid-State Circ. 42(5), 1044–1057 (2007) 10. C. Weyers, P. Mayr, J.W. Kunze, U. Langmann, A 22.3 dB voltage gain 6.1 dB NF 60 GHz LNA in 65 nm CMOS with differential output. ISSCC Digest of Technical Papers, Feb 2008 11. P. Sakian, R. Mahmoudi, P. van Zeijl, M. Lont, A. van Roermund, A 60-GHz double-balanced homodyne down-converter in 65-nm CMOS Process, 2009. European Microwave Integrated Circuits Conference, Rome, Sept. 2009 12. C.E. Shannon, Communication in the presence of noise. Proc. IEEE 72(9), 1192–1201 (1984) 13. D. Manstretta, M. Brandolini, F. Svelto, Second-order intermodulation mechanisms in CMOS downconverters. IEEE J. Solid State Circ. 38(3), 394–406 (2003)

234

R. Mahmoudi and A. van Roermund

14. J. Wang, A.K.K. Wong, Effects of mismatch on CMOS double-balanced mixers: a theoretical analysis. IEEE Hong Kong Electron Devices Meeting, Hong Kong, 2001 15. K. Dufrene, R. Weigel, A novel IP2 calibration method for low-voltage down conversion mixers. IEEE Radio Frequency Integrated Circuits (RFIC) Symposium, June 2006 16. K. Kivekas, A. Parssinen, J. Ryynanen, J. Jussila, K. Halonen, Calibration techniques of active BiCMOS mixers. IEEE J. Solid State Circ. 37(6), 766–769 (2002) 17. M. Hotti, J. Ryynanen, K. Kievekas, K. Halonen, An IIP2 calibration technique for direct conversion receivers. IEEE International Symposium on Circuits and Systems (ISCAS), 2004 18. S.K. Reynolds, B.A. Floyd, U.R. Pfeiffer, T. Beukema, J. Grzyb, C. Haymes, B. Gaucher, M. Soyuer, A silicon 60 GHz receiver and transmitter chipset for broadband communications. IEEE J. Solid-State Circ. 41(12), 2820–2831 (2006) 19. C.-H. Wang, C.-C. Chen, M.-F. Lei, et al., A 66-72 GHz divide-by-3 injection-locked frequency divider in 0.13-m CMOS technology. IEEE Asian Solid-State Circuits Conference, Nov 2007, pp. 344–347 20. Yu Xiao Peng, A 3 mW 54.6 GHz divide-by-3 injection locked frequency divider with resistive harmonic enhancement. IEEE Microw. Wirel. Compon. Lett. 19(9), 575–577 (2009) 21. H.-K. Chen, H.-J. Chen et al., A mm-wave CMOS multimode frequency divider. ISSCC Digest of Technical Papers, Feb. 2009, pp. 280–281 22. S.-W. Tam, H.-T. Yu, Y. Kim, E. Socher, M.C.F. Chang, T. Itoh, A dual band mm-wave CMOS oscillator with left-handed resonator. IEEE Radio Frequency Integrated Circuits Symposium, June 2009, pp. 477–480 23. Luo Tang-Nian, Chen Yi-Jan Emery, A 0.8-mW 55 GHz dual-injection-locked CMOS frequency divider. IEEE Trans. Microw. Theory Tech. 56(3), 620–625 (2008) 24. H.-K. Chen, D.-C. Chang, Y.-Z. Juang, S.-S. Lu, A 30-GHz wideband low-power CMOS injection- locked frequency divider for 60 GHz wireless- LAN. IEEE Microw. Wirel. Compon. Lett. 18(2), 145–147 (2008) 25. I. Aoki, S. Kee, D. Rutledge, A. Hajimiri, Distributed active transformer-a new power combining and impedance-transformation technique. Microw. Theory Tech., IEEE Trans. 50(1), 316–331 (2002) 26. T. Cheung, J. Long, Y. Tretiakov, D. Harame, A 21-27 GHz selfshielded 4-way powercombining pa balun, in Custom Integrated Circuits Conference, 2004. Proceedings of the IEEE, 2004 27. J.-W. Lai, A. Valdes-Garcia, A 1v 17.9dbm 60 GHz power amplifier in standard 65 nm CMOS, in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010. IEEE International, 2010 28. P. Haldi, D. Chowdhury, P. Reynaert, G. Liu, A. Niknejad, A 5.8 GHz 1v linear power amplifier using a novel on-chip transformer power combiner in standard 90 nm CMOS. Solid-State Circ. IEEE J. 43(5), 1054–1063 (2008) 29. U.R. Pfeiffer, D. Goren, A 23-dbm 60-GHz distributed active transformer in a silicon process technology. Microw. Theory Tech., IEEE Trans. 55(5), 857–865 (2007) 30. Y. Pei, A 60 GHz, 12.5 GHz 1db bandwidth fully integrated power amplifier using a distributed ring transformer in CMOS 65 nm. Master thesis, Eindhoven University of Technology, Aug 2010 31. O, Richard et al., A 17.5-to-20.94 GHz and 35-to-41.88 GHz PLL in 65 nm CMOS for wireless HD applications. IEEE International Solid-State Circuits Conference, Feb. 2010, pp. 252–253 32. S. Pellerano, R. Mukhopadhyay, A. Ravi, J. Laskar, Y. Palaskas, A 39.1-to-41.6 GHz † fractional-N frequency synthesizer in 90 nm CMOS. IEEE International Solid-State Circuits Conference, Feb. 2008, pp. 484–485 33.C. Changhua, D. Yanping, K.O. Kenneth, A 50 GHz phase-locked loop in 0.13-mm CMOS. IEEE J. Solid-State Circ. 42(8), 1649–1656 (2007) 34. C. Lee, S.I. Liu, A 58-to-60.4 GHz frequency synthesizer in 90 nm CMOS. IEEE International Solid-State Circuits Conference, Feb 2007, pp. 196–197

11 Ultra High Data Rate CMOS Front Ends

235

35. A. Siligaris, Y. Hamada, C. Mounet, C. Raynaud, B. Martineau, N. Deparis, N. Rolland, M. Fukaishi, P. Vincent, A 60 GHz power amplifier with 14.5dBm saturation power and 25Circuits. IEEE J. Solid-State Circ. 45(7), 1286–1294 (2010) 36. W. Chan, J. Long, M. Spirito, J. Pekarik, A 60 GHz-band 1v 11.5dbm power amplifier with 11 conference – digest of technical papers, 2009. ISSCC 2009. IEEE International, 2009, pp. 380–381 37. B. Martineau, V. Knopik, A. Siligaris, F. Gianesello, D. Belot, A 53-to-68 GHz 18dbm power amplifier with an 8-way combiner in standard 65 nm CMOS, in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010. IEEE International, 2010, pp. 428–429 38. E. Cohen, S. Ravid, D. Ritter, 60 GHz 45nm pa for linear ofdm signal with predistortion correction achieving 6.1. Radio Frequency Integrated Circuits Symposium, 2009. RFIC 2009. IEEE, 2009 39. K. Raczkowski, S. Thijs, W. De Raedt, B. Nauwelaers, and P. Wambacq, 50-to-67 GHz ESDprotected power amplifiers in digital 45 nm lp CMOS, in Solid-State Circuits Conference – digest of Technical Papers, 2009. ISSCC 2009. IEEE International, 2009, pp. 382–383, 383a 40. T. Kjellberg, M. Abbasi, M. Ferndahl, A. de Graauw, E. van der Heijden, H. Zirath, A compact cascode power amplifier in 45-nm CMOS for 60 GHz wireless systems, in Compound Semiconductor Integrated Circuit Symposium, 2009. CISC 2009. Annual IEEE, 2009, pp. 1–4 41. M. Abbasi, T. Kjellberg, A. de Graauw, E. van der Heijden, R. Roovers, H. Zirath, A broadband differential cascode power amplifier in 45 nm CMOS for high-speed 60 GHz system-onchip, in Radio Frequency Integrated Circuits Symposium (RFIC), 2010 IEEE, May 2010, pp. 533–536 42. H.-H. Hsieh, Y.-C. Hsu, L.-H. Lu, A 15/30 GHz dual-band multiphase voltage-controlled oscillator in 0.18-m CMOS. IEEE Trans. Microw. Theory Tech. 55(3), 474–483 (2007) 43. Y.-N. Jen, J.-H. Tsai, T.-W. Huang, H. Wang, Design and analysis of a 55-71 GHz compact and broadband distributed active transformer power amplifier in 90-nm CMOS process. Microw. Theory Tech., IEEE Trans. 57(7), 1637–1646 (2009)

Chapter 12

Extremely Wideband CMOS Circuits For Future THz Applications Lorenzo Tripodi, Marion K. Matters-Kammerer, Dave van Goor, Xin Hu, and Anders Rydberg

Abstract Recent results in IC design have demonstrated the possibility to realize CMOS circuits working in the 100 GHz-1 THz band. In this chapter the design and measurements of a CMOS nonlinear transmission line and a CMOS Schottky diode sampling bridge are presented. Large-signal measurements of the nonlinear transmission lines from 6 to 168 GHz are shown. Time-domain measurements showing the possibility to sample ultrafast signals with fall time of 4.6 ps are described too. These two extremely wide band devices will be used as essential building blocks for the future implementation of a CMOS-based coherent THz spectrometer and imager.

1 Introduction Technologies and devices able to produce, detect and guide radiation in the band 100 GHz-1 THz have attracted for long time the attention of the scientific and industrial community [1]. Applications related to space exploration have been the major driver for the development of new devices for decennia, but in recent years, and especially since the invention of the terahertz time-domain spectroscopy technique [2], new commercial applications in multiple fields including biology and medicine have appeared [1, 3]. Today, femtosecond lasers are the core components of many state-of-the-art optical THz systems. Such devices have paved the way L. Tripodi () • D. van Goor Philips Research, High Tech Campus 37, 5656AE Eindhoven, The Netherlands e-mail: [email protected] M.K. Matters-Kammerer Electrical Engineering, TU Eindhoven, 5600MB Eindhoven, The Netherlands e-mail: [email protected] X. Hu • A. Rydberg Uppsala University, Uppsala, Sweden M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 12, © Springer ScienceCBusiness Media B.V. 2012

237

238

L. Tripodi et al.

towards modern THz applications and, in some cases, have been transformed into successful products [4]. An alternative approach to produce and detect THz radiation is based on solid state electronics. The compactness, the lower cost and the large commercial success of solid state devices is definitely an attractive solution to allow a more widespread use of THz radiation. Devices based on advanced material band gap engineering such as the Quantum Cascade Lasers (QCLs) [5, 6] have demonstrated to be powerful THz sources. To produce and detect THz radiation, another mechanism based on plasma waves in a transistor channel [7] is also currently being explored. With this technique room-temperature THz emission [8] and detection have been achieved [9]. Apart from these solid-state approaches based on new devices, recent publications also show an effort to exploit standard CMOS technology for THz radiation detection and generation. The use of an already available and standard commercial technology such as CMOS would allow implementing innovative miniaturized THz devices that will open novel commercial opportunities in the field of THz imaging and spectroscopy. In this work we show some recent advances in this direction. Design and measurements of two essential building blocks of a future THz imager and spectrometer are presented. Such a miniaturized device is planned to have an overall size small enough so that it can be also used for handheld applications. This chapter is organized as follows. Section 2 gives some background information on the architecture of the THz imager and spectrometer. A short summary of some current research carried out in the world on THz CMOS electronics is given in Sect. 3. The design and measurements of an extremely wide band THz signal generator are reported in Sect. 4. Section 5 focuses on the receiving part of the overall THz system. Conclusions are drawn in Sect. 6.

2 CMOS-Based THz Imager and Spectrometer Systems based on GaAs technology able to carry out THz imaging and spectroscopy have been demonstrated in the 1990s [10, 11]. Those pioneering experiments were using integrated nonlinear transmission lines and Schottky diode sampling bridges to generate and detect THz radiation. The THz electronics was usually integrated on GaAs technology while the low frequency part was implemented with external instruments such as signal generators and oscilloscopes. In this work a further step towards the transformation of those devices into actual commercial products is taken. CMOS technology is chosen instead of GaAs because it is more widespread and has lower cost. Other advantages of CMOS are related to the capability of having complex digital and analog functions integrated together, leading to a substantial reduction in overall system size and, very often, also cost. In Fig. 12.1 the architecture of an integrated THz CMOS imager and spectrometer is reported. In the transmitter a sinusoidal signal with a relatively low frequency (say 10 GHz) is generated by an oscillator. An amplifier increases the amplitude of the signal to a suitable level before injecting it into a nonlinear transmission line (NLTL). The NLTL reshapes the signal generating numerous harmonics from

12 Extremely Wideband CMOS Circuits For Future THz Applications

239

Tx Antenna

Transmitter Oscillator

NLTL

Amplifier

NLTL Balun / differentiator Oscillator

Amplifier

Sample

Rx Antenna

Sampling Bridge Receiver

Substrate

IF

Fig. 12.1 Schematic of the electronic THz spectrometer

10 GHz to a maximum frequency that can be above 200 GHz. A suitable antenna emits the signal towards a sample. The interaction with the sample changes the shape of the signal and its harmonic content. At the receiver side a signal similar to the one generated by the NLTL in the transmitter is used to down-convert the signal reflected by the sample. The down conversion is carried out in the diode sampling bridge and the small frequency difference between the fundamentals used in the receiver and the transmitter determines, together with the total number of harmonics, the maximum IF frequency generated. The IF frequency can be exploited in time and frequency domain to extract important features of the sample under test. Variations of the general architecture shown in Fig. 12.1 are possible. Transmitter and receiver could be placed on two separate substrates/packages to allow transmission-mode analysis instead of reflection mode. The electronic components can be implemented in CMOS or GaAs and a hybrid approach (instead of a fully integrated one as it is shown in Fig. 12.1) where some external components are implemented in non-IC technology is possible. In the following sections the focus will be on the design and characterization of the NLTL and the sampling bridge blocks.

3 Above-fT Electronics Driven by the needs of modern applications for faster communication links, there is a continuous trend towards circuits working at higher frequencies. Circuits based on III–V technologies working above 100 GHz are not new but, recently, also circuits

240

L. Tripodi et al.

based on CMOS working in the same frequency range have been demonstrated. The challenge in these CMOS designs is mainly related to the transition frequency fT or the maximum oscillation frequency fmax which are usually lower than in III–V devices and constitute a limit to the maximum operating frequency of a transistor. In recent research results, though, new techniques have been explored to design circuits working far above the transition frequency of the technology. As an example, a signal generator working at 324 GHz, i.e. at a frequency about the double of the fmax of the CMOS node used (90 nm), has been demonstrated [12]. In [12], four signals at 81 GHz are generated using coupled voltage controlled oscillators (VCOs). Because the signals have a frequency that is half of the fmax of the technology used, commencement of oscillation is not problematic. The signals are combined using a linear superposition technique. Because the combination doesn’t necessitate transistor gain, going above fmax is possible. With a different approach, the possibility to generate and transmit via an on-chip antenna a signal at 410 GHz using 45 nm CMOS has been shown [13]. In this case a push-push oscillator where two fundamental signals are combined to cancel the fundamental and odd harmonics and enhance the second harmonic is used. As for the reception side, a 600 GHz CMOS receiver has been demonstrated [14]. In this case the principle of self-mixing is exploited. This allows direct (or incoherent) detection at 600 GHz with a CMOS technology node of transition frequency of only 35 GHz. In all the three examples given above, the trick adopted to beat the technological limitations is to avoid the use of transistor gain directly at high frequency. Another possibility to extend the maximum frequency of CMOS circuits is to use traditional frequency multiplication techniques and adapt them to the CMOS technology. Therefore models of CMOS components are required that describe the behavior far beyond the fT of the technology. With these models optimized component layouts for this high frequency range can be developed. An example of such a device in CMOS technology is the Schottky diode, which is typically not part of the foundry libraries. Recent work [15] on Schottky diodes in 65 nm CMOS revealed the importance of the parasitic stray capacitance in the estimation of the cut-off frequency of the diodes. Figure 12.2 shows the cross-section of the studied Schottky diodes [15]. The Schottky contact consists of Ni-silicide on n-type-silicon. This is a non-standard contact in the chosen 65 nm CMOS process that was specifically requested to realize the circuits of this investigation. The cathode contact is made of highly n-doped silicon. Arrays of vias from the first metallization level to the Schottky contact and to the nC silicon are placed to reduce the series resistance of the diode. As already mentioned in [16], the layout of the Schottky contact strongly influences the high frequency behavior. In [15], various layouts of identical junction area but different anode layout have been compared and the equivalent circuit parameters have been extracted. Figure 12.3 shows two of the layouts. While in Fig. 12.3a the anode area consists of one single junction, this junction is split into 64 minimum sized junctions in the layout of Fig. 12.3b. By doing this the path length between the anode and the cathode is minimized and the series resistance RS1 is reduced. At the same time the stray capacitance is increasing significantly.

12 Extremely Wideband CMOS Circuits For Future THz Applications

241

1st level interconnect metal

Vias SiO2

SiO2

Silicide

ILD

SiO2

SiO2 N+ doping

N-well

Silicon substrate

Fig. 12.2 Cross-section of a Schottky diode in CMOS

Fig. 12.3 Layout of the Schottky diodes processed in 65 nm CMOS. (a) The anode consists of one single junction; (b) the anode consists of 64 minimum sized junctions

In reverse bias, the Schottky diode’s small signal equivalent circuit, as proposed in [15] is shown in Fig. 12.4. Next to the bias-dependent junction capacitance Cjunction , the stray capacitance Cstray is included. The stray capacitance is mainly the capacitive coupling between the metal contacts of the cathode and the anode through the back-end layers of the CMOS process. The resistance RS1 mainly represents the resistance of the path through the n-well. The resistance RS2 mainly represents the resistance of the anode metal structure. The equivalent circuit parameters for the two layouts of Fig. 12.3 are summarized in Table 12.1.

242

L. Tripodi et al.

Fig. 12.4 Equivalent circuit model for the forward (left) and reverse (right) bias condition

Table 12.1 Extracted equivalent circuit parameters of Schottky diodes in 65 nm CMOS as published in [15] Diode (b) Diode (a)

Cjunction;0V (fF) 6.5 6.6

Cstray (fF) 7.8 1

RS1 () 57 104

RS2 () 8 –

fC;junction;0V (GHz) 430 231

In conclusion, the cut-off frequency of the Schottky diode with the structured anode is nearly twice as high but the stray capacitance of these diodes is nearly 8 times higher than that of a single-junction diode of the same junction area. The speed of the Schottky junction is expressed with its cut-off frequency, which is the RC-time constant of the junction-capacitance and the junction resistance. The effect of the stray-capacitance has to be carefully de-embedded to prevent overestimation of the junction speed. fC;junction;0V D

1 A 2RS1 Cjunction;0V

(12.1)

Diode models that only use a simple R-C-series equivalent circuit, as proposed in [16] are no longer suitable for highly integrated CMOS processes. The cut-off frequency of these models overestimates the junction speed significantly: fseries D fC;junction;0V

Cstray C Cjunction Cjunction

(12.2)

In the case of diode (b) this leads to an overestimation of the diode speed of nearly a factor two.

4 Extremely Wide-Band Frequency Generation A known way to produce THz radiation is frequency multiplication by means of nonlinear transmission lines (NLTLs) [17]. This approach has been extensively studied in the past, [18–21] and NLTLs have been proved to be crucial building

12 Extremely Wideband CMOS Circuits For Future THz Applications Linear Tx Line

Linear Tx Line

243

Linear Tx Line

d Cd(V)

Cd(V)

Cd(V)

Fig. 12.5 General schematic of a nonlinear transmission line

blocks for complete all-electronic THz spectrometers [10, 11]. Compared to the other optoelectronic technologies commonly used nowadays to produce THz radiation, NLTLs have the advantage of being well known devices that can be manufactured as a standard integrated circuit (IC). NLTLs studied at the end of the 1980s and in the 1990s were built using Gallium Arsenide (GaAs) IC technologies with optimized diodes to improve performance. More recently though, a standard CMOS technology has been adopted to implement NLTLs [20, 22] with encouraging preliminary results but with measured data only below 50 GHz. The main challenges in CMOS compared to GaAs are determined by the lossy substrate and the intrinsic lower speed and power handling capabilities. The losses will affect signal propagation and, as it will be evident, limit the bandwidth of the signal that can be generated. Lower breakdown voltage and device speed will also complicate the design because the power injected in the line must be limited to avoid damage to the devices. If, in spite of those challenges, the generated power is sufficient at higher frequencies, integration of CMOS NLTLs with other digital or analog electronics would open new application fields and possibly pave the way for the first time to a very compact and low-cost THz spectrometer/imager. Nonlinear transmission lines are linear transmission lines periodically loaded with voltage-dependent capacitances (Fig. 12.5). A signal propagating along the line is reshaped by the nonlinearity of the capacitances [17, 21]. With the variable capacitances used in this work, the falling edge of an input sinusoid is progressively sharpened and harmonics are generated up to a frequency that ultimately depends on the cut-off frequency of the devices and the losses of the linear transmission line. The signal generated in this way has an extremely wide band spectrum that can be used to carry out THz spectroscopy assuming that sufficient output power can be generated [23]. In the majority of the work carried out in the past using GaAs technology, the linear part of the line consisted of a coplanar waveguide (CPW) and the variable capacitances consisted of reverse-biased Schottky diodes (see for instance [19, 21]). Standard CPWs are not suitable for CMOS implementation due to the large losses that the low-resistivity silicon in the chip substrate would introduce. As for the diodes, the Schottky variant of them is not part of the standard CMOS library used in this work. A slightly different approach has therefore been adopted, similar to what is suggested in [22, 24]. A modified CPW (m-CPW) has been used that complies with the CMOS design rules and is shielded both laterally and from the substrate below as shown schematically in Fig. 12.6.

244

L. Tripodi et al.

wg

ws

wsg

Metal, GND

S

GND

Metal, GND

Metal, GND Dielectric

Metal, GND

Metal, GND Metal, GND

Si Substrate

Fig. 12.6 Approximate cross section of the modified CPW used in this work Ll

Ll

Ll

Cl

Cd Rd

Cl

Cd Rd

Cl

Cd Rd

Fig. 12.7 Approximation of the NLTL using a non-linear ladder network

In this way the signal is confined in the back-end which is composed of low-loss tangent dielectric layers. Also, interference with nearby circuitry is minimized. CMOS RF varactors type nC poly/n-well are used instead of Schottky diodes as nonlinear elements. The metal layers in close proximity to the varactors have been modified slightly with respect to the standard library devices to reduce their parasitic capacitance. The voltage-dependent capacitance of the varactors is denoted by Cd (V). If, as in this case, the length of a section d of the m-CPW used is short enough compared to the wavelength of the fastest propagating signal, the linear line sections can be modeled with simple L-C circuits (the m-CPW losses are for the moment neglected for simplicity). The inductance in a single section is Ll and the capacitance is Cl . The approximated structure, much easier to treat mathematically compared to the original one (Fig. 12.5), is shown in Fig. 12.7. The number of sections present is N. The parasitic series resistance Rd associated with the varactors is also shown in the figure and taken into account. The varactor’s stray capacitance is ignored in this simplified model, but will be included in the final simulation (see Fig. 12.8) The Bragg frequency (fB ), the unloaded characteristic impedance (Zl ), the loaded characteristic impedance (ZLS ), the large signal capacitance (CLS ) and the diode cut-off frequency (fc ) are defined as in [21]. In the work described here some parameters are fixed by external conditions. ZLS was chosen to be 50 to simplify measurements with standard equipment. A large Zl would help reduce the fall time in the sharpened sinusoid, but the maximum Zl achievable is limited to about 87 by the design rules of CMOS technology for the

12 Extremely Wideband CMOS Circuits For Future THz Applications Tx Line circuit model

Rg

245

Tx Line circuit model

Rs Cp Rs Vg

Cp

Cs

Cd(V) Cpp

Cpp Rd

VOUT

Cd(V) Cs

RL

Rd

Fig. 12.8 Final test-bench used to simulate the NLTLs

Table 12.2 NLTLs geometrical parameters and simulation results

Ll [pH] Cl [fF] fB [GHz] fc [GHz] CLS [fF] (Vl D 2 V, Vh D 2 V) Rd [] Cd (Vh )–Cd (Vl ) [fF] tfmin [ps] N d [m] Total line length L [mm]

22:8 3 699 507 6:1 51:4 6:14 2:7 190 37:4 7:1

chosen m-CPW. The maximum possible Zl is therefore chosen in the design. A clear design methodology for NLTLs is described in [25]. The basic underlying NLTLs equations at the base of the design methodology are reported in [21]. In this work, an algorithm based on the same equations has been used to carry out the design. Starting from the fixed parameters mentioned above the algorithm calculates which is the best size of the varactors to achieve a simulated minimum fall time of the output signal. The schematic shown in Fig. 12.7 is used to carry out time-domain simulations. The input signal injected into the NLTL is a step with a 10–90% fall time of 16 ps. The size of the varactors is changed at each simulation and the algorithm stops when the minimum fall time has been achieved. After this phase all the design parameters of the wanted NLTL are known. The losses of the varactors are included in the simulations, but the losses of the m-CPW are not taken into account at this stage so the minimum fall time determined is an underestimation. A more accurate value will be obtained below. A summary of the parameters of the implemented line and some simulation results are shown in Table 12.2. The minimum fall time achieved by the optimization algorithm is noted as tfmin . The line has ZLS D 50 and Zl D 87 . The m-CPW line was realized using the top aluminum metal layer to achieve higher inductance per meter and reduce the overall line length.

246

L. Tripodi et al.

Fig. 12.9 Chip microphotograph of the NLTLs

The length d of the m-CPW line sections is chosen as: dD

Ll Cl D ; Lm Cm

(12.3)

where the inductance and capacitance per length (Lm and Cm ) of the modified CPW are determined from the simulated S-parameters obtained using a commercial electromagnetic simulator. The section length d is 37.3 m. The total length of the line is 7.1 mm with N D 190 sections. To calculate Lm and Cm from the S-parameters of the m-CPW the method discussed in [26] is used. In order to completely simulate the NLTLs in time domain and obtain a more accurate fall time, the losses of the linear m-CPWs neglected up to now must be also included in the circuit schematics. From the simulated S-parameters of the m-CPWs a wideband circuit model is generated using a commercial electromagnetic simulator. The advantage of a lumped equivalent circuit model is that it can be directly used for non-linear simulations in a circuit simulator. A test bench with the schematic shown in Fig. 12.8 has been used to predict the behavior of the lines under several bias conditions. Rg and RL are the 50 source and load impedance. Cpp is the parasitic stray capacitance of the varactors. Cp , Cs , and Rs are used to model the 40 m 40 m wafer probe pads. The measured parasitic capacitance of the 40 m 40 m pads is around 28 fF. A photograph of part of the fabricated NLTLs in 65 nm CMOS technology is shown in Fig. 12.9. Small signal, S-parameter measurements were made up to 67 GHz using an Agilent network analyzer type PNA E8361A. Large signal power measurements were carried out in bands. From DC to 50 GHz an Agilent input signal generator type PSG E8257D and a 50 GHz Agilent spectrum analyzer type PSA E4448A were used. Adding to the setup an Agilent mixer type 11970 V it was possible to measure in the band from 50 to 67 GHz. From the raw measurement of the available power at the generator and measured power at the spectrum analyzer the actual power at the input and the output of the line under test can be calculated if the losses of the system are known.

12 Extremely Wideband CMOS Circuits For Future THz Applications

247

Simulation 0 V bias

10 Output power [dBm]

Measurement 0 V bias 0 –10 –20 –30

162

150

138

126

114

102

90

78

66

54

42

30

18

–50

6

–40

Frequency [GHz]

Fig. 12.10 Measured vs. simulated output power of the NLTL. Input frequency and input power are 6 GHz and 18 dBm. The lines are added as a guide for the eye. The power is generated only at the harmonics of 6 GHz indicated by the points

In the band from 78 to 168 GHz, the output spectrum was measured using an Agilent spectrum analyzer type E4440A with an Agilent harmonic mixer type 11970 W (78–108 GHz), and a Pacific Millimeter Products (PMP) harmonic mixer DM connected with a diplexer type MD1A (114–168 GHz). The conversion loss of the PMP harmonic mixer was calibrated using a WR-6 band signal source, a WR-6 band attenuator, an Agilent spectrum analyzer type PSA E4440A, and a submillimeter power meter Erickson type PM2. The insertion losses of the RF input cable and input Cascade Microtech Infinity probe (I67-75-S) were compensated. An Anritsu RF/Microwave signal generator type MG3694A has been used. In all the power measurements the power actually entering the lines was 18 dBm at 6 GHz. Figure 12.10 shows the combined large signal measurements compared with simulations. In simulations, the input signal chosen is a sinusoid with 2.5 V peak voltage at the input of the line. Due to matching, the signal before the source impedance is 5 V peak. This corresponds to an input power of 18 dBm on the 50 -matched line. In measurements the following procedure to de-embed cable losses has been used. In the 6–66 GHz band an on-chip thru has been used to determine the losses of the cables, the connectors and the probes. Because the length of the thru on chip is only 200 m, a small length compared to the millimeters-long NLTLs, and because everything is well matched at 50 it is possible to determine with good precision the losses that the signal experiences before the input of the NLTLs and after the output. The measured losses of one cable plus probe are around 2 dB at 6 GHz and 9 dB at 66 GHz. The conversion losses of the mixer were automatically corrected by the spectrum analyzer using data from the manufacturer. In the 78–168 GHz range, the insertion loss of the output probe and the conversion loss of the harmonic mixers (shown in Table 12.3) have been compensated in the measured

248

L. Tripodi et al.

Table 12.3 Conversion loss of the harmonic mixers: agilent 11,970 W (78–108 GHz), and PMP harmonic mixer DM (114–168 GHz) connected with a diplexer MD1A Frequency (GHz) Conversion loss (dB) Frequency (GHz) Conversion loss (dB)

78 38:1 126 56:2

84 38:4 132 54:5

90 39:4 138 50:7

96 40:1 144 53:9

102 40:6 150 50:9

108 41:4 156 51:9

114 54:9 162 59:3

120 49:3 168 59:5

–25

Output power [dBm]

–30 –35 –40 –45 Measurement –0.5 V bias Measurement 0V bias

–50

168

162

156

150

144

138

132

126

120

–55

114

Measurement +0.5 V bias

Frequency [GHz]

Fig. 12.11 Measured output power for different biasing condition of the NLTL. Input frequency and input power are 6 GHz and 18 dBm

output power. As it is clear from Fig. 12.10, simulations (with an estimated stray capacitance equal to 2 fF) and measurements match very well up to 156 GHz. At higher frequencies the match is less good. In Fig. 12.11 it is shown that, as expected from simulation, by tuning at the end of the line the bias voltage of the varactors it is possible to optimize the performance (i.e. the output power at high frequency) of the NLTLs. This is due to the fact that, by changing the bias, the nonlinearity of the varactors is better exploited by the signal swing. The best result is obtained for a bias of 0.5 V, with an improvement of more than 11 dB obtained at 168 GHz with 0.5 V bias compared to the case with C0.5 V bias. Compared to the unbiased line, the improvement in the 0.5 V-biased line is of 3.5 dB at 168 GHz. It is important to stress that the adverse effect of the output pads has not been de-embedded for the power measurements.

5 High Speed Samplers High speed samplers have been realized in GaAs in the past [27–29] and sampling frequencies up to nearly 1 THz have been achieved. The results presented in this paper show that it is possible to realize this type of sampler also in commercial CMOS technology.

12 Extremely Wideband CMOS Circuits For Future THz Applications

249

The interest of realizing such a circuit in CMOS is based on the fact that the high volume production capabilities of CMOS are an important step towards commercially viable THz applications. In order to realize such a circuit, a number of limitations of CMOS technology have to be overcome: • Schottky diode layouts are not available in most CMOS foundry libraries and need to be developed first. • The speed of the Schottky diodes in CMOS is lower than in GaAs due to the higher losses in the semi-conducting silicon substrate. • The layout freedom is strongly limited by the strict CMOS design rules and the attenuation is typically higher than in equivalent layouts in GaAs. The function of the high-speed sampler described here is to reproduce a high speed periodic signal at a reduced speed. This allows measurements with an oscilloscope. An important assumption for the working principle of this type of sampler is that the signal is periodic. Because of the periodicity of the signal, it is possible to sample one single narrow section of the signal every signal period and to combine these sections to form the representation of the high-speed signal at lower speed.

5.1 Sampler Schematic Figure 12.12 shows the schematic of the sampler. The circuit consists of four parts: 1. The strobe generator: A high speed voltage step from the nonlinear transmission line (NLTL) is split at the T-junction. The two signals travel on the balanced transmission line and are simultaneously reflected at the short circuits on either side at a distance d. The sum of the incident and the two reflected voltage signals at the location of the diode bridge switches the diodes on and then off again after a short period of time T. The time interval T depends on the amplitude and falltime of the strobe pulse, on the chosen diodes, their DC bias and on the length of the balanced transmission line sections. 2. The signal input: The signal Vsignal that needs to be sampled is fed to the central point of the diode bridge and is charging the two hold-capacitors during the time that the diodes are forward biased. In the case of a spectrometer as shown in Fig. 12.1, the input signal is composed of frequency components nfrf (n D 1,2,3, : : : ) and the sampling pulse repetition frequency is chosen as frf C f, with f / frf 1. Due to this small frequency shift between the strobe pulses and the signal, the sampling window slowly shifts over the envelope of the signal in time. 3. The diode bridge: The diode bridge consists of high speed diodes D1 and D2 and two hold capacitors Chold . When the diodes are forward biased, the signal charges the hold capacitor. The charge accumulated on the capacitor corresponds to the average signal during this short time-interval.

250

L. Tripodi et al. d

Rbias

d

Chold

RIF

DC1 D1

50 Ω

Vsignal DC2

Rbias

D2

IFout

RIF

Chold

NLTL

50 Ω Oscillator f0

Fig. 12.12 Schematic of the sampler, including the four different sub-functions: strobe-generator, sampling bridge, signal input and DC and IF connections

4. The DC and IF circuitry: The biasing of the diodes D1 and D2 with the bias voltages DC1 and DC2 has on the one hand to be chosen such that the output amplitude of the NLTL is sufficiently large to drive the diodes into forward bias. On the other hand it has to be chosen sufficiently low, such that the time-interval of forward biasing is small enough to sample with sufficient accuracy. The output signal of the sampler contains a signal S(t) at the IF frequencies nf.

5.2 Realization in 65 nm CMOS Technology Figure 12.13 shows a photo of the sampler realized in 65 nm CMOS technology. From the lower side, the strobe-NLTL is connected to the sampler. The horizontal lines are the balanced transmission lines that are terminated in a short circuit to generate the reflected signal. This reflected signal is superimposed to the incident NLTL signal and the signals sum up to the voltage spikes that forward biases the diodes of the sampling bridge. The hold capacitors are visible to either side of the balanced transmission line. The diodes are embedded in the silicon substrate and are not visible. The diodes chosen for this circuit are described in Sect. 3 and are of type Diode (b). The signal input is to the left and forms a 50 transmission line together with the balanced transmission line of the strobe-generator. As indicated in the photograph, a

12 Extremely Wideband CMOS Circuits For Future THz Applications

Ground

DC 1

IF out

251

DC 2

RF-signal input

10 dB Short attenuator circuit

Diode bridge and hold capacitors

Strobe-input

Short circuit

Fig. 12.13 Photo of the sampler integrated in 65 nm CMOS technology

10 dB attenuator is inserted between the output of the signal NLTL and the sampler. This reduces the amplitude of the input signal, so that it does not influence the bias conditions of the diodes. If this sampler is used in a spectroscopy application, it will typically receive the signal with an antenna. The cross-section of the transmission line and diode bridge layout is shown in more detail in Fig. 12.14. The transmission line layout consists of three metal lines where different modes are used. The upper two metal lines of width w and distance d are used for the slot-line mode required for strobe generation. They are processed in a thick Cu metal layer of the CMOS backend and have a distance d of only 3 m, a width of only 2 m and a metal thickness of 3.4 m. In the slotline mode, the field is confined mainly in-between the lines. For the transmission of the RF-signal to the diode bridge, the two upper metal lines act as RF ground for the signal line on the intermediate metal layer to form a modified coplanar waveguide. The modified coplanar waveguide mode has 50 impedance; the slotline mode has 100 impedance. The layout of the diode bridge has to be very compact in order to minimize the parasitic impedances as much as possible (Fig. 12.14b). Vertical viainterconnects of a few micrometer length have been chosen between the slotlines and the upper capacitor plates. The interconnect length between the signal line and the central point between the two diodes is also only a few micrometers.

5.3 Time-Domain Measurements At the input of the strobe NLTL an 18 dBm signal at 20 GHz is injected. At the input of the signal NLTL an 18 dBm signal at 20 GHz C f is injected, with f equal to either 1 MHz or 10 kHz. The NLTLs are designed such that they sharpen the falling edge of this sinusoidal signal. The IF output is connected to an oscilloscope and the 90% to 10% fall time is measured. The measured fall-time is then converted back to the original fall-time by multiplying by f /20 GHz.

252

L. Tripodi et al.

Fig. 12.14 Cross-section of the sampler layout: (a) Combined slotline and signal line layout, (b) Integration of the diode bridge Fig. 12.15 Measured voltage envelope of the IF signal. The time-scale is converted to the original speed of the signal

An example of such a measurement is shown in Fig. 12.15. It is visible, that the falling edge of the signal is sharper than the rising edge. This shows the pulsesharpening property of the NLTL described in Sect. 4. The detailed analysis of the fall-time as a function of the diode bias voltage is shown in Fig. 12.16 for the IF frequencies of 10 kHz and 1 MHz. As a number of resistors are placed in series and parallel to the diode, the actual diode voltage is only a fraction of the voltage applied to the DC pads of the circuit. As expected, the choice of the IF frequency has only a minor influence on the

12 Extremely Wideband CMOS Circuits For Future THz Applications

253

Fig. 12.16 Measurement of the fall time as a function of the DC-biasing for two IF frequencies (IF1 D 10 kHz and IF2 D 1 MHz)

measured fall-times. The measured fall-time is reduced for lower bias voltage at the DC-ports. This optimization of the sampler is already mentioned in [27]. When the diodes are biased close to the threshhold voltage, the strobe signal forward-biases the diodes for a longer time-interval than in the case that the diodes are biased at a larger voltage step from the thresh-hold voltage. As the time-interval of forward biasing determines the time-resolution of the sampler, lower bias-voltage enables the measurement of smaller fall-times. At a DC voltage of ˙6 V, the strobe signal no longer forward biases the diodes. This also means that the minimum measured fall time of the signal of 4.6 ps is an upper limit for the real fall-time of the NLTL signal.

6 Conclusions Generation of signals with a bandwidth exceeding 160 GHz using integrated NLTLs implemented in 65 nm CMOS technology has been demonstrated. Simulations matching well the measurements have been carried out up to 156 GHz. The effect of biasing on the NLTLs has been shown. Compared to the unbiased line, an increase of 3.5 dB can be achieved at 168 GHz biasing the end of the line with 0.5 V. An ultrafast CMOS Schottky-diode bridge able to sample signals with fall time of 4.6 ps has been presented too. The working principle of the sampler has been explained and time domain measurements have been reported. Acknowledgement The authors thank Dr. S. Cheng for help in assembling the measurement setup. This work is funded by the European Community’s Seventh Framework Programme under grant agreement no. FP7-224189 (ULTRA project, www.ultra-project.eu)

254

L. Tripodi et al.

References 1. P.H. Siegel, Terahertz technology. IEEE Trans. Microw. Theory Tech. 50, 910–928 (2002) 2. B.B. Hu, M.C. Nuss, Imaging with terahertz waves. Opt. Lett. 20(16), 1716–1718 (1995) 3. P.H. Siegel, Terahertz technology in biology and medicine. IEEE Trans. Microw. Theory Tech. 52, 2438–2447 (2004) 4. See for instance the following web sites: teraview.com, picometrix.com, zomega-terahertz.com 5. J. Faist, F. Capasso, D.L. Sivco, C. Sirtori, A.L. Hutchinson, A.Y. Cho, Quantum cascade laser: a new optical source in the mid-infrared. Science 264(5158), 553–556 (1994) 6. R. K¨ohler, A. Tredicucci, F. Beltram, H.E. Beere, E.H. Linfield, A.G. Davies, D.A. Ritchie, R.C. Iotti, F. Rossi, Terahertz semiconductor-heterostructure laser. Nature 417, 156–159 (2002) 7. M. Dyakonov, M. Shur, Shallow water analogy for a ballistic field effect transistor: new mechanism of plasma wave generation by dc current. Phys. Rev. Lett. 71(15), 2465–2468 (1993) 8. N. Dyakonova, Room-temperature terahertz emission from nanometer field-effect transistors. Appl. Phys. Lett. 88(14), 141906–1–141906–3 (2006) 9. W. Knap, F. Teppe, Y. Meziani, N. Dyakonova, J. Lusakowski, F. Boeuf, T. Skotnicki, D. Maude, S. Rumyantsev, M.S. Shur, Plasma wave detection of sub-terahertz and terahertz radiation by silicon field-effect transistors. Appl. Phys. Lett. 85(4), 675–677 (2004) 10. J.S. Bostak, All-electronic terahertz spectroscopy system with terahertz free-space pulses. J. Opt. Soc. Am. B 11(12), 2561–2565 (1994) 11. Y. Konishi, Picosecond electrical spectroscopy using monolithic GaAs circuits. Appl. Phys. Lett. 61(23), 2829–2831 (1992) 12. D. Huang, T.R. LaRocca, L. Samoska, A. Fung, M.-C.F. Chang, 324 GHz CMOS frequency generator using linear superposition technique, in IEEE International Solid-State Circuits Conference (ISSCC) Digital Technical Papers, Feb 2008, pp. 476–629 13. E. Seok et al., A 410 GHz CMOS push-push oscillator with an on-chip patch antenna, in IEEE International Solid-State Circuits Conference (ISSCC) Digital Technical Papers, Feb. 2008, pp. 472–629 14. U. Pfeiffer, E. Ojefors, A 600-GHz CMOS focal-plane array for terahertz imaging applications. 34th European Solid-State Circuits Conference (ESSCIRC), 2008 15. M.K. Matters-Kammerer, L. Tripodi, R. van Langenvelde, J. Cumana, R.H. Jansen, RF characterization of Schottky diodes in 65-nm CMOS. IEEE Trans. Electron Devices 57(5), 1063–1068 (2010) 16. S. Sankaran, K.O. Kenneth, Schottky barrier diodes for millimeter wave detection in a foundry CMOS process. IEEE Electron Device Lett. 26(7), 492–494 (2005) 17. R. Landauer, Parametric amplification along nonlinear transmission lines. J. Appl. Phys. 31(3), 479–484 (1960) 18. C.J. Madden, M.J.W. Rodwell, R.A. Marsland, D.M. Bloom, Y.C. Pao, Generation of 3.5-ps fall-time shock waves on a monolithic GaAs nonlinear transmission line. IEEE Electron Device Lett. 9(6), 303 (1988) 19. D.W. van der Weide, Delta-doped Schottky diode nonlinear transmission lines for 480-fs, 3.5 V transients. Appl. Phys. Lett. 65(7), 881–883 (1994) 20. E. Afshari, A. Hajimiri, Nonlinear transmission lines for pulse shaping in silicon. IEEE J. Solid-State Circ. 40(3), 744–752 (2005) 21. M.J.W. Rodwell, GaAs nonlinear transmission lines for picosecond pulse generation and millimeter-wave sampling. IEEE Trans. Microw. Theory Tech. 39(7), 1194–1204 (1991) 22. M. Li et al., Low-loss low-cost all-silicon CMOS NLTLs for pulse compression, in IEEE/MTTS International Microwave Symposium, 3–8 June 2007, pp. 449–452 23. D.W. van der Weide, J.S. Bostak, B.A. Auld, D.M. Bloom, All-electronic generation of 880 fs, 3.5 v shockwaves and their application to a 3 THz free-space signal generation system. Appl. Phys. Lett. 62(1), 22–24 (1993)

12 Extremely Wideband CMOS Circuits For Future THz Applications

255

24. M. Li et al., CMOS varactors in NLTL pulse-compression applications, in Proceedings of 37th European Microwave Conference, Munich, 2007, pp.1405–1408 25. A. Jrad et al., A simple and systematic method for the design of tapered nonlinear transmission lines, IEEE/MTT-S Digest, 1998, pp. 1627–1630 26. W.R. Eisenstadt, Y. Eo, S-parameter-based IC interconnect transmission line characterization. IEEE Trans. Comp. Hybrids Manuf. Tech. 15(4), 483–490 (1992) 27. R.A. Marsland, V. Valdivia, C.J. Madden, M.J.W. Rodwell, D.M. Bloom, 130 GHz GaAs monolithic integrated circuit sampling head. Appl. Phys. Lett. 55(6), 592–594 (1989) 28. R.Y. Yu, M. Case, M. Kamegawa, M. Sundaram, M.J.W. Rodwell, A.W. Gossard, 275 GHz 3-mask integrated GaAs sampling circuit. Electron. Lett. 26(13), 949–951 (1990) 29. S.T. Allen,U. Bhattacharya, M.J.W. Rodwell, 4 THz sidewall-etched varactros for sub-mmwave sampling circuits. GaAs IC Symposium, 2003, pp. 285–287

Part III

Power Management and DC-DC

Power management and integrated DC-DC converters are becoming an important research topic. For that, in this part an overview and some state-of-the-art achievements in this area are discussed. It starts with an overview and trends in integrated DC/DC converters. Both the working principle of inductive and capacitive techniques and their advantages are discussed. Some solution in using different topologies, such as gearbox and multiphase topologies are explained. A nice overview of integrated power converters in terms of maximum output power is presented. The second paper presents techniques for controlling the DC-DC converters, and especially the pulse-width modulators. The trends to implement those functions with more digital building blocks is discussed. Examples of implementations are presented. The third work focus more on the low power requirements and the high efficiencies required in medical implantable applications. High voltage and high efficiency with reduced number of components, single inductor and multi-output switching regulators are discussed. Especially the conductive losses versus switching frequency are becoming important even for low power applications. The forth paper is related to the control architecture: by using feed-forward techniques, big improvements related to load-dumps can be achieved. This is done by using both voltage and current information, and as a result reduced ripple can be achieved. The fifth paper describes the need for special optimized Power MOSFET transistors via mixed-mode simulations to determine their internal losses and reducing ringing effects. As a result high efficient high power converters could be achieved. The last work deals with control techniques for fully integrated DC-DC converters in CMOS technologies. Both architectures for inductive and capacitive DC-DC’s are discussed. Several techniques, such as PWM versus PFM, constant on/off timing versus semi- constant on/off timing techniques are compared. The advantage of integrated topologies and the possibilities towards multi-phase architectures allow fast control loops which can handle very fast load dumps. Michiel Steyaert

Chapter 13

State-of-the-Art of Integrated Switching Power Converters Gerard Villar Piqu´e and Henk Jan Bergveld

Abstract This paper discusses the state-of-the-art of integrated switched-capacitor and inductive power converters. After introducing applications that drive the need for integrated switching power converters, implementation issues to be addressed for integrated switched-capacitor and inductive converters are given, as well as design examples. At the end of the paper, various integrated power converters are compared in terms of the main specifications.

1 Introduction In many applications there is an increasing demand for power converters that convert one voltage into another and that can be accommodated in a small volume. Examples are portable devices, which become smaller while hosting an increasing number of features and associated voltage conversions, and energy scavengers, where efficient and small-volume voltage conversion is a key feature. In general, two approaches exist for converting one voltage into another [1]. The first approach involves a continuous-time circuit with a dissipative element and is generally referred to as a linear regulator. Only voltage down conversion is possible in this case. The second approach involves a switched-mode circuit with one or more energy-storage elements and enables both voltage up and down conversion as well as inversion. The type of energy-storage element can either be a capacitor, yielding a switchedcapacitor converter, or an inductor, leading to an inductive converter. The use of linear regulators to accommodate voltage conversion is not preferred in many applications due to the maximum efficiency of Vo /Vin , where input

G. Villar Piqu´e () • H.J. Bergveld NXP Semiconductors, Central Research and Development, High Tech Campus 32, 5656AE, Eindhoven, The Netherlands e-mail: [email protected] M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 13, © Springer ScienceCBusiness Media B.V. 2012

259

260

G. Villar Piqu´e and H.J. Bergveld

voltage Vin is always larger than output voltage Vo . Particularly in cases where the difference between Vin and Vo is large this would lead to too much dissipation in a small volume and associated high temperatures. Switching power converters offer the potential to implement the desired voltage conversion steps at a higher efficiency. However, reactive components, i.e. inductors and/or capacitors, are needed to implement these converters. Since available volumes are generally small, this has led to a trend towards small-form-factor switching power converters with integrated reactive components, i.e. no external capacitors or inductors, which are the main focus of this paper. The trend towards lower power consumption of ICs, particularly digital ICs, also increases the need for integrated switching power converters. In order to reduce the active and idle power consumption of a digital circuit, voltage-scaling and body-bias techniques have been developed [2, 3]. In the case of voltage scaling, the digital circuit is divided into independent voltage islands with different voltage supplies to minimize their power consumption while keeping the desired performance. By applying body-bias techniques, the circuit characteristics can be modified. When applying a reverse body bias, the gate leakage of transistors can be decreased by increasing the threshold voltage. This reduces the leakage of the digital circuit in idle mode. When applying a forward body bias, the speed and thereby performance of the digital circuit can be enhanced in active mode due to the lower threshold voltage of the transistors. Voltage scaling implies integrated down converters, either switched-capacitor or inductive, with output powers in the range of 100–400 mW, or in the mW-range for sub-threshold processors [4]. Body biasing requires up or down conversion, or voltage inversion. Due to the lower output power of bodybias voltage generators, switched-capacitor converters are favored due to the better integration possibilities compared to inductive converters. For improved power-delivery quality and space reduction, integrating the power converter with the load is desired. Especially in the case of voltage scaling and bodybias voltage generation, which are mostly applied in state-of-the-art nm-CMOS IC processes, this implies that the integrated switching power converter should be realized in nm-CMOS as well. Since nm-CMOS processes combine high cost per area with low energy-storage density, this complicates the integration of the necessary reactive components. Two main methodologies can be distinguished. First of all, a different and dedicated technology can be used to integrate the reactive components. This approach is generally referred to as a System-in-Package (SiP) approach. Secondly, the reactive components can be monolithically integrated on the same die as the active electronics, optionally using post-processing steps to reduce the reactive-component silicon area. All integration approaches yield smaller capacitance and inductance densities than those of their non-integrated counterparts. Therefore, switching frequencies of integrated switching power converters are high. As a result, optimizing converter efficiency, trading off area taken up by the reactive components for switching frequency, becomes important. The efficiency of the integrated switching power converter should be higher than that of a linear regulator.

13 State-of-the-Art of Integrated Switching Power Converters

261

This paper gives an overview of state-of-the-art integrated switching power converters. Section 2 focuses on integrated switched-capacitor converters, while Sect. 3 discusses the main characteristics of integrated inductive converters. Section 4 compares the main specifications of switched-capacitor and inductive converters discussed in Sects. 2 and 3. Finally, conclusions are drawn in Sect. 5.

2 Integrated Switched-Capacitor Converters 2.1 Averaged Modeling of Switched-Capacitor Converters A common way to model a Switched-Capacitor Converter (SCC) regarding its inputto-output static transfer function is by means of an ideal DC transformer plus a series resistance connected to the output, as shown in Fig. 13.1a. While the voltage conversion ratio of the transformer M is determined by the topology itself (how are the capacitors connected along the different clock phases), the value of the average equivalent output resistance Rout depends not only on the topology but also on the switching frequency fs , the capacitance of the floating capacitors (i.e. the capacitors that change their connections along the clock phases) C, the on-resistance of the switches Ron and the duty cycle of the clock signal. Additionally, other second-order effects like the dead time of the switching signals and the parasitic resistances of capacitors and connections, might also affect the output resistance (and consequently Vo ) to a lesser degree. In 1995, Makowski and Maksimovi´c established all the possible positive conversion ratios of an SCC given its number of floating capacitors [5]. All these values can be obtained from the Eq. 13.1. M D

P Œk QŒk

(13.1)

b Rout

Asymptotic Real

SSL

a 1:M

Rout Vo +

+ Vin –

Io

Cout –

FSL RFSL

fc

fs

Fig. 13.1 (a) Averaged model of a switched-capacitor converter; (b) Rout as a function of the switching frequency fs

262

G. Villar Piqu´e and H.J. Bergveld

Table 13.1 Possible conversion ratios (M) vs. number of floating capacitors (N-1) # Floating capacitors 1 2 3

4

Possible conversion ratios 1 I 1I 2 2 1 1 2 3 I I I 1I I 2I 3 3 2 3 2 5 4 3 5 5 1 1 1 2 1 3 2 3 4 I I I I I I I I I 1I I I I I 2I I 3I 4I 5 5 4 3 5 2 5 3 4 5 4 3 2 3 2 1 1 1 1 1 2 1 3 2 3 1 4 3 5 2 5 3 4 5 6 7 I I I I I I I I I I I I I I I I I I I I I 1I 8 7 6 5 4 7 3 8 5 7 2 7 5 8 3 7 4 5 6 7 8 8 7 6 5 4 7 3 8 5 7 7 5 8 7 I I I I I I I I I I 2I I I I 3I I 4I 5I 6I 7I 8 7 6 5 4 3 5 2 5 3 4 3 2 3 2

where the positive integers P[k] and Q[k] represent the k-elements Fk of the Fibonacci series for 2 6 k 6 .N C 1/: N is the total number of capacitors including the output capacitor Cout . Table 13.1 shows the available conversion ratios for different numbers of floating capacitors. Regarding the Rout value, in [6] it was identified that it presents two different asymptotic behaviors as a function of the switching frequency (Fig. 13.1b), and it can be approached by the square root of the sum of the squared asymptotic values. However, it was in 2008 that Seeman and Sanders [7], presented a simple systematic way to analyze any two-phase SCC in order to model its Rout . This method is based on the Slow-Switching Limit (SSL) and the Fast-Switching Limit (FSL) operating zones of an SCC, coinciding with the asymptotic behaviors shown in [6]. According to this model the Rout value corresponds to: Rout D

q

2 2 RFSL C RSSL

(13.2)

The RFSL and RSSL values in Eq. 13.2 represent the asymptotic values as shown in Fig. 13.1b for the FSL and SSL limits, respectively. In the case of a 50% dutycycle clock signal, C being the sum of all the floating capacitances, and the same on-resistance Ron for all the switches, the RFSL and RSSL values are: RSSL D

m I fs C

RFSL D pRon

(13.3)

where the positive integers m and p are characteristics that change with different topologies, even when they provide the same conversion factor M. As an example, Fig. 13.2 shows two different topologies that provide M D 1/4 that have different output-impedance values. Unless additional constraints on the switching frequency and/or the size of the components preclude it, an optimized design will fall around the corner frequency fc at the meeting of the SSL and the FSL asymptotic limits (Fig. 13.1b), since higher fs

13 State-of-the-Art of Integrated Switching Power Converters

a

263

b Phase 1:

Phase 2:

Phase 1:

Vin + + +

C1 C2 C3 Vo

+ Cout

+

+

+

C1

C2

C3

Vo

+ Cout

M = 1/4

Phase 2:

Vin

+

+ C1 +

+

C2

C3

Vo

+

Cout

+

+

C1

C2

C3

Vo

+ Cout

M = 1/4

m = 9 / 16

m=1

p = 20 / 16

p = 38 / 16

Fig. 13.2 Example of two different SCCs with 3 floating capacitors that provide the same conversion ratio but have different output impedances

values (design in FSL) would increase the switching losses without further reducing Rout . On the other hand, if a higher Rout value needs to be designed, the use of small floating capacitors would be preferred, rather than a lower fs value especially in integrated implementations. Additionally, a design that provides the lowest Rout value required by the applications around fc offers a higher degree of freedom when it comes to regulating the output voltage.

2.2 Efficiency and Power-Loss Distribution From Fig. 13.1a, it can be observed that the average output voltage Vo of an SCC will be equivalent to that of an LDO with the difference that Vin has been multiplied by the M value. Consequently, the power efficiency of an ideal SCC (this is, without considering switching losses) can be similarly stated as follows: D

Vo M Vin

(13.4)

Thus, the efficiency might be unacceptably low unless Vo is very close (but smaller) to the product MVin , which is a condition difficult to satisfy in applications where Vin or Vo , or both, present a wide range of values. As stated previously, one of the main loss sources of an SCC are the conduction losses (Pcond ) represented by Rout in the averaged model (in reality, those happen in the parasitic resistances of the circuit while charging/discharging the capacitors). Once the M factor has been selected by the design, Pcond is set by the input and output voltages and the output current Io as shown in Eq. 13.5. Pcond D .M Vin Vo /Io

(13.5)

264

G. Villar Piqu´e and H.J. Bergveld

Hence, having chosen the most appropriate M factor, the efficiency will be determined by the switching losses Psw , with Eq. 13.4 being the upper limit. These can be subdivided in two different categories: • Bottom-plate losses. These switching losses are due to the parasitic capacitances of the floating capacitors of the SCC. The energy is spent in charging and discharging the capacitances at the terminals of the floating capacitors, during the circuit phase changes. Because of the more common planar nature of integrated capacitors, they are mainly related to the parasitic capacitance of the bottom plate of the capacitors, though strictly speaking the upper plate also contributes to the parasitics. Also the junction capacitances corresponding to the drain/source diffusions of the switches connected to the floating capacitors contribute. Though they strongly depend on the technology; ‘bottom-plate’ losses can be the dominant switching losses. It is also interesting to note that different topologies providing the same M factor can generate different amounts of bottom-plate losses. This issue has been addressed in [8, 9], and the former even proposes a particular switching scheme to reduce the amount of bottom-plate losses. • Driver losses. This category includes the energy spent in the drivers of the power switches. In CMOS technologies, the drivers are normally implemented by tapered buffers. Consequently, in a properly optimized design, the drivers should be co-designed with their corresponding switches so that the total switching losses are minimized [10].

2.3 Multi-ratio Switched-Capacitor Converters The possibly low power efficiency of an SCC in applications with a wide range of Vin and/or Vo values can be overcome by a Multi-Ratio SCC (MR-SCC). Figure 13.3 shows the theoretical efficiency (this is, without including switching losses and allowing for a value Rout D 0 at the peaks) of different SCCs with 1, 2 and 3 floating capacitors. As observed, when the input-to-output voltage ratio is wide, the use of a MR-SCC can result in significant benefits. For this reason, different authors targeting integrated an SCC implement a MR-SCC [4, 8, 11, 12]. However, the required increase in the amount of switches and drivers, and the obvious incapability of providing very low output impedance, which would eventually correspond to the peaks of the waveforms in Fig. 13.3, results in a smaller-than-expected increase of efficiency due to the multi-ratio configuration. In Fig. 13.4, the efficiency of two different MR-SCCs with N D 3 and N D 4 as a function of the current density is depicted. At every point, the designs have been optimized to maximize the average efficiency throughout the whole Vin range. As observed, the increase of efficiency due to the multi-ratio structure only becomes noticeable for low power densities (large area for the same output power). This is

13 State-of-the-Art of Integrated Switching Power Converters

265

Fig. 13.3 Theoretical efficiency comparison of MR-SCC with 1, 2 and 3 floating capacitors as a function of the input voltage Vin (Vo D 1.5 V). The average efficiencies are 71.4%, 84%, 92%, respectively

Fig. 13.4 Efficiency comparison of two different MR-SCC with 2 and 3 floating capacitors as a function of the current density. The designs are optimized to maximize the efficiency considering switching losses

due to the relatively lower switching frequency required for low power densities. In this situation, the efficiency is mainly determined by the conduction losses which are minimized by using a higher number of available ratios.

266

G. Villar Piqu´e and H.J. Bergveld

2.4 Control Strategies Applied to Switched-Capacitor Converters Because of the complex behavior of the output impedance of an SCC, there are different control strategies to provide line and load regulation. In the following, an overview of them will be provided highlighting their main characteristics. Most of the control strategies (except the conversion ratio control), base the regulation action on the modification of Rout to adjust its voltage drop. As stated in Eq. 13.5, the conduction losses Pcond are determined by the topology and the application. Thus, the difference in the resulting efficiency, especially for low Io values, is related to how the control minimizes the switching losses Psw . 1. Conversion ratio control adjusts the proper conversion ratio M to provide the required output voltage. Thus, it is the only control strategy that reduces the conduction losses. However, M only can take discrete values not linearly spaced, and the Rout characteristics (p and m values) change also non-linearly with the change of the topology. Because of this, most of the designs adjust the topology open-loop based on Vin and/or the reference for Vo , rather than including the voltage drop Io Rout in a closed-loop [4, 8, 11, 12]. However, in [13, 14] the conversion ratio is controlled closed-loop via sensing Io , though the SCC is not fully integrated. Because of the unavailability of sufficient M values, it is generally combined with other control methods to provide fine regulation. Since it does not modify the fs value the generated noise spectrum is quite predictable. 2. Duty-cycle control is quite uncommon in integrated SCC because of its lack of efficiency benefits. It is based on the dependency of the RFSL value on the duty cycle (d) of the switching signal. Though the dependency RFSL D f(d) is a function of the used topology, the minimum RFSL value is obtained at d D 50%. This does not result in any efficiency improvement over other strategies since the switching activity remains independent of the output power. Thus, Psw remains constant for the whole Io range, which results in a more predictable spectrum of the generated noise. In [15, 16] the authors used this control method combined with a programmable fs in order to reduce the switching losses at low power. 3. Switching Frequency Modulation (PFM). Many control loops of integrated SCC use control strategies that result in some kind of switching frequency modulation (PFM, hysteretic control, : : : ) [8, 9, 12, 17–19]. This is preferred because it keeps Psw proportional to Io , which results in a rather constant D f(Io ). Their main drawback is the unpredictable spectrum of the generated noise because of the fs modulation as a function of the required regulation actions. 4. Ron modulation. Another way to control the output voltage in the face of variations of Vin and/or Io is to modulate the on-resistance of one or more switches of the SCC, especially in FSL designs. Considering that the switches are mainly implemented by MOS transistors, two main mechanisms enable modifying the Ron value:

13 State-of-the-Art of Integrated Switching Power Converters

267

• Modulation of the transistor channel width by using segmented switches [11]. The disadvantage is that it only provides discrete Ron values, and all the required control signals can be quite complex to generate and route. • Modulation of the switch resistance by modulating the MOS Vgs voltage. Though theoretically it can generate continuous values up to completely switching-off the transistor channel, the highly non-linear characteristic of the MOS Ids D f(Vgs ) function can become a source of instability in the control loop, especially when Vgs gets close to the transistor threshold voltage. As an example, in [14] the authors use some switches from the SCC as the pass transistor of an LDO. Either of the two Ron modulation mechanisms results in a variation of the switching losses either because of the smaller switches to drive, or because the power spent to drive a MOS gate is a squared-function of the applied voltage swing. Hence, this is a control mechanism that reduces Psw when the output power needs to be reduced without modifying fs , which makes it interesting for noise-sensitive applications. 5. Series LDO. Because of its straightforward approach, the series connection of a Low-Drop-Out (LDO) continuous-time voltage regulator with the SCC became quite common [20, 21]. The concept is to save most of the Vin -Vo voltage gap by means of an SCC and regulate Vo by means of a series-connected LDO. As additional benefits, if connected at the input of the SCC, it can reduce the input current ripple; or if connected at the output it can reduce the Vo ripple and provide faster dynamics to the load regulation. The main drawback is that in order to allow for the voltage drop across the LDO, the Io Rout drop of the SCC has to be even lower, which implies the use of a higher fs , larger capacitors, larger switches, or a combination of them. Any of these would increase the switching losses. 6. Floating-capacitor size modulation. This technique aims to keep the Psw proportional to Io without modifying fs . The concept is to split the floating capacitors in smaller ones in order to use only the required amount of floating capacitors at different levels of output power [22]. As a result, this has a direct impact on the amount of bottom-plate losses. A natural extension of this concept is to split the whole SCC into smaller ones (this is, not only the floating capacitors but also the switches), and use them dynamically as separated parallel modules [12]. The sizing of the resulting smaller SCC modules will determine the amount of variation of D f .Io / and also the variation of the Vo ripple, given that the output buffer capacitor is kept constant. Additionally, the obtained discrete values of the supplied output power may lead to a cycle-limit behavior of Vo , unless it is further regulated with another method. Table 13.2 qualitatively compares all control strategies. Up arrows show a benefit and the opposite applies for down arrows ( for neutral). The last column shows the availability of a continuous regulation of Vo , instead of discrete values. Finally, it should be noted that an interesting and common approach is to combine some of the exposed control strategies to get their different benefits.

268

G. Villar Piqu´e and H.J. Bergveld

Table 13.2 Comparison of different control strategies of SCC Control strategy Conversion ratio Duty-cycle Switching frequency modulation Ron modulation (Wch ) Ron modulation (Vgs ) Series LDO Capacitor size modulation

Cond. losses " – – – – – –

Switch. losses – # "" " " ## ""

Noise " " ## " " " "

Discrt./Cont. Discrt. Cont. Cont. Discrt. Cont. Cont. Discrt.

2.5 Multi-phase Switched-Capacitor Converters A further extension of splitting the SCC into smaller ones is the Multi-Phase SCC (MP-SCC) [11, 17, 18]. Conceptually, an MP-SCC is a single SCC that is split into multiple smaller ones, in a modular approach, their clock phases being evenly distributed along the whole switching period. The main purpose of this is the reduction of the output voltage ripple because charge is injected to the output by means of smaller charge packages at different time instants. Correspondingly, the input current presents a much smaller ripple, because charge is taken from the input voltage source in smaller charge packages, too. The combined result is a much lower magnitude of generated noise. Since only one or a few phases are injecting charge to the output node, most of the designs can strongly reduce the size of Cout , or even avoid it (since the floating capacitors of the other phases may act as the output capacitor). This results in a very significant reduction of the size of the integrated SCC. On top of that, since an MP-SCC is already split into smaller converters the control of the number of active converters can produce a rather constant efficiency as function of Io . However, by means of turning on/off some of the modules, the generated noise spectrum is modified, and its impact should also be analyzed in noise-sensitive applications.

2.6 Technology Options for Switched-Capacitor Converters The main difficulty to fully integrate an SCC is the implementation on-chip of all the required capacitors. Even though parallel-plate capacitors are relatively easy to integrate in a planar process, normally non-dedicated standard processes provide low capacitive density values (obtained capacitance per area unit), which results in a large area occupation by the capacitors, and correspondingly high implementation cost. Consequently, in most of the fully integrated SCC in standard

13 State-of-the-Art of Integrated Switching Power Converters

269

CMOS technologies, more than 80% of the area is occupied by the capacitors. Hence, the highest performance in terms of power density and efficiency will be strongly dependent on the technology used. In the following the main advantages and drawbacks of the different technology options for integrated SCC will be discussed. The main three items to consider are: capacitive density, amount of ‘bottom-plate’ parasitics and implementation cost. • Current bulk CMOS technologies can provide a capacitance density of 4 up to 12 nF/mm2 , when using the gate capacitance of MOS transistors (MOSCAP). The highest values are obtained for the transistors with thin gate oxide, which results in a low breakdown voltage. In addition to this, because of the need for floating capacitors a separate well for the MOSCAPs is needed. Thus, Ntype MOSCAPs (with a lower ESR than P-type) require a triple-well process. Also, the bottom-plate capacitance is supposed to be the highest because of the proximity to the substrate (10%). The main advantage of bulk CMOS processes is the possibility to integrate the SCC with the rest of the system, especially in large ICs. The use of fringe metal capacitors, with lower parasitics, is normally not considered because of their low capacitive density (<1 nF/mm2 ) [4, 8, 9, 15, 22, 23]. • An extra option on top of bulk technologies is the use of Metal-InsulatorMetal (MIM) capacitors. Here, they are considered separately from bulk CMOS because they require extra masks in the fabrication phase, which increases the cost. The main advantages are the reduced bottom-plate parasitic capacitance (1%) and normally higher breakdown voltages, though the low capacitive density (up to 2 nF/mm2 ) increases the required area and furthermore the cost [12, 17–19, 21]. • A further extension of bulk CMOS is the Silicon-On-Insulator (SOI) technology. It differs from bulk technology because the devices are placed on a high-impedance substrate, which greatly reduces the bottom-plate parasitic capacitance of a MOSCAP. Thus, in SOI it is possible to get the high capacitive density of a MOSCAP with parasitic capacitances in the order of 0.1%. Again, the main drawback is its higher cost, which normally makes it only suitable for applications that justify it [11]. • With the use of trench capacitors, capacitive density can be higher than 400 nF/mm2 [24], which is far beyond the other technologies. The breakdown voltage of the capacitors is also higher than in standard CMOS and the parasitic are <1%. However, normally trench capacitors are incompatible with active devices that therefore need to be on a different die, and the interconnection between the two dies could be complex in case of using many different capacitors (MR-SCC or MP-SCC). Also the cost of the total assembly should be considered [25].

270

G. Villar Piqu´e and H.J. Bergveld

3 Integrated Inductive Converters 3.1 Efficiency and Power Loss Contributions As for any power converter, the power efficiency of an integrated inductive switching power converter is defined by Po /Pin , with Po the output and Pin the input power, and Pin D Po C Ploss . The main contribution to the loss power Ploss are [26, 27]: • Conduction losses. Conduction loss power Pcond results from ohmic losses in the power switches as well as the integrated inductor and is defined by: Pcond D

X

2 Irms;i Ri

(13.6)

i

where Irms; i is the rms current through resistance Ri . Since the inductance of the integrated inductor is generally small due to technology limitations, the current ripple will be substantial, leading to a large rms current. Moreover, the effective ohmic resistance of the inductor increases with the switching frequency due to the skin and proximity effects in the windings. When using an air-core inductor, eddy currents generated in nearby conducting areas also substantially increase the effective winding resistance. This can seriously impact the converter efficiency [28]. • Switching losses. Switching losses are associated with switching the power switches on and off and are mainly caused by charging and discharging parasitic capacitances with voltage excursions equal to supply voltage Vdd . Vdd is the highest available voltage to minimize the on resistance of the power switches (Vin for a down converter and Vo for an up converter). They are proportional to the switching frequency f. These losses are given by: Pswitch D

X

f Ci Vdd2

(13.7)

i

In addition to Eq. 13.7, other switching losses include overlap losses. These losses occur because of non-zero voltage across a power switch and non-zero current flowing through it during the time it is switched on or off. Overlap losses are generally small compared to the switching losses given in Eq. 13.7. • Dead-time losses. In any switching power converter, a dead time exists in which both power switches are in the off state, to prevent large current peaks that would otherwise occur due to shorting the input (down converter) or the output (up converter). In the case of non-zero inductor current, the current will flow through the body diode of one of the power switches during the dead-time period. This leads to losses given by: Pdead D f IL Vdiode td

(13.8)

13 State-of-the-Art of Integrated Switching Power Converters

271

where IL is the inductor current during the dead-time period (td ) and Vdiode is the body-diode forward voltage. • Inductor core losses. When the integrated inductor uses a magnetic core, core losses will result. Core losses include eddy-current and hysteresis losses, both of which are frequency-dependent [29]. Using core material has the advantage of a smaller inductor footprint and reduction of Electro-Magnetic Interference (EMI) problems, but resulting core losses may be substantial and require a careful choice of the core material, thickness and structure.

3.2 Control Strategies A power converter control loop should ensure good line and load regulation, keeping the output voltage constant when the input voltage and output current vary, respectively. Since integrated switching power converters operate at high switching frequencies with fast switching times, special precautions must be taken when implementing a control loop. In general, integrated switching power converters require a simple control strategy and implementation [30]. Two basic control strategies can be distinguished, including Pulse-Width Modulation (PWM) and Pulse-Frequency Modulation (PFM) [26]. With PWM, the switching frequency is fixed and the duty cycle for driving the power switches is varied. In the case of PFM, the switching frequency will increase with the load current. This can be achieved by keeping the on time of the controlling power switch constant, by keeping its off time constant, etc. Comparing PFM to PWM for the case of integrated inductive conversion [30], PFM leads to better power efficiency, particularly at low output power, due to the lower switching losses associated with the lower switching frequency. This assumes that the switching frequency of a PFM-controlled integrated inductive converter is always lower than that of a PWM-controlled integrated converter, and only equals it at maximum output power. However, a PFM-controlled integrated inductive converter will have a larger output voltage ripple at low output power. This can only be counteracted by choosing a larger output capacitance with associated area penalties. The EMI behavior of a PWM-controlled integrated inductive converter can be predicted better due to the constant switching frequency. However, due to the high switching frequency, it may occur in the frequency band of the application, in which case dealing with all EMI energy concentrated at the switching frequency may be troublesome. In that case, the spreading of EMI energy across the frequency band for PFM-controlled integrated converters may be an advantage.

272

G. Villar Piqu´e and H.J. Bergveld Vg

Vin

V in

Cin

P ON N ON

t

VLX

Vg, P LX Vg, N

Vg, P Vg, N

IL Cx

L Vo Co

Vin t

IL ΔI

t

Fig. 13.5 General block diagram of a synchronous inductive buck converter with corresponding waveforms

3.3 Implementation Issues The inductive converter topologies considered for integration are relatively simple, since the boundary condition of integration prevents more complex structures. For example, isolated topologies are unpractical, due to the need for an on-chip transformer, and resonant topologies are not practical due to the low Q factors of on-chip resonance tanks. This implies that the regular buck and boost topologies are used the most. The dimensioning of the components of these simple topologies, such as switch resistances and associated area and capacitance as well as inductance value, is not trivial. First of all, parasitic effects that are less important when designing a conventional switching power converter with external reactive components, such as the low Q factor for integrated inductors and the large impact of parasitic switch capacitances, should be taken into consideration. This requires the development of design equations that take these effects into account. Secondly, the multi-dimensional design space makes design optimization by simulation unpractical. Therefore, designspace exploration based on calculations of e.g. efficiency over the complete design space based on a detailed converter model are very useful [27, 30–32]. Due to the high switching frequency, reducing switching losses is important, particularly when Vdd is large, see Eq. 13.7. Implementing Zero-Voltage Switching (ZVS) helps to reduce switching losses, since the inductor current is used to charge and discharge some of the parasitic capacitances. This basically implies quasiresonant switching behavior, where energy is exchanged between capacitances and the inductor instead of being dissipated in the power switches. For example, consider the power stage and corresponding switching waveforms of a synchronous inductive buck converter in Fig. 13.5 [28]. When the upper PMOST power switch is on, the inductor current ramps up linearly, and when it switches off, the positive inductor current discharges the

13 State-of-the-Art of Integrated Switching Power Converters

273

parasitic capacitance Cx at the LX node. With proper control of the dead time between switching off the upper PMOST and switching on the lower NMOST, the NMOST can be switched on when its drain-source voltage is zero. This leads to lower switching losses, since Cx is discharged by the inductor current. A similar effect can be achieved when the NMOST is switched off at negative inductor current, where the inductor current charges Cx . Note that the transition times of the LX node voltage depend on the inductor current and therefore on the load current. Therefore, in order to prevent that the MOSTs are switched on too early, leading to remaining capacitive switching losses, or too late, leading to parasitic body-diode losses, the dead-time control needs to be adaptive. It depends on the loss distribution whether ZVS operation makes sense or not. Various examples of adaptive dead-time control circuitry to reduce switching losses can be found in literature. The adaptive dead-time loop presented in [33] uses a Delay-Locked Loop (DLL) to ensure that the drain-source voltage Vds crossing zero and the gate-source voltage Vgs passing the threshold voltage Vt occur at the same moment. In the adaptive dead-time loops presented in [34, 35], an additional capacitor is added in parallel to Cx to increase the high-low and low-high transition times at the LX node. This enables more precise adaptive dead-time control, since delays in e.g. comparators in the control loop and gate drivers become less influential. As in [33], Vds zero-crossing and Vgs Vt -crossing comparators are used, the outputs of which are fed to an analog DLL. The circuitry needed to implement adaptive dead-time control also consumes power. Therefore, when the switching losses are relatively low, for example because Vdd in Eq. 13.7 is relatively low, simpler implementations may be a better trade-off. An example of this reasoning can be found in [28]. Multi-level power conversion reduces the rms current value [31, 36]. This is particularly beneficial in integrated inductive converters due to the small inductance value and associated high current ripple, as well as the low Q factor of the inductor and associated high series resistance. Therefore, multi-level converters can reduce conduction losses. Due to the lower current ripple, the output voltage ripple also decreases at the same output capacitance value, so the output capacitance can be reduced to keep the voltage ripple constant. An additional floating capacitor is needed to generate the third voltage level, but its capacitance value is small and therefore it does not take up much area. An additional advantage concerns the decreased voltage values across the switches, allowing to use thin-gate-oxide switches. A disadvantage is the increased complexity of generating the switch driving signals. Moreover, additional power switches are needed compared to the standard buck or boost topologies. An example of a three-level integrated buck converter is given in [36]. Multi-phase power converters consist of multiple stages or phases, each with power switches and inductor, in parallel. All phases share a common input and output capacitor. Placing converter stages in parallel increases the maximum output power, since all of the parallel stages contribute to the output power and the maximum output power for a single stage is limited. Moreover, the multi-phase power converter can be optimized such that its phases all operate at maximum

274

G. Villar Piqu´e and H.J. Bergveld

possible efficiency, which optimizes the overall converter efficiency. In case the phases are operated out of phase, the ripple of the overall output current can be reduced substantially depending on the duty cycle and the number of phases. This leads to reduced input and output capacitance compared to a single-phase converter with the same input and output voltage ripple. Alternatively, with unchanged capacitance values, a substantially lower input and output voltage ripple results. An example of a two-phase integrated buck converter with two coupled air-core inductors stacked on top of each other can be found in [37]. Coupling of inductors reduces current ripple even more [37, 38], but becomes unpractical for more than two phases. An example of a four-phase integrated inductive buck converter without coupled coils can be found in [39]. In many cases, the converter input voltage will be higher than can be handled by the nm-CMOS IC process. For both switched-capacitor and inductive converters, one way of implementing a converter that can withstand higher input or output voltages using standard CMOS is to apply cascoded power switches [38, 40]. An alternative is to realize High-Voltage (HV) MOSTs in baseline CMOS without requiring any additional process masks [41]. These HV MOSTs enable high switching frequencies as well, as is supported by its application in RF Power Amplifiers (PA) [42]. Another main choice needs to be made between Continuous Conduction Mode (CCM), where the inductor current is flowing continuously, and Discontinuous Conduction Mode (DCM), where the inductor current is zero for some time. CCM leads to simpler generation of switch driving signals and can lead to acceptable results [32], but particularly at low output powers, moving to DCM makes more sense from an efficiency point of view. Operating in DCM requires more complex timing of the switch driving signals, for example by determining when the coil current is zero. This is not trivial and only a few practical implementation examples can be found in literature [30, 31].

3.4 Integration Technology Options As described in Sect. 1, two main integration approaches can be distinguished, including SiP and monolithic integration. Each of the two groups includes quite a few variations. In the SiP approach, the reactive components are realized in a (mixture of) different technology(ies) leading to an optimum implementation. For example, using ferrite as a substrate for a solenoid inductor on which the power IC is mounted forms a Chip-Size Module in [43]. Alternatively, an eight-phase interleaved buck converter is formed in [44] using off-chip SMD air-core inductors and capacitors. In a different approach, the reactive components are realized in IC technology, but not on the same die as the switches and control. This can be referred to as a dual-die approach, where the active die and the die with reactive components are connected to each other via bumps. The reactive components can be realized in a

13 State-of-the-Art of Integrated Switching Power Converters

275

Fig. 13.6 (a) Passive-integration die before dicing, (b) Active die flip-chipped on passiveintegration die, (c) Sandwich double-flip-chipped on HVQFN40 lead frame, (d) HVQFN40 package including two-die sandwich

relatively cheap CMOS process with e.g. a feature size of 0.35 m, where area is less expensive, and the active components can then be integrated with the load in nm CMOS [45]. A further cost reduction is possible when a dedicated low-maskcount passive-integration process is used to integrate the capacitors and inductor. Examples can be found in [28, 32], where a passive-integration process with 80 nF/mm2 capacitance density is used with an 8-m thick copper top metal layer to realize inductors with reasonable Q factor. An additional advantage is the fact that relatively large input and output capacitance values can be realized at low cost. Some photographs of an integrated buck converter realized in [32] are shown in Fig. 13.6. Several examples can be found of inductive converters that are monolithically integrated, i.e. using a single die in a package, for example in 130 nm CMOS [37, 39, 46] and in 180 nm CMOS [47]. A monolithically integrated buck converter in a 180 nm SiGe IC process can be found in [48]. However, in standard CMOS the inductor remains difficult to integrate in acceptable area and with acceptable performance. An alternative is to realize the inductor with bond wires, as is shown in [31, 36, 49]. Finally, the inductor performance can be increased by applying post-processing steps to a standard CMOS process. For example, a MEMS postprocessed Plastic Deformation Magnetic Assembly (PDMA) inductor is used in [50]. Post-processed thin-film inductors based on amorphous CoZrTa alloy are proposed in [38, 51].

276

G. Villar Piqu´e and H.J. Bergveld

4 Comparison of Integrated Switched-Capacitor and Inductive Converters In this section, a comparison of different relevant designs in the literature is performed, placing them in a ‘peak efficiency vs. power density (at the peak efficiency)’ plane (Fig. 13.7). From that comparison, some conclusions about the suitability of the previously commented options are drawn. Note that the comparison among the different examples in the literature is both complex and unfair because of the different functionality and constraints required by the various applications (step-up, step-down, particular values of Vin and Vo ). Regarding the SCC arena (Fig. 13.7a) the following is observed: • In bulk CMOS implementations (including MIM capacitors) there is a clear trade-off between power density and peak efficiency (moving along the straight black line). The designs with higher peak efficiency implement MIM capacitors, but the lower capacitive density results in lower power densities. • Designs in smaller feature size (<100 nm) can provide higher power density at reasonable efficiencies due to their capability to operate at higher switching frequencies, together with higher densities for the MOSCAPs. • As expected the best performance is obtained for more exotic technologies, as in [11] (32 nm on SOI), and [25] (trench capacitors). • Comparing the results in both graphs of Fig. 13.7, it is difficult to identify what kind of converter (SCC or inductive) can provide better overall performance (efficiency and power density). Also, it should be considered that the exposed results strongly depend on the application, regarding Vin , Vo , Po and other constraints. From Fig. 13.7b, the following is observed regarding inductive converters: • The true monolithic converters all realize similar power densities [37, 39, 46, 48] without requiring special process options (although [48] is realized in SiGe instead of CMOS). Using bond wires to realize the inductor does not lead to a large increase in power density [49]. • The SiP-based converters show both lower [28, 32, 43, 45] and higher [44] power densities compared to the monolithic converters. The reason for the highest power density of [44] is probably the use of dedicated and optimized small SMD inductors and capacitors. Therefore, the main reason for choosing a SiP-based approach instead of a monolithic approach is not power density, but may for example be lower cost since expensive CMOS area is saved. • The converter presented in [51] and dating from 2010 shows that a lot can be improved in power density when applying post-processed magnetic structures on CMOS, especially compared to [50] which is based on the same rationale but dates from 2005. Careful material choice is crucial here.

13 State-of-the-Art of Integrated Switching Power Converters

277

Fig. 13.7 State-of-the-art literature results for switched-capacitor (a) and inductive (b) converters. Peak efficiency vs. power density (at the peak efficiency). The numbers indicate the corresponding reference

It is also interesting to observe the maximum output power obtained by the different designs in the literature, in order to identify what kind of power converter is more suitable for what kind of application. This is depicted in Fig. 13.8 for both inductive and SCC converters. There is a very clear trend for the integrated inductive converters to go for higher power levels. In case of SCC, most of the designs target low-power applications, though recently ([11, 17] in 2010) there have been some attempts to provide similar output power levels to the inductive converters.

278

G. Villar Piqu´e and H.J. Bergveld 95.0

43

90.0

25

Peak efficiency (%)

19 18

80.0 17

4 75.0

Inductive

32

12 85.0

SCC

11 37

46 51

8

70.0 9 65.0

22 15

28

23

49 48

45 60.0 39 55.0 50 50.0 0.5

5

50

500

5000

Max Output Power (mW)

Fig. 13.8 State-of-the-art results for inductive and switched-capacitor converters: peak efficiency vs. maximum output power. The numbers indicate the corresponding references

5 Conclusions This paper describes the state-of-the-art of integrated switching power converters, both switched-capacitor and inductive. Both types enable efficient up and down conversion, which is required by many applications. For switched-capacitor converters a clear trade-off between peak efficiency and power density can be distinguished. For inductive converters, monolithic integration does not necessarily lead to lower power densities compared to the SiP approach. Comparing switchedcapacitor and inductive converters, inductive converters generally enable higher output powers. However, switched-capacitor converters with output powers in the range of inductive converters are appearing, whereas inductive converters are less suitable for ultra-low output powers.

References 1.H.J. Bergveld et al., Battery Management Systems: Design by Modelling. Philips Research Book Series, vol. 1 (Kluwer, Dordrecht, 2002) 2. M. Meijer et al., Post-silicon tuning capabilities of 45 nm low- power CMOS digital circuits, in IEEE Symposium on VLSI Circuits 2009, VLSI’09, Kyoto, 2009, pp. 110–111 3. M. Meijer et al., Ultra-low-power digital design with body biasing for low area and performance-efficient operation. IEEE J. Low Power Electron. 6(4), 521–532 (2010) 4. J. Kwong et al., A 65 nm Sub-Vt microcontroller with integrated SRAM and switchedcapacitor DC-DC converter. IEEE J. Solid-State Circuits 44(1), 115–126 (2009)

13 State-of-the-Art of Integrated Switching Power Converters

279

5. M. Makowski, D. Maksimovi´c, Performance limits of switched-capacitor DC-DC converters, in IEEE Power Electronics Specialists Conference 1995, PESC’95, Atlanta, 2008, pp. 1215–1221 6. B. Arntzen, D. Maksimovi´c, Switched-capacitor DC/DC converters with resonant gate drive. IEEE Trans. Power Electron. 13(5), 892–902 (1998) 7. M.D. Seeman, S.R. Sanders, Analysis and optimization of switched-capacitor DC-DC converters. IEEE Trans. Power Electron. 23(2), 841–851 (2008) 8. Y.K. Ramadass, A.P. Chandrakasan, Voltage scalable switched capacitor DC-DC converter for ultra-low power on-chip applications, in IEEE Power Electronics Specialists Conference 2007, PESC’07, Orlando, 2007, pp. 2353–2359 9. D. Maksimovi´c, S. Dhar, Switched-capacitor DC-DC converters for low-power on-chip applications, in IEEE Power Electronics Specialists Conference 1999, PESC’99, Charleston, 1999, pp. 54–59 10. G. Villar et al., Energy optimization of tapered buffers for CMOS on-chip switching power converters, in IEEE International Symposium on Circuits and Systems 2005, ISCAS’05, Kobe, 2005, pp. 4453–4456 11. H.P. Le et al., A 32 nm fully integrated reconfigurable switched-capacitor DC-DC converter delivering 0.55 W/mm2 at 81% efficiency, in IEEE International Solid-State Circuit Conference 2010, ISSCC’10, Session 10, San Francisco, 2010 12. T. Van Breussegem, M. Steyaert, A fully integrated gearbox capacitive DC/DC-converter in 90 nm CMOS: optimization, control and measurements, in IEEE Workshop on Control and Modeling for Power Electronics 2010, COMPEL’10, Boulder, 2010, pp. 1–5 13. S. Bin et al., High efficiency, inductorless step-down DC/DC converter, in IEEE International Conference on ASIC 2005, ICASIC’05, Shanghai, 2005, pp. 364–367 14. A. Rao et al., Noise-shaping techniques applied to switched-capacitor voltage regulators. IEEE J. Solid-State Circuits 40(2), 422–429 (2005) 15. L. Su et al., A monolithic step-down SC power converter with frequency-programmable subthreshold z-domain DPWM control for ultra-low power microsystems, in IEEE European Solid-State Circuit Conference 2008, ESSCIRC’08, Edinburgh, 2008, pp. 56–61 16. L. Su et al., Design and analysis of monolithic step-down SC power converter with subthreshold DPWM control for self-powered wireless sensors. IEEE Trans. Circuits Syst. I 57(1), 280–290 (2010) 17. T. Van Breussegem, M. Steyaert, A fully integrated 74% efficiency 3.6 V to 1.5 V 150 mW capacitive point of load DC/DC converter, in IEEE European Solid-State Circuit Conference 2010, ESSCIRC’10, Seville, 2010, pp. 434–437 18. T. Van Breussegem, M. Steyaert, A 82% efficiency 0.5% ripple 16-phase fully integrated capacitive voltage doubler, in IEEE Symposium on VLSI Circuits 2009, VLSI’09, Kyoto, 2009, pp. 198–199 19. M. Seeman et al., An ultra-low-power power management IC for energy-scavenged wireless sensor nodes, in IEEE Power Electronics Specialists Conference 2008, PESC’08, Rhodes, 2008, pp. 925–931 20. K. Bhattacharyya et al., Embedded hybrid DC-DC converter with improved power efficiency, in IEEE International Midwest Symposium on Circuits and Systems 2009, MWSCAS 2009, Cancun, 2009, pp. 945–948 21. M. Wieckowski et al., A hybrid DC-DC converter for sub-microwatt sub-1 V implantable applications, in IEEE Symposium on VLSI Circuits 2009, VLSI’09, Kyoto, 2009, pp. 166–167 22. Y.K. Ramadass et al., A fully-integrated switched-capacitor step-down DC-DC converter with digital capacitance modulation in 45 nm CMOS. IEEE J. Solid-State Circuits 45(12), 2557–2565 (2010) 23. D. Somasekhar et al., Multi-phase 1 GHz voltage doubler charge pump in 32 nm logic process. IEEE J. Solid-State Circuits 45(4), 751–758 (2010) 24. J.H. Klootwijk et al., Ultrahigh capacitance density for multiple ALD-grown MIM capacitor stacks in 3-D silicon. IEEE Electron Device Lett. 29(7), 740–742 (2008)

280

G. Villar Piqu´e and H.J. Bergveld

25. L. Chang et al., A fully-integrated switched-capacitor 2:1 voltage converter with regulation capability and 90% efficiency at 2.3A/mm2 , in IEEE Symposium on VLSI Circuits 2010, VLSI’10, Bangalore, 2010, pp. 55–56 26.R.W. Erickson, D. Maksimovi´c, Fundamentals of Power Electronics, 2nd edn. (Kluwer, Dordrecht, 2001) 27. F. Haizoune et al., Topology comparison and design optimisation of the buck converter and the single-inductor dual-output converter for system-in-package in 65 nm CMOS, in IEEE International Power Electronics and Motion Control Conference 2009, IPEMC’09, Wuhan, 2009, pp. 295–301 28. H.J. Bergveld et al., An inductive down converter system-in-package for integrated power management in battery-powered applications, in IEEE Power Electronics Specialists Conference 2008, PESC’08, Rhodes, 2008, pp. 3335–3341 29. R. Meere et al., Analysis of microinductor performance in a 20–100 MHz DC/DC converter. IEEE Trans. Power Electron. 24(9), 2212–2218 (2009) 30. M. Wens, Monolithic inductive CMOS DC-DC converters – theory study and Implementation, Ph.D. dissertation, Katholieke Universiteit Leuven, Belgium, October 2010, ISBN 978-946018-263-1 31. G. Villar Piqu´e, E. Alarc´on, CMOS Integrated Switching Power Converters: A Structured Design Approach, 1st edn. (Springer, New York, 2011) 32. H.J. Bergveld et al., A 65-nm CMOS 100-MHz 87%-efficient DC-DC down converter based on dual-die system-in-package integration, in IEEE Energy Conversion Congress and Exhibition 2009, ECCE’09, San Jose, USA, 2009, pp. 3698–3705 33. W. Lau, S. Sanders, An integrated controller for a high-frequency buck converter, in IEEE Power Electronics Specialists Conference 1997, PESC’97, Orlando, 1997, pp. 246–254 34. S. Chen et al., Fast dead-time locked loops for a high-efficiency microprocessor-load ZVSQSW DC/DC converter, in IEEE Conference on Electron Devices and Solid-State Circuits 2003, EDSSC’03, Hong Kong, 2003, pp. 391–394 35. O. Trescases et al., Precision gate drive timing in a zero-voltage-switching DC-DC converter, in IEEE Symposium on Power Semiconductor Devices and ICs 2004, ISPSD’04, Kitakyushu, 2004, pp. 55–58 36. G. Villar Piqu´e, E. Alarcon, Monolithic integration of a 3-level DCM-operated low-floatingcapacitor buck converter for DC-DC step-down conversion in standard CMOS, in IEEE Power Electronics Specialists Conference 2008, PESC’08, Rhodes, 2008, pp. 4229–4235 37. J. Wibben, R. Harjani, A high-efficiency DC-DC converter using 2 nH integrated inductors. IEEE J. Solid-State Circuits 43(4), 844–854 (2008) 38. G. Schrom et al., Feasibility of monolithic and 3D-stacked DC-DC converters for microprocessors in 90 nm technology generation, in IEEE International Symposium on Low-Power Electronics And Design 2004, ISPLED’04, Newport Beach, 2004, pp. 263–268 39. M. Wens, M. Steyaert, An 800 mW fully integrated 130 nm CMOS DC-DC step-down multiphase converter with on-chip spiral inductors and capacitors, in IEEE Energy Conversion Congress and Exhibition 2009, ECCE’09, San Jose, USA, 2009, pp. 3706–3709 40. V. Kursun et al., High input voltage step-down DC-DC converters for integration in a low voltage CMOS process, in IEEE International Symposium on Quality Electronic Design 2004, ISQED’04, San Jose, 2004, pp. 517–521 41. A. Heringa, J. Sonsky, Novel power transistor design for a process independent high voltage option in standard CMOS, in IEEE Symposium on Power Semiconductor Devices and IC’s 2006, ISPSD’06, Naples, 2006, pp. 1–4 42. D. Calvillo-Cortes et al., A 65 nm pulse-width-controlled driver with 8Vpp output voltage for switched-mode RF pas up to 3.6 GHz, in IEEE International Solid-State Circuits Conference 2011, ISSCC’11, San Francisco, 2011, pp. 12–14 43. Z. Hayashi et al., High-efficiency DC-DC converter chip size module with integrated soft ferrite. IEEE Trans. Magn. 39(5), 3068–3072 (2003)

13 State-of-the-Art of Integrated Switching Power Converters

281

44. G. Schrom et al., A 100 MHz eight-phase buck converter delivering 12A in 25 mm2 using air-core inductors, in IEEE Applied Power Electronics Conference 2007, APEC’07, Anaheim, 2007, pp. 727–730 45. K. Onizuka et al., Stacked-chip implementation of on-chip buck converter for distributed power supply system in SiPs. IEEE J. Solid-State Circuits 42(11), 2404–2410 (2007) 46. N. Jinhua et al., Improved on-chip components for integrated DC-DC converters in 0.13 m CMOS, in IEEE European Sold-State Circuits Conference 2009, ESSCIRC’09, Athens, 2009, pp. 448–451 47. M. Alimadadi et al., A fully integrated 660 MHz low-swing energy-recycling DC-DC converter. IEEE Trans. Power Electron. 24(6), 1475–1485 (2009) 48. S. Abedinpour et al., A multistage interleaved synchronous buck converter with integrated output filter in 0.18 m SiGe process. IEEE Trans. Power Electron. 22(6), 2164–2175 (2007) 49. M. Wens, M. Steyaert, A fully integrated 0.18 m CMOS DC-DC step-down converter, using a bondwire spiral inductor, in IEEE Custom Integrated Circuits Conference 2008, CICC’08, San Jose, 2008, pp. 17–20 50. S. Musunuri, P. Chapman, Design of low power monolithic DC-DC buck converter with integrated inductor, in IEEE Power Electronics Specialists Conference 2005, PESC’05, Recife, 2005, pp. 1773–1779 51. J.T. Dibene II, Power on silicon with on-die magnetics – the start of a revolution in power delivery and power management for SoC’s and high performance applications, in 2nd International Workshop on Power Supply on Chip 2010, PwrSoC’10, Cork, 2010

Chapter 14

Data Conversion Pulse-Width Modulators for Switch-Mode Power Converter Digital Control Eduard Alarc´on, Vahid Yousefzadeh, Aleksandar Prodi´c, and Dragan Maksimovi´c

Abstract This chapter presents a survey and classification of architectures for integrated circuit implementation of digital pulse-width modulators (DPWM) targeting digital control of high-frequency switching DC-DC power converters. In order to optimize circuit resources in terms of occupied area and power consumption, architectures based on tapped delay lines are studied, which includes segmentation of the input digital code to drive binary-weighted delay cells and thermometer-decoded unary delay cells. Integrated circuit design of a particular example of the segmented DPWM is described.

1 Introduction Feasibility of practical high-frequency, high-performance digital controllers for DC-DC applications were demonstrated one decade ago [1–3]. Based on custom architectures and microelectronic realizations of the key building blocks, including high-resolution high-frequency digital pulse-width modulators (DPWM), simplified discrete-time compensator schemes, and A/D converters, such controllers can offer

E. Alarc´on () Department of Electronic Engineering, Technical University of Catalunya (UPC BarcelonaTech), Barcelona, Spain e-mail: [email protected] V. Yousefzadeh National Semiconductors, Longmont, CO, USA A. Prodi´c Electrical and Computer Engineering Department, University of Toronto, Toronto, ON, Canada D. Maksimovi´c Department of Electrical and Computer Engineering, University of Colorado, Boulder, CO, USA M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 14, © Springer ScienceCBusiness Media B.V. 2012

283

284

E. Alarc´on et al.

the advantages of lower sensitivity to parameter variations, programmability, and reduction or elimination of external passive components, without compromising dynamic performance, simplicity or cost [4–8]. The first generation of digital controllers for switching DC-DC power converters [1–3] aimed to provide comparable performance to their continuous-time analog counterparts in terms of dynamic performance as well as use of implementation resources (silicon area and power consumption) while providing extended functionality, such as programmability, and practical interest due to the ease of synthesis. A decade later, the ultimate goals for the next generation digital controllers encompass both aspects of integration and functionally. In terms of integration, the aim is to provide simple SMPS-specific practical controller realizations with the advantages of CMOS process scaling and digital IC design flow with continuous improvements in the area/power/processing space. As far as functionality is concerned, the target is to address enhanced performance and functionality that is impractical or challenging by means of the analog approach, namely: system integration and system-level power management, improved silicon integration, full programmability, testability, self-diagnostics, self-tuning, adaptability, on-line efficiency optimization, on-line calibration and nonlinear control with enhanced dynamics. Several partial challenges in digital SMPS are still open, namely: (a) those fundamentally related to quantization in time (inherent to discrete-time systems) such as aliasing of signal frequencies, modification of system Bode plots and filter corner frequencies shift with sample rate, (b) related to quantization in amplitude, such as limited resolution for regulation, limit cycling disturbances, distorted waveforms, rounding of filter corner frequencies (finite word lengths), and (c) the availability of cost-effective high performance hardware, such as dedicated hardware processing blocks and controller design and coding to minimize hardware requirements. A high-frequency, high-resolution DPWM circuit is one of the critical blocks for successful practical realization of digital control for switching power converters. A high-resolution DPWM is necessary to accomplish precise voltage regulation and avoid undesirable quantization effects, such as limit-cycle oscillations [9]. Although other modulation schemes (such as sigma-delta) can be envisaged, and other control schemes with varying switching frequency (such as sliding-mode and hysteretic control) can be used, the use of constant-frequency pulse-width modulation is widespread in switching power converters. Discussions in this chapter, although general in nature, are mainly limited to single-output, trailing-edge pulse width modulation. In the standard analog trailing-edge pulse width modulation, an analog input signal is compared to a given carrier to provide amplitude to time domain conversion, as shown in Fig. 14.1. The usual particular case of a linear ramp (“saw-tooth”) as the carrier results in a linear relationship between the input signal and the output pulse width. When targeting the implementation of a pulse-width modulator with a digital signal as the input, a proper means to provide the digital to time domain conversion is required, while pursuing high-frequency capability, low area, and low power

14 Data Conversion Pulse-Width Modulators . . . Fig. 14.1 Conceptual block diagram of an analog PWM

285

OSC fs

Carrier generator frequency = fs control voltage +

DPWM out

Q

ton

Fig. 14.2 Conceptual block diagram of a digital PWM

S R

Ts

OSC fs

–

clear

Time quantizer ... Digital comparator for time slot selection

DPWM out ton

Q

Ts

n

d [n]

S R

clear

consumption. Figure 14.2 shows a conceptual DPWM realization: instead of the carrier ramp which is used to measure the time in the analog PWM of Fig. 14.1, the time is quantized into a number of discrete time slots of length td , and a particular slot is selected by the digital control input word d. The digital comparator, which is used to select the time slot in the DPWM of Fig. 14.2, serves the purpose of the analog comparator in Fig. 14.1. There are many possible circuit implementations of the DPWM conceptually shown in Fig. 14.2 [1–3, 10–18], with different characteristics in terms of high-frequency capability, complexity, area, power consumption, sensitivity to process/temperature variations, linearity, etc.

2 Review of DPWM Architectures The first reported method for implementing a DPWM is based on the direct digital emulation of the ramp waveform by means of a fast-clocked counter, which is loaded by the input digital code at the beginning of the cycle, and thus generating a

286

E. Alarc´on et al.

time-varying digital code following a sawtooth signal that is compared with zero by a digital zero detector [10]. This counter-based DPWM closely follows the general block diagram of Fig. 14.2. An excellent linearity is achieved in the digital to time domain conversion through the use of a 2n fs clock to divide the time period Ts D 1/fs for an n-bit DPWM. For a high switching frequency fs and a high DPWM resolution n, the need for the very high frequency clock is the main disadvantage of this approach. A subsequently proposed architecture that circumvents the high-frequency clock problem of the counter-based DPWM is based on a tapped delay line [11], as depicted in Fig. 14.3a. This circuit takes advantage of the linear propagation of a given pulse from a reference clock clk through the delay cells connected in cascade, to select a given pulse width quantized as a function of the selected number of cells. This selection can be obtained from the digital input code by means of a 2n -to-1 multiplexer driven by the digital input code d. The multiplexer selects the signal that resets the output PWM signal, thus performing the function of selecting one of the time slots generated by the delay-line time quantizer. The delay line in Fig. 14.3a operates in an open-loop manner in the sense that the switching frequency fs is imposed by an external oscillator, while the total delay of the line should be designed in order for the maximum delay to match the switching period. As the semiconductor material properties (and therefore the cell delay td ) vary with process and temperature, this condition cannot be satisfied at all process/temperature corners. Consequently, the executed duty cycle Dexec is not always the same as the duty cycle command Dcomd represented by the input digital signal. As an alternative, the delay line itself can be used to form a ring oscillator (Fig. 14.3b), which generates the clock at the switching frequency. With this configuration, the resulting switching frequency is imposed by the delay line itself, and thus the maximum duty cycle can be guaranteed. In this case, the process/temperature variation causes a drift in the switching frequency fs rather than affecting the executed duty cycle and duty cycle command relationship, which always remains Dexec D Dcomd . The analog equivalent of the open-loop circuit of

a

VDD

clk OSC fs

VDD

Ts

Q S R

VDD

fs t1

t2n-1

2n:1 MUX clear

t2n

VDD

t1

t0

n

d [n]

VDD

delay cell

VDD

VDD

rst t0

DPWM out ton

b

VDD

delay cell

DPWM out ton

VDD

t2n-1

2n:1 MUX Q S R

VDD

t2n

n

d[n]

clear

Ts

Fig. 14.3 Delay-line based digital pulse width modulator (DPWM) architectures: (a) open-loop tapped delay line DPWM, and (b) tapped delay line DPWM forming a ring oscillator

14 Data Conversion Pulse-Width Modulators . . .

287

Fig. 14.3a consists of an integrator based on a capacitor that is discharged by means of an external clock, whereas the approach of Fig. 14.3b corresponds to an astable oscillator that provides the sawtooth waveform by including the switched integrator in a closed loop configuration. In order to control the cell delay, or to enable synchronization to an external clock, design guidelines for the delay-line based DPWM cells can resort to well-known techniques in the area of DLL (delay locked loops) used in communication circuits [19]. Additionally, a DLL based scheme can be used to generate multiphase PWM signals [2]. Note as well that this delay oscillator might also be synchronized to an external system clock, if required. The DLL technique can also be utilized in open loop delay line in order to control the cell delay and thereby force a fixed relationship between Dexec and Dcomd , as discussed in Sect. 6. As far as the design of the digital delay cells composing the delay line is concerned, note that the delay of each cell depends on both its supply voltage and size, factors that directly impinge on the output current capability and the output node capacitance, and thus on the delay. Additionally, differential cells might be used to minimize noise sensitivity of delay cells to voltage supply. A reported work takes advantage of the delay dependence on the current supplying the delay cells to implement a CMOS-suited analog PWM using a plain delay-controlled line [16], that uses starved inverters [10] to limit and control the current available to charge each delay cell output node. Delay cells with the cell delay td inversely proportional to the input voltage have been used to construct feed-forward delay-line DPWM circuits [17, 18]. One of the disadvantages of the delay line based approach is the nonlinearity in the digital to time domain conversion, which has its origin in the cell delay mismatch due to process variations across the length of the delay line and any extra mismatched delay in the path of delay cell output to the reset terminal of the flipflop, analogous to those in current-steering DACs. The most important disadvantage for this architecture is the large size required by the multiplexer in charge of gating the desired delay line tap to the flip-flop. A power and area efficient solution can be implemented by considering a hybrid counter/delay-line based design, tradingoff multiplexer area versus counter clock frequency [1, 3, 13–15]. In the hybrid approach, the two cases reported so far employ either the counter with the most significant bits to provide a coarse pulse width and the finer pulse width being generated by the delay line selected by the least significant bits, or vise versa. The addition of coarse and fine pulse widths is accomplished by a series connection of the two sub-circuits. The work in [14] identifies the optimum degree of hybridization by evaluating power consumption versus distortion, since the application is a digitally controlled class-D switching power audio amplifier where linearity is very important. The THD is due to the mismatch of both characteristics, which, although individually monotonic, combine into a global characteristic with distortion.

288

E. Alarc´on et al.

3 Segmented DPWM Architecture All previous delay-line based architectures require the number of unary delay cells to be proportional to the total number of quantization levels, i.e. 2n , and hence there is the need to convert the n-bit digital code to a 2n -bit thermometer-code version to select the proper cell in delay-based architectures (which is both area and power consuming). As the DPWM resolution (n) increases, the circuit resources required for the encoding increases exponentially, making this kind of approach less desirable for small area, low power IC applications. An alternative approach considers binary-weighting the delay cells so that they can be directly driven by the digital code. This architectural approach is depicted in Fig. 14.4. In this case, the delay line is composed of cells whose delays are scaled with binary weights. The architecture considers direct use of switches driven by the digital code that either bypass or include the delay of a given binary-weighted cell in the signal path, so that the output pulse width corresponds to the expected value of the digital-to-time conversion. Note that while the binary weighted cells can actually be implemented as a series connection of unary cells (for the sake of regularity of the integrated-circuit layout and hence matching), the required area resources of both approaches are similar as regards the delay line, but drastically reduced in the binary-weighted case due to the lack of an area-consuming multiplexer. A disadvantage of this binary-weighted DPWM architecture is that the linearity of the digital to time domain conversion degrades to an extent that even the monotonicity is not inherently guaranteed by the architecture itself, as is the case with thermometer-decoded unary cells. As the number of bits increases, it becomes increasingly difficult to guarantee monotonicity because of the large layout area in which the cells are spread out. A certain matching between cells has to be provided to achieve a given linearity performance, requiring statistical analysis. It can be noted that similar binary-weighted architectures are used in Nyquist-rate currentsteering D/A converters [20]. In order to leverage the advantages of the thermometer code approach (the inherent linearity), while obtaining a small area, a compromise can be obtained by segmentation. Therefore, in order to trade-off resources versus performance, a segmented architecture -depicted in Fig. 14.5 – can be considered [21], for which the

delay cell

dn

2n cells

dn

d1

d1

d0

d0

Fig. 14.4 Binary-weighted delay-line digital pulse width modulator architecture

DPWM out ton Ts

14 Data Conversion Pulse-Width Modulators . . .

289 Set S Reset

Q R

. . .

d[n-1:n-q] q

Q0

Q1

Q2

Q2q–1 Q2q

0

1

2

2q

ton DPWM out

Ts

2q:1 MUX

d[n-1:0] n input n-q

pwm1(t)

d[n-q-1:0]

d1

dn-q-1

dn-q-1

2n-q cells

...

d1

d0

pwm2(t)

d0

Fig. 14.5 Segmented digital pulse width modulator architecture

DPWM circuit is segmented into two sub-DPWM circuits. The q most significant bits (the upper part of the segmented input code) are thermometer decoded and select from a delay line a given pulse width with 2q time slots. The output of this first segment (with a guaranteed monotonic characteristic) is connected in series with a binary-weighted group of cells driven by the n-q least significant bits. The cascade connection of the two sub-circuits results in time addition and hence the final pulse width is selectable with the resolution provided by the n-bit input code.

4 DPWM Classification There are two basic processes that occur in a DPWM performing digital to time domain conversion for a given n-bit digital input code, as shown in the conceptual diagram of Fig. 14.2. These are the time quantization and the selection of time slots. The first process divides the switching period Ts into 2n slots using either a delay line or through a fast clock of 2n fs frequency. The second process selects the proper time slot by either tapping the desired delay cell or by comparing the counter output to the digital code. Apart from these, the DPWM may also be categorized according to frequency synchronization. This leads us to a classification based on the following three criteria.

290

E. Alarc´on et al.

Multiple-element time quantization (Hybrid)

MSBs – Counter LSBs –Delay line ref [1] MSBs –Delay line LSBs –Counter ref [9]

Single-element time quantization (counter-based)

2N-element time quantization (delay line-based)

ref [4]

ref [5]

Fig. 14.6 Classification of the DPWM according to the time quantization scheme

4.1 Time Quantization Scheme For an n bit DPWM, the total switching period Ts can be quantized into 2n slots using a variety of possible schemes. On one extreme end is the case when a single element is used to measure each of these slots as is the situation with the counter-based DPWM. The other extreme is when each slot is measured using an individual element dedicated to that slot. The pure delay-line based approach is an example of this category. In between these two extremes, many choices exist with varying number of elements for the time quantization, and fall under the category of hybrid approach. For the classification purpose with respect to the time quantization scheme, the hybrid structure can be considered the most generalized case, whereas the counter based and the delay-line based approaches are the special cases. Figure 14.6 elaborates the classification according to the time quantization scheme.

4.2 Selection of Time Slots Once the switching period Ts is quantized, the next step is the selection of a time slot. Again the possibilities are numerous; the selection of time slot using a thermometer code (Fig. 14.2) or a binary code (Fig. 14.4) or a combination of the two in the segmented approach (Fig. 14.5) being the particular examples. Similar to the classification of Sect. 4.1, this classification has the segmented scheme as the most general approach, and then the two special cases are the thermometer code and the binary weighted code. The segmented design itself can have many different

14 Data Conversion Pulse-Width Modulators . . .

291 MSBs –Thermometer code LSBs –Binary weighted

2-segment / Multiple-segment

Segmented

MSBs –Binary weighted LSBs –Thermometer code etc.

Binary weighted code

Thermometer code ref [5]

Fig. 14.7 Classification of the DPWM according to the selection of time slots

possible structures. The number of segments can vary from a minimum of two (as shown in Fig. 14.5) to any number allowed by the size of the DPWM. Furthermore it can have the thermometer code as MSBs while employing the binary code for LSBs or vice versa in the case of 2-segment architectures and many more possibilities exist in multiple-segment architectures. It should be noted that the approach of Fig. 14.5, where the MSBs are realized using the thermometer code is preferred for improved linearity. The classification according to the time-slot selection scheme is shown in Fig. 14.7. Note that different designs perform differently with respect to area/power, linearity, complexity of design etc., as discussed earlier.

4.3 Frequency Synchronization Here the DPWM is classified into two categories: • Open loop, 2n td ¤ Ts • Closed loop, 2n td D Ts The first category (open loop) is the case when the DPWM time slot td is independent of the switching frequency fs and the second (closed loop) is the situation when the two are related by a fixed ratio. Delay-line based open loop design is an example of the first category where time slots are equal to the cell delay td whereas fs is externally imposed, and owing to process variations, is not related to td according to a constant ratio. Delay-line operating as a ring oscillator constitutes an example for the second category where the switching frequency fs is determined by the DPWM itself and hence always has a fixed relationship with td . Similar examples can be constructed in the counter-based design where in one case the counter clock is independent of fs and in another case the two are related by a fixed ratio.

292

E. Alarc´on et al.

As it will be discussed in Sect. 6, delay-locked-loop (DLL) techniques can be used to control the cell delay td , or to enable synchronization of a closed-loop DPWM to an external clock.

5 IC Implementation of a Segmented DPWM In order to validate the proposed segmented architecture, a particular 6-bit proofof-concept DPWM IC was designed and fabricated [17]. The design considers segmentation of the input digital code in three segments. The segmented structure of the DPWM is shown in Fig. 14.8: a positive edge of the square wave clock after passing through a delay compensation unit sets the SR flip-flop, which starts the output pulse. Simultaneously, the clock starts propagating through a series of delay cells: unit cell ! sixteen cells ! sixteen cells ! sixteen cells ! dummy cell. According to the values of the bits D5 and D4 , one of the taps in the delay line of segment-3 is selected, and the rising edge of the signal at that tap passes through the 4 to 1 multiplexer to the next segment. At this stage again a similar transfer of the signal occurs through the selection of the desired tap. Finally the positive edge of the signal resets the SR flip-flop marking the end of the output pulse. The presence of an extra delay cell at the beginning of each of the segmented stage, and

delay cell clk

OSC fs

16-cells

16-cells

16-cells

…

…

…

4-cells

4-cells

4-cells

…

…

…

ton Ts

Q

D3 D2 dummy

4:1 MUX DPWM out

Segment-2

4:1 MUX

Delay Compensation Unit

Segment-3

D5 D4

4:1 MUX

D1 D0

S R

clear

Fig. 14.8 Block diagram of the segmented DPWM integrated circuit prototype

Segment-1

14 Data Conversion Pulse-Width Modulators . . .

293

Fig. 14.9 Layout of the segmented DPWM test chip

one dummy delay cell at the end of these stages aims to provide similar driving and loading environment to each of the delay cells in the structure. Note that due to multiplexers and extra unit cells in the beginning of each stage, a certain amount of extra delay is introduced in the path of the signal from the clock input to the Reset input of the SR flip-flop. This extra delay is almost constant irrespective of the D5 -D0 selection set. To balance this extra delay, the delay compensation unit, which is composed of matching delay and multiplexer cells, is employed. With the aim of comparing the segmented architecture with a plain thermometercode design (shown in Fig. 14.3a), these two versions of the DPWM were implemented on prototype ICs in a 0.5 CMOS process. Both DPWM’s are designed to operate at the switching frequency of 1 MHz. The layout of the segmented architecture is shown in Fig. 14.9. The DPWM area is 0.07 mm2 . There is a significant area and complexity improvement in the segmented design over the plain delay-line based design. Furthermore, due to the smaller number of units in the design, power consumption is also reduced. A comparison is presented in Table 14.1. Figure 14.10a and b show experimental results of the implemented DPWMs: the measured duty ratio of the output pulse as a function of the digital command

294

E. Alarc´on et al.

Table 14.1 Comparison of the segmented and the thermometer code experimental DPWM ICs Current (A) # of Delay # of MUX Area @Vin D5.0 V, Routing IC cells cells (mm2 ) complexity Linearity fclk D 1 MHz Segmented 75 6 0.0675 111 Low Poor Thermometer- 67 24 0.0833 132 High Good code

a

b

0,9 0,8

0,7

Vin = 3.3V

0,7

Vin = 3.3V

0,6

0,6

0,5 duty ratio

duty ratio

0,8

0,5 0,4

0,4

0,3

0,3

0,2

0,2

0,1

0,1

0 1

8

15

22

29

36

43

50

57

64

0

k

1

8

15 22 29 36 43 50 57 64 k

Fig. 14.10 Experimental results: measured duty ratio of the output pulse as a function of the 6-bit digital command k, for Vin D 3.3 V, (a) segmented DPWM IC (b) thermometer code DPWM IC

input, validating their functionality. The segmented DPWM, as expected, shows a greater degree of nonlinearity in its digital to time domain conversion. At closer inspection, it can be seen that this nonlinear character is periodic with a periodicity equal to an interval of four in the digital duty command. For closed-loop DC-DC controller applications, a degree of nonlinearity can be tolerated. Nevertheless, it is important to point out that, compared to a plain thermometer-code design, the segmented design requires a much more careful layout to ensure monotonicity.

6 Hybrid DPWM with Digital DLL The DPWM described in this section includes a digital delay locked loop around a delay-line with discretely programmable delay cells so as to achieve constantfrequency clocked operation with the best possible resolution over a range of process or temperature variations. In [22–25], analog phase-locked-loops (PLL) or delay-locked-loops (DLL) are proposed to adjust the delay of the delay line in a DPWM in order to synchronize the operation to an external clock. Here, it is described a simple hybrid DPWM with

14 Data Conversion Pulse-Width Modulators . . .

295

Fig. 14.11 (a) Delay cell module with adjustable delay of 1:m. (b) Hybrid DPWM with adjustable delay for the output D1

digital DLL. The proposed DPWM module is fully synthesizable, requires relatively small hardware resources, and can be synchronized to an external clock over a range of process or temperature variations. It is suitable for both FPGA and custom chip implementations. To guarantee monotonicity and near optimum linearity, the total delay of the delay line should be equal to one clock period of the input clock. However, even for a very careful delay line design, the cell delay varies with process and temperature. Therefore, an adjustable delay cell is required to achieve the desirable delay through the delay line. Furthermore, an active control scheme is required to control the delay through each individual delay cell. The delay through each individual delay cell should be adjustable to achieve control for the entire delay of the delay line. Figure 14.11a shows a typical delay cell (the kth cell of the delay line), in which the delay can be adjusted with the ratio of 1:m. There are m parallel branches that connect the input ik 1 to ik . If the first branch has a delay of dt, the mth branch has a delay of mdt. The control lines a1;k to am1;k are assigned with a thermometer code that selects the appropriate delay branch and provides the desired delay through the delay cell. Figure 14.11b shows the hybrid DPWM structure with an adjustable delay for the delay line. The overall structure is similar to what is shown in Fig. 14.5, but the delay cells have adjustable delay. The signal cnt out is the output of a counter. 32 delay cells are shown in this DPWM realization, which corresponds to 5-bit resolution. The signals d contp are the control signals adjusting the individual delay of the delay cells. The signal Dt at the input, Dt30, and Dt31 at the output of the delay line are tapped out to construct the delay locked loop feedback and produce the control signals d contp. A flip-flop is inserted at the input of the delay line to eliminate possible glitches introduced by the combinatorial logic comparator. To compensate for one cycle delay introduced by the flip-flop, the output of the counter is compared with dutyMSB–1 instead of dutyMSB. The value of dutyMSB D msb(duty) corresponds to the most significant bits of the duty cycle command. This module generates the DPWM output D1.

296

E. Alarc´on et al.

Fig. 14.12 Digital delay locked loop

The delay locked loop (DLL) is shown in Fig. 14.12. The signals Dt, Dt30 and Dt31 from the delay line are driving the DLL. In an ideal case, the delay through the delay line should be equivalent to the width of the Dt signal at the input of the delay line. Therefore, the falling edge of the input signal, Dt, should be aligned to the rising edge of the output signal, Dt31. In the DLL controller a flip-flop is used to detect the misalignment of the edges. If at the falling edge of the signal, Dt, a high value for the Dt31 is detected, the delay of the delay line should be increased. If at the falling edge of the signal Dt, a low value for the Dt31 is detected, the delay of the delay line should be decreased. To provide the delay control signals, and adjust the delay of the delay cells, a simple shift register is used. To increase the delay through the delay line, a logic zero is shifted to the left, and to decrease the delay, a logic one is shifted to the right. The output of the shift register is a thermometer coded signal that can control the delay of each individual delay cell. Figure 14.12b shows the timing diagram for a typical hybrid DPWM, which has a combination of a 2-bit counter and an l-bit delay line. The set point of the output D1, signal S1, is always fixed and occurs when the counter is zero. The reset point R1, on the other hand, may occur at any time in one clock period of input clk. Therefore, only one delay line is required to generate the signal D1. The input dead times are applied to the output signal D2. As a result the set and reset events of the output D2 may happen at any time inside of one clock period of the input clk. One or two delay lines can be used to generate the signals S2 and R2. To provide the output D2 with one delay line a more sophisticated combinatorial logic to create dutyMSB and dutyLSB is required. It should also be noted that only one counter is required for the two outputs D1 and D2. The two delay lines for the output D2 are identical and matched with the delay line. Therefore, the same control signals d contp, generated from the DLL, are used to control the delay of the delay cells. An implemented hybrid DPWM with DLL has a 10-bit resolution, with a 5-bit counter and a 5-bit delay line (32 delay cells). The input clock frequency to this DPWM is fclk D 25 MHz. The DPWM provides a switching frequency of fsw D 25 MHz/25 D 781 kHz. The delay through the delay line should be equal to

14 Data Conversion Pulse-Width Modulators . . .

297

Fig. 14.13 (a) Experimental results for the synchronized delay line (b) Output D1 for different values of duty cycle. (c) Output of the hybrid DPWM with digital DLL. (d) Comparison of 5-bit counter-based and 10-bit hybrid DPWM

one clock period, deltot D 1/25 MHz D 40 ns. For the 32 delay cells the average and nominal delay of one delay cell is dt D 40 ns/32 D 1.25 ns. A delay cell is chosen with the adjustment ratio of 1:2. Therefore, one 32-bit shift register is required to provide 32 lines of delay control signals, d contp. Figure 14.13a shows the experimental waveform of Dt, the input clock of the delay line and Dt31, the propagated clock at the output of the delay line. It is shown that the falling edge of Dt is aligned with the rising edge of Dt31. The delay control signals, d contp, are monitored for this experimental setup, and it is observed that 12 delay cells are adjusted to the low delay value and the rest of 20 delay cells to the high delay value. Figure 14.13b shows the change in the output of D1 when the 10-bit duty cycle command changes by one step from the value of 3 C5 Hex to 3 C9 Hex. Figure 14.13c shows the two outputs of the hybrid DPWM, D1 and D2. Two dead times td1 and td2 are applied to the output D2. The set and reset signals S1 and R1 determine the rising and falling edge of the signal D1. It is noted that

298

E. Alarc´on et al.

the experimental setup provides a 10-bit DPWM with a switching frequency of fsw D 781 kHz DPWM, while the input clock frequency is fclk D 25 MHz. This clock frequency would have produced only 5-bit resolution using a simple, counterbased DPWM with the same fsw. A simple low-pass filter using a resistor and a capacitor is placed on the DPWM output, D1. The duty cycle is then slowly increased from zero to the maximum value of d D 0.985. Figure 14.13d compares the output of the lowpass filter for the two cases of the 5-bit, counter based DPWM and the 10-bit, hybrid DPWM, where the remaining 5-bits are provided by the delay line. In both cases, fsw D 781 kHz, and fclk D 25 MHz. It is observed that the hybrid DPWM significantly increases the resolution of the counter based DPWM, while maintaining the linearity.

7 DPWM Based on Second-Order Sigma–Delta Modulator One of the main limitations to maximum switching frequency at which digital controllers can be effectively used in low-power applications, as discussed, is the power consumption of the digital pulsewidth modulator (DPWM). It is usually proportional to the product of the switching frequency and the DPWM resolution, which needs to be sufficiently high to eliminate undesirable limit-cycle oscillations. Research works discussed before demonstrate high-resolution low-power DPWM controllers that operate at switching frequencies between 400 kHz and 2 MHz. Compared to their readily available analog counterparts the presented controllers still operate at five to ten times lower switching frequencies. Consequently, they require significantly larger power stages that nullify most of the digital control advantages. Even more, these digital systems will probably not be able to regulate state of the art low-power integrated SMPS that operate at switching frequencies in the order of tens of MHz. In this section we briefly review operation of conventional sigma-delta digital pulsewidth modulators and discuss problems of slow convergence and low frequency tones appearing in first-order sigma-delta structures. Accordingly, this section also presents a second-order †– DPWM architecture that minimizes effects of mentioned problems. As illustrated in Fig. 14.14a, †– DPWM consists of a low-resolution lowpower DPWM (core DPWM) capable of operating at high switching frequencies and a †– modulator, which improves effective resolution of the core DPWM. In the design shown in the figure, the effective resolution of a 4-b DPWM core is improved to 10 b. The †– operation is based on the well known noise-shaping concept [22] widely used in analog-to-digital and digital-to-analog converters. Over several switching cycles, the †– modulator varies dLR [n], the lowresolution input of the core DPWM, between few of 16 possible values to achieve a high-resolution average duty ratio value, equal to the 10-b input. When connected to a switching converter power stage, no additional hardware is needed for averaging. It is naturally performed with the filtering components of the power stage.

14 Data Conversion Pulse-Width Modulators . . .

299

a d[n] 10-bit

ed [n] +

+

– dtr [n]

x[n]

Delay (Z –1)

+ x[n]

dLR [n]

Truncator

10-bit

10-bit

Core c(t ) DPWM

4-bit

4 MSBs Σ-Δ Moduclator

b ed [n]

10-bit d[n]

+ –

x[n]

Delay (Z –1)

+ +

10-bit

Truncator

dLR[n]

Core c(t ) DPWM

4-bit 2 +– H(z)

Delay (Z –1)

Fig. 14.14 (a) First-order sigma-delta simplified block diagram (b) Block diagram of a 10-b, second-order, sigma-delta DPWM with 4-b core

Several publications [21, 22, 26–29] show DPWM architectures based on various modifications of the first-order sigma–delta concept. These systems increase switching frequency, but still require a core DPWM with relatively high resolution (6–8 b) and often suffer from low-frequency noise at the converter output. Additionally, in most of the realizations, the bandwidth of voltage control loop is significantly reduced. The problems of slow convergence toward high-resolution input and lowfrequency tones existing in first-order †– modulators have been extensively analyzed in research related to oversampling ADCs and digital-to-analog converters (DACs). It has been shown that second-order †– architectures strongly suppress low-frequency tones and have faster convergences. In the following, it is discussed a second-order multibit †– DPWM architecture that allows operation at programmable constant switching frequencies. The design of second-order multibit †– DPWM, shown in Fig. 14.14b, is inspired by power DAC [21] architectures. However, the main difference is in the system complexity. Since the DPWM does not require a very fast power-consuming ADC in the feedback path (in this case only a truncated signal is sent back) its implementation is significantly simpler, and can be fully realized with digital components. To minimize size and power consumption, the second-order †– DPWM of Fig. 14.4 is modified and realized as shown in Fig. 14.15a, b. Instead of using 10-b

300

E. Alarc´on et al.

a d [n]

10-bit

a

+

Limiter

–

10-bit

8-bit b

– +

dLR[n]

x[n]

c

7-bit ed [n] 2

6-bit Delay (z–1)

6-bit

4-bit (MSB)

Core DPWM

c(t)

6-bit (LSB)

Delay (z–1)

Fig. 14.15 (a) Error-feedback second-order sigma-delta DPWM (a) functional block diagram and (b) practical implementation (c) Die photo of a complete high frequency digital controller IC including the second-order †– DPWM

signal x[n], the †– modulator processes the truncation error only. This structure is known as error-feedback and performs the same function as the above described second-order system utilizing much simpler digital hardware. In this case, only two adders are used, and ed [n] and dLR [n] are just six least significant bits (LSBs)

14 Data Conversion Pulse-Width Modulators . . .

301

and four MSBs of x[n], respectively. In addition, each delay block is realized with only six D flip-flops and the sizes of the adders are reduced accordingly. The two multiplication block is implemented as a simple 7-b logic shifter. The function of the limiter is to restrict dLR [n] to positive values and prevent overflows. A complete digital controller architecture is implemented on an application specific integrated circuit (ASIC) and fabricated in 0.18 m CMOS process. The fabricated chip is then tested with a 3.3 V, 750-mW buck converter operating at switching frequency of 12 MHz. In addition, the results of transistor-level simulation of the same design with adjusted parameters for operation at switching frequency beyond 100 MHz are shown in [30] to fulfill appropriate performance.

8 Conclusions A high-frequency, high-resolution digital pulse-width modulator (DPWM) is a key component for successful IC realization of practical digital control for highfrequency switching power converters. This chapter presents a survey and a comprehensive classification of DPWM architectures. The classification is based on three criteria: time quantization scheme, selection of time slots, and frequency synchronization. Previously presented designs are identified as particular cases of the proposed classification. A segmented DPWM architecture is explored aiming to optimize resources by segmentation of the input digital code to drive binary-weighted delay cells and thermometer-decoded unary delay cells. A particular proof-of-concept design case consisting of a three-segment six-bit DPWM integrated circuit operating at 1 MHz switching frequency, with low power consumption and very small silicon area (less than 0.07 mm2 in a standard 0.5 m CMOS process) is described, and experimental results are shown to validate its functionality. An extended fully synthesizable version of a DPWM considers a digital delay locked loop around a delay-line with discretely programmable delay cells to achieve constant-frequency clocked operation with the best possible resolution over a range of process or temperature variations. Finally, it is shown how a second-order †– modulator DPWM architecture allows a significant increase of switching frequency and effective DPWM resolution.

References 1. B.J. Patella, A. Prodic, A. Zirger, D. Maksimovic, High-frequency digital PWM controller IC for DC-DC converters. IEEE Trans. Power Electron. 18(1), 438–446 (2003) 2. V. Peterchev, J. Xiao, S.R. Sanders, Architecture and IC implementation of a digital VRM controller. IEEE Trans. Power Electron. 18(1), 356–364 (2003) 3. P. Dancy, R. Amirtharajah, A.P. Chandrakasan, High-efficiency multiple-output DC–DC conversion for low-voltage systems. IEEE Trans. VLSI Syst. 8(3), 252–263 (2000)

302

E. Alarc´on et al.

4. A.V. Peterchev, S.R. Sanders, Quantization resolution and limit cycling in digitally controlled PWM converters. IEEE Trans. Power Electron. Part 2 18(1), 301–308 (2003) 5. A. Syed, E. Ahmed, E. Alarc´on, D. Maksimovic, Digital PWM architectures. IEEE PESC 6, 4689–4695 (2004) 6. M. Barai, S. Sengupta, J. Biswas, Digital controller for DVS-enabled DC–DC converter. IEEE Trans. Power Electron. 25(3), 557–573 (2010) 7. H. Ahmad, B. Bakkaloglu, A digitally controlled DC-DC buck converter using frequency domain ADCs, in IEEE APEC 2010, Palm Springs, 2010, pp. 1871–1874 8. E.G. Soenen, A. Roth, J. Shi, M. Kinyua, J. Gaither, E. Ortynska, A robust digital DC-DC converter with rail-to-rail output range in 40 nm CMOS, in IEEE ISSCC, 2010, Austin, 2010, pp. 198–200 9. H. Peng, A. Prodic, E. Alarcon, D. Maksimovic, Modeling of quantization effects in digitally controlled DC–DC converters. IEEE Trans. Power Electron. 22(1), 208–215 (2007) 10. G.Y. Wei, M. Horowitz, A low power switching power supply for self-clocked systems, in International Symposium on Low Power Electronics and Design, 1996, Monterey, 1996, pp. 313–317 11. P. Dancy, A.P. Chandrakasan, Ultra low power control circuits for PWM converters, in IEEE PESC’97, St. Louis, 1997, pp. 21–27 12. J. Goodman, A.P. Dancy, A.P. Chandrakasan, An energy/security scalable encryption processor using an embedded variable voltage dc/dc converter. IEEE J. Solid-State Circuits 33(11), 1799 (1998) 13. H. McDermott, A programmable sound processor for advanced hearing aid research. IEEE Trans. Rehabil. Eng. 6(1), 53 (1998) 14. H. Gwee, J.S. Chang, H. Li, A micro-power low-distortion digital pulse width modulator for a digital class D amplifier. IEEE Trans. Circuits Syst. – II 49(4), 245 (2002) 15. E. O’Malley, K. Rinne, A programmable digital pulse width modulator providing versatile pulse patterns and supporting switching frequencies beyond 15 MHz, in IEEE APEC 2004, Anaheim, 2004, pp. 53–59 16. A. Djemouai, M. Sawan, M. Slamani, New CMOS integrated pulse width modulator for voltage conversion application, in Proceedings of the 7th IEEE ICECS, 2000, Jounieh, 2000, pp. 116–119 17. A. Syed, Digital pulse width modulators: architectures and feed-forward compensation, M.S. thesis, Department of Electrical and Computer Engineering, University of Colorado at Boulder, May 2004 18. A. Syed, E. Ahmed, D. Maksimovic, Digital PWM controller with feed-forward compensation, in IEEE Applied Power Electronics Conference, 2004, Anaheim, 2004 19. F. Rad, W. Dally, H.T. Ng, A. Senthinathan, M.J.E. Lee, R. Rathi, J. Poulton, A low-power multiplying DLL for low-jitter multi gigahertz clock generation in highly integrated digital chips. IEEE J. Solid-State Circuits 37(12), 1414–1420 (2002) 20. J. Deveugele, M. Steyaert, A 10-bit 250-MS/s binary-weighted current-steering DAC. IEEE J. Solid-State Circuits 41(2), 320–329 (2006) 21. J.M. Goldberg, M.B. Sandler, New high accuracy pulse width modulation based digital-toanalogue convertor/power amplifier. IEEE Proc. Circuits Devices Syst 141(4), 315–324 (1994) 22. M. Norris, L. Marco, E. Alarcon, D. Maksimovic, Quantization noise shaping in digital PWM converters, in Power Electronics Specialists Conference, 2008. PESC 2008, Rodes, IEEE, 15–19 June 2008, pp. 127–133 23. E. O’Malley, K. Rinne, A programmable digital pulse width modulator providing versatile pulse patterns and supporting switching frequencies beyond 15 MHz. IEEE APEC 1, 53–59 (2004) 24. A. Djemouai, M. Sawan, M. Slamani, New CMOS integrated pulse width modulator for voltage conversion applications. IEEE Conf. Electron. Circuits Syst. ICECS 1, 116–119 (2000) 25. A. Parayandeh, A. Prodic, Programmable analog-to-digital converter for low-power DC–DC SMPS. IEEE Trans. Power Electron. 23(1), 500–505 (2008)

14 Data Conversion Pulse-Width Modulators . . .

303

26. K.M. Smith, K.M. Smedley, M. Yunhong Ma, Realization of a digital PWM power amplifier using noise and ripple shaping, in Proceedings IEEE PESC Conference, 1995, Atlanta, 1995, pp. 96–102 27. Z. Lu, Z. Qian, Y. Zeng, W. Yao, G. Chen, Y. Wang, Reduction of digital PWM limit ring with novel control algorithm, in Proceedings of IEEE APEC Conference, 2001, Anaheim, 2001, pp. 521–525 28. Z. Lukic, K. Wang, A. Prodic, High-frequency digital controller for dc–dc converters based on multi-bit sigma–delta pulse-width modulation, in Proceedings of IEEE APEC Conference, 2005, Austin, 2005, pp. 35–40 29. A. Kelly, K. Rinne, High resolution DPWM in a dc–dc converter application using digital sigma–delta techniques, in Proceedings of IEEE PESC Conference, 2005, Recife, 2005, pp. 1458–1463 30. Z. Luki´c, N. Rahman, A. Prodi´c, Multibit † PWM digital controller IC for DC-DC converters operating at switching frequencies beyond 10 MHz. IEEE Trans. Power Electron. 22(5), 1693–1707 (2007)

Chapter 15

Advanced Power Management for Low Power Medical Applications Kristof Quaegebeur and Jan Crols

Abstract The challenges of designing the power management for low power medical applications are proposed. Safety is the primary focus, while maintaining high efficiencies at low output power and dealing with high voltage input and output supplies. This is illustrated by means of a single inductor multiple output switching inductor regulator and its application inside the power management of a medical implant.

1 Introduction Since decades the use of integrated circuits has helped the medical world to diagnose, cure or improve a patient’s life quality. Especially in the field of medical implants a wide array of new applications is rapidly emerging [1]. This is made possible by a combination of medical knowledge of the human body and improvements in IC design that allow more intelligence at low power consumption and in a small volume. Medical implantable devices can be passive or active. In the passive case the system does not interact with the body. A typical example is an implantable chip to monitor parameters like body temperature and blood pressure, which can be used in diagnosis. Other examples include an implantable RFID that allows wireless identification. In case of active behavior the system will have actual interaction with the body. A well known example for an active implant is a cardiac pacemaker where the heart rhythm is controlled by means of electric pulses. Another form of stimulation through electric pulses is nerve stimulation.Many new applications as pain relief and

K. Quaegebeur () • J. Crols AnSem NV, Hertogstraat 141, 3001 Heverlee, Belgium e-mail: [email protected] M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 15, © Springer ScienceCBusiness Media B.V. 2012

305

306

K. Quaegebeur and J. Crols

Fig. 15.1 Vagal nerve stimulation

appetite control are rapidly arriving in this area. Some are still in an experimental phase. One of these applications is vagal nerve stimulation, shown in Fig. 15.1, for curing depression. The vagus nerve is one of the cranial nerves. While vagal nerve stimulation was already known to help people with epilepsy, experiments have shown the effect of electric pulses on this nerve on the mood of a patient [2]. Therefore it is now used as a possible treatment of severe depression. Designing ICs for medical implants is different to other IC designs in a number of ways. Key differentiating properties over other applications are safety, a very low power consumption and, in the case of nerve stimulation, the use of high voltage levels. The human body is a hostile environment to the chip but the chip must not become a hostile entity to the body. This means safety is of the utmost importance. Since the available power is limited, high efficiency is required for all building blocks. It will be necessary that the efficiency is improved by using different power domains for different functions. For nerve stimulation, voltages above 10 V are needed. This means a high voltage tolerant technology must be used. Note that the need for multiple power domains is for sure true when high voltages are needed. All these elements lead to the need of a dedicated power management. Safety and efficiency are key features. This must be done in a high voltage tolerant technology. Flexibility of the power domains will increase the efficiency as each function will then be able to operate at its most optimal power supply level. A microprocessor will have different power requirements than an analog-to-digital converter. While high efficiency high voltage power management is often discussed in literature, it seldom

15 Advanced Power Management for Low Power Medical Applications

307

deals with a low power application. In contrary to most of these applications, the targeted power here is in the microwatt or milliwatt range, requiring specific power management design techniques not normally seen in high voltage technologies. In Sect. 2 the ways to bring power inside the human body are discussed. Building blocks that derive the different required power supply levels from the input power supply are discussed in Sect. 3. An example of a switching inductor regulator is discussed in Sect. 4 and its possible application in a medical implant is shown in Sect. 5.

2 Bringing Power Inside the Body The power management block will always run from at least one input power supply source. This is not different in a medical implant. From this supply the full system will be powered. Generally spoken there are three methods to bring electric power inside the human body. A first method is to include a power source inside the body. This comes down to including a battery in the implanted system. The major advantage is reliability. The battery will supply a predictable and stable power supply for the system to run on. The energy stored on a battery is however limited. This means the battery will need replacement or recharging. In the first case this comes down to new surgery for the patient. Even though smart location of the battery might decrease the complexity of the surgery, a certain risk and the discomfort of the patient remains. Replacing batteries can therefore only be acceptable if the life cycle of the battery is at least a few years. This is the case for pacemakers that can operate at a current consumption level as low as 5 A. For most other applications, this is not possible. Another major disadvantage of a battery is the safety risk. Batteries are filled with toxic materials and special precautions must be taken when implanted. A second method of providing power is to use inductive power coupling through the skin. This inductive power coupling is often done in combination with low data rate communication over the same link. The major advantage is avoiding a battery along with its hazardous materials and its requirement for replacement. In theory the life span of the implant is now a lifetime. The drawback from this approach is that the power supply generated by the inductive power coupling is not a clean and well controlled supply. Practical use of the inductive link involves sharing it with the data communication as well as sudden unavailability of the inductive link. The amount of power transferred through the inductive link is not constant and in most cases not actively controlled. This will result in an input supply voltage range that can vary widely. The difference between the highest and lowest possible input voltage can easily be a factor 3. Depending on the implementation, it is possible that all circuits powered by the input supply will need to be high voltage tolerant. Alternative is passively clamping the input voltage under good receiving conditions, but this dramatically reduces efficiency. These elements will complicate the design of the power management.

308

K. Quaegebeur and J. Crols

INDUCTIVE LINK

CORE

POWER THROUGH

CHARGER

RECHARGEABLE BATTERY – +

REGULATED SUPPLIES

Fig. 15.2 Possible concept of an implanted power management including charging of a rechargeable battery

A combination of the two methods above is also being considered more and more as an option. This is shown in Fig. 15.2. In this case a rechargeable battery is implanted and will serve as input power supply. Recharging of the battery is done through the inductive link. This approach combines advantages of both methods. No more surgery is needed because of a depleted battery and the input supply is stable and predictable independent of the availability of the inductive link. The disadvantage of insertion of toxic materials remains. The last method is to scavenge power from the human body itself. This involves transforming energy from motion or body heat into electric energy. These methods are still very experimental, but as technology will progress it is likely that they will become an important way to power low power medical implants in the future [3].

3 Deriving Well Controlled References and Supplies The previous section discussed what methods can be used to generate the input power supply for the implants power management block. From this power supply several other power domains must be regulated. Each power domains supply has its own characteristics in terms of voltage, programmability, accuracy and power capability. To generate these power supplies several approaches are possible. Some of them are discussed below in the light of their usability in low power medical applications.

3.1 Bandgap Bandgap circuits are widely used as main voltage reference in ICs. If well designed they provide an accurate voltage reference around 1.2 V that has only a limited

15 Advanced Power Management for Low Power Medical Applications Fig. 15.3 Typical CMOS implementation of a linear regulator

309

VDD_In

Vref

VDD_Out

dependence on power supply, temperature and process variations. If possible, trimming can be added to further decrease sample-to-sample variations. Current references can be derived from the voltage reference. As in any design, the need for accurate references cannot be underestimated. The bandgap voltage will be used as reference for the other regulated power supplies. Therefore it is most likely that the bandgap circuit will run from the input power supply. As stated before, it is possible that the input supply has a wide voltage range, even into the high voltage range. In this case the bandgap must be made high input voltage tolerant, while a good PSRR will be crucial.

3.2 Linear Regulator A linear regulator is a well known and often used circuit in power management. It consists of a common source or common drain transistor, a large capacitor at the output and circuitry to control the regulation. Charge is transferred from input to output supply through the pass transistor. This means that only regulating to a voltage lower than the input voltage is possible. Regulation is achieved by controlling the gate of the transistor with a signal that is proportional to the error of the output supply. A typical implementation is shown in Fig. 15.3. In terms of integration this is an attractive circuit for two reasons. In practical implementations it requires only one external component, namely the large output capacitor. Also, the noise footprint of the circuit is small. The theoretical efficiency is limited by the ratio of the output and input voltage. If multiple and flexible power supplies are needed with good efficiency, only relying on linear regulators is therefore not sufficient anymore. If supplies higher than the input are required, using linear regulators is even impossible.

310 Fig. 15.4 Implementation of a switched inductor boost regulator

K. Quaegebeur and J. Crols

VDD_IN

D2

VOUT

D1

3.3 Switched Capacitor Regulator A switching capacitor regulator consists of some switches that connect a few capacitors in a sequence such that an output voltage is generated. These circuits are often called charge pumps. The output supply can be higher than the input supply. This is an important advantage with respect to linear regulators. For low power requirements, a switched capacitor regulator can be fully integrated. However, the efficiency of a fully integrated solution tends to be limited to 60%. For higher efficiencies, external capacitors are needed. Moreover, switching capacitor regulators tend to have good efficiency only when the output/input ratio is an integer number [4]. This points out a major disadvantage of charge pumps: their lack of flexibility. For applications where a fixed output supply needs to be derived from an input supply that can vary, these circuits are not the best choice.

3.4 Switched Inductor Regulator A switching inductor regulator consists of at least one inductor, some switches and possibly one or more diodes. Like a switched capacitor regulator, the switching inductor regulator can generate a supply higher than the input power supply. This is called a boost regulator. A typical implementation is shown in Fig. 15.4. During the first phase, D1, current is built up in the inductor. This comes down to storing energy from the input supply in the inductor. In a second phase, D2, this energy is dumped on the output. This principle is also valid for the buck regulator, where the output supply is lower than the input supply. Due to this mechanism a theoretical efficiency of 100% is achievable. In practical implementations the efficiency will be lower due to conducting losses in the inductor and switches as well as losses to drive the capacitance of the switches. In a good design the efficiency of a switching inductor regulator will be better than the one of a linear regulator over a wide range of input/output ratios. The large noise footprint because of the switching is one of the drawbacks of these regulators. Another disadvantage is the poor possibility for full integration.

15 Advanced Power Management for Low Power Medical Applications

D1

Fig. 15.5 Inductor current and D1 signal in discontinuous current mode (DCM)

I(L)

t

t

D1

2 1 t

I(L)

Fig. 15.6 Alternatives for feedback – (1) Pulse width modulation (2) Pulse frequency modulation (3) Current mode feedback

311

3 t

For a switched inductor regulator, this is not possible due to the inductance and parasitic resistance that is needed for a practical implementation. If there is continuously current inside the inductor, the regulator is said to work in continuous current mode (CCM). At low loads, this mode is not attractive, as efficiency will be poor. Low power applications will run in discontinuous current mode (DCM). This means the inductor current will return to zero before a new pulse will be generated. This mode also allows to stop the regulator from switching for a while, which will be crucial to obtain good efficiencies at very low load. Typical inductor current as well as the phase D1 is shown in Fig. 15.5. Regulating is achieved by applying an appropriate switching scheme to the regulator. Controlling the feedback of a switching regulator tends to be rather complicated and comes in different flavors. Parameters to control are the width and frequency of the switching pulses. Therefore pulse width modulation (PWM) and pulse frequency modulation (PFM) of the measured error of the output supply are two straightforward approaches. Combinations, variations and automatic adaptations to the output load are possible, in analog and digital implementations [5–8]. A third category is current mode feedback [9]. In this method, the inductor current is sensed. In the first phase, the inductor current is charged up to a value proportional to the output supply error. This method provides better load and line regulation. Moreover, it has an inherent current limiting feature, which improves safety. A conceptual overview of these three categories is shown in Fig. 15.6.

312

K. Quaegebeur and J. Crols

In short: switching regulators are complex circuits that have more than one external component and introduce a lot of switching noise, but they offer the best efficiency and the possibility for output voltages higher than the input supply. In medical implants with high voltage output supplies, switching regulators are unavoidable. Switching inductor regulators have a higher flexibility in output/input ratios while maintaining good efficiency over a wide output power range, exactly what is needed for implantable applications. Note however that there is a risk involved when including an inductor in an implant. If a patient is subject to a large magnetic field, like in an MRI scan, this might influence the characteristics of the inductor coil. There is the possibility for multiple regulators to share the same inductor. This will complicate the control even more. However, since the space in a medical implant is limited, sharing the inductor is sometimes preferred to including extra inductors.

4 A SIMO Switched Inductor Regulator In this section the design of a single-inductor multiple output (SIMO) switched inductor regulator will be discussed. The goal is to derive three output supplies from one input supply. Some target specifications are shown in Table 15.1. One of these output supplies is a boost supply, generating up to 20 V for nerve stimulation. The other ones are buck supplies for various functions, including a rate adaptive supply for a digital signal processor in order to keep the digital power consumption to the minimum needed to achieve the timing performance under any given condition. Realizing these three supplies can be achieved by using a hybrid boost-buck structure as shown in Fig. 15.7. The most important specifications of the output supplies are listed below. The input supply has a range between 3.2 and 10 V. Given the input and output voltage, high voltage tolerant switches are needed at many places in the design. In the example design these were implemented using DMOS devices. All switches indicated with the letter D are DMOS devices. The SIMO regulation is obtained by time-multiplexing. This means regulation for a given output can only occur in an assigned time slot. For every supply, the corresponding time slot is fixed. This is done for simplicity reasons. More dynamic

Table 15.1 Target specifications Range [V] Programmable? [Y/N] Max output power [mW] Ripple [mVptp] Target optimal power [mW] Efficiency at Popt [%] Efficiency at Pmax [%]

VDD Boost 4.5–20 Y 50 100 10 85 75

VDD Buck1 0.9–1.3 Y 10 20 1 75 65

VDD Buck2 1.4–1.8 Y 10 20 1 80 70

15 Advanced Power Management for Low Power Medical Applications

313

DMp3

VDD_In

DMp2

DMp1

DMn3 DMn1

Mn1

VDD_Boost

VDD_Buck1

DMn2 Mn2

VDD_Buck2

Fig. 15.7 Switched Inductor SIMO (boost – buck – buck) regulator

slot assignment solutions exist [10]. The regulator works in DCM since the output power is low and to decrease cross regulation between the supplies. The inductor current must be zero at the start of each time slot. The target output power for optimal efficiency is low. Losses that degrade the efficiency can be split up in three categories. First, there are losses that are present no matter if the regulator is switching or not. Typical examples are biasing currents or digital control switching. We call these the quiescent losses. Secondly, there are losses that are present every time the regulator switches but independent from the amount of charge that is being transferred. The charging and discharging of the switch gates are a typical example of this. We call these the switching losses. Finally there are losses that are proportional to the amount of charge that is being transferred to the output. These will be mainly the losses in the non-zero resistances in the regulator and will be called conducting losses. Conducting losses and switching losses can be traded off. Increasing the size of the switch will increase the switching losses, but decrease the conducting losses. However, if the maximum desired output power and the target efficiency at that power are defined, this tends to give little degree of freedom in this trade-off. At higher output load, the losses tend to be dominated by the conducting losses. The targeted efficiency at maximum output power defines the maximum allowable resistance in the conducting path. It makes little sense to go below this value. The remaining freedom lies in distributing the resistance over the different switches. Since this regulator is a hybrid system, it is possible that optimized values for boost are conflicting with the values for buck. Another way to optimize the switching versus conducting losses is to drive the switches with an appropriate Vgs. In most designs the maximum possible Vgs is chosen to minimize resistive losses and possibly simplify the switch driver design. However, charging the gate of the switch to 3 V consumes more than double the power that is necessary to charge the same gate to 2 V. Since the on resistance only

314

K. Quaegebeur and J. Crols

decreases slowly with Vgs above the threshold voltage, the decreased conducting loss will seldom make up for the extra switching loss, even at high output power. To illustrate this idea, some formulas are derived. The total loss caused by one switch that is conducting during a full buck cycle at maximum output is given by: 4 2 Ploss D Ploss; cond C Ploss; sw D 3 Rsw Iout C Csw Vgs2 f 3

(15.1)

The factor 3 originates from the fact that the regulator can only use one of the three time slots. Now we will make the crude assumption the resistance of the switch will vary inversely proportional with the Vgs of the switch. We call R1V the switch resistance at a Vgs of 1 V. This results in: Ploss D 4

R1V 2 I C Csw Vgs2 f Vgs out

(15.2)

Optimizing this formula for minimum loss yields in: s Vgs;opt D

3

2 4R1V Iout Csw f

(15.3)

Note that the derived formulas are not usable in the actual design but illustrate that the Vgs of the switch is a design parameter which is often overlooked. This is an optimization that will only be usable for low power applications. As the output current in (15.3) increases, the optimal Vgs will sooner or later hit the technology limit. Dedicated drivers are needed to control the switch gates. These drivers need to charge and discharge a large load, which a switch gate will unavoidably be. For most switches, fast toggling is required, as the available time in a time slot is limited. This has to be done operating on a variable and possibly high voltage power supply. The resulting Vgs must be close to a predefined value, as was discussed above, although the accuracy is not that stringent. The source node of the switch is not necessarily a fixed or slowly varying voltage. And finally, this should be done in an efficient way, such that the amount of charge needed by the driver itself is negligible compared to the charge needed to charge the switch gate. Quickly driving a large load in a controlled way is not possible with a simple OTA at the required power consumption. Therefore a large part of the charging is done without feedback. Only when the output is already close to its final value, feedback takes over. The implementation of a fast high voltage NMOS switch driver is shown in Fig. 15.8. Transient waveforms are shown in Fig. 15.9. The charge for the switch gate will be delivered by the current source DMp1. When switch gate needs to be charged, the inverse controlling signal is capacitively coupled to the gate of the charging PMOS current source. The gate is temporarily pulled down and the current source will instantly start charging the switch gate. By the time the

15 Advanced Power Management for Low Power Medical Applications

315

VDD_IN

DMp1

IN

REP

G

S

Fig. 15.8 Topology of fast high voltage NMOS switch driver

gate of the current source is back to its initial value, a simple OTA with limited current consumption has taken over the regulation of the switch gate towards a replica voltage that is two diodes above the switch source. As a result, the driver is able to charge a switch gate of more than 200 pF to 1.5 V in 30 ns. This is shown in Fig. 15.9. The efficiency of the driver is 95%. In practice, this comes down to an increase of the switching losses with 5%. The driver for a PMOS switch driver can be designed with the same approach. Quiescent current consumption has a big impact on the efficiency at low target output powers. Therefore analog circuits must be powered down whenever not in use as much as possible and remaining biasing current must be kept small. This approach is also followed in the design of the drivers. There is however a risk in reducing the current, especially in the branch controlling the switch gate. If the gate becomes too high impedance, it becomes vulnerable to coupling from the large switching signals in the regulator. Another major source of quiescent current is the majority of the digital circuitry. Recall that quiescent current was defined as the current that is consumed even if there is no switching activity. The preferred approach for the digital design is to keep everything as simple as possible and avoid unnecessary transitions by sticking with asynchronous design. Finally, using a low power digital library will also further decrease these quiescent losses. The feedback system that is implemented is PWM feedback with PFM below a programmable threshold of the pulse width. Using PWM leads to a low output ripple at low output power. However, if the pulses become too small, the capacitive losses are much bigger than the actual charge that is transferred to the output. Therefore it

316

K. Quaegebeur and J. Crols

IN [V]

3 2 1 0 4.7

4.8

4.9

5

5.1

5.2

5.3

5.4

5.5 x 10–6

Gate DMP1 [V]

5 4 3 2

4.7

4.8

4.9

5

5.1

5.2

5.3

5.4

5.5 x 10–6

1.5

G [V]

1 0.5 0 4.7

4.8

4.9

5

5.1

5.2

5.3

5.4

5.5 x 10–6

Ron [Ohm]

4 3 2 1 4.7

4.8

4.9

5

5.1 Time [s]

5.2

5.3

5.4

5.5 –6

x 10

Fig. 15.9 Fast charging of an NMOS switch gate

is better to determine a minimum pulse width below which PFM will regulate the loop. To improve efficiency the minimum pulse width varies with the output supply setting and the input supply voltage. Although not implemented, current mode feedback might be a good alternative, especially because of the feature of inherent maximum inductor current control. Another possibility is to use a fully digital feedback loop. If the complexity is kept low and a low power library is used, current consumption might be in favor of this digital solution. However, the biggest advantage would be avoiding a lot of sensitive analog circuits close to the noisy switching circuit. An additional advantage of a full digital control would be a decrease in area. A digital feedback loop also allows more flexibility towards adding new features. Although, as stated before, it should be kept in mind to limit the extra quiescent current consumption because of digital switching.

15 Advanced Power Management for Low Power Medical Applications

317

Table 15.2 Measured results Range [V] Max output power [mW] Efficiency at Popt [%]

VDD Boost 4.5–20 >50 79

VDD Buck1 0.9–1.3 >10 72

VDD Buck2 1.4–1.8 >10 75

VDD_Boost VDD_Buck1

SIMO REGULATOR HV BANDGAP

MUX

VDD_In

VDD_Buck2

LINEAR REGULATOR LINEAR REGULATOR

VDD_Analog

BIAS LV GEN BANDGAP

POWER ON/OFF CONTROL

VDD_Digital

VALID CONTROL

Power Management

Fig. 15.10 Possible architecture of the power management for a medical implant

Programmability is implemented by level-shifting the output supply with a programmable voltage to the input of the feedback loop. This is done by means of a programmable level shifter instead of a resistive ladder. This allows better control of mismatch and will increase the accuracy at the input of the feedback loop, especially for the high voltage output. This design was realized in a.35 m BCD 80 V tolerant technology in 5.3 mm². Some measured specifications are shown below (Table15.2). Note that the SIMO regulator is a multidimensional system. While the voltage of the regulated supply is not influenced by the conditions of other supplies, the efficiency of one supply depends on the setting and loading of the other ones. This makes it hard to have a complete definition of the efficiency of one output supply.

5 Power Management of a Medical Implant Figure 15.10 shows a possible architecture of a power management system for a medical implant. Apart from the discussed SIMO switched inductor regulator, it also consists of two bandgaps, bias current generation and two linear regulators. One block also controls the power on and power off sequence, will generate the power on reset

318

K. Quaegebeur and J. Crols

signal and checks if all supplies are in their valid operating area. The last feature is mainly for safety reasons. This will enable the chip to respond quickly to unwanted behavior. Safety is a primary focus throughout the design. This comprises robust design techniques and taking into account process variations. However, this is not enough. One time events, like cosmic rays, should not be able to bring the implant in a state where it will harm the patient or damage the chip. Given the mixed high voltage – low voltage nature of the chip, there are a lot of potentially hazardous nodes in the design. Damaging the chip would mean that the patient needs to undergo new surgery, which involves extra health risks. When designing for medical implants, also these one time events and their possible consequences need to be taken into account and covered by precautions on system or block level. Special attention must be given to the start-up sequence, since the power management must take care of its own power-up. As VDD In rises, the high voltage bandgap will be generated. This bandgap can be powered by a high voltage supply like VDD In. It generates a voltage around 1.2 V but with a limited accuracy of 10% across corner conditions. This voltage will be used as a first reference for the linear regulator that generates the supply VDD Analog. This is the supply that will power the bulk part of the low voltage analog circuits. Due to the limited accuracy of the high voltage bandgap, VDD Analog will also show this inaccuracy during startup. To ease the design of the analog blocks, a more precise VDD Analog is needed. The inaccurate VDD Analog will power a low voltage bandgap. Due to the better characteristics of the low voltage devices and due to the inclusion of sampleto-sample trimming, the low voltage will generate a very accurate 1.2 V reference. This will serve as new reference for the linear regulator which will now be able to regulate VDD Analog within 1% accuracy. After the precise VDD Analog is present, the switched inductor regulator can be started. A supply for digital circuitry VDD Digital is generated from one of the buck outputs as soon as it has settled.

6 Conclusions New applications are rapidly emerging for medical implants. In the design of the power management of a medical implant safety and efficiency are key features. Safety can be guaranteed on system level by implementing fail safe scenarios. Efficiency can be gained on system level by selecting the optimal type of building block as well as optimizing the building block itself. Medical implants are typically low power. This means some commonly used techniques in power management are not usable, while other aspects like quiescent current gain importance. As example the power management of a medical implant was discussed.

15 Advanced Power Management for Low Power Medical Applications

319

References 1. S. Oesterle, P Gerrish, P. Cong, New interfaces to the body through implantable-system integration, in Proceedings of the 58th IEEE International Solid-State Circuits Conference, San Francisco, Feb 2011, pp. 9–14 2. G. Elger, C. Hoppe, P. Falkai et al., Vagus nerve stimulation is associated with mood improvements in epilepsy patients. Epilepsy Res. 42, 203–210 (2000) 3.Biothermal power source for implantable devices, Patent US6640137 4. G. Palumbo, D. Pappalardo, M. Gaibotti, Charge pump circuits: power consumption optimization. IEEE Circuits Syst. Mag. 49(11), 1535–1542 (2002) 5. J. Xiao, A. Peterchev, J. Zhang, S. Sanders, A 4-A quiescent-current dual-mode digitally controlled buck converter IC for cellular phone applications. IEEE J. Solid State Circuits 39(12), 2342–2348 (2004) 6. F. Luo, D. Ma, Integrated switching DC-DC converter with a pulse-train/PWM control. IEEE Trans. Circuits Syst. Part II 56(2), 152–156 (2009) 7. M. Belloni, E. Bonizzoni, E. Kiseliovas, P. Malcovati, F. Maloberti, T. Peltola, T. Teppo, A 4-output single-inductor dc-dc buck converter with self-boosted switch drivers and 1.2A total output current, in Proceedings of the 55th International Solid-State Circuits Conference, San Francisco, Feb 2008, pp. 444–445 8. I. Furukawa, Y. Sugimoto, A synchronous, step-down from 3.6V to 1.0V, 1MHz PWM CMOS DC/DC converter, in Proceedings of the 27th European Solid-State Circuits Conference, Florence, Sept 2001, pp. 96–99 9. Y. Sugimoto, A CMOS current-mode buck DC-DC converter with a 240-kHz loop bandwidth and unaltered frequency characteristics using a quadratic and input-voltage-dependent compensation slope, in Proceedings of the 35th European Solid-State Circuits Conference, Athens, Sept 2009, pp. 460–463 10. Controlled multi-output DC-DC converter, Patent US6636022

Chapter 16

Feedforward Control of Switching Regulators Richard Redl

Abstract Feedforward control is a conceptually simple, highly effective and extremely robust, but not very well known or appreciated, technique for improving the dynamic regulation of switching regulators. Feedforward control is also used for stabilizing the switching frequency or the loop gain of free-running converters. The paper presents an overview of feedforward control, starting with small-signal inputvoltage feedforward and load-current feedforward in the basic dc-dc converters, both with voltage-mode and current-mode control. Then the extension of the feedforward concept to large input-voltage perturbations without or with loadcurrent feedforward will be discussed. This will be followed by the introduction of feedforward pulse-width modulators. The paper concludes with a discussion of miscellaneous practical topics (implementing input voltage feedforward in isolated converters with secondary-side control, feedforward to the input of the error amplifier, and feedforward stabilization of the switching frequency or the dc value of the control-to-output voltage gain).

1 Introduction Feedback control regulates the output voltage of the converter with a negative feedback loop, while feedforward control uses the input voltage, load current, or, in the case of programmed converters, the programming voltage for a predictive regulation of the output voltage. Since in pure feedforward control the output voltage is not sensed, the dc regulation tends to be mediocre, and for this reason feedforward control is usually combined with feedback control. Figure 16.1 shows the block schematic of the converter with feedback and multiple feedforward control loops.

R. Redl () ELFI S.A, Mont´evaux 14, Farvagny-le-Petit, CH, 1726 Switzerland e-mail: [email protected] M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 16, © Springer ScienceCBusiness Media B.V. 2012

321

322

R. Redl

Vin feedforward

Vref feedforward

Fr

Vref

Error amplifier

+ –

+

Vin

Fv

– Power converter

PWM +

Iout feedforward

Iout Rsense

+ Feedback

Vout

Fi

Fig. 16.1 Converter with combined feedforward and feedback control loops

The main advantage of feedforward control over feedback control is the fast dynamic regulation, i.e., the correction for an input-voltage or load-current perturbation happens very rapidly, virtually at the same time as the perturbation commences. On the other hand, with feedback control the correcting action starts only after the output voltage has changed due to the perturbation. Furthermore since feedforward control can ensure fast regulation without a wide-band feedback loop, the likelihood of fast-scale instability is much reduced. This also means that with feedforward the dynamic behavior essentially becomes independent from the compensation of the feedback loop. Feedforward control can be effectively used for improving the line or load transient responses and also for eliminating the sensitivity to the presence of an input filter. Additionally, feedforward control can be used for stabilizing the switching frequency of free-running converters or for stabilizing the loop gain. The main disadvantage of feedforward control is that a sensor is needed for sensing the input voltage or the load current.

2 Small-Signal Input-Voltage Feedforward Small-signal input-voltage feedforward is easy to implement and it effectively improves the transient response to input-voltage perturbations, especially in situations where the magnitude of the perturbation is only a small fraction of the steady-state value. Figure 16.2 shows the block schematics of a feedback-regulated converter with input-voltage feedforward to the input of the pulse-width modulator. In order to determine the optimal transfer function of the feedforward signal processor we need an equivalent small-signal representation of the system. Any of several available small-signal models of the converter can be employed. A suitable model that is independent of the converter topology or operating mode is the admittance-parameter model. That model describes the converter with the

16 Feedforward Control of Switching Regulators

Source impedance, input filter

323

vin(t) Dc-dc converter

+

Zin

vg(t) –

R

+ vout(t) –

d(t) Feedforward signal processor

iL(t)RS

PWM

Zfb – vff(t)

vc(t) +

–

verr(t)

Vref

+

Fig. 16.2 Input voltage feedforward to the input of the pulse-width modulator of a feedback regulated voltage-mode or current-mode controlled dc-dc converter vg(s)

ii(s)

io(s)

vin(s) yic(s)

Zf(s)

yoc(s) yoo(s)

yii(s) yio(s)

Fv(s)

vout(s)

vc(s) –

C

R

yoi(s)

+ verr(s)

– G(s)

+

vref(s)

Fig. 16.3 Small-signal admittance-parameter based equivalent circuit of the converter with feedback and input-voltage feedforward loops

relationships between the output-port and input-port currents [io (s) and ii (s), respectively] and the control, output and input voltages. Eq. 16.1 shows the relationships among those parameters. 3 2 vc .s/ io .s/ yoc .s/ yoo .s/ yoi .s/ 4 (16.1) D vout .s/ 5 ii .s/ yic .s/ yio .s/ yii .s/ vin .s/ Note that although possible it is not customary to include the admittance of the input or output filter capacitors or the load in the input admittance (yii ) or output admittance (yoo ) parameters. Figure 16.3 shows the admittance-parameter based equivalent circuit of the converter with the feedback and input-voltage feedforward loops. The optimal input-voltage feedforward gain can be determined from the condition that any change in the input voltage that would affect the output voltage is cancelled by the change introduced into the control port via the feedforward path, i.e., vin .s/ yoi .s/ vin .s/ Fv.opt/ .s/ yoc .s/ D 0

(16.2)

324

R. Redl Table 16.1 Output-port admittance parameters of the buck, boost, and buck-boost converters with voltage-mode control in CCM

Buck Boost Buck-boost

yoc (s) 1 A sL 1 M2 A sL R M .1 M/ 1 A C sL R

yoo (s) 1 sL 1 sL M2 1 sL .1 M/2

yoi (s) M sL 1 sL M

M sL .1 M/2

Table 16.2 Output-port admittance parameters of the buck, boost, and buck-boost converters with voltage-mode control in DCM yoc (s) yoo (s) yoi (s) r .2 M/ M 2 1M 1 Buck A .1 M/ R .1 M/ R R K s .2M 1/ M 2 M M Boost A .M 1/ R .M 1/ R R K .M 1/ Buck-boost

2 1 p A R K

1 R

2M R

From (16.2) the optimal input-voltage feedforward gain is Fv.opt/ .s/ D

yoi .s/ yoc .s/

(16.3)

The expression for Fv.opt/ (s) is valid for any converter, in any operating mode. Table 16.1 presents the admittance parameters for the output ports of the three basic converters (buck, boost, and buck-boost) with voltage-mode control and continuous-inductor-current mode (or “CCM”). Table 16.2 presents the same parameters for discontinuous-inductor-current mode (“DCM”). From these tables the optimal input-voltage feedforward gain can be easily calculated. For example, for the buck and boost converters in CCM the optimal gains are M A

(16.4)

1 1 2 M A 1 sL RM

(16.5)

Fv.opt/.buck/ .s/ D Fv.opt/.boost/ .s/ D

Note that in (16.4), (16.5) and in Tables 16.1 and 16.2 L is the inductance of the energy-storage inductor in the converter, Vin and Vout are the dc input and output voltages, M D Vout /Vin is the dc voltage gain of the converter, A D Vin /Vramp.pp/ , where Vramp.pp/ is the peak-to-peak value of the PWM ramp, and K D 2 L/RT.

16 Feedforward Control of Switching Regulators

6.5

Vout_1

325

Vout_2

6

No feedforward

Optimal dc feedforward

5.5 5 4.5 4 3.5 0.002

0.0025

0.003

0.0035

0.004

0.0045

0.005

Time (s)

Fig. 16.4 Simulated transient responses of a voltage-mode controlled boost converter in CCM to large step changes in the input voltage, without feedforward and with optimal dc feedforward. The converter parameters are: Vin D 2 V, Vin D 1.5 V, fsw D 400 kHz, L D 5 H, C D 33 F, ESR of C: 10 m, R D 10 , Vramp.pp/ D 1 V. The feedback loop is compensated with a Type 3 amplifier; the loop gain has a unity-gain bandwidth of 12 kHz and a phase margin of 40ı

The optimal input-voltage feedforward gain of the voltage-mode controlled buck converter in CCM is independent from the frequency and is inversely proportional to the square of the dc input voltage. The optimal input-voltage feedforward gain of the voltage-mode controlled boost converter in CCM is dependent on the frequency (it has a right-half-plane, or RHP, pole at the same frequency as the RHP zero of the yoc parameter), but the dc value of the gain is independent from the input voltage. It is not practically possible to implement a transfer function with a RHP pole, but it can be demonstrated by simulation that by providing only dc feedforward the line transient response is still greatly improved. Due to the fact that the dc feedforward gain is independent from Vin , the optimal dc gain provides good line transient rejection even when the step change in Vin is commensurable with the dc value (Fig. 16.4). Input-voltage feedforward can also be used with current-mode controlled converters. Note that the buck converter with current-mode control has an inherently good rejection of the input-voltage transients and adding input-voltage feedforward yields only little improvement. The current-mode-controlled boost converter in CCM has a much better rejection of the input-voltage transients than the voltage-mode-controlled version, but the input-voltage feedforward can still be useful for improving the transient response. The two relevant output-port admittance parameters of the boost converter with constant-frequency peak-current control and small stabilizing ramp are: sL 2 1 1 yoc .s/ D 1 M fh .s/ M R Rs

(16.6)

326

R. Redl Vout_1

5.06

No feedforward

5.04 5.02 5 4.98 4.96

Vout_2 5.06

Optimal dc feedforward

5.04 5.02 5 4.98 4.96 0.002

0.0022

0.0024

0.0026

0.0028

0.003

Time (s)

Fig. 16.5 Simulated transient responses of a current-mode controlled boost converter in CCM to step changes in the input voltage, without feedforward and with optimal dc feedforward. The converter parameters are: Vin D 2 V, Vin D 0.4 V, fsw D 400 kHz, L D 5 H, C D 33 F, ESR of C: 10 m, R D 10 , Vramp.pp/ D 50 mV, Rs D 0.1 . The feedback loop is compensated with a Type 2 amplifier; the loop gain has a unity-gain bandwidth of 12 kHz and a phase margin of 40ı

and yoi .s/ D

M R

(16.7)

where fh .s/ D

h 1 C sT 1 C

L ARs T

1

i 2 .1 D/ 0:5 C sT

(16.8)

is the high-frequency response of the current loop, Rs is the transresistance of the current sensor, T is the switching period, and D is the steady-state duty ratio of the power switch. Taking into account only the dc terms, the optimal input-voltage feedforward gain of the current-mode-controlled boost converter is Fv.opt/ D

Rs 2 M R

(16.9)

Figure 16.5 shows the simulated responses of a current-mode-controlled boost converter without feedforward and with optimal dc feedforward to step changes in the input voltage. As can be seen, although the improvement is not as great as in the case of the voltage-mode controlled converter it is still substantial.

16 Feedforward Control of Switching Regulators

327

iout(t) Dc-dc converter

+

+

vin(t)

r(t)

Zin

–

vout(t) –

d(t) iL(t)Rs

PWM

Zfb vc(t) –

+

+

verr(t)

+ vff(t)

Vref

Feedforward signal processor

iout(t)Rso

Fig. 16.6 Load-current feedforward to the input of a feedback regulated voltage-mode or currentmode controlled dc-dc converter vg(s)

ii(s)

vin(s)

Zf(s)

vout(s)

io(s) C

yoc(s)

yic(s) yii(s)

iout(s)

yoo(s) yio(s)

yoi(s) vc(s) +

Rc vref(s) verr(s)

+

+ G(s)

Fi(s)

– iout(s)Rso

Fig. 16.7 Small-signal admittance-parameter based equivalent circuit of the converter with feedback and load-current feedforward loops

3 Small-Signal Load-Current Feedforward Figure 16.6 shows the block schematics of a feedback-regulated converter with loadcurrent feedforward to the input of the pulse-width modulator. The optimal feedforward gain (i.e., the gain that sets the output impedance of the converter to zero) can be determined from the small-signal admittance-parameter based equivalent circuit (Fig. 16.7).

328

R. Redl

From the equivalent circuit the output impedance is Zout .s/ D

vout .s/ 1 D yoo .s/ k Rc C Œ1 Rso Fi .s/yoc .s/ iout .s/ sC

(16.10)

The output impedance is zero when the feedforward gain is optimal. From (16.10) Fi.opt/ .s/ D

1 Rso yoc .s/

(16.11)

The expression for Fi.opt/ (s) is valid for any converter, in any operating mode. From Table 16.1 the optimal load-current feedforward gain of the three basic converters with voltage-mode control in CCM can be calculated. The results for the buck and boost converters are as follows. Fi.opt/.buck/ .s/ D sL Fi.opt/.boost/ .s/ D

1 Rso A

sL 1 sL 2 R A 1 RM so

(16.12) (16.13)

As can be seen, these functions have a zero in the origin, and the optimal gain of the boost converter also has a RHP pole. If we disregard the unrealizable RHP pole then, in the time domain, both functions become pure differentiators, with a gain that is inversely proportional to the input voltage. Figure 16.8 shows the simulated load transient response of a voltage-mode controlled buck converter in CCM without and with optimal load-current feedforward, and Fig. 16.9 shows the simulated load transient response of a voltage-modecontrolled boost converter in CCM without feedforward and with a quasi-optimal load-current feedforward that does not include the RHP pole. The presence of a differentiator in the feedforward loop makes that loop sensitive to ripple and high-frequency noise. This fact combined with the inherent difficulty of sensing the output current and the dependence of the optimal gain on the input voltage reduces the attractiveness of load-current feedforward in voltage-mode controlled converters. On the other hand, load-current feedforward is particularly interesting for the current-mode controlled buck converter operating in CCM. Neglecting the highfrequency double poles in the current loop, the yoc (s) admittance parameter is equal to 1/Rs . This means that except for the simple multiplication by Rs /Rso , the output voltage of the load current sensor can be used as a feedforward signal without further processing, at any input voltage. The optimal load-current feedforward gain of the current-mode controlled boost converter in CCM (neglecting both the high-frequency double poles of the current loop and the RHP zero of the boost converter) is MRs /Rso .

16 Feedforward Control of Switching Regulators

329

No feedforward

Optimal feedforward

Fig. 16.8 Simulated load transient responses of a voltage-mode controlled buck converter in CCM without and with optimal load-current feedforward. The converter parameters are: Vin D 30 V, fsw D 100 kHz, L D 100 H, C D 470 F, ESR of C: 20 m, R D 6 , Vramp.pp/ D 3 V. The feedback loop is compensated with a Type 3 amplifier; the loop gain has a unity-gain bandwidth of 5 kHz and a phase margin of 40ı

No feedforward

Optimal feedforward, but w / o RHP pole

Fig. 16.9 Simulated load transient responses of a voltage-mode controlled boost converter in CCM without feedforward and with quasi-optimal load-current feedforward (i.e., without the RHP zero). See Fig. 16.4 for the converter parameters

330

R. Redl

No feedforward Feedforward

Fig. 16.10 Simulated load transient responses of a current-mode controlled buck converter in CCM, without load-current feedforward and with approximately optimal load-current feedforward. The converter parameters are: Vin D 20 V, fsw D 100 kHz, L D20 H, C D 470 F, ESR of C: 10 m, R D 5 , Vramp.pp/ D 500 mV, Rs D 0.1 . The feedback loop is compensated with a Type 2 amplifier; the loop gain has a unity-gain bandwidth of 4.5 kHz and a phase margin of 60ı

No feedforward Feedforward

Fig. 16.11 Simulated load transient responses of a current-mode controlled boost converter in CCM, without load-current feedforward and with approximately optimal load-current feedforward (For the converter parameters see Fig. 16.5)

Figures 16.10 and 16.11 show the simulated load transient responses of currentmode controlled buck and boost converters in CCM using the above discussed approximately optimal load-current feedforward. In most applications sensing the load current is difficult. The simplest solution is adding a small resistor between the output of the converter and the load and monitoring the voltage drop across it. Unfortunately, this resistor increases the output resistance and the losses of the converter. Furthermore it requires extra

16 Feedforward Control of Switching Regulators

331 L

iD(t)

S + Vin

iout(t) vout(t)

D L

iL(t)

–

iout(t) vout(t)

iC(t) C

D

R

+ Vin –

iL(t)Rs

iC(t) S

C

R

iD(t)Rs +

+ –

iC(t)Rs

iout(t)Rs = [iL(t) -iC(t)]Rs

–

iC(t)Rs

iout(t)Rs = [iD(t) -iC(t)]Rs

Fig. 16.12 Synthesizing the load current from the current injected toward the output and the current in the output capacitor; left: buck converter, right: boost converter

active circuitry if the resistor cannot be placed in the ground connection between the converter and the load. Alternative solutions (magnetic or Hall-effect current sensors) are expensive and cannot be justified in most low-power converters. The load current can be synthesized, however, by sensing the current injected toward the output (inductor current in the case of the buck converter, diode current in the case of the boost and buck-boost converters) and the current in the output capacitor, and subtracting the capacitor current from the injected current. The inductor current is often available in a converter, and sensing the capacitor current can be done with a series RC network that is in parallel with the output capacitor and has a time constant that is approximately equal to the Rc C time constant of the output capacitor. Figure 16.12 shows the concept of load-current synthesis for the buck and boost converters. It is to be noted here that a current-mode controlled buck converter with loadcurrent feedforward is essentially equivalent to a buck converter with capacitor current control, since the voltage differences between the two inputs of the PWM comparator are identical. Figure 16.13 shows the buck converter with capacitor current control and with a capacitor current sensor based on an RC network in parallel with the output capacitor, having matched time constants.

4 Feedforward of Large Perturbations It is not always possible to perfectly cancel large input-voltage perturbations with linear feedforward signal processors, but quasi-perfect cancellation can be achieved with a relatively simple nonlinear manipulation of the input voltage. Table 16.3 presents the required large-signal feedforward transfer functions vff (vin ) of the three basic converters with voltage-mode control and in CCM.

332

R. Redl

S +

L

iL(t)

Vin –

vout(t)

iC(t)Rs

C1

iout(t)

iC(t) C

R

D R1

C1

R1C1 = RCC

– PWM processor

RC

R1

C2

R2

– + Rb

+ Vref

Fig. 16.13 Buck converter with capacitor current control Table 16.3 Large-signal input-voltage feedforward transfer functions for the three basic converters with voltage-mode control and in CCM Buck Vout Vramp.pp/ vff D vin

Boost vff D

.Vout vin / Vramp.pp/ Vout

Buck-boost Vout Vramp.pp/ vff D Vout vin

Table 16.4 Combined large-signal input-voltage and load-current feedforward transfer functions for the three basic converters with currentmode control and in CCM Buck

Boost

vff D Rs iout

iout vff .vin / D Rs Vout vin

Buck-boost iout .Vout vin / vff D Rs vin

In current-mode controlled converters it is possible to combine large-signal input-voltage feedforward with load-current feedforward. Table 16.4 presents he required feedforward transfer functions vff (vin , iout ). Figure 16.14 shows the simulated waveforms of a buck-boost converter with hysteretic current-mode control and with combined large-signal input-voltage and load-current feedforward.

16 Feedforward Control of Switching Regulators

333

No feedforward

Combined largesignal feedforward

Fig. 16.14 Simulated input-voltage and load transient responses of a buck-boost converter with hysteretic current-mode control in CCM, without feedforward and with combined largesignal input-voltage and load-current feedforward. The converter parameters are: L D15 H, C D 100 F, ESR of C: 5 m, Rs D Rso D 1 , Ihyst D 0.5 A. The feedback loop is compensated with a Type 2 amplifier; at 5-V input voltage and 1-A load current the loop gain has a unity-gain bandwidth of 2 kHz and a phase margin of 75ı

5 Feedforward Pulse-Width Modulators As was shown in Sect. 4 implementing large-signal input-voltage feedforward in voltage-mode controlled converters requires nonlinear signal processing. An alternative solution that does not require multipliers or dividers is based on feedforward pulse-width modulators. In such a modulator the amplitude and/or the position of the PWM ramp is a simple linear function of the input and output voltages. Figure 16.15 shows how the feedforward modulator for the buck converter can be derived. The PWM ramp can be made proportional to vin by integrating vin with a resettable integrator that is reset by the clock pulse. The common implementation is charging a capacitor with a current that is proportional to vin , and discharging the capacitor with a switch that is turned on by the clock pulse. Figures 16.16 and 16.17 show the derivations of the feedforward pulse-width modulators for the boost and buck-boost converters, respectively. Considering that in a regulated converter the output voltage is constant the only practical difference between the feedforward pulse-width modulator and the standard input-voltage feedforward in the boost converter is that in the feedforward modulator a fraction of the input voltage is added to the ramp at the inverting input of the PWM comparator instead of subtracted from the error voltage at the noninverting input of the same comparator.

334

R. Redl Multiply both input signals by →

Optimal large-signal feedforward voltage

Feedforward PWM for the buck converter

cvin Vramp(pp)

VoutVramp(pp)

d(t) =

vin

+

Vout vin

d(t) =

cVout

+ –

– Vramp(pp)

0

Vout vin

0

cvin T

T

Fig. 16.15 Deriving the feedforward pulse-width modulator for the voltage-mode-controlled buck converter Multiply both input signals by → c Vout and add cvin Vramp(pp) to both

Optimal large-signal feedforward voltage

Vramp(pp)

Vout – vin Vout

d(t) = +

Feedforward PWM for the boost converter

Vout – vin c Vout

Vout

– Vramp(pp)

0 T

d(t) = +

Vout – vin Vout

– c Vout

c vin T

Fig. 16.16 Deriving the feedforward pulse-width modulator for the voltage-mode-controlled boost converter

In the case of the buck-boost converter the PWM ramp can be made proportional to vin vout by integrating vin vout with an integrator that is reset by the clock pulse. A practical implementation is charging a capacitor with the sum of a suitably selected fixed current and a current that is proportional to vin , and discharging the capacitor with a switch that is turned on by the clock pulse.

6 Miscellaneous In isolated converters with secondary-side control the input voltage is not available directly for implementing input-voltage feedforward. It is often possible, however, to have access to the voltage across the output winding of the power transformer. In most converters that voltage is proportional to the input voltage during the on-time

16 Feedforward Control of Switching Regulators Optimal large-signal feedforward voltage

335

Multiply both input signals by ↓

Feedforward PWM for the buck-boost converter

c (vin –Vout) Vramp(pp)

–Vout vin – Vout

d(t) =

Vramp(pp)

Vout

d(t) =

Vout – vin

+

–cVout

–

– Vramp(pp)

0

+

Vout Vout – vin

c (vin – Vout)

0

T

T

Fig. 16.17 Deriving the feedforward pulse-width modulator for the voltage-mode-controlled buck-boost converter

Vout Vin

+

np

ns

Vref

Rff1 –

+ – Drive signal isolator

PWM

Rff2

+

Vref

Vramp(pp)

Fig. 16.18 Implementing input-voltage feedforward in a forward converter with secondary-side control

of the power switches, therefore it can be used, either directly or after rectification, for input-voltage feedforward. Figure 16.18 shows how input-voltage feedforward can be implemented in a two-switch forward converter with secondary-side control. In converters with integrated PWM controllers the PWM comparator or the ramp generator are usually not accessible externally, and therefore the inputvoltage feedforward cannot be implemented as usual. The solution is to provide feedforward to the inverting input of the voltage-error amplifier and selecting the

336

R. Redl Z fb(s) Zin(s) To PWM comparator

–

vout(s)

+

vin(s) Vref

Zffin(s)

Zfb (s) Zffin(opt) (s) =

Fv(opt) (s)

= Zfb (s)

y oc (s) y oi (s)

Fig. 16.19 Implementing input-voltage feedforward to the inverting input of the voltage-error amplifier Table 16.5 Control laws for stabilizing the switching frequencies of constant-ontime and constant-off-time converters, and of converters with hysteretic current-mode control, when they operate in CCM

Buck Boost Buck-boost

Constant-on-time Vout ton D T vin Vout vin ton D T Vout Vout ton D T vin Vout

Constant-off-time vin Vout toff D T vin vin toff D T Vout Vout toff D T vin Vout

Hysteretic CMC T .vin Vout / Vout Ihyst D L vin T .Vout vin / vin Ihyst D L Vout T vin Vout Ihyst D L Vout vin

impedance between the input voltage and the inverting input such that, except for the negative sign, the gain of the amplifier for the input voltage is equal to the required feedforward gain. Figure 16.19 shows the concept. In addition to improving the dynamic behavior of the converters, feedforward can also be used for stabilizing the switching frequency of free-running converters. In those converters the switching frequency can change over an unacceptably wide range when the input voltage varies between the minimum and maximum values and/or when the output voltage is programmed. Table 16.5 shows the control laws required for stabilizing the switching frequencies of constant-on-time and constantoff-time converters, and converters with hysteretic current-mode control, when they operate in CCM. A varying input voltage changes the dc value of the control-to-output response and as a result it also changes the unity-gain frequency of the feedback loop and the phase margin. The dc gain can be stabilized by making the peak-to-peak ramp voltage a function of the input voltage, i.e., by providing nonlinear feedforward into the ramp generator. Table 16.6 presents the peak-to-peak ramp voltages that produce constant control-to-output gain Gdc with varying input voltages, for voltage-modecontrolled converters operating in CCM.

16 Feedforward Control of Switching Regulators

337

Table 16.6 Ramp voltages that produce constant control-to-output gain Gdc for voltage-mode-controlled converters operating in CCM Buck Vramp.pp/ D f(vin )

vin Gdc

Boost V2out vin Gdc

Buck-boost .vin Vout /2 vin Gdc

7 Summary Feedforward control is a powerful technique for improving the dynamic response of switching converters to input-voltage or load-current perturbations without the need for increasing the gain of the feedback loop. This paper reviewed the theory behind input-voltage feedforward and load-current feedforward, and presented optimal feedforward gain functions and simulation verifications. The extension of the feedforward concept to large perturbations was also investigated, together with the idea of feedforward into the pulse-width modulators. Implementing input-voltage feedforward in isolated converters with secondary-side control and in converters with integrated controllers that do not allow access to the input of the PWM comparator were also discussed. Lastly, the paper showed how the frequency of freerunning converters or the dc value of the control-to-output voltage transfer function can be stabilized with input-voltage feedforward.

8 Literature There are relatively few papers discussing the feedforward control of switching regulators. The most comprehensive overview of feedforward was presented at APEC 2009, as a professional education seminar [1]. Much of the content of this paper is based upon that a seminar. The list below comprises the most relevant publications, including a book on the dynamic analysis of switching converters using admittance parameters [8], and mentions the computer program that was used for all simulation verifications [12].

References 1. R. Redl, Feedforward control of switching regulators, Professional Education Seminar, APEC 2009 2. N.O. Sokal, Feed-forward control for switched-mode power converters, in Proceedings of Powercon 3, June 1976, pp. E2-1 through E2-13 Beverly Hills, California 3. R. Redl, N.O. Sokal, Optimizing dynamic behavior with input and output feed-forward and current-mode control, in Proceedings of Powercon 7, 24–27 Mar 1980, pp. H1-1 through H1-16 San Diego, California

338

R. Redl

4. R. Redl, N.O. Sokal, Near-optimum dynamic regulation of DC-DC converters using feedforward of output current and input voltage with current-mode control. IEEE Trans. Power Electr. 1(3), 181–191 (1986) 5. G.K. Schoneman, D.M. Mitchell, Output impedance considerations for switching regulators with current injected control. IEEE Trans. Power Electr. 4(1), 25–35 (1989) 6. L. Calderone, L. Pinola, V. Varoli, Optimal feed-forward compensation for PWM dc/dc converters with linear and quadratic conversion ratio. IEEE Trans. Power Electr. 7(2), 349–355 (1992) 7. B. Arbetter, D. Maksimovic, Feedforward pulse width modulators for switching power converters. IEEE Trans. Power Electr. 12(2), 361–368 (1997) 8. A.S. Kislovski, R. Redl, N.O. Sokal, Dynamic Analysis of Switching-Mode Dc/Dc Converters (Design Automation, Lexington, 2003) 9. A.V. Peterchev, Digital pulse-width modulation control in power electronic circuits: theory and applications, Ph.D. Thesis, University of California, Berkeley, 2005 10. J.-P. Sjoroos, et al., Dynamic performance of buck converter with input voltage feedforward control, in the Proceedings of the 2005 European Conference on Power Electronics and Applications (Dresden, Germany 2005) 11. M. Karppanen et al., Dynamical characterization of peak-current-mode-controlled buck converter with output-current feedforward. IEEE Trans. Power Electr. 22(2), 444–451 (2007) 12. PSIM v. 8.0, from Powersim, Inc., http://www.powersimtech.com

Chapter 17

Device Optimization to Assess Losses and Ringing Issues in Integrated Synchronous Buck Converters J. Roig and F. Bauwens

Abstract In this work integrated power MOSFETs are optimized to improve DC/DC converter performance in Multi-Voltage Smart Power technologies. Hence, lateral and trench-based power MOSFETs are analyzed to integrate 12 V-input and 48 V-input synchronous buck converters in the same technological platform. An extensive investigation on the device architecture for 100 and 24 V power MOSFETs is carried out by measurements and TCAD mixed-mode simulations to determine their internal losses as well as the inherent ringing effects.

1 Introduction Nowadays, monolithically integrated DC/DC converters are used for a wide range of applications. In this sense, the Smart-Power technologies are suitable when highvoltage conversion (>10 V) is required. Due to its simplicity and versatility, the most usual power switch in Smart Power technologies is the lateral power MOSFET (nLDMOS). Compared to the trench-based power MOSFET, the nLDMOS usually exhibits a lower gate-to-drain charge (Qgd), thus being interesting for 12-to1.2 V conversion [1]. However, in high voltage conversion the conduction losses become critical due to the high nLDMOS specific on-state resistance (sRon). Hence, a 100 V trench-based power MOSFET (nTrenchMOS) is preferable for 48-to5 V conversion. To provide full compatibility between integrated nLDMOS and nTrenchMOS is a difficult task that requires the implementation of very deep trenches and a combination of high and low-resistive Silicon layers (see Fig. 17.1). This work analyzes the optimum structure and biasing conditions for nLDMOS and nTrenchMOS taking into account their technological compatibility. In this

J. Roig () • F. Bauwens Power Technology Centre, ON Semiconductor, Westerring 15, B-9700 Oudenaarde, Belgium e-mail: [email protected] M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 17, © Springer ScienceCBusiness Media B.V. 2012

339

340

J. Roig and F. Bauwens

S

NBL-Contact

G

S

P-body G N-drift

D

S P-RESURF

Tepi N-Epi

N-Sinker

N-Sinker

NBL

100 V-nTrenchMOS

D

N-Epi

Tn

NBL

DTI

24 V-nLDMOS

Fig. 17.1 Cross section of the multi-voltage Smart-Power technology combining 100 VnTrenchMOS with 24 V-nLDMOS

sense, the nLDMOS body-diode reverse recovery is evaluated under different n-type epitaxy conditions. In fact, the layer between P-RESURF and NBL (see n-epi in Fig. 17.1), with a thickness defined by Tn, is expected to increase in thickness and resistivity with the maximum voltage capability of the Smart Power platforms. Note that Tn depends on the epitaxial thickness (Tepi) which in its turn is optimized to the maximum voltage range of the quasi-vertical nTrenchMOS (see Fig. 17.1). In a first section, a 24 V nLDMOS is electrically analyzed by experiment and TCAD simulations for an existing 50 V rated Smart Power platform (P50) [2]. A special emphasis is given on the reverse recovery comparison between the optimized nLDMOS integrated in P50 and in a 100 V rated Smart Power platform (P100) which is currently under development [3, 4]. A second section contains a comparative analysis between different nTrenchMOS. A fabricated discrete version of our nTrenchMOS candidates are tested in an on-board synchronous buck converter. Form these measurements, the impact on the circuit efficiency and the waveform oscillations are extracted. Moreover, the experimental data is used to calibrate our mixed-mode simulations [5] and to explore the new device topologies or to extend this study at different load currents.

2 nLDMOS Optimization A too slow recovery of the inherent nLDMOS body-diode has been recently proved in simple bulk Smart Power technologies [6]. The large reverse recovery time (Trr), usually correlated with a large reverse recovery charge (Qrr), produces significant losses during the High-Side-FET (HS-FET) turn-on transient, otherwise called lead

17 Device Optimization to Assess Losses and Ringing Issues . . .

a

b

G

c D

S

A

G

S

341

S

G

D

D T1

P-RESURF

T2

BLN

BLN

P-Subs

P-Subs

N-Epi

BLN

A’

Fig. 17.2 (a) Parasitic transistors in an nLDMOS basic cell. (b) Source-to-NBL and (c) Drain-toNBL configurations in the simulated nLDMOS structure

edge. As a consequence, the system efficiency is hampered and the reliability is compromised by the increment of the current/voltage ringing amplitudes [7]. It has been already demonstrated that Trr and Qrr are minimized in Smart-Power technologies by implementing RESURF nLDMOS in junction-isolated pockets [8, 9]. As it is observed in the 24 V-nLDMOS cross section on Fig. 17.1, the pocket isolation is constituted by an NBL (n-type buried layer) and an N-Sinker. The latter can be electrically connected to the drain (Drain-to-NBL) or to the source (Sourceto-NBL) through the top metal layers. The drawbacks and advantages related to device energy capability and minority/majority carrier injection in the substrate have been studied in [10, 11]. The Drain-to-NBL configuration provides a larger nLDMOS SOA than the Source-to-NBL one. However, a highly resistive NBL (>80 Ohm/square) induces substrate majority carrier injection in Drain-to-NBL configuration. Aside from the reduced SOA, another drawback of the Source-toNBL configuration is the potential punch-through effect in lowly doped P-RESURF. Other works have been focused on the reverse recovery behavior of the body-diode [8, 9]. These works concluded a faster body-diode response and a larger current capability in the Source-to-NBL case. Moreover, the impact of the P-RESURF layer thickness on the reverse recovery is assessed in [8].

2.1 Device Description and Fabrication A schematic cross-section of the nLDMOS indicating T1 and T2 parasitic bipolar is shown in Fig. 17.2a. T1 is the classical horizontal nLDMOS parasitic bipolar

342

J. Roig and F. Bauwens

(emitter D N C source, base D Pbody, collector D N C drain) while T2 is a vertical parasitic bipolar (emitter D NBL, base D P-RESURF, collector D N C drain) which electrical behavior strongly depends on the NBL biasing conditions. The electrical scheme of the Source-to-NBL and Drain-to-NBL configurations is displayed in Fig. 17.2b, c, respectively. The simulated nLDMOS structures are obtained by TCAD process simulations [5]. These simulations are based on calibrated input decks for a 0.35 m SmartPower technology, named here P50. After performing additional TCAD electrical simulations, the implanted P-RESURF layer has been optimized to avoid the punchthrough effect in the Drain-to-NBL case [11]. A P-RESURF implanted dose of 4 1012 cm2 combined with an additional implant for the drift region gives an optimum BV-sRon trade-off of 24 V–17.3mOhm*mm2 for any NBL biasing condition. The optimized nLDMOS gives similar electrical results when an identical process is simulated under the epitaxy characteristics of P100. The value of Tepi in P100 is about twice the P50 one while n-epi doping concentration is four times smaller than in P50.

2.2 Reverse Recovery Analysis In this subsection the nLDMOS reverse recovery is explored under resistive load conditions. Initially, the nLDMOS devices integrated in P50 are simulated and measured. Then the calibrated mixed-mode simulations are used to predict the reverse recovery behavior of nLDMOS in P100. An experimental setup similar to the one in [12] has been used to measure the reverse recovery on wafer level. Being the gate connected to the source, the output current (Iout) is measured at the scope impedance when applying a voltage pulse to the drain by means of a pulse generator. A voltage pulse (Vf D Vds) is defined to force the body-diode in forward during 500 ns. After this time, the negative voltage is reversed to Vr D 10 V in order to generate the reverse recovery event. The measured curves in the P50 nLDMOS are represented in Fig. 17.3 with an acceptable accuracy for Trr values below 20 ns. In accordance to the observations in [8, 9] the Source-to-NBL clearly reduces Qrr respect to the Drain-to-NBL. Mixed-mode TCAD simulations are performed to investigate the reverse recovery with identical conditions than in the experimental setup. In both simulation and measurement curves, Qrr is extracted by integrating the positive Iout curve with the conditions described in the inset of Fig. 17.3. As it is inferred from Figs. 17.4 and 17.5, TCAD correctly reproduces the Qrr trend for different forward currents (If) with Vf compressed between 0.8 and 2 V. Furthermore, the Qrr difference between Source and Drain-to-NBL (above one order of magnitude in all cases) is also predicted. Nevertheless, the simulated Qrr values are slightly above the experimental ones. This is probably due to additional parasitic inductances or resistances that are not accounted for in the simulation. The Qrr reduction at Source-to-NBL is elucidated in [8] for a structure without

17 Device Optimization to Assess Losses and Ringing Issues . . .

0.01

343

Drain-to-NBL

0 –0.01 Source-to-NBL

Iout (A)

–0.02 –0.03

trr

IF

–0.04

10 % IRmax

–0.05 Qrr

–0.06

Tau = Qrr / IF

–0.07 Exp. with Vr = –10 V IRmax

–0.08 20

40

60

80

100 120 time (ns)

140

160

180

200

Fig. 17.3 Measured reverse recovery curves under resistive load for Source and Drain-to-NBL in P50. Inset: description of the Qrr extraction method 5 Drain-to-NBL (P50)

4

Iout (mA / um)

3 Drain-to-NBL (P100)

2

Source-to-NBL (P100)

1 0 –1

Source-to-NBL (P50)

–2 TCAD with Vr = –10 V

–3 –5

0

5

10

15

20

Time (ns)

Fig. 17.4 Simulated reverse recovery curves under resistive load for Source and Drain-to-NBL in P50 and P100

344

J. Roig and F. Bauwens P100 (TCAD)

1.E–08

P50 (TCAD) P50 (Exp.) Drain-to-NBL

1.E–09 Qrr / If (s)

P100 (TCAD) Source-to-NBL P50 (TCAD)

1.E–10

P50 (Exp.)

1.E–11 1.0E–04

1.0E–03 If / W (A / um)

1.0E–02

Fig. 17.5 Measured and simulated Qrr/If vs. If/W in Source and Drain-to-NBL in P50 and P100 (only simulation in P100)

n-epi region (i.e., P-RESURF on top of NBL). Indeed, T2 is triggered to operate in weak active region when the body-diode is forwarded. As a result, the electrons are partially removed from the P-RESURF region. The activation of T2 is clear when applying a negative Vds voltage under stationary regime (grounded Vg and Vs). This activation is proved by experiment and simulation in [13], where the electron and hole density profiles are captured along the AA’ cut (see Fig. 17.2) give additional physical insight on the reverse recovery curves. An extension of the TCAD reverse recovery study is carried out in P100 nLDMOS. It is remarked from Fig. 17.4 that the reverse current peak (IRmax defined in Fig. 17.3) decreases in Drain-to-NBL when comparing P100 to P50. Contrarily, the IRmax value is higher in P100 when Source-to-NBL is analyzed. These trends are similar in an nLDMOS with reduced P-RESURF doping concentration [8]. As a consequence, all indicates that the IRmax variation can be attributed to a higher T2 gain in P100. Such a gain is also noticed by simulations on Ids vs. Vds, presenting slightly higher Ids values than in P50. The higher T2 gain is a consequence of the deeper P-RESURF layer diffusion into a less doped epitaxial layer. A more remarkable effect is the pronounced current tail observed in P100 Drain-to-NBL case. As a consequence of the additional reverse recovery current tail, Qrr and Trr are increased in P100 nLDMOS. A comparison between simulated Qrr in P50 and P100 is found in Fig. 17.5. The increment of Qrr in P100 is about 2 and 5 times in Drain and Source-to-NBL, respectively. The appearance of the current tail is closely related to the slow hole removal from the n-epi region. Remember that in P100 Tepi is larger and n-epi doping level is lower, which is translated in a larger recombination time for minority charges in the n-epi region.

17 Device Optimization to Assess Losses and Ringing Issues . . . Fig. 17.6 Schematic description of the simulated converter with nLDMOS as HS and LS-FET. In order to mimic the Drain and Source-to-NBL configurations BLN is connected to node 1 and 2, respectively

345

Vin = 12 V

Vout = 1.2 V Vsw

Iload Connected to 1 or 2

1

2

2.3 Efficiency and Ringing Analysis The impact of the different reverse recovery in P50 and P100 on DC/DC converter losses is not obvious. From one side, Qrr and Trr are smaller in P50; however, the reverse recovery in P100 shows a less abrupt decay. To determine the relevance of the reverse recovery in the application efficiency, mixed-mode simulations are performed in a synchronous DC/DC buck converter with the nLDMOS acting as LS and HS-FET (see Fig. 17.6). The buck converter transforms 12 V input voltage in 1.2 V, operating at 1 MHz frequency. It is worth to remark that the mixed-mode simulations only account for the losses at the HS-FET and LS-FET switches, which are the predominant ones in this kind of application. The efficiency results obtained for different deadtimes and Iloads are summarized in Figs. 17.7 and 17.8, respectively. It is perceived from Fig. 17.7 that, for deadtimes below 30 ns, the efficiency drops due to the shoot-through effect as pointed out in [14]. Differently, for deadtimes above 30 ns the efficiency diminishes as a consequence of the reverse recovery effect. Effectively, a larger deadtime is translated in a

346

J. Roig and F. Bauwens 90.0 89.0 P100 Source-to-NBL 88.0 Efficiency (%) .

P100 Drain-to-NBL 87.0

P50 Source-to-NBL

86.0

P50 Drain-to-NBL

85.0 84.0 83.0 82.0 10

30

50 Deadtime (ns)

70

90

Fig. 17.7 Simulated efficiencies for Drain and Source-to-NBL configurations in P50 and P100 for different Deadtime values (F D 1 MHz, Iload D 12A, Vg D 5 V)

larger forward time for the LS-FET body-diode and, as a consequence, there is more time to charge the LS-FET drift region. Focusing our analysis in the efficiencies for deadtimes higher than 30 ns, it is derived that Source-to-NBL is significantly more efficient than Drain-to-NBL in P50 with a 0.8% of improvement. Surprisingly, the difference on efficiencies between Source and Drain-to-NBL becomes smaller in P100 (0.2%), with a boosted overall efficiency, which is about 1% above the P50 one. This means that in our application the softness of the reverse recovery is more important than the Qrr or Trr values. This trend is confirmed by Fig. 17.8, where the efficiency improvement provided by P100 is more evident at low Iloads with predominant transient losses. After analyzing the waveforms for the four cases under study, it is confirmed that the electrical stress and power losses take place at the lead edge. The current peaks in the lead edge at HS and LS-FET are displayed in Fig. 17.9. It is noticed that the amplitude of these peaks correlates to the efficiency. Hence, in the worst efficiency case (Drain-to-NBL in P50) the current peaks reach 52A and 36A for HS and LSFET, respectively. On the other hand, the best efficiency case (Source-to-NBL in P100) shows current peaks of 47A and 23A for HS and LS-FET, respectively. Interestingly, the ringing time is slightly shorter in Source-to-NBL P50 than in Source-to-NBL P100 despite of the lower efficiency and larger current amplitude. The power loss contributions for the best and worst efficiency cases are represented in Fig. 17.10, thus unveiling the predominance of the losses at the HS-FET during the lead edge.

17 Device Optimization to Assess Losses and Ringing Issues . . .

347

90.0

Efficiency (%) .

89.0

P100 Source-to-NBL

88.0

P100 Drain-to-NBL

87.0

P50 Source-to-NBL

86.0 P50 Drain-to-NBL

85.0 0

5

10

15

20

25

Iload (A)

Fig. 17.8 Simulated efficiencies for Drain and Source-to-NBL configurations in P50 and P100 for different Iload values (F D 1 MHz, Iload D 12A, Deadtime D 50 ns)

52 A Freq = 1 MHz Ioad = 16 A

48 A

Current , Voltage (A , V)

40

43 A 36 A Drain-to-NBL in P50

32 A

Source-to-NBL in P50

27 A

Source-to-NBL in P100

20

0

–20

Time (a.u.) Idrain HS-FET Idrain LS-FET

Vgs HS-FET Vgs LS-FET

Vds HS-FET Vds LS-FET

Fig. 17.9 Simulated Id, Vgs and Vds waveforms for HS and LS-FET (F D 1 MHz, Iload D 16A, Deadtime D 50 ns)

348

J. Roig and F. Bauwens

LS On-state 5%

LS Trail Edge 3%

Drain-to-NBL in P50

LS Lead Edge 6%

HS On-state 14 %

Body-Diode 24 %

HS Lead Edge 47 %

HS Trail Edge 1%

LS Trail Edge 5%

Source-to-NBL in P100

LS On-state 7% LS Lead Edge 5%

HS Lead Edge 36 %

HS On-state 19 %

HS Trail Edge 1% Body-Diode 27 %

Fig. 17.10 Simulated power loss contributions for HS and LS-FET (F D 1 MHz, Iload D 16A, Deadtime D 50 ns)

3 nTrenchMOS Optimization Since the structure selected to integrate the nTrenchMOS device should be accommodated in the epitaxial layer, Tepi and n-epi doping concentration play a crucial role on the nTrenchMOS choice. As it is detailed in the following subsection, P100 epitaxial layer is more suitable to be used with planar or trench-gate power MOSFETs while P50 enables the integration of charge-compensation techniques

17 Device Optimization to Assess Losses and Ringing Issues . . .

a

b

c

349

d Gate-to-Source

Active Gate

Test device Ron=3mΩ

Test device Ron=5mΩ

Test device Ron=6mΩ

Ron*A1

0.35*Ron*A1

0.45*Ron*A1

0.55*Ron*A1

Qg / A1

0.8*Qg / A1

0.4*Qg / A1

0.3*Qg / A1

Qrr / A1

0.3*Qrr / A1

0.3*Qrr / A1

0.3*Qrr / A1

BV ~ 100V

BV ~ 100V

BV ~ 100V

BV ~ 100V

Fig. 17.11 Cross section and electrical characteristics for (a) GT, (b) GDT, (c) SG, (d) GDT with half of the gates active (GDTHG). Vth D 3 V in all cases

in the drift region. In P50 and P100, the BLN layer combined with the N-sinker constitute a low ohmic path (<10 Ohm/square) for the current flowing towards the top drain terminal. As a result, the integrated nTrenchMOS is catalogued as a quasivertical device. Due to the lack of experimental data in an integrated version of our application, discrete parts of the nTrenchMOS under study are measured in a synchronous buck converter to provide a comparative analysis. One advantage of this study are the realistic TCAD structures obtained from detailed process simulations and validated by comparing the electrical characteristics with the own fabricated nTrenchMOS. Previously, a comparative analysis between ideal nTrenchMOS TCAD structures has been reported for 12-to-1.2 V conversion in [15].

3.1 Device Description and Fabrication The four 100 V nTrenchMOS structures under study and their relevant electrical characteristics are summarized in Fig. 17.11. Historically, the preferred switch for low-voltage DC/DC converters was the trench-gate power MOSFET (GT in Fig. 17.11a) [16–18]. In GT structures, a relatively large and lowly doped drift region is normally designed to reach an optimum sRon-BV trade-off. Hence, the epitaxial layer in P100 is appropriate to integrate GT. Recently, power MOSFETs with deeper trenches and different gate oxide thicknesses (GDT and SG in Fig. 17.1b, c) were demonstrated to significantly reduce sRon in a voltage range between 50 V and 200 V [19–21]. For the first time, an integrated version of GDT was presented

350

J. Roig and F. Bauwens

in [22] with similar process flow as the GDT evaluated in this work. The deep trenches etched in GDT and SG act as alternated MOS capacitors. Consequently, the short and highly doped drift region is compatible to the P50 epitaxial layer. As inferred from the data in Fig. 17.11, not only the sRon is reduced but also the Qrr and Qg per unit area. By splitting the polysilicon inside the trenches, a shielding plate electrode is achieved in SG structures [21, 23]. When the shielding plate electrode is connected to the source electrode, Qg is drastically reduced compared to GDT. The reduction of the accumulation region length formed by the gate in the bird’s beak is responsible of the slight sRon increase in SG. An additional GDTHG structure with deep trenches is analyzed. The GDTHG structure is derived by connecting the half of the GDT gates to the source as described in Fig. 17.11d. Although GDTHG devices are not tested on the buck converter circuit, they have been successfully fabricated with the same GDT technological process. Detrimentally, the deactivation of half channel length penalizes the sRon in GDTHG. It is worth to remark that the deep trench, as well as the split polysilicon structure, increases the cost of the final product, which is a relevant variable to account for in the selection of our nTrenchMOS.

3.2 Efficiency and Ringing Analysis An initial calibration of the TCAD mixed-mode parasitic elements is carried out by means of a limited number of measurements. Differently from the schematic circuit described in Fig. 17.6, the well known parasitic elements related to the package and the board are also included. Furthermore, the unknown parasitic elements attributed to the driver have been tuned to reproduce a large variety of waveforms (see Figs. 17.12 and 17.13). These waveforms account for the switching node (Vsw), the High-Side gate (VgHS) and the Low-Side gate (VgLS) in three different HS/LSFET configurations: SG/SG, GDT/GDT and GT/GDT. It is shown in Fig. 17.12 that SG/SG shows short transient times with intense ringing effect at the lead edge. The efficiencies corresponding to these measurements, monitored in Fig. 17.14, exhibit values reaching 85.5%. Aside from increasing the transient losses, the Vsw ringing is a reluctant effect that causes EMC issues to the neighboring circuitry. Moreover, the voltage capability of the LS-FET is overrated to avoid the avalanche phenomenon. Since VgHS and Vsw oscillate in phase, the Vgs at the HS-FET does not oscillate and the shoot-through events are avoided. In GDT/GDT a Qg and Cgd increment respect to SG/SG suppresses the lead edge oscillations, as it is confirmed in Fig. 17.13. On the other hand, the large HS-FET turn-on time is translated into a critical efficiency reduction with values below 70% and device destruction for Iload > 5A. Regarding the measurements, the GT/SG case shows the best efficiency at small Iload, improving the efficiency in a 5% respect to SG/SG for Iload < 1.5A and alleviating the ringing during the lead edge. In general, TCAD simulations reproduce the measured efficiencies in a qualitative way, thus being useful for a

17 Device Optimization to Assess Losses and Ringing Issues . . .

351

a

Vsw Vg HS

Vg LS (V)

Vsw , Vg HS (V)

Vg LS

b Vsw Vg HS

Vg LS (V)

Vsw , Vg HS (V)

Vg LS

Time (s)

Fig. 17.12 (a) Measured and (b) simulated waveforms for a case with SG in HS and LS-FET (Iload D 5A, F D 0.2 MHz, Deadtime D 65 ns). In this case an efficiency of 85.5% is measured

comparative analysis. Note that, since the TCAD analysis only accounts for the power MOSFET losses, the simulated efficiencies are above the measured ones. Exceptionally, the cases showing low efficiency exhibit higher efficiency in the experiment.

352

J. Roig and F. Bauwens

a

Tek

Stopped Single Seq

1 Acqs

08 Dec 09 12:04:37

Vg LS (V)

Vsw , Vg HS (V)

Vsw Vg HS Vg LS

3

Ch1 Ch3

20.0 V 5.0 V

Ch2

20.0 V

M 40.0 ns 1.25 Gs / s IT 4.0 ps / pt A Ch1 / 8.4 V

b Vsw

30

Vg HS Vg LS

20

Vg LS (V)

Vsw , Vg HS (V)

100

50 10

0

0

5e–06

5.1e–06

5.2e–06 Time (s)

5.3e–06

Fig. 17.13 (a) Measured and (b) simulated waveforms for a case with GDT in HS and LS-FET (Iload D 5A, F D 0.2 MHz, Deadtime D 65 ns). In this case an efficiency of 68.3% is measured

17 Device Optimization to Assess Losses and Ringing Issues . . .

353

100 90

Efficiency (%)

80 70 60 Exp.

GT / SG

50

SG / SG

TCAD

40 GDT / GDT 30 0

2

4

6

8

10

Iload (A)

Fig. 17.14 Measured and simulated efficiencies for GT/SG, SG/SG and GDT/GDT converter configurations. In GDT/GDT, 5A results in device destruction (F D 0.2 MHz, Deadtime D 65 ns) 95

Efficiency (%)

90

85 80 GT / SG 75

SG / SG GDTHG / GDTHG

70

GDT / GDT GT / GT

65 0

5

10

15

20 Iload (A)

25

30

35

40

Fig. 17.15 Simulated efficiencies for SG/SG, GDTHG/GDTHG, GDT/GDT and GT/GT converter configurations (F D 0.2 MHz, Deadtime D 65 ns)

Once our TCAD mixed-mode simulations are calibrated, our study can be extended to cases that are not measured. In this sense, a GDTHG/GDTHG solution boosts the efficiency compared to GDT/GDT as it is seen in Fig. 17.15. However, the overall efficiency is still low compared to the SG/SG and GT/SG configurations.

354

J. Roig and F. Bauwens

In the GT/SG case, the TCAD simulations predict a more important drop of the efficiency than in SG/SG with the Iload increment. This is basically due to the larger conduction losses in GT. The losses at high Iload are ever larger in the simulated GT/GT combination where, aside from the conduction losses, the reverse recovery losses have a relevant contribution. Contrarily, the GT/GT combination provides very high efficiency at low Iloads, where the ringing effects are drastically reduced. A more extended analysis for the different contribution losses as well as some straightforward solutions to damp the ringing peaks in the SG/SG configuration are provided in [24].

4 Conclusions This paper investigates the combined integration of quasi-vertical and lateral power MOSFETs in Multi-Voltage Smart Power platforms for different conversion voltage ranges. In order to monolithically integrate HS and LS-FETs for 12 V-input and 48 V-input converters different nLDMOS and nTrenchMOS are evaluated by TCAD and experiment. In the nLDMOS, the body-diode reverse recovery is expected to be of utmost importance in the transient losses. Hence, a Source-to-NBL configuration is preferred in front of a Drain-to-NBL one. Moreover, the effect of the epitaxial region is analyzed in two existing Smart Power platforms which, in its turn, are exclusive to accommodate different nTrenchMOS structures. For the 12 V-input conversion, the best choice is a Multi-Voltage Smart Power technology combining a 100 V trench-gate power MOSFET with a 24 V RESURF nLDMOS. This technology is also the preferred one for 48 V-input conversion at low Iload. On the contrary, a technology compatible with a split-gate trench power MOSFET is more appropriate for 48 V-input conversion at high Iload. Acknowledgements This work is carried out in the frame of the GREENFETS project and financed by the IWT (Flanders, Belgium).

References 1. Y.C. Liang, R. Oruganti, T.B. Oh, Design considerations of power MOSFET for high frequency synchronous rectification. IEEE Trans. Power Electron. 10(3 May), 388–389 (1995) 2. F. De Pestel, P. Moens, H. Hakim, H. De Vleeschouwer, K. Reynders, T. Colpaert, P. Colson, P. Coppens, S. Boonen, D. Bolognesi, M. Tack, Development of a robust 50 V 0.35 pm based Smart Power technology”, in Proceedings of the International Symposium on Power Semiconductor Devices and ICs, IEEE, Piscataway, April 2003, pp. 182–185 3. J. Roig, S. Mouhoubi, P. Gassot, R. Charavel, A. Suvkhanov, P. Moens, F. Bauwens, M. Tack, New VDMOS structure with Discontinuous Thick Inter-Body Oxide to reduce gate-to-drain charge, in Proceedings of the International Symposium on Power Semiconductor Devices and ICs, IEEE, Hiroshima, June 2010, pp. 397–400

17 Device Optimization to Assess Losses and Ringing Issues . . .

355

4. R. Charavel, J. Roig, S. Mouhoubi, P. Gassot, F. Bauwens, P. Vanmeerbeek, B. Desoete, P. Moens, E. De Backer, Next generation of Deep Trench Isolation for Smart Power technologies with 120V high-voltage devices. Microelectron. Reliab. 50(9–11 Sept), 1758– 1762 (2010) 5. Sentaurus TCAD Tools Suite. Synopsys 2012 6. Y. Xiong, H. Jia, W. Deschaine, S. Sun, X. Cheng, G. Dashney, D. Okada, Z.J. Shen, Optimization of body diode reverse recovery characteristics of lateral power MOSFETs for synchronous rectifier dC-dC converters, in Proceedings of the International Symposium on Power Semiconductor Devices and ICs, IEEE, Orlando, May 2008, pp. 99–102 7. R. Elferich, T. Lopez, Impact of gate voltage bias on reverse recovery losses of power MOSFETs, in Proceedings of the Applied Power Electronics Conference and Exposition, IEEE, Aachen, March 2006, pp. 6 8. S. Pendharkar, R. Ramanathan, T. Efland, L. Zheng, Performance trade-offs and optimization of low side low voltage integrated FETs, in Proceedings of the Bipolar/BiCMOS Circuits and Technology Meeting, IEEE, Piscataway, Sept 2004, pp. 160–163 9. S. Pendharkar, J. Trogolo, High speed junction diodes in BiCMOS technologies, in Proceedings of the Bipolar/BiCMOS Circuits and Technology Meeting, IEEE, Boston, Sept 2007, pp. 82–85 10. V. Khemka, V. Parthasarathy, R. Zhu, A. Bose, T. Roggenbauer, Trade-off between highside capability and substrate minority carrier injection in deep sub-micron smart power technologies, in Proceedings of the International Symposium on Power Semiconductor Devices and ICs, IEEE, Piscataway, Apr 2003, pp. 241–244 11. R. Zhu, V. Khemka, A. Bose, T. Roggenbauer, Substrate majority carrier-induced NLDMOSFET failure and its prevention in advanced smart power IC technologies. IEEE Trans. Device Mater. Reliab. 6(3 Sept), 386–392 (2006) 12. M.J. Swanenberg, A.W. Ludikhuize, A. Grakist, Applying DMOSTs, diodes and thyristors above and below substrate in thin-layer SOI”, in Proceedings of the International Symposium on Power Semiconductor Devices and ICs, IEEE, Nijmegen Apr 2003, pp. 232–235 13. J. Roig, S. Mouhoubi, B. Crnkovi´c, F. Bauwens, P. Moens, P. Gassot, M. Tack, NBL biasing impact on reverse recovery for RESURF nLDMOS in multi-voltage Smart-Power platform, in Proceedings of the International Seminar on Power Semiconductors, Prague, Sept. 2010, pp. 269–275 14. D. Polenov, T. Reiter, R. Baburske, H. Probstle, J. Lutz, The influence of turn-off dead time on the reverse-recovery behaviour of synchronous rectifiers in automotive DC/DC-converters, in Proceedings of the European Conference on Power Electronics and Applications, IEEE, Barcelona, Sept 2009, pp. 1–8 15. X. Cheng, Y. Xiong, X. Wang, P. Kumar, Z. J. Shen, Performance analysis of trench power MOSFETs in synchronous buck converter applications, in Proceedings of the Applied Power Electronics Conference, IEEE, Anaheim, Mar 2007, pp. 332–338 16. R.K. Williams, W. Grabowski, M. Darwish, M. Chang, H. Yilmaz, K. Owyang, A 1 million-cell 2.0-m˝ 30-V TrenchFET utilizing 32 Mcell/in 2 density with distributed voltage clamping, in Proceedings of the International Electron Devices Meeting, IEEE, Washington, Dec 1997, pp. 363–366 17. M. Darwish, C. Yue, Kam Hong Lui, F. Giles, B. Chan, Kuo-in Chen, D. Pattanayak, Qufei Chen, K. Terrill, K. Owyang, A new power W-gated trench MOSFET (WMOSFET) with high switching performance, in Proceedings of the International Symposium on Power Semiconductor Devices and ICs, IEEE, Santa Clara, Apr 2003, pp. 24–27 18. D. Kinzer, Advances in power switch technology for 40 V - 300 V applications, in Proceedings of the European Conference on Power Electronics and Applications, IEEE, Dresden, Sept 2005, pp. 11 19. R. Siemieniec, F. Hirler, A. Schlogl, M. Rosch, N. Soufi-Amlashi, J. Ropohl, U. Hiller, A new and rugged 100 V power MOSFET, in Proceedings of the International Power Electronics and Motion Control Conference, Shanghai, Aug 2006, pp. 32

356

J. Roig and F. Bauwens

20. J. Roig, B. Desoete, P. Moens, M. Tack, Theoretical analysis of XtreMOS power transistors, in Proceedings of the European Solid State Device Research Conference, Washington, DC, Sept 2007, pp. 422–425 21. P. Goarin, G.E.J. Koops, R. van Dalen, C. Le Cam, J. Saby, Split-gate RESURF stepped oxide (RSO) MOSFETs for 25 V applications with record low gate-to-drain charge, in Proceedings of the International Symposium on Power Semiconductor Devices and ICs, IEEE, Jeju Island, May 2007, pp. 61–4 22. P. Moens, F. Bauwens, B. Desoete, J. Baele, K. Vershinin, H. Ziad, E.M. Shankara Narayanan, M. Tack, Record-low on-Resistance for 0.35 m based integrated XtreMOSTM Transistors, in Proceedings of the International Symposium on Power Semiconductor Devices and ICs, IEEE, Jeju Island, May 2007, pp. 57–60 23. C.F. Tong, P.A. Mawby, J.A. Covington, Field balanced’ SG-RSO structure showing tremendous potential for low voltage trench MOSFETs, in Proceedings of the European Conference on Power Electronics and Applications, IEEE, Barcelona, Sept. 2009, pp. 1–5 24. J. Roig, D. Lee, F. Bauwens, B. Burra, A. Rinaldi, J. McDonald, B. Desoete, Suitable operation conditions for different 100V trench-based power MOSFETs in 48V-input synchronous buck converters, in Proceedings of the European Conference on Power Electronics and Applications, IEEE, Birmingham, Sept 2011

Chapter 18

Control of Fully Integrated DC-DC Converters in CMOS Tom Van Breussegem, Mike Wens, and Michiel Steyaert

Abstract Monolithic integration has boosted the development of both industrial and consumer electronics. The ongoing scaling of CMOS technology and the decreasing supply voltages are continuously reducing power consumption in System on Chip solutions. The power supply turned out to be the last building block to remain discrete. But recently, techniques have been developed to miniaturize and integrate these building blocks in CMOS. This paper demonstrates the most recent circuit implementations, control techniques and evolutions in the field of fully integrated DC-DC converters in standard CMOS.

1 Introduction Power management in state of the art SoC’s heavily relies on the availability of distributed power dense power supplies. Up till today linear regulators are used to supply multiple voltage rails in SoC’s. But linear regulators have a maximum theoretical efficiency which corresponds with their Voltage Conversion Ratio (VCR). The VCR is the ratio of the output voltage and the input voltage of a voltage regulator. Thus a linear regulator has a low efficiency if used for conversions with a small VCR. An external DC-DC converter requires bulky discrete components which are expensive due to the additional assembly cost, the silicon area used for bond pads and the cost of the discrete passive devices. Additionally the use of extra chips contradicts the integration paradigm [1]. This paradigm looks for opportunities to integrate more and more functionality on a single silicon die or in the same integrated circuit. Over the past decade, this paradigm has brought us compact electronic radios, RFID solutions and much more. Therefore it is a

T. Van Breussegem () • M. Wens • M. Steyaert ESAT MICAS K.U.Leuven, Kasteelpark Arenberg 10, 3001 Heverlee, Belgium e-mail: [email protected] M. Steyaert et al. (eds.), Analog Circuit Design: Low Voltage Low Power; Short Range Wireless Front-Ends; Power Management and DC-DC, DOI 10.1007/978-94-007-1926-2 18, © Springer ScienceCBusiness Media B.V. 2012

357

358

T. Van Breussegem et al.

reasonable approach to consider the use of integrated DC-DC converters for dealing with the existing power management issues. This introduces an additional layer the power management interface between the system’s subblocks and the external power supply.

1.1 Power Supply Miniaturization The miniaturization of power supplies can be approached from two different perspectives: PowerSIP and PowerSoC. PowerSip is a single package (SIP) approach where as much functionality as possible is integrated on a single chip and the passives are packaged along with the chip. The PowerSOC approach aims for the fully integrated (System on Chip) approach both for the control functionality, switches as for the passive components.

1.2 Efficiency Enhancement Factor Until recently DC-DC converters were primarily qualified by means of their power density or their efficiency ratings. But especially for buck type of converters efficiency ratings turned out to be an inadequate figure to qualify these converters. First of all, because there is no obvious relationship between the efficiency of the converter and the reduction in power loss due to the converter. Secondly because the efficiency should be normalized with respect to the conversion factor – this is demonstrated in the following paragraphs. First one has to recognize that a down conversion DC-DC converter can be benchmarked with respect to the most straight forward down converter: the linear regulator. The linear regulator [2] is a closed loop circuit which dissipates the excess voltage in a series pass device. This circuit has the drawback of executing this conversion at an efficiency which corresponds with the ratio between output and input voltage. As a consequence the theoretical maximum efficiency for linear regulators is small for conversions with a small VCR. In general, linear regulators can be build in a very compact way and operate at an efficiency very close to their theoretical maximum efficiency. Hence linear regulators are excellent vehicles to benchmark down conversion DC-DC converters. More over - each down conversion DC-DC converter should perform better than a linear regulator. The Efficiency Enhancement Factor (EEF) is defined in Eq. 18.1 and illustrated in Fig. 18.1. In Eq. 18.1 SW represents the efficiency of a DC-DC converter and Lin is the maximum efficiency of a linear regulator executing the same conversion. EEF D 1

Lin SW

(18.1)

18 Control of Fully Integrated DC-DC Converters in CMOS

359

Fig. 18.1 EEF illustrated for a standalone DC-DC converter

Fig. 18.2 EEF illustrated for a cascaded system of a DC-DC converter followed with a series linear regulator

It can be proven [3] that the EEF corresponds with the power loss reduction (Pin) due to the DC-DC converter normalized with respect to the power consumed by a hypothetical linear regulator (Plin, in) performing the same conversion as the DC-DC converter. EEF D

Pin Plin; in

(18.2)

Thus the EEF is a true measure of the power loss reduction. Moreover the following explanation will demonstrate the conservation of EEF in a System. In some cases an additional linear regulator will be required if the ripple at the output of the DC-DC converter is to high (Fig. 18.2). But this does not alter the efficiency enhancement of the system. This is demonstrated by the following interpretation of the EEF in a cascaded system. The EEF of a standalone DC-DC converter is defined as: EEF D 1

Lin; 1 SW

(18.3)

In this formula ˜Lin,1 is the maximum theoretical efficiency of an ideal series regulator that executes the same conversion as the DC-DC converter.

360

T. Van Breussegem et al.

If a the DC-DC converter is followed by a series regulator with efficiency ˜Lin,2 the formulation is altered accordingly. But now ˜Lin D ˜Lin,1 ˜Lin,2 and also the converter efficiency is now ˜SW D ˜DC DC ˜Lin,2 : This reduces for the new system to: EEF D 1

Lin; 1 ˜Lin; 2 Lin; 1 D1 ˜SW ˜Lin; 2 ˜SW

(18.4)

This demonstrates that if a converter is followed by a linear regulator, the efficiency enhancement of the entire system (DC-DC C post regulator) equals the EEF of the standalone converter ˜SW. (Considering the operation point of the converter remains the same.) This is an additional motivation to appreciate the EEF formulation.

2 Converter Basics The use of standard CMOS technology puts severe constraints on the range of possible DC-DC converter topologies. The absence of ferromagnetic material in this standard CMOS process limits the possible useful solutions to basic DC-DC converters using nothing but solid-state switches, air coil inductors and capacitors. Basically two types of integrated DC-DC converters are observed: inductive type of converters and capacitive type of converters: The first use an inductor to transfer energy, the latter use the capacitor as an energy transferring element.

2.1 Inductive DC-DC Converters Inductive DC-DC converters use one or multiple inductors and capacitors as energy storing/transferring elements. As opposed to capacitive DC-DC converters this approach does not restricts the voltage conversion ratio. Also the power conversion efficiency is ideally not dependant on the voltage conversion ratio and can theoretically reach 100%. This can be understood by considering the transient step-response current and voltages of an ideal series LC-circuit, as shown in Fig. 18.3. It is clear that each time the current passes zero, the voltage over the capacitor is maximal as it then also contains the energy from the inductor. In a real LC-circuit a parasitic series resistance is present and the step-response will be damped. Therefore, the first zero crossing of the current will be the optimal timing for charging the capacitor, as the resistive losses will still be minimal. This specific timing is typical for inductive DC-DC converters, whereas capacitive DC-DC converters should be switched slower than their time constant (SSL) for maximal power conversion efficiency.

18 Control of Fully Integrated DC-DC Converters in CMOS

361

Fig. 18.3 The current and voltages of the transient step-response of an ideal RLC-circuit

Fig. 18.4 An ideal DC-DC buck converter.

The ideal DC-DC buck converter shown in Fig. 18.4 is suitable for fullintegration due to its simplicity and will be used to explain some basic concepts and circuit topology variations. The buck converter is suited to generate a lower Uout compared to Uin . Two conduction modes (CM) can be distinguished for inductive DC-DC converters: continuous conduction mode (CCM) and discontinuous conduction mode (DCM). In CCM the inductor current will always have a positive, finite value and in DCM the inductor current will be zero during a certain time. CCM is generally not suited for monolithic DC-DC converters because of the very high required switching frequency, which would cause excessive losses. Therefore, only DCM is considered. The steady-state transient currents and voltages of an ideal boost converter in DCM are shown in Fig. 18.5. During ˆ1 L and C are charged by Uin and C, thereby also powering RL . Afterwards, during ˆ2 L is discharged into C and RL . Finally, during ˆ3 both switches remain opened and RL is powered through C. It can be understood that by introducing the dead-time in DCM the switching frequency can be effectively decreased, thereby also decreasing the switching losses, which is beneficial for the overall power conversion efficiency. Many inductive DC-DC converter topologies exist, all yielding similar functionality: decreasing, increasing and inverting Uout , or a combination of these possibilities. However, it can be proven that the basic buck, boost and buckboost topologies are best suited [3] for monolithic integration, in addition to their variations. The most important of these topology variations for the buck converter example are discussed here.

362

T. Van Breussegem et al.

Fig. 18.5 The steady-state transient current and voltages of a DC-DC buck converter in DCM

Fig. 18.6 An ideal multi-phase DC-DC buck converter

The first variation is the multi-phase topology, where multiple inductors and switch networks are used to power the output. The ideal multi-phase DC-DC buck converter is illustrated in Fig. 18.6. The possible advantages of this topology, which may be combined with one another, are: 1. Pout : Can be increased with times the number of stages, compared to a singlephase converter. 2. ˜SW : The optimal efficiency of a single-phase converter is mostly occurs at Pout < Pout max . Hence, the overall power conversion efficiency can be increased by using a multi-phase converter for the same Pout as the single-phase converter.

18 Control of Fully Integrated DC-DC Converters in CMOS

363

Fig. 18.7 An ideal SIMO DC-DC buck converter

3. Area: By interleaving the different phases, the current towards C and RL is smeared out in the time-domain, relaxing the specifications for C and thus its required on-chip area. 4. Uout : The output voltage ripple will decrease because of the interleaving effect, compared to a single-phase converter at the same Pout . The second variation is the single-inductor multiple-output (SIMO) topology, shown in Fig. 18.7. This topology enables the generation of multiple different Uout , by dividing the energy from the inductor over multiple outputs. However, this topology has an intrinsic disadvantage for monolithic integration when one or more Uout value is higher than the nominal technology supply voltage. Indeed, in that case the output capacitors need to be implemented physically as a series stack of multiple capacitors to effectively divide the voltage over them. It is clear that this requires a significant amount of (expensive) chip area. For this reason a new SIMO-like topology for monolithic integration is defined: the series multiple output converter (SMOC) [3], which is shown in Fig. 18.8. In this topology the output capacitors of the different outputs are placed in series. As such the output voltages are effectively divided over the different output capacitors, eliminating the need to implement the individual capacitors as a series stack. In this way, an area reduction in the order of 1/3 can be achieved, depending on the number of outputs and the voltages. Finally, it is noted that both the multi-phase and SMOC topologies can be combined with each other.

2.2 Capacitive DC-DC Converters Capacitive DC-DC converters are Variable Structure Systems that transfer charge from their input terminal to their output terminal by means of (flying) capacitors [4]. The mayor part of the capacitive DC-DC converters operate in a two-phase cycle. By switching between two different configurations, charge is transferred from the

364

T. Van Breussegem et al.

Fig. 18.8 An ideal SMOC DC-DC buck converter

Fig. 18.9 A basic Capacitive DC-DC converter

input to the output of the DC-DC converter. During a two-phase cycle the flying capacitors are charged and discharged. If a flying capacitor is charged in one phase it is inevitably discharged in the complementary phase. The simplest example of a capacitive converter is shown in Fig. 18.9. By alternating phase ˆ1 and ˆ2, a conversion 2:1 is obtained. For correct operation the switches must be driven by non overlapping clock signals to avoid shortcut currents. The capacitive converter can be modeled [5] as an ideal voltage source with a non-zero output impedance. At rather low switching frequencies when the system’s time constants are smaller than the switching period, the output impedance is inverse proportional to the switching frequency Fsw and to the total amount of flying capacitance Cfly. The converter is operating in the Slow Switching Limit (SSL) [6]: ROUT;SSL D

Kc Cfly Fsw

18 Control of Fully Integrated DC-DC Converters in CMOS

365

At high switching frequencies the system’s time constants exceed the switching period and the switch conductance Gtot in combination with the duty cycle of the switching frequency in the circuit dominate the output impedance, this region is called the Fast Switching Limit (FSL) [6]: ROUT;FSL D

Ks D Gtot

The constants Kc and Ks – depend only on topology and sizing of the switches. Each capacitive converter topology has an unique voltage conversion ratio (VCR). Typically higher VCR requires a larger number of flying capacitors. Moreover more complex fractional converters also require an increasing number of flying capacitors. The relationship between the VCR and the number of components is described by the Fibonacci number in [5]. For this ideal conversion ratio the highest theoretical efficiency is 100%. When deviating from this VCR the upper limit of the efficiency is determined by the ratio of the actual output voltage and the ideal output voltage.

2.3 Voltage Capability The state of the art CMOS technologies demonstrate a limited voltage capability. Most recent technologies (45–90 nm) contain both low voltage devices with a break down voltage of 1–1.2 V and thick gate devices with a break down voltage of 2.5– 3.3 V. Therefore special techniques are required to deal with higher voltages. Two techniques can be used to deal with voltages higher than the maximum nominal voltage: the device stacking technique and the voltage domain stacking technique. The most popular technique is device stacking [7]. By combining two devices in series, each device is exposed to only halve of the voltage. But as a consequence, the devices must be sized up to compensate for the increase in channel resistance due to the stacking. The latter is a heavy drawback of this technique. A second technique is the voltage domain stacking [8]. According to this method: each switch is located in a single voltage domain and in this voltage domain the switch is never exposed to a voltage higher than the nominal voltage. Therefore the devices do not need up sizing and the switching losses are reduced with respect to the device stacking technique. But this requires careful start up techniques to ensure that none of the devices are exposed to a voltage higher than the breakdown voltage.

3 Control A switched mode converter is expected to regulate its output voltage at a constant voltage level even when load current variations occur or when the input voltage fluctuates. The ability to cope with these conditions is specified by the converter’s load and line regulation characteristics respectively.

366

T. Van Breussegem et al.

Fig. 18.10 The principle of PWM (left) and PFM (right) control systems

3.1 Control of Inductive DC-DC Converters In order to keep the output voltage to the desired level under load- and line variations, inductive DC-DC converters are controlled through pulse-width (PWM) or pulse-frequency (PFM) modulation. Both feedback and/or feed-forward implementations are possible. The principle of PWM and PFM is shown in Fig. 18.10. For PWM the converter operates at constant switching frequency and Uout is controlled by adapting the duty-cycle. In PFM control mode the duration of the switch pulses is kept constant and their repetition rate is altered to control Uout , which is equal to altering the switching frequency. Both control techniques have a significant impact on the power conversion efficiency as a function of Pout , in a real (lossy) converter. This is illustrated in Fig. 18.11, which shows ˜ as a function of Pout for a PWM (grey curve) and a PFM (black curves) controlled DC-DC converter. It is observed that when the switching frequency of the PFM controller at Pout max is chosen equal to that of the PWM controller (solid black curve), the value of ˜ will always be higher for the PFM controlled DC-DC converter. This is due to the overall lower switching frequency of the PFM controller in that case. However, the output voltage ripple will be higher for the PFM controlled DC-DC converter compared to PWM. In another extreme case the switching frequency of the PFM controller is made equal to that of the PWM one at Pout min (dashed black curve). In spite of the fact that the output voltage ripple of the PFM converter will always be lower than that of the PWM converter, its ˜ will also always be lower. It can be understood that this will lead to an efficiency versus area (of the output capacitor) trade-off in a PFM controlled DC-DC converter. Nevertheless, the PFM control technique is desired for monolithic DC-DC converters because of the potential overall higher power conversion efficiency, compared to a PWM controlled DC-DC converter.

18 Control of Fully Integrated DC-DC Converters in CMOS

367

Fig. 18.11 The power conversion efficiencies of PWM and PFM compared

Fig. 18.12 The principle of a COOT control system

For the purpose of controlling monolithic DC-DC converters the control systems should be able to obtain high speeds, as switching frequencies is the order of a few hundred MHz are used. These high speed requirements also imply that the delay in the signal path of the control systems should be minimized, requiring simple and straight-forward implementations. A first PFM variant that is suited for controlling monolithic DC-DC converters is the constant on/off-time (COOT) control scheme [9], of which the timing is illustrated in Fig. 18.12. When the on-time is kept constant, the inductor is always charged until the same current flows through it. Thus the energy stored in the inductor is also constant. This implies that the off-time can be kept constant as well, providing Uin and Uout remain constant. In this way no current sensing is required for the freewheeling switch in DCM. This simplifies the control system and increases

368

T. Van Breussegem et al.

Fig. 18.13 The power conversion efficiencies of COOT and SCOOT control schemes compared

its potential speed. In practice Uin and Uout will vary however, resulting in nonoptimal timing. This will in turn result in either bulk conduction or discharge of the output capacitor to the freewheeling switch. Measurements have nevertheless demonstrated that a variation of Uin and Uout in the order of 20% is still acceptable for the power conversion efficiency. Moreover, this disadvantage is in most cases no match compared to the advantage of the simplicity of this control system, which can be implemented using mostly digital and simple analog building blocks. A variant of the COOT control scheme is the semi-constant on/off-time control scheme (SCOOT) [10]. This control scheme is developed for multi-phase monolithic converters. It uses a larger on/off-time combination for larger Pout values, resulting in lower maximal switching frequencies. As a consequence, the overall power conversion efficiency of a SCOOT controlled converter is higher compared to a COOT one. This is illustrated in Fig. 18.13. COOT1 uses a higher switching frequency, yielding a lower output voltage ripple compared to COOT2 which uses a lower switching frequency. The combination of both, COOT1 for low Pout and COOT2 , yields a higher overall power conversion efficiency while maintaining a low output voltage ripple. The fact that the output voltage ripple can be kept sufficiently low is also due to the interleaving strategy of the different phases of the converter. This is explained by means of Fig. 18.14, which shows both low and high load SCOOT timing. For low load timing the maximal inductor current is limited through the smaller on/offtime and the offset-time between the phases is chosen large to effectively smear the current to the output out into the time-domain. This limits the output voltage ripple. For high load timing the inductor current is allowed to be higher and the different phases are consecutively allowed to follow each other faster. Nevertheless, the output voltage ripple will still be low because of the fact that the inductor current is divided between the load and the output capacitor. At high loads only a small fraction of the inductor current will flow through the capacitor, hence yielding a low output voltage ripple. The SCOOT timing scheme has the same potential disadvantage of the limited allowable variation of Uin . This is overcome by the feed-forward semi-constant

18 Control of Fully Integrated DC-DC Converters in CMOS

369

Fig. 18.14 The principle of a SCOOT control system

Fig. 18.15 The power conversion efficiencies of the COOT and F²SCOOT timing schemes compared (left) and the on- and off-time of a F²SCOOT control system as a function of the input voltage

on/off-time (F²SCOOT) control scheme [3]. A standard PFM control system keeps the on-time constant. At low Uin values this leads to a low inductor current, requiring an increased switching frequency. The higher associated losses of this increased frequency leads to a limited power conversion efficiency at low Uin , which is illustrated by the grey curve in the left graph of Fig. 18.15. By decreasing the on-time proportionally to Uin the inductor current can be limited and the low Uin power conversion efficiency is drastically increased (black curve in left graph of Fig. 18.15). Because of the varying on-times the off-time have tp be varied also in order to avoid current sensing of the freewheeling switch. The on/off-time values as a function of Uin are shown in Fig. 18.15 on the right graph. The fact that the on/off-time is varied with Uin results in an additional feed-forward loop. The advantages of F²SCOOT are similar to COOT, except for the additional advantage of a large Uin range.

370

T. Van Breussegem et al.

Fig. 18.16 A VCO based FM controlled Capacitive DC-DC converter

3.2 Control of Capacitive DC-DC Converters Capacitive DC-DC converters can be controlled both by switching frequency modulation (FM) and by pulse width (PWM) or duty cycle modulation. In Integrated DC-DC converters there is a trend to prefer FM over PWM. Especially due to the poor low load efficiency characteristics of PWM control, which is also observed for inductive DC-DC converters. For FM control the switching power losses scale along with the output power and this facilitates a flat efficiency curve over a broad output power range. Frequency modulation can be achieved by introducing an Error Amplifier (EA) and a Voltage Controlled Oscillator (VCO) and embedding the capacitive DC-DC converter in a closed loop system as is shown in Fig. 18.16. This VCO based FM controlled converter [11] requires a high Gain Band With Error amplifier to reduce the static off set in the loop. Moreover, due to the multi-pole nature of the loop, dominant pole compensation is used to stabilize the converter. This slows down the response of the converter to load and line variations. This approach turns this in a straight forward but slow and power consuming solution for the control of a capacitive DC-DC converter. On the other hand this technique is fully compatible with an interleaved multiphase switching scheme. Multiphase Interleaved Switching proposes to reduce the output noise by virtually increasing the switching frequency of the converter. Instead of using a single converter, the converter is fragmented into N equivalent paralleled converters. Each of these converters is clocked out of phase and the output noise is smeared out in time. Multiphase Interleaved Switching can be employed to reduce the output voltage ripple while maintaining output power, actual switching frequency and output buffer capacitance constant. On the other hand for a similar output voltage ripple higher output power densities can be achieved by reducing Cout, hereby reducing die area. A VCO based control loop can be extended to an interleaved solution as is shown in Fig. 18.17 and demonstrated in [12].

18 Control of Fully Integrated DC-DC Converters in CMOS

371

Fig. 18.17 An Interleaved VCO based FM controlled Capacitive DC-DC converter

Fig. 18.18 A hysteretic FM controlled Capacitive DC-DC converter

For the technique used in Fig. 18.17. The VCO runs at a N times higher frequency and a Clock Divider logic block turns this high frequency in N phase shifted clock signals for the N dimensional converter Array. Off course the draw backs remain the same and a rather slow solution is obtained. For high frequency response DC-DC converters, hysteretic control is a feasible alternative. A hysteretic controller bases its switching behavior on the output of a comparator. In Fig. 18.18 a basic implementation of a hysteretic controller is depicted. The comparator observes the output voltage and clocks the converter when the output voltage falls beyond the control voltage. The comparator is a digital comparator so that the Clock is passed through as long as the boundary is violated. By this means a dual switching behavior is obtained. During start-up the converter

372

T. Van Breussegem et al.

Fig. 18.19 Waveforms in a hysteretic FM controlled Capacitive DC-DC converter

switches at maximum frequency (halve of the Clock frequency) and in steady state the switching frequency depends on the load condition of the converter (load current and load voltage) but is smaller than or equal to halve the clock frequency. The timing sequence of a typical hysteretic controller operating in steady state is depicted in Fig. 18.19. Vout is compared with respect to the control Voltage Reference Vcontrol . The comparator passes the clock according to the comparison decision. The latch T-FF filters out the falling edges since these are nothing but a clock artifact and do not correspond with a boundary violation. Hysteretic control is unconditionally stable as long as the delay in the control loop remains smaller than the switching frequency and provides minimum response times in the range of twice the clock period. Moreover hysteretic control can be merged with multiphase interleaving by providing a control loop for each converter fragment and clock each of the comparators with a phase shifted clock signal [13]. This is shown in Fig. 18.20. This leads to a quasi digital solution which can be implemented at the cost of little amount of power loss. In the previous paragraphs control techniques were described which are counted to the type of output impedance control techniques. They directly address the output impedance and are above all very effective to deal with load variations for a fixed output voltage. To deal with line variations in an energy efficient fashion other techniques are required: topology reconfiguration. It was already mentioned that each converter topology corresponds with an ideal voltage conversion ratio. The maximum efficiency of a capacitive DC-DC converter is limited by the ratio between actual VCR and the ideal – VCR (iVCR) of a capacitive converter. Therefore it is important that the VCR remains close to this iVCR, typically by a factor 0.9, so that

18 Control of Fully Integrated DC-DC Converters in CMOS

373

Fig. 18.20 A hysteretic Interleaved FM controlled Capacitive DC-DC converter

90% is the upper limit for the theoretical efficiency. If a certain application requires a broad input output range and thus a broad range of VCR, multiple topologies can be implemented in a single capacitive DC-DC converter. Based on the output voltage and input voltage ratio the converter is reconfigured in real time to facilitate the most efficient conversion. This requires a switch array that can be reconfigured according to the wanted Voltage Conversion Ratio. In the optimum case the total amount of capacitance is used for each mode/topology.

4 Conclusions This paper has illustrated the most recent advances in the miniaturization of power supplies. The most critical issues regarding control of fully integrated DC-DC converters are highlighted and discussed. The most recent advances point out that comparator based frequency control is the most promising technique for controlling integrated DC-DC converters in an energy efficient and high speed fashion. Both for inductive as for capacitive DC-DC converters these techniques can seamlessly be merged with multiphase interleaving techniques and integrated with high speed digital logic.

374

T. Van Breussegem et al.

References 1.M. Steyaert, P. Vancorenland, CMOS: a paradigm for low power wireless?, in Proceedings of the 39th Design Automation Conference 2002, New Orleans, 2002, pp. 836–841 2. T. Endoh, K. Sunaga, H. Sakuraba, F. Masuoka, An on-chip 96.5% current efficiency CMOS linear regulator using a flexible control technique of output current. IEEE J. Solid-State Circ. 36(1 Jan), 34–39 (2001) 3. M. Wens, Monolithic inductive CMOS DC-DC converters: theoretical study & implementation, Ph.D. thesis promoted by Steyaert M., 2010 4. O.C. Mak, A. Ioinovici, Small size and low weight DC/DC converter with no magnetic elements, in 16th International Telecommunications Energy Conference 1994. INTELEC ’94, New York, 30 Oct–3 Nov 1994, pp. 573–580 5. M.S. Makowski, Realizability conditions and bounds on synthesis of switched-capacitor DC-DC voltage multiplier circuits. IEEE Trans. Circuits Syst. I: Fundam. Theory. Appl. 44(8 Aug), 684–691 (1997) 6. M.D. Seeman, S.R. Sanders, Analysis and optimization of switched-capacitor DC–DC converters. IEEE Trans. Power Electron. 23(2 Mar), 841–851 (2008) 7. M. Wens, M. Steyaert, A fully-integrated 0.18 m CMOS DC-DC step-down converter, using a bondwire spiral inductor, in Custom Integrated Circuits Conference, 2008. CICC 2008, IEEE, San Jose, 21–24 Sept 2008, pp. 17–20 8. V.W. Ng, M.D. Seeman, S.R. Sanders,, High-efficiency, 12 V-to-1.5 V DC-DC converter realized with Switched-Capacitor architecture, in VLSI Circuits, 2009 Symposium on, Kyoto, 16–18 June 2009, pp. 168–169 9. M. Wens M. Steyaert, A fully-integrated 130 nm CMOS DC-DC step-down converter, regulated by a constant on/off-time control system, in IEEE Proceedings of the 34th European Solid-State Circuits Conference (ESSCIRC), Edinburgh, Sept 2008, pp. 62–65 10.M. Wens, M. Steyaert, A fully-integrated CMOS 800 mW 4-Phase semi-constant on/off-time step-down converter. IEEE Trans. Power Electron. 26(2), 326–333 (2011) 11. P. Favrat, P. Deval, M.J. Declercq, A high-efficiency CMOS voltage doubler. IEEE J. SolidState Circ. 33(3 Mar), 410–416 (1998) 12. T. Van Breussegem, M. Steyaert, A 82% efficiency 0.5% ripple 16-phase fully integrated capacitive voltage doubler, in VLSI Circuits, 2009 Symposium on, Kyoto, 16–18 June 2009, pp. 198–199 13. T. Van Breussegem, M. Steyaert, A fully integrated 74% efficiency 3.6 V to 1.5 V 150mW capacitive point-of-load DC/DC-converter, in Proceedings of the ESSCIRC, 2010, Seville, 14–16 Sept 2010, pp. 434–437